Share the article
Subscribe for updates
Sardine needs the contact information you provide to us to contact you about our products and services.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Open sourcing protobuf-to-bigquery converter

At Sardine we use protocol buffers (protobuf) everywhere. It is a lightweight and language-neutral serialization format with typing.

We use protobuf to define data schema of BigQuery table and also generate backend server (go) code from the same protobuf file. Our backend servers populates the golang struct (generated from protobuf), send it to pubsub, and then dataflow jobs pick them up in BigQuery.

This simple pattern worked pretty well with one caveat. When we started, there was no clear solution on how to convert pubsub object to BigQuery row insert, so we needed to write boilderplate code like below to copy data from one object to another:

TableRow clientMetadataRow = new TableRow()  .set("session_key", clientMetadata.getSessionKey())  .set("client_id", clientMetadata.getClientId())  .set("revision", clientMetadata.getRevision())  .set("user_id", clientMetadata.getUserId())...

This was very tedious and error prone. When we added a new field in bigquery table we needed to update the above boilderplate code too. There must be a better solution for this.

(side note — if you start a new service today, you could use JSON instead of protobuf as serialization format for pubsub, then use Google’s official pubsub-to-bigquery job template (beta) —https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming)

It turns out that you can dynamically iterate protobuf object so instead of hard coding field names, we could do:

void copyFields(GeneratedMessageV3 fromProto, TableRow toRow) {  allFields = fromProto.getDescriptorForType().getFields();  for (Descriptors.FieldDescriptor field: allFields) {    Object value = fields.get(field);    String columnName = field.getName();    switch (field.getJavaType()) {      case STRING:        if (value != null) {          toRow.set(columnName, value);        }        break;...

That’s it! It reduced our boilerplate code and now we don’t need to worry about updating Java code (note you still need to redeploy dataflow job for any schema change so job runner knows about the latest proto definition)

We made the code public in below repo so the community don’t have to reinvent the wheel — https://github.com/sardine-ai/proto-to-bq-java

If you have any feedback, please reach out to me https://twitter.com/kazukinishiura

We’re hiring across all engineering roles — https://www.sardine.ai/careers

Share the article
About the author
Kazuki Nishiura
Head of Engineering

Share the article
Subscribe for updates
Sardine needs the contact information you provide to us to contact you about our products and services.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Heading

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Share the article
About the author
This is some text inside of a div block.
This is some text inside of a div block.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.