Backward and Forward Compatibility, Protobuf Versioning, Serialization

Goutham Pilla

July 01, 2019

What is backward and forward compatibility?

A change made to a system or technology in such a way that the existing users are unaffected is a backward compatible change. The obvious advantage is that the existing users have a non-time sensitive and a graceful way of upgrading their integrations. On the other hand, a non backward-compatible change breaks the existing integrations and forces the existing users to deal with an immediate fix. Authors of libraries, frameworks and APIs take particular care in the way they evolve their software to not disrupt the current user base and a backward compatible design philosophy serves them well in the long run.

Forward compatibility, on the other hand is the ability of a system to process input meant for a later version of the system. A message/standard/library/tool (ex: protobuf) supports forward compatibility if an implementation (ex: a grpc service built on protobufs) that uses an older version of the message processes a future version of the message. A daily example of a forward compatible software we encounter is the typical web browser. A forward compatible web browser would accept a newer version of HTML and gracefully handle the portions it cannot render (ex: a new HTML tag) by potentially showing a friendly message to the user to upgrade for full effect.

Writing software in a forward and a backward compatible way facilitates easy adoption and minimizes disruption. In this article we see how protobufs have the natural ability to support forward and backward compatibility of services in a typical microservice architecture.

Backward and Forward compatibility in protobuf

Protobuf is a format to serialize structured data - it is primarily used in communication (between services) and for storage. It is language-neutral, platform-neutral and due to its backward and forward compatibility friendliness, it provides an easily extensible way of serializing data. Let's study it with an example.

Consider the simple service below that utilizes two structured data messages (Version 1) - HelloRequest and HelloReply. HelloRequest has no fields in it HelloReply returns the reply in the field message.

service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}
}

// Version 1 of the proto messages.

// The request message containing the user's name.
message HelloRequest {}

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

The following service implementation just fills the message with the string “Hello World!”

def SayHello(self, request, context):
    response = helloworld_pb2.HelloReply()
    response.message = “Hello World!”
    return response

Following the simplistic implementation, we decide to add more flavour to our messages and come up with the Version 2 of protobuf messages where we want to add a personalized message.

// Version 2 of Protobuf messages
message HelloRequest {
  string name = 1;
}

// Server Response Message
message HelloReply {
  string message = 1;
  string personalized_message = 2;
}

Now let's examine the various cases affected by this change.

Case1 - Server upgrades to Version 2 of the protobuf messages and the client is on Version 1

As part of the upgrade, the server could choose to change the service endpoint implementation or not. If it does not change the implementation, it would only be filling in the “message” field of the HelloReply message.

On the other hand, it could change the implementation to fill in both the fields and support backward logic compatibility with the following implementation.

def SayHello(self, request, context):
    response = helloworld_pb2.HelloReply()
    response.message = “Hello World!”

    if request.name != “”:
        response.personalized_message = "Hi, {0}".format(request.name)
    
    return response

Either way, the client only understands Version 1 of the protobuf messages and it will only be referencing to the “message” field in the response while ignoring the “personalized_message” field.

Case 2 - Client upgrades to Version 2 of the protobuf messages and the server is on Version 1

In this case, the client fills in the “name” field in the HelloRequest message, but the server which is compiled with Version 1 of the protobuf messages just fills in the “message” field in the HelloReply response message.

The client is expecting two fields in its response and the second field (personalized_message) is set to its default value for the string datatype (an empty string).

As the cases illustrate, we have a simple method to upgrade individual components in isolation, which is an important need while building microservices. Let's summarize the dos and donts while changing protobuf message definitions in the next section.

Tips while changing protobuf message definitions

Do not change the numbered tags for the fields in the messages. This will break the design considerations meant for backward and forward compatibility.
Do not remove a field right away if it not being used anymore. Mark it deprecated and have a timeline to completely remove it, thereby giving the integrated applications time to flexibly remove the dependency on that field.
Adding fields is always a safe option as long as you manage them and don’t end up with too many of them.
Add new fields for newer implementations and depreciate older fields in a timely way.
Generally speaking, do not reuse the number tags.
Be aware of the default values for the data types so that new code can work with messages generated by old code.

In conclusion, protobufs vastly simplify planning for backward and forward compatibility and help with the velocity of development of software in a typical microservice architecture.