Back to Blog
Software

What the Protobuf

10 min read
Protocol BuffersMicroservices

Introduction

If you've worked with microservices or distributed systems, you've probably heard someone mention Protocol Buffers. Maybe you dismissed it as just another serialization format, or maybe you were too deep in JSON schemas to care. But here's the thing: once you understand what Protobuf actually does and why it exists, you'll start seeing its use cases everywhere.

This isn't about jumping on the Google bandwagon. This is about understanding a tool that solves real problems in machine-to-machine communication, and knowing when to reach for it instead of defaulting to JSON for the hundredth time.

What is Protocol Buffers?

Protocol Buffers (Protobuf) is a language-neutral, platform-neutral serialization format developed by Google. Think of it as a way to define the structure of your data once, and then generate code to read and write that data in any language you need.

The key difference from JSON or XML? Protobuf is a binary format. Your data gets serialized into a compact binary representation rather than human-readable text. This means smaller payloads, faster parsing, and strongly typed contracts between services.

Here's what makes it different:

  • You define your data structure in .proto files using a simple interface definition language
  • The protobuf compiler (protoc) generates code for your target language
  • The generated code handles all serialization and deserialization
  • The binary format is backwards and forwards compatible (with some rules)

Why Protobuf Matters

The honest answer? Performance and type safety.

Size Matters

A typical JSON payload might look like this:

{
  "user_id": 12345,
  "username": "developer",
  "email": "dev@example.com",
  "is_active": true
}

That's 98 bytes. The equivalent Protobuf message? Around 30-40 bytes depending on the values. When you're sending millions of messages between services, that difference adds up fast.

Type Safety

With JSON, you're essentially flying blind until runtime. Sure, you might have TypeScript definitions or JSON schemas, but there's no guarantee that the service on the other end is actually sending what you expect.

Protobuf enforces a contract. If the message doesn't match the schema, it won't deserialize. No more "user_id": "12345" (string) vs "user_id": 12345 (number) bugs at 3 AM.

Performance

Parsing JSON requires a full text scan and type inference. Protobuf reads a binary format with field tags, skipping fields it doesn't care about. The performance difference is measurable, especially at scale.

Getting Started with Proto Files

Let's write an actual .proto file. Here's a simple example for a user service:

syntax = "proto3";

package users;

message User {
  int64 user_id = 1;
  string username = 2;
  string email = 3;
  bool is_active = 4;
  int64 created_at = 5;
}

message GetUserRequest {
  int64 user_id = 1;
}

message GetUserResponse {
  User user = 1;
  bool found = 2;
}

A few things to note:

  • syntax = "proto3" specifies the version (proto3 is the current standard)
  • Each field has a type, name, and a unique number (the number is critical for backwards compatibility)
  • Numbers 1-15 take one byte to encode, so use them for frequently used fields
  • You can nest messages inside each other

The field numbers are permanent. Once you assign user_id = 1, that's its number forever. This is how Protobuf maintains compatibility when you add or remove fields later.

Using Protobuf with a Server

Let's say you want to build a simple gRPC service (gRPC uses Protobuf by default). First, define your service in the proto file:

syntax = "proto3";

package users;

service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
  rpc UpdateUser(UpdateUserRequest) returns (UpdateUserResponse);
}

message GetUserRequest {
  int64 user_id = 1;
}

message GetUserResponse {
  User user = 1;
  bool found = 2;
}

message User {
  int64 user_id = 1;
  string username = 2;
  string email = 3;
  bool is_active = 4;
}

Now compile it:

protoc --go_out=. --go-grpc_out=. users.proto

This generates two files:

  • users.pb.go - the message types
  • users_grpc.pb.go - the service interfaces

Your server implementation might look like this (in Go):

type userServer struct {
    pb.UnimplementedUserServiceServer
    db *sql.DB
}

func (s *userServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.GetUserResponse, error) {
    user, err := s.db.QueryUser(req.UserId)
    if err != nil {
        return &pb.GetUserResponse{Found: false}, nil
    }
    
    return &pb.GetUserResponse{
        User: &pb.User{
            UserId:   user.ID,
            Username: user.Name,
            Email:    user.Email,
            IsActive: user.Active,
        },
        Found: true,
    }, nil
}

The client code is just as simple:

conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure())
client := pb.NewUserServiceClient(conn)

resp, err := client.GetUser(context.Background(), &pb.GetUserRequest{
    UserId: 12345,
})

No manual serialization, no JSON marshaling, no wondering if the field is a string or a number. The generated code handles everything.

Machine-to-Machine Communication

This is where Protobuf really shines. When humans aren't reading the data, why are we using human-readable formats?

The HTTP/JSON Habit

Most APIs use JSON over HTTP because it's what we know. It's easy to debug with curl, you can read it in browser dev tools, and every language has JSON support. But when you have:

  • Service A calling Service B
  • Service B calling Service C
  • Service C publishing to a message queue
  • Service D consuming from that queue

Nobody is looking at those payloads. You're using tools like Postman or debugging with logs anyway. The "human readable" benefit is negligible.

What You Get Instead

With Protobuf in machine-to-machine communication:

Bandwidth: Smaller messages mean less network overhead. In a high-throughput system, this can reduce your bandwidth costs significantly.

Speed: Faster serialization and deserialization means lower latency. If Service A makes 100 calls to Service B per request, those milliseconds add up.

Type Safety: The contract is explicit and enforced. If Service B changes its API, Service A's code won't compile until you update it.

Schema Evolution: Need to add a field? Just add it to the proto file with a new number. Old services ignore it, new services use it. No breaking changes.

Cross-Language: Your Python service can talk to your Go service can talk to your Java service. The proto file is the single source of truth.

Backwards Compatibility in Practice

Let's say you start with:

message User {
  int64 user_id = 1;
  string username = 2;
}

Later, you need to add email:

message User {
  int64 user_id = 1;
  string username = 2;
  string email = 3;  // new field
}

Old services that don't know about email will simply ignore it. New services will read it. The old binary format still deserializes correctly because the field numbers haven't changed.

This is huge for microservices where you can't update all services simultaneously.

When to Use Protobuf (and When Not To)

Use Protobuf when:

  • Building microservices that talk to each other
  • You need high-performance serialization
  • Type safety and contracts matter
  • You're working with multiple languages
  • Bandwidth is a concern

Stick with JSON when:

  • Building a public REST API (external developers expect JSON)
  • You need human-readable logs or debugging output
  • You're building a simple CRUD app with a single service

Conclusion

Protobuf isn't magic, and it's not always the right choice. But for machine-to-machine communication in distributed systems, it solves real problems that JSON doesn't. The performance gains are measurable, the type safety prevents entire classes of bugs, and the tooling ecosystem is mature.

If you're building microservices and still serializing everything to JSON, maybe it's time to ask "what the Protobuf?" and give it a shot. Start small with one service-to-service call, measure the difference, and go from there.

The Protocol Buffers documentation is actually well-written, and there are implementations for practically every language. The gRPC project provides a complete framework built on top of Protobuf if you want to go that route.

Sometimes the best tools are the ones that get out of your way and just work. Protobuf is one of those tools.


Further reading: Protocol Buffers Language Guide, gRPC Basics Tutorial