YaFF Deep-Dive: Yandex's Zero-Copy Wire Format For Protobuf

Introduction

On June 17, 2026, Yandex open-sourced YaFF (Yet Another Flat Format) — a zero-copy wire format for the Protobuf ecosystem, released under Apache 2.0 at version 0.1.0. This is not just another serialization library. It is a fundamental rethinking of how Protobuf data sits in memory, designed from the ground up for server-side, performance-critical C++ runtimes.

The headline numbers speak for themselves: YaFF’s Flat Layout reads hot hierarchical data ~3.8x faster than FlatBuffers and comes within 1.2x of a raw C++ struct. In production, Yandex’s advertising recommendation system reports 10–20% CPU savings at scale. And critically, your .proto file stays the single source of truth — no dual-schema maintenance, no manual converters, no ecosystem lock-in.

The Problem: Protobuf’s Parsing Tax

Protobuf is the industry standard for schematized data serialization. But in high-throughput server environments, its wire format extracts a painful tax: every read requires a full parse pass. The CPU cost is non-trivial, and it multiplies with message complexity.

To understand the scale, consider a real production scenario from Yandex’s ad system. A single hot service handles ~4,000 RPS per host, processing hundreds of banner objects per request. That is millions of objects per second. The full serving shard — 30 million banners weighing ~50 GB — must live locally because network overhead is prohibitive.

With Protobuf, you face two bad options:

Parse on the fly: Deserializing millions of objects per second burns enormous CPU.
Parse at startup: Deserializing 50 GB into C++ structs takes tens of minutes per replica and balloons RAM usage.

The only viable pattern at this scale is memory-mapped files — map the file into virtual memory and read directly. But this requires the on-disk bytes to match the in-memory representation exactly. Protobuf’s wire format cannot do this. FlatBuffers can, but it brings its own set of compromises.

Architecture: Zero-Copy, Proto-Native, Two-Way

.proto as the Single Source of Truth

Unlike FlatBuffers — which requires a separate .fbs schema that drifts from the original .proto — YaFF has no schema language of its own. Your existing .proto files are the one and only definition. A protoc plugin generates YaFF accessors directly from the proto descriptors, preserving field numbers, reserved ranges, and evolution semantics.

Zero-Copy, mmap-Compatible Buffers

YaFF serializes data into a flat binary buffer where fields are accessed by computing offsets rather than parsing. You can mmap a file and read it as a YaFF message with zero deserialization — no parsing, no allocation, no copying.

Bidirectional Protobuf Conversion

Less performance-sensitive services can continue using standard Protobuf. YaFF provides two-way conversion between its buffers and standard Protobuf messages, enabling incremental adoption module by module:

// Hot path: zero-copy read from YaFF buffer
const auto& response = yaff::ReadMessage<protoyaff::feed::FeedResponse>(buffer.Data());
std::string_view title = response.items(0).title();

// Edge: convert back to standard Protobuf for downstream services
feed::FeedResponse restored;
response.ParseTo(restored);

The Four Layouts

YaFF’s defining innovation is its adaptive layout system. A layout determines only the physical memory organization — the schema and the generated API remain identical.

Fixed Layout

  +---------------------------+
  | Field a (8 bytes)         |  <-- direct offset 0
  +---------------------------+
  | Field b (8 bytes, default)|  <-- direct offset 8
  +---------------------------+
  | Field c (8 bytes)         |  <-- direct offset 16
  +---------------------------+
  Size known at compile time. No header. 0 bytes overhead.

1 read, 0 branches, 0 bytes overhead. Schema must be frozen.

Flat Layout

  +------+----------------------------+
  |Header| Field a | Field c          |
  |(2 B) | (8 B)   | (8 B)            |
  +------+---------+------------------+
         ^         ^
         |         +-- offset from header
         +-- bitmask of present fields
  Unset fields occupy zero space.

2 reads, 1 branch, 2 bytes overhead. Dense, hot data. Best for performance-critical paths.

Sparse Layout

  +--------+----------+--------------+------------+
  | Meta   | Meta     | Field a      | Field c    |
  | Header | Table    | (8 B)        | (8 B)      |
  | (6 B)  | (idx Map)|              |            |
  +--------+----------+--------------+------------+
           +-- field_number -> offset mapping
  Supports arbitrary gaps and unlimited evolution.

4 reads, 2 branches, 6 bytes overhead. For sparse schemas.

Dynamic Layout

Selects Flat or Sparse at runtime based on schema structure. If no gaps, uses the fast Flat path. Transparently falls back to Sparse when needed. Default layout.

Layout	Read Access	Overhead	Evolution	Best for
Fixed	1 read, 0 branches	0 bytes	Frozen	Inlined primitives
Flat	2 reads, 1 branch	2 bytes	Restricted	Dense hot data
Sparse	4 reads, 2 branches	6 bytes	Unrestricted	Sparse schemas
Dynamic	Flat or Sparse	2 or 6 bytes	Unrestricted	General (default)

Benchmark Analysis

AMD EPYC 7713, Clang 20.1.8. Lower is better.

Format	Read time (ns)	Slowdown vs struct
Raw C++ struct	8.14	1.0x
YaFF Flat	9.79	1.2x
YaFF Sparse	21.23	2.6x
FlatBuffers	37.30	4.6x
Protobuf	219.35	26.9x

Flat Layout reads 3.8x faster than FlatBuffers and 22.4x faster than Protobuf, within 1.2x of raw struct.

Compiler Aliasing & TBAA

FlatBuffers achieves zero-copy via reinterpret_cast into a uint8_t array. This breaks LLVM’s Type-Based Alias Analysis (TBAA), forcing a conservative MayAlias verdict. The compiler cannot CSE or GVN across chained accesses, causing up to 1.8x slowdown without manual caching.

// Without caching: 36.8 ns — compiler re-does walk each time
s += root->intermediate()->leaf()->a();
s += root->intermediate()->leaf()->b();

// With manual caching: 20.7 ns — only one walk
const auto* l = root->intermediate()->leaf();
s += l->a();
s += l->b();

YaFF solves this with __restrict annotations and explicit offset arithmetic, achieving 9.79ns with or without chain caching — the compiler handles it automatically.

Production: Yandex Ad System

YaFF runs in Yandex’s advertising recommendation system — 50 GB shards, 30M+ entries, thousands of RPS. Results: 10–20% CPU savings at production scale. Gradual adoption: convert at boundaries, keep Protobuf elsewhere.

Code Walkthrough

#include "feed.pb.h"
#include "feed.yaff.h"

feed::FeedResponse proto = LoadFeedResponse();
const auto buffer = yaff::Serialize<protoyaff::feed::FeedResponse>(proto);

const auto& response = yaff::ReadMessage<protoyaff::feed::FeedResponse>(buffer.Data());
for (const auto& item : response.items()) {
    std::string_view title  = item.title();
    std::string_view author = item.author().name();
    Process(title, author);
}

feed::FeedResponse restored;
response.ParseTo(restored);

Integration Guide

# CMake
find_package(yaff REQUIRED)
target_link_libraries(my_app yaff::core yaff::proto)
yaff_generate(TARGET my_app PROTO_FILES my_schema.proto)

# Conan v2
[requires]
yaff/0.1.0

[generators]
CMakeDeps
CMakeToolchain

Target	Type	Purpose
`yaff::core`	INTERFACE	Runtime headers
`yaff::proto`	STATIC	Protobuf support
`yaff::protoc_plugin`	Executable	Protoc plugin

Roadmap

Columnar Layout — Analytics/ML-optimized columnar representation
Automated Adaptivity — PGO-driven layout selection
Multi-Language Bindings — Go, Rust, Python

Conclusion

By keeping .proto as the source of truth while offering multiple zero-copy physical layouts, YaFF bridges Protobuf’s ergonomics with raw struct performance. For any team running Protobuf at scale, it is worth a serious look.

Links:

GitHub: github.com/yandex/yaff
Docs: yaff.tech
Telegram (EN): t.me/yaff_en