Contents
Introduction
On June 17, 2026, Yandex open-sourced YaFF (Yet Another Flat Format) — a zero-copy wire format for the Protobuf ecosystem, released under Apache 2.0 at version 0.1.0. This is not just another serialization library. It is a fundamental rethinking of how Protobuf data sits in memory, designed from the ground up for server-side, performance-critical C++ runtimes.
The headline numbers speak for themselves: YaFF’s Flat Layout reads hot hierarchical data ~3.8x faster than FlatBuffers and comes within 1.2x of a raw C++ struct. In production, Yandex’s advertising recommendation system reports 10–20% CPU savings at scale. And critically, your .proto file stays the single source of truth — no dual-schema maintenance, no manual converters, no ecosystem lock-in.
The Problem: Protobuf’s Parsing Tax
Protobuf is the industry standard for schematized data serialization. But in high-throughput server environments, its wire format extracts a painful tax: every read requires a full parse pass. The CPU cost is non-trivial, and it multiplies with message complexity.
To understand the scale, consider a real production scenario from Yandex’s ad system. A single hot service handles ~4,000 RPS per host, processing hundreds of banner objects per request. That is millions of objects per second. The full serving shard — 30 million banners weighing ~50 GB — must live locally because network overhead is prohibitive.
With Protobuf, you face two bad options:
- Parse on the fly: Deserializing millions of objects per second burns enormous CPU.
- Parse at startup: Deserializing 50 GB into C++ structs takes tens of minutes per replica and balloons RAM usage.
The only viable pattern at this scale is memory-mapped files — map the file into virtual memory and read directly. But this requires the on-disk bytes to match the in-memory representation exactly. Protobuf’s wire format cannot do this. FlatBuffers can, but it brings its own set of compromises.
Architecture: Zero-Copy, Proto-Native, Two-Way
.proto as the Single Source of Truth
Unlike FlatBuffers — which requires a separate .fbs schema that drifts from the original .proto — YaFF has no schema language of its own. Your existing .proto files are the one and only definition. A protoc plugin generates YaFF accessors directly from the proto descriptors, preserving field numbers, reserved ranges, and evolution semantics.
Zero-Copy, mmap-Compatible Buffers
YaFF serializes data into a flat binary buffer where fields are accessed by computing offsets rather than parsing. You can mmap a file and read it as a YaFF message with zero deserialization — no parsing, no allocation, no copying.
Bidirectional Protobuf Conversion
Less performance-sensitive services can continue using standard Protobuf. YaFF provides two-way conversion between its buffers and standard Protobuf messages, enabling incremental adoption module by module:
// Hot path: zero-copy read from YaFF buffer
const auto& response = yaff::ReadMessage<protoyaff::feed::FeedResponse>(buffer.Data());
std::string_view title = response.items(0).title();
// Edge: convert back to standard Protobuf for downstream services
feed::FeedResponse restored;
response.ParseTo(restored);
The Four Layouts
YaFF’s defining innovation is its adaptive layout system. A layout determines only the physical memory organization — the schema and the generated API remain identical.
Fixed Layout
+---------------------------+ | Field a (8 bytes) | <-- direct offset 0 +---------------------------+ | Field b (8 bytes, default)| <-- direct offset 8 +---------------------------+ | Field c (8 bytes) | <-- direct offset 16 +---------------------------+ Size known at compile time. No header. 0 bytes overhead.
1 read, 0 branches, 0 bytes overhead. Schema must be frozen.
Flat Layout
+------+----------------------------+
|Header| Field a | Field c |
|(2 B) | (8 B) | (8 B) |
+------+---------+------------------+
^ ^
| +-- offset from header
+-- bitmask of present fields
Unset fields occupy zero space.
2 reads, 1 branch, 2 bytes overhead. Dense, hot data. Best for performance-critical paths.
Sparse Layout
+--------+----------+--------------+------------+
| Meta | Meta | Field a | Field c |
| Header | Table | (8 B) | (8 B) |
| (6 B) | (idx Map)| | |
+--------+----------+--------------+------------+
+-- field_number -> offset mapping
Supports arbitrary gaps and unlimited evolution.
4 reads, 2 branches, 6 bytes overhead. For sparse schemas.
Dynamic Layout
Selects Flat or Sparse at runtime based on schema structure. If no gaps, uses the fast Flat path. Transparently falls back to Sparse when needed. Default layout.
| Layout | Read Access | Overhead | Evolution | Best for |
|---|---|---|---|---|
| Fixed | 1 read, 0 branches | 0 bytes | Frozen | Inlined primitives |
| Flat | 2 reads, 1 branch | 2 bytes | Restricted | Dense hot data |
| Sparse | 4 reads, 2 branches | 6 bytes | Unrestricted | Sparse schemas |
| Dynamic | Flat or Sparse | 2 or 6 bytes | Unrestricted | General (default) |
Benchmark Analysis
AMD EPYC 7713, Clang 20.1.8. Lower is better.
| Format | Read time (ns) | Slowdown vs struct |
|---|---|---|
| Raw C++ struct | 8.14 | 1.0x |
| YaFF Flat | 9.79 | 1.2x |
| YaFF Sparse | 21.23 | 2.6x |
| FlatBuffers | 37.30 | 4.6x |
| Protobuf | 219.35 | 26.9x |
Flat Layout reads 3.8x faster than FlatBuffers and 22.4x faster than Protobuf, within 1.2x of raw struct.
Compiler Aliasing & TBAA
FlatBuffers achieves zero-copy via reinterpret_cast into a uint8_t array. This breaks LLVM’s Type-Based Alias Analysis (TBAA), forcing a conservative MayAlias verdict. The compiler cannot CSE or GVN across chained accesses, causing up to 1.8x slowdown without manual caching.
// Without caching: 36.8 ns — compiler re-does walk each time
s += root->intermediate()->leaf()->a();
s += root->intermediate()->leaf()->b();
// With manual caching: 20.7 ns — only one walk
const auto* l = root->intermediate()->leaf();
s += l->a();
s += l->b();
YaFF solves this with __restrict annotations and explicit offset arithmetic, achieving 9.79ns with or without chain caching — the compiler handles it automatically.
Production: Yandex Ad System
YaFF runs in Yandex’s advertising recommendation system — 50 GB shards, 30M+ entries, thousands of RPS. Results: 10–20% CPU savings at production scale. Gradual adoption: convert at boundaries, keep Protobuf elsewhere.
Code Walkthrough
#include "feed.pb.h"
#include "feed.yaff.h"
feed::FeedResponse proto = LoadFeedResponse();
const auto buffer = yaff::Serialize<protoyaff::feed::FeedResponse>(proto);
const auto& response = yaff::ReadMessage<protoyaff::feed::FeedResponse>(buffer.Data());
for (const auto& item : response.items()) {
std::string_view title = item.title();
std::string_view author = item.author().name();
Process(title, author);
}
feed::FeedResponse restored;
response.ParseTo(restored);
Integration Guide
# CMake
find_package(yaff REQUIRED)
target_link_libraries(my_app yaff::core yaff::proto)
yaff_generate(TARGET my_app PROTO_FILES my_schema.proto)
# Conan v2
[requires]
yaff/0.1.0
[generators]
CMakeDeps
CMakeToolchain
| Target | Type | Purpose |
|---|---|---|
yaff::core | INTERFACE | Runtime headers |
yaff::proto | STATIC | Protobuf support |
yaff::protoc_plugin | Executable | Protoc plugin |
Roadmap
- Columnar Layout — Analytics/ML-optimized columnar representation
- Automated Adaptivity — PGO-driven layout selection
- Multi-Language Bindings — Go, Rust, Python
Conclusion
By keeping .proto as the source of truth while offering multiple zero-copy physical layouts, YaFF bridges Protobuf’s ergonomics with raw struct performance. For any team running Protobuf at scale, it is worth a serious look.
Links:
- GitHub: github.com/yandex/yaff
- Docs: yaff.tech
- Telegram (EN): t.me/yaff_en