1. Introduction

On February 28, 2026, ByteDance’s DeerFlow claimed the #1 spot on GitHub Trending — an extraordinary milestone for an open-source AI agent project with over 73,000 stars and counting. What made this even more remarkable was that version 2.0 represented a complete ground-up rewrite, shipping with 180+ merged pull requests since the first milestone tag. This was not an incremental update; it was a reinvention.

The journey from DeerFlow 1.x to 2.0 tells a larger story about the evolution of AI agents. Version 1 began as a Deep Research framework—a focused tool for autonomous information gathering and synthesis. But as ByteDance’s engineers observed how the community used it, they realized the original scope was too narrow. Users were pushing the framework far beyond research, into territory that demanded code execution, file management, persistent memory, multi-agent coordination, and integration with external services. The framework needed to become something bigger.

Thus was born the concept of a super agent harness. DeerFlow 2.0 is no longer a framework you wire together. It is a complete runtime environment—batteries included, fully extensible—built on LangGraph and LangChain. It ships with everything an agent needs out of the box: an isolated filesystem (sandbox), short- and long-term memory, extensible skills and tools, message gateway integration across eight IM channels, Model Context Protocol (MCP) support, and the ability to plan, spawn, and coordinate sub-agents for complex multi-step tasks that can run from minutes to hours.

What makes DeerFlow fundamentally different from other agent frameworks is the harness philosophy. Most frameworks provide a thin orchestration layer and expect you to bring your own infrastructure—your own vector store, your own execution environment, your own tool integrations. DeerFlow ships as a self-contained deployment: a FastAPI Gateway with an embedded LangGraph agent runtime, a Next.js frontend, Nginx reverse proxy, Docker sandbox support, and PostgreSQL integration. Clone the repository, configure your models, and you have a production-ready super agent infrastructure running on your own hardware under the permissive MIT License.

In this article, we will peel back the hood on five critical layers of DeerFlow 2.0: the architecture split between harness and app, the sub-agent orchestration engine, the sandbox and execution environment, the skills and tools system, and practical guidance for getting started. Whether you are building a research assistant, a code-generation pipeline, or a multi-agent automation platform, understanding these layers will give you a blueprint for how to design your own super agent infrastructure.

2. Architecture Overview

Two-Layer Split: Harness vs. App

The single most important architectural decision in DeerFlow 2.0 is the strict separation between the harness layer and the application layer. The harness, contained in the deerflow.* Python package, is a publishable agent framework that can be imported and used independently. The application layer, app.*, is a FastAPI Gateway that imports and extends the harness, adding REST endpoints, IM channel connectors, and configuration management. The dependency direction is strict and enforced: app imports deerflow, never the reverse. This means the harness can be packaged, versioned, and published to PyPI as a standalone library, while the app layer remains a thin deployment wrapper.

Gateway-Embedded Agent Runtime

Unlike architectures that run the agent runtime as a separate microservice, DeerFlow embeds the LangGraph agent runtime directly inside the FastAPI Gateway process. The Gateway exposes both standard REST endpoints (/api/models, /api/skills, /api/memory, /api/threads) and a LangGraph-compatible route prefix. Nginx, running on port 2026, serves as a unified reverse proxy with the following routing rules:

Route Pattern Target Purpose
/api/langgraph/* Gateway (rewritten to /api/*) LangGraph SDK-compatible agent runtime
/api/* (other) Gateway (port 8001) REST APIs (models, skills, memory, uploads)
/* (non-API) Frontend (port 3000) Next.js UI

The Nginx rewrite of /api/langgraph/* to the Gateway’s native /api/* routes is a clever design choice: it means clients using the standard LangGraph SDK can connect without running a separate LangGraph server. The Gateway speaks the LangGraph protocol natively, which dramatically simplifies deployment.

Request Flow

When a user sends a message via the browser or an IM channel, the full request flow is:

  1. Client → Nginx (port 2026) — request arrives at the unified entry point.
  2. Nginx routes to the FastAPI Gateway (port 8001) based on the path prefix.
  3. Gateway loads or creates the ThreadState for the conversation thread.
  4. Middleware chain executes in strict order (10 middlewares): ThreadDataMiddleware creates per-thread directories; UploadsMiddleware injects uploaded files; SandboxMiddleware acquires the sandbox environment; SummarizationMiddleware manages context windows; TitleMiddleware generates conversation titles; TodoListMiddleware loads task lists (Plan Mode); MemoryMiddleware queues memory updates; ViewImageMiddleware prepares vision data; SubagentLimitMiddleware enforces concurrency limits; and ClarificationMiddleware handles user clarification interrupts.
  5. Lead Agent (LangGraph create_react_agent) processes the messages, calls tools (bash, web search, file operations, sub-agent delegation), and streams responses via Server-Sent Events (SSE).
  6. Tools execute against the acquired sandbox—writing files, running commands, fetching web pages.
  7. Response streams back through the middleware chain to Nginx to the client.

Directory Structure

The project follows a clean, modular layout:

deer-flow/
├── backend/
│   ├── src/
│   │   ├── app/                    # FastAPI Gateway layer
│   │   │   ├── api/                # REST endpoints
│   │   │   ├── channels/           # IM channel connectors
│   │   │   └── main.py             # FastAPI application
│   │   └── packages/
│   │       └── harness/
│   │           └── deerflow/       # Core harness package
│   │               ├── agents/     # Lead agent, sub-agents
│   │               ├── sandbox/    # Sandbox providers
│   │               ├── subagents/  # Sub-agent executor
│   │               ├── memory/     # Memory system
│   │               ├── tools/      # Built-in tools
│   │               └── skills/     # Skills parser & loader
│   └── docs/                       # Architecture, config, MCP docs
├── frontend/                       # Next.js UI
├── skills/                         # Skill definitions
│   ├── public/                     # Built-in skills
│   └── custom/                     # User-defined skills
├── nginx/                          # Nginx configuration
├── docker/                         # Docker Compose, Dockerfiles
└── config.yaml                     # Root configuration

3. Sub-Agent System — The Orchestration Engine

The sub-agent system is arguably the most architecturally impressive component of DeerFlow 2.0. It transforms the lead agent from a single-threaded chat model into a task orchestrator capable of spawning, monitoring, and synthesizing results from multiple specialized agents running in parallel.

The task() Tool

Sub-agents are spawned through a single task() tool call. When the lead agent determines that a complex sub-task would benefit from its own dedicated context and execution thread, it invokes task() with a prompt and an optional sub-agent type. The tool delegates immediately to the SubagentExecutor, which handles the entire lifecycle behind the scenes.

Execution Flow

The execution flow from tool invocation to result collection follows this path:

  1. Lead agent calls task(prompt="Research topic X", subagent_type="general-purpose").
  2. The task() tool handler creates a SubagentExecutor with filtered tools and parent context, stores a SubagentResult with PENDING status, and submits the work to the scheduler thread pool.
  3. The scheduler delegates the actual sub-agent execution to the execution thread pool, where a fresh LangGraph agent is built via _create_agent() with its own isolated ThreadState (no checkpointer inheritance from the parent).
  4. Polling: the executor polls the sub-agent’s status every 5 seconds, emitting SSE events (task_started, task_running, task_completed/failed/timed_out) to the lead agent’s event stream.
  5. On completion, the result (structured output, files, token usage) is returned to the lead agent for synthesis.

Concurrency Model

DeerFlow enforces strict concurrency limits to prevent resource exhaustion. The maximum is 3 concurrent sub-agents, controlled by dual thread pools:

class SubagentExecutor:
    def __init__(self):
        self._scheduler_pool = ThreadPoolExecutor(max_workers=3)
        self._execution_pool = ThreadPoolExecutor(max_workers=3)
        self._isolated_subagent_loop_thread = None  # Dedicated asyncio loop

    async def _aexecute(self, task_id, prompt, subagent_type, ...):
        # Submitted to scheduler pool, then forwarded to execution pool
        result = await asyncio.get_event_loop().run_in_executor(
            self._scheduler_pool, self._run_task, task_id, prompt, ...
        )
        return result

A dedicated _isolated_subagent_loop_thread hosts a persistent asyncio event loop for sub-agent executions triggered from an already-running parent loop, preventing nested event loop conflicts. The SubagentLimitMiddleware enforces the cap in the middleware chain: any task() call beyond the limit is truncated in the after_model phase, and the lead agent receives a message indicating that concurrency limits have been reached.

Each sub-agent runs with a default timeout of 30 minutes and a maximum of 150 turns, providing guards against runaway execution.

Context Isolation

Each sub-agent operates in a completely isolated context:

  • Isolated: conversation history, tool call history, intermediate results—sub-agents cannot see the lead agent’s conversation or other sub-agents’ contexts.
  • Shared via sandbox: the sandbox filesystem is shared, so sub-agents can coordinate by writing and reading files at /mnt/user-data/workspace/.
  • Memory: sub-agents have read-only access to the user’s persistent memory, allowing them to incorporate user context without side effects.

Built-in Sub-Agent Types

Type Description Tools Available
general-purpose Full-capability agent with all tools except task() Sandbox (bash, file ops), research (web search/fetch), built-in (ask_clarification), skills, memory (read-only)
bash Command specialist focused on shell execution bash (primary), read_file, write_file, ls

Parallel Research Workflow Example

The following pseudocode illustrates how a lead agent can decompose a complex research question into parallel sub-agent tasks:

# Lead agent receives: "Compare RAG architectures for legal document analysis"

# Step 1: Spar research across three parallel sub-agents
task_1 = task(
    prompt="Research hybrid RAG architectures (GraphRAG, LightRAG) for legal docs",
    subagent_type="general-purpose"
)
task_2 = task(
    prompt="Research agentic RAG patterns with tool use for legal document workflows",
    subagent_type="general-purpose"
)
task_3 = task(
    prompt="Evaluate chunking strategies and embedding models for legal text",
    subagent_type="general-purpose"
)

# Step 2: Wait for all three to complete (polling, 5s interval)
results = await asyncio.gather(task_1, task_2, task_3)

# Step 3: Synthesize into a structured comparison
synthesis = f"""
Based on parallel research:

## Architecture Comparison
{results[0]}

## Agentic Patterns
{results[1]}

## Chunking & Embeddings
{results[2]}

## Recommendation
Synthesize findings into a final recommendation...
"""

4. Sandbox & Execution Environment

Every thread in DeerFlow gets its own isolated filesystem—a sandbox that provides the agent with a real computing environment. The sandbox abstraction is one of the most important differentiators between DeerFlow and simpler agent frameworks, because it enables agents to write code, execute it, store results, and build artifacts within a controlled, secure boundary.

Virtual Path System

Agents never see physical filesystem paths. Instead, they interact with a virtual path hierarchy that is transparently translated by the sandbox layer:

Virtual Path Physical Path
/mnt/user-data/workspace/ .deer-flow/threads/{thread_id}/user-data/workspace/
/mnt/user-data/uploads/ .deer-flow/threads/{thread_id}/user-data/uploads/
/mnt/user-data/outputs/ .deer-flow/threads/{thread_id}/user-data/outputs/
/mnt/skills/ skills/ (project root)

The path translation is implemented in a straightforward replace function:

def replace_virtual_path(path: str, thread_id: str) -> str:
    if "/mnt/user-data/" in path:
        base = f"backend/.deer-flow/threads/{thread_id}/user-data"
        return path.replace("/mnt/user-data", base)
    if "/mnt/skills/" in path:
        return path.replace("/mnt/skills", "skills")
    return path

This translation is transparent to the agent and fully invisible in tool outputs. The agent always sees virtual paths; the sandbox always writes to physical paths.

Sandbox Providers

DeerFlow provides two built-in sandbox providers, selected based on the deployment mode:

LocalSandboxProvider operates directly on the host filesystem. It creates per-thread directories managed by an LRU cache with 256 entries. When a thread completes or the cache is full, directories are cleaned up automatically. This provider is suitable for development and trust-zone deployments where shell access isolation is not critical. Note: bash execution is disabled by default with LocalSandboxProvider for security reasons.

AioSandboxProvider provides full Docker-based isolation. Each thread (or group of threads) gets a dedicated Docker container from a warm pool, with health checks and automatic cleanup. This provider enables safe shell execution (bash is available), mounts the virtual paths into the container, and supports DooD (Docker-outside-of-Docker) mode for nested container operations. The Docker socket is only mounted in DooD mode, never by default.

For Kubernetes environments, a provisioner interface allows dynamic pod creation and teardown, giving each sandbox its own K8s pod with full network isolation.

Sandbox Operations

The sandbox exposes a clean, minimal interface. Here are the three core operations every agent uses:

# Write a Python analysis script to the workspace
sandbox.write_file(
    "/mnt/user-data/workspace/analysis.py",
    "print('hello')"
)

# Execute it inside the sandbox
result = sandbox.execute_command(
    "python3 /mnt/user-data/workspace/analysis.py"
)
# result: {"exit_code": 0, "stdout": "hello
", "stderr": ""}

# Read the output file
content = sandbox.read_file("/mnt/user-data/workspace/output.json")

Additional operations include list_dir() for directory listing, str_replace for surgical file edits (with per-sandbox, per-path serialization to prevent concurrent write conflicts), and write_file with an append mode flag.

Security Considerations

DeerFlow takes sandbox security seriously. The write_file tool implements a str_replace pattern that serializes read-modify-write operations per (sandbox.id, path) tuple, so isolated sandboxes can operate concurrently even when virtual paths collide. On Windows hosts, MSYS path conversion is handled transparently. The bash tool carries a warning: it is disabled by default under LocalSandboxProvider, and even with AioSandboxProvider, the Docker socket is mounted only when explicitly configured for DooD mode.

5. Skills & Tools System

DeerFlow’s skills system is the extensibility layer that allows the agent to acquire domain-specific capabilities without modifying the core runtime. Unlike traditional plugin systems where code is loaded and executed, a skill in DeerFlow is primarily a structured knowledge package—a Markdown file that teaches the agent how to perform a specific task.

Skill as Markdown Manifest

Each skill is a Markdown file with YAML front matter that declares metadata, environment variables, and tool requirements. When the agent encounters a relevant task, it loads the skill file, reads the instructions, and uses them as additional context for its reasoning loop. The skill file itself becomes part of the agent’s prompt, not compiled code. This is a fundamentally different approach from plugin architectures:

---
name: custom-web-scraper
description: "Scrape and extract structured data from websites"
env:
  HEADLESS: "true"
  MAX_PAGES: "10"
tools:
  - bash
  - web_fetch
  - read_file
---

## Instructions

You are a web scraping specialist. Your goal is to extract clean, structured data from websites.

### Rules
1. Always respect robots.txt
2. Add delays between requests (min 2 seconds)
3. Never scrape personal data without permission
4. Return data as CSV format by default

### Process
1. Analyze the target page structure using web_fetch
2. Identify the data fields to extract
3. Write a Python script using BeautifulSoup/lxml
4. Execute the script and save results
5. Validate output completeness

This approach means that extending DeerFlow with a new capability does not require writing Python code, modifying the framework, or deploying new services. It requires writing a Markdown file. This dramatically lowers the barrier to customization.

Skill Resolution

When the lead agent needs to solve a problem, it does not blindly load all skills. The ToolSelectionMiddleware (middleware #8) filters available skills based on the agent’s message history. Only skills relevant to the current task are injected into the prompt. This keeps the context window lean and prevents irrelevant instructions from diluting the agent’s focus.

Skills are loaded from two directories:

  • skills/public/ — built-in skills shipped with DeerFlow (e.g., web research, code analysis, data processing).
  • skills/custom/ — user-created skills, loaded at runtime, taking precedence over built-in skills with the same name.

Built-in Tools Complementing Skills

While skills provide knowledge, tools provide execution. DeerFlow ships a comprehensive set of built-in tools that skills can reference:

Tool When It Runs What It Does
bash Shell execution needed Executes commands in the sandbox with timeout, returns stdout/stderr
web_search Real-time information needed Queries the configured web search provider (e.g., Tavily, SearXNG)
web_fetch Page content needed Fetches and converts URLs to Markdown
write_file File creation/modification Writes or appends to files in the sandbox with concurrent write protection
read_file File reading needed Reads file content from the sandbox (up to 80K chars)
str_replace_editor Surgical file edit Search-and-replace editing with per-path serialization
task Complex sub-task Spawns a sub-agent for parallel/hierarchical execution
ask_clarification Ambiguity detected Sends a clarification question to the user
memory Long-term storage Reads from and writes to persistent memory
mcp_tools MCP server query List and call tools exposed by registered MCP servers

6. Memory System

DeerFlow implements a two-tier memory architecture that gives the agent both short-term context awareness and long-term persistence across sessions.

Short-Term Memory: Thread State

Short-term memory is managed through LangGraph’s ThreadState, which tracks conversation history, tool call results, and intermediate state within a single thread. The thread state is persisted via a configurable checkpointer—PostgreSQL by default (via AsyncPostgresSaver), with SQLite as a lightweight alternative. Each thread is identified by a thread_id and can be resumed later, enabling stateful multi-turn conversations that span days or weeks.

Long-Term Memory: Profile & Knowledge Graph

Long-term memory uses a two-pronged approach for different types of persistent information:

User Profile Memory stores structured key-value information about the user: preferences, facts, skills, and context that the agent should remember across threads. The API for reading and writing is straightforward:

# Write a memory entry
result = memory.add(
    key="user_name",
    value="Alice",
    scope="user"
)

# Read memory entries
results = memory.search(
    query="What do I know about Alice?",
    scope="user"
)

Knowledge Graph Memory stores relationships and entities as a graph structure. Unlike profile memory (which is key-value), this stores interconnected facts. Notes can be linked to other notes with relationship types, enabling the agent to build a growing web of knowledge. The graph supports add, search, and delete operations.

How Memory Integrates with the Agent Loop

The MemoryMiddleware (middleware #5) runs after_model, after the agent has generated a response. At this point, the middleware examines the agent’s output, determines if there is information worth persisting, and writes it to long-term memory. On the next agent invocation, the middleware before_model phase reads relevant memory entries and injects them into the system prompt. This happens automatically, without explicit tool calls from the agent.

For explicit memory operations, the agent can use the memory tool, which provides direct read/write/search access. Sub-agents have read-only access to the user’s memory, preventing accidental side effects.

7. IM Channels & Message Gateway

DeerFlow’s message gateway is a unified IM abstraction layer that enables the agent to communicate across eight different messaging platforms transparently.

The gateway architecture mirrors the harness/app split. The harness layer defines an abstract Channel interface:

class Channel(ABC):
    @abstractmethod
    async def send(self, to: str, message: str) -> None:
        ...

    @abstractmethod
    async def send_block(self, to: str, block: dict) -> None:
        """Send a structured block (markdown, table, image)"""
        ...

    @abstractmethod
    async def on_message(self, handler: Callable) -> None:
        """Register an incoming message handler"""
        ...

The app layer provides concrete implementations for each channel. The channels/ directory contains connectors for:

Channel Implementation Primary Use Case
Discord WebSocket bot with message intent filtering Developer communities, team discussions
Slack Socket Mode with webhook receivers Enterprise team collaboration
Telegram Polling-based bot with inline keyboards Personal assistant, broadcast channels
Lark Custom bot integration ByteDance’s internal IM, enterprise Asia markets
DingTalk Enterprise bot with callback URLs Chinese enterprise ecosystem
WeChat Work Third-party application connector Chinese enterprise communication
WhatsApp Cloud API integration (Meta) Global consumer & business messaging
Bark Push notification service iOS push alerts for agent notifications

8. Real-World Deployment Guide

Quick Start

Getting DeerFlow running requires almost zero configuration. Clone the repo, install dependencies, configure at least one model, and run the setup wizard:

git clone https://github.com/bytedance/deer-flow.git
cd deer-flow

# Option A: Local Dev (fastest)
pip install -e "backend/[dev]"
cp config.example.yaml config.yaml
# Edit config.yaml to set your LLM API keys
deer-flow setup  # Interactive wizard

# Option B: Docker (recommended for production)
docker compose up -d

The Docker Compose setup includes the FastAPI Gateway, Next.js frontend, Nginx reverse proxy, PostgreSQL, and Redis. Configuration is managed through environment variables and the config.yaml file at the project root.

Configuration Reference

The configuration is organized into clear sections in config.yaml:

  • models: Define LLM providers with API keys, base URLs, model names, and parameters (temperature, max tokens, etc.). Multiple models can be configured and selected at runtime.
  • sandbox: Choose provider type (local or docker), configure timeouts, memory limits, and network policies.
  • memory: Set storage backends (PostgreSQL for production, SQLite for development), enable/disable auto-memory, configure embedding models.
  • channels: Enable IM channels with platform-specific credentials (bot tokens, webhook URLs, app IDs).
  • mcp: Register MCP servers with their command-line invocations, environment variables, and allowed tools/folders.
  • skills: Set skill directories, enable/disable selective skill loading.
  • workspace: Configure workspace root, file size limits, allowed file extensions.

Production Hardening

For production deployments, consider these best practices:

  • Authentication: Deploy behind an authenticating reverse proxy or configure the Gateway’s built-in API key authentication.
  • Database: Use PostgreSQL for production. Enable connection pooling with PgBouncer for high concurrency.
  • Sandbox: Always use AioSandboxProvider with Docker in production. Never use LocalSandboxProvider for untrusted users.
  • Rate Limiting: Configure rate limits in Nginx to prevent abuse of the LangGraph and REST endpoints.
  • Monitoring: DeerFlow exposes Prometheus metrics at /metrics on the Gateway, covering token usage, request latency, error rates, and active thread counts.
  • Scaling: For high-load deployments, run multiple Gateway instances behind a load balancer, sharing the same PostgreSQL and Redis backends.

9. Why DeerFlow Matters

With DeerFlow 2.0, ByteDance has redefined what an open-source agent framework should be. The shift from a research-specific tool to a general-purpose super agent harness reflects a broader industry understanding: the next generation of AI applications will not be built on monolithic models but on agentic systems that compose models, tools, memory, and external services into autonomous workflows.

The key insight behind DeerFlow is that the harness is the product. By embedding the LangGraph runtime directly in a FastAPI process with Nginx routing, ByteDance eliminated the operational complexity of managing separate agent servers. By treating skills as Markdown files instead of code packages, they made extensibility accessible to non-developers. By building a sandbox abstraction with Docker isolation, they made safe code execution practical. And by designing a sub-agent system with strict concurrency controls, they proved that complex, hours-long tasks can be decomposed and orchestrated reliably.

Perhaps most importantly, DeerFlow is MIT-licensed. In an ecosystem where many powerful agent frameworks are proprietary or come with restrictive licenses, DeerFlow’s permissive license means startups, enterprises, and individual developers can adopt, modify, and deploy it without legal friction. ByteDance is not just building a product; they are seeding an ecosystem.

For developers evaluating agent frameworks in 2026, DeerFlow 2.0 sets a new baseline. The questions to ask are no longer “Can it run code?” or “Does it have memory?”—the answers are yes and yes. The questions are now: How many IM channels does it support natively? Does it have an intelligent sub-agent system? Can I extend it with a Markdown file? Can I deploy it in five minutes? DeerFlow answers all of these, and that is why it earned the #1 spot.

10. Conclusion

DeerFlow 2.0 represents a significant leap forward in open-source agent infrastructure. Its architecture—a two-layer split between harness and app, a Gateway-embedded LangGraph runtime, a virtualized sandbox system, and a Markdown-based skill framework—provides a production-ready foundation for building autonomous AI agents.

The five layers we explored each offer valuable lessons for system architects and AI engineers:

  1. Architecture: The harness/app separation and Gateway-embedded runtime demonstrate how to design frameworks that are both independently publishable and deployment-ready.
  2. Sub-Agent System: The task() tool, polling-based execution, and strict concurrency controls offer a blueprint for safe multi-agent orchestration.
  3. Sandbox: The virtual path system and Docker isolation show how to give agents real computing power without compromising security.
  4. Skills & Tools: The Markdown-as-plugin approach reimagines extensibility for the LLM era, making it accessible to non-programmers.
  5. Memory & Channels: The two-tier memory architecture and multi-IM gateway illustrate how agents can integrate seamlessly into human workflows.

As the AI agent ecosystem continues to evolve, DeerFlow 2.0 stands as both a tool and a reference architecture. Whether you are building a personal research assistant, a code-generation pipeline, or a multi-agent platform for enterprise automation, the patterns in DeerFlow provide a proven, production-tested foundation. Clone the repository, run the setup wizard, and discover what a super agent harness can do for you.

Sponsored Links

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply