Ornith-1.0: Open-Source Self-Improving Models For Agentic Coding

Contents

Introduction

Ornith-1.0 is a family of open-source, self-improving large language models purpose-built for agentic coding tasks. Unlike traditional static models, Ornith-1.0 leverages iterative self-refinement to enhance its code generation, debugging, and reasoning capabilities over time. Released under the MIT license, it represents a significant step forward in making high-performance AI coding assistants accessible to everyone.

The flagship Ornith-1.0-397B variant not only matches but surpasses proprietary alternatives like Claude Opus 4.7 on key benchmarks — a milestone for open-source AI in the coding domain.

Key Features

Self-Improving Architecture: Ornith-1.0 employs iterative self-refinement loops that allow the model to learn from its own outputs, progressively improving code quality and reasoning accuracy without additional human-labeled data.
Fully Open-Source (MIT License): Released under the permissive MIT license, enabling unrestricted commercial and research use. No API keys, no usage restrictions, no vendor lock-in.
Multiple Model Variants: Available in 397B, 35B, and 9B parameter configurations, allowing developers to choose the optimal balance between performance and computational cost for their use case.
Agentic Coding Focus: Specifically optimized for multi-step coding workflows, including autonomous debugging, code refactoring, test generation, and repository-level reasoning.
Hardware Flexible: Smaller variants (9B, 35B) can run on consumer-grade GPUs, while the 397B model delivers state-of-the-art performance on multi-GPU setups.

Benchmark Performance

Ornith-1.0 achieves exceptional results across industry-standard coding benchmarks. Below is a comparison with leading proprietary and open-source models:

Model                    | Terminal-Bench 2.1 | SWE-bench Verified
-------------------------|--------------------|------------------
Ornith-1.0-397B          | 77.5%              | 82.4%
Ornith-1.0-35B           | 64.2%              | 75.6%
Ornith-1.0-9B            | 43.1%              | 69.4%
Claude Opus 4.7          | 70.3%              | 80.8%
Qwen3.5-397B             | 53.5%              | 76.4%

The flagship Ornith-1.0-397B scores 77.5% on Terminal-Bench 2.1 and 82.4% on SWE-bench Verified — outperforming Claude Opus 4.7 by +7.2 percentage points and +1.6 percentage points respectively. Most notably, it beats Qwen3.5-397B by a wide margin on both benchmarks, despite similar parameter counts.

Key Insight: Even the smaller Ornith-1.0-35B variant (64.2% TB-2.1) outperforms the much larger Qwen3.5-397B (53.5%), demonstrating the efficiency of the self-improving approach.

Model Variants & Serving Options

Ornith-1.0 ships in three variants tailored to different deployment scenarios:

Ornith-1.0-397B (Flagship)

Best-in-class performance across all benchmarks
Recommended: Multi-GPU setup with 8x A100 80GB or equivalent
Serve via vLLM, TGI, or TensorRT-LLM
Optimal for complex, repository-level coding tasks

Ornith-1.0-35B (Balanced)

Strong performance-to-cost ratio
Runs on 2x A100 80GB or equivalent consumer GPUs
Ideal for development teams needing local deployment
Beats models 10x its size on coding benchmarks

Ornith-1.0-9B (Lightweight)

Runs on a single consumer GPU (RTX 4090 or similar)
Surprisingly capable for its size, especially on SWE-bench
Perfect for edge deployment and resource-constrained environments
Excellent baseline for fine-tuning on domain-specific codebases

Integration with OpenCode

Ornith-1.0 is fully compatible with OpenCode, the open-source AI coding assistant. You can serve any Ornith variant locally and connect it to OpenCode for a completely private, offline agentic coding workflow:

# Serve Ornith-1.0-35B locally with vLLM
vllm serve deepreinforce-ai/Ornith-1.0-35B \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 2

# Configure OpenCode to use your local server
# Add to opencode.json:
{
  "provider": {
    "name": "openai-compatible",
    "baseURL": "http://localhost:8000/v1",
    "model": "deepreinforce-ai/Ornith-1.0-35B"
  }
}

For the flagship 397B model, vLLM handles tensor parallelism automatically across multiple GPUs. The smaller 9B variant can run on a single GPU, making it ideal for quick prototyping with OpenCode.

Getting Started

# Install from Hugging Face
pip install transformers torch

# Quick inference example
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "deepreinforce-ai/Ornith-1.0-35B",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("deepreinforce-ai/Ornith-1.0-35B")

prompt = "Write a Python function to implement a thread-safe LRU cache."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Conclusion

Ornith-1.0 marks a turning point in open-source AI for coding. By combining a self-improving architecture with the permissive MIT license, it democratizes access to state-of-the-art agentic coding capabilities. Whether you’re running the 397B flagship for maximum performance or the lightweight 9B variant on consumer hardware, Ornith-1.0 delivers exceptional value.

The numbers speak for themselves: a 397B open-source model outperforming Claude Opus 4.7 on both Terminal-Bench 2.1 and SWE-bench Verified is not just impressive — it’s a signal that the gap between proprietary and open-source AI is closing rapidly.