For the past year, the open-source AI community has been chasing a moving target: can a freely available model match the frontier proprietary systems on real-world agentic tasks? Nex-AGI just made the strongest case yet. The lab today released Nex-N2, an open-source agentic reasoning model that goes toe-to-toe with GPT-5.5, Claude Opus 4.7, and DeepSeek-V4-Pro across coding, browsing, and software engineering benchmarks.

Nex-N2 ships in two variants. Nex-N2-Pro (post-trained on Qwen3.5-397B-A17B, a 397-billion-parameter MoE with 17B active) targets maximum performance. Nex-N2-mini (built on Qwen3.5-35B-A3B-Base) offers a lighter entry point for teams with tighter infrastructure. Both share a 256K context window, multimodal input support, and the same architectural philosophy: Agentic Thinking.

Agentic Thinking: Beyond Chain-of-Thought

The core insight behind Nex-N2 is that reasoning, tool use, and environment execution have historically been treated as separate capabilities. Nex-AGI fused them into a single closed-loop framework called Agentic Thinking, which connects requirement understanding, task planning, code implementation, environmental feedback, evaluation, and iterative debugging.

The framework rests on two pillars:

  • Adaptive Thinking — The model dynamically decides when to think and how deeply. Simple actions execute quickly; critical decisions trigger thorough reasoning. This avoids the latency tax of models that over-reason on trivial steps while still catching edge cases on high-stakes branches.
  • Coherent Thinking — A single reasoning paradigm spans general reasoning, agentic coding, tool calling, and terminal execution. Consistency across task modalities enables stable capability transfer — a lesson learned from how real developers work, switching context fluidly between planning, coding, debugging, and searching.

In practice, this means Nex-N2 can autonomously navigate a multi-hour software engineering task, pause to search documentation when stuck, write and test code, and iterate based on compiler or runtime errors — without human intervention. That is the difference between a chatbot and an agent.

Benchmark Breakdown: Keeping Up with the Frontier

The numbers tell a compelling story. Nex-AGI evaluated Nex-N2 across three axes — agentic tasks, coding, and general reasoning — against GPT-5.5, Opus 4.7, Kimi-K2.6, and DeepSeek-V4-Pro.

Benchmark Nex-N2-Pro Nex-N2-mini GPT-5.5 Opus 4.7 Kimi-K2.6 DeepSeek-V4-Pro
BrowseComp 83.7 74.1 84.4 79.8 83.2 83.4
GDPval 1585 1402 1769 1753 1481 1554
Toolathlon 51.9 33.3 55.6 52.8 50.0 51.8
Terminal-Bench 2.1 75.3 60.7 83.4 69.7 72.0
SWE-Bench Verified 80.8 74.4 82.9 87.6 80.2 80.6
SWE-Bench Pro 58.8 50.2 58.6 64.3 58.6 55.4
GPQA Diamond 90.7 82.6 93.6 94.2

Three numbers stand out. BrowseComp at 83.7 puts Nex-N2-Pro within 0.7 points of GPT-5.5 and ahead of Opus 4.7 (79.8). Terminal-Bench 2.1 at 75.3 beats Opus 4.7 by nearly 6 points, proving its mettle in real shell-based workflows. And SWE-Bench Verified at 80.8 lands within 2 points of GPT-5.5 — a gap small enough that most developers will find it indistinguishable in practice.

The GDPval score of 1585 and GPQA Diamond of 90.7 further reinforce that this is not a one-trick pony: the model holds its own on long-horizon planning and graduate-level science reasoning.

How to Run It

Deploying Nex-N2 requires Nex-AGI's custom sglang fork, available as a Docker image at nexagi/sglang:v0.5.12. The Pro variant demands serious hardware: 2 nodes × 8×H100 (CUDA 12.8), with tensor parallelism set to 16. The mini variant is far more accessible at 1 node × 2×H100 with tp=2.

Nex-AGI recommends the following sampling parameters for production use:

  • Temperature: 0.7
  • Top-p: 0.95
  • Top-k: 40

Function calling is enabled via --tool-call-parser qwen3_coder, and reasoning parsing via --reasoning-parser qwen3. The 256K context window makes it viable for long-session agentic workloads like repository-wide refactoring or multi-turn research.

For teams without dedicated H100 clusters, a free trial is available on OpenRouter until June 23, 2026. The model is also accessible via SiliconFlow and Novita AI.

Open Source & Caveats

Both variants are available for download on Hugging Face and ModelScope. The repository on GitHub has accumulated 297 stars at the time of writing.

One important caveat: despite positioning itself as open source, the Nex-N2 repository does not currently include a LICENSE file. The Hugging Face model card and several downstream API providers reference Apache 2.0, but the official repo has not confirmed this. Prospective users should treat licensing as unresolved until Nex-AGI publishes a formal license file.

Deployability Matters

The gap between a model that scores well on static benchmarks and one that works reliably in production is wide. Nex-N2's closed-loop Agentic Thinking framework is designed precisely to bridge that gap: by unifying reasoning, tool execution, and environmental feedback within a single architecture, it reduces the engineering overhead typically required to chain separate models together. For teams building agentic coding assistants, automated research tools, or CI/CD debugging agents, this means less glue code and fewer failure modes between the model and the task.

The Pro variant's hefty hardware requirement (16 H100 GPUs) will limit self-hosting to well-resourced teams, but the mini variant and API access lower the barrier significantly. For early-stage teams, the OpenRouter free trial offers a zero-commitment path to evaluation.

The Bottom Line

Nex-N2-Pro is a genuine milestone for open-source agentic AI. A free model that matches GPT-5.5 on BrowseComp, beats Opus 4.7 on Terminal-Bench, and lands within striking distance on SWE-Bench would have been unthinkable even six months ago. The model is not perfect — it trails on DeepSWE (33.6 vs 70 for GPT-5.5) and GDPval (1585 vs 1769), and the missing license file needs prompt resolution — but the trajectory is unmistakable.

Who should care? If you build agentic systems, write code for a living, or evaluate models for enterprise deployment, Nex-N2 deserves a spot on your testing shortlist. The open-source lead over proprietary models is narrowing, and Nex-N2 is the closest thing yet to a frontier agent you can download, inspect, and run on your own hardware.

The agent wars just got a new player — and this one is free.

Sponsored Links

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply