For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.
Chinese research labs including Alibaba’s Qwen, DeepSeek, Moonshot and Baidu have rapidly set the pace in developing large-scale, open Mixture-of-Experts (MoE) models — often with permissive licenses and leading benchmark performance. While OpenAI fielded its own open source, general purpose LLM this summer as well — gpt-oss-20B and 120B — the uptake has been slowed by so many equally or better performing alternatives.
Now, one small U.S. company is pushing back.
Today, Arcee AI announced the release of Trinity Mini and Trinity Nano Preview, the first two models in its new “Trinity” family—an open-weight MoE model suite fully trained in the United States.
Users can try the former directly for themselves in a chatbot format on Acree’s new website, chat.arcee.ai, and developers can download the code for both models on Hugging Face and run it themselves, as well as modify them/fine-tune to their liking — all for free under an enterprise-friendly Apache 2.0 license.
While small compared to the largest frontier models, these releases represent a rare attempt by a U.S. startup to build end-to-end open-weight models at scale—trained from scratch, on American infrastructure, using a U.S.-curated dataset pipeline.
„I’m experiencing a combination of extreme pride in my team and crippling exhaustion, so I’m struggling to put into words just how excited I am to have these models out,“ wrote Arcee Chief Technology Officer (CTO) Lucas Atkins in a post on the social network X (formerly Twitter). „Especially Mini.“
A third model, Trinity Large, is already in training: a 420B parameter model with 13B active parameters per token, scheduled to launch in January 2026.
“We want to add something that has been missing in that picture,” Atkins wrote in the Trinity launch manifesto published on Arcee’s website. “A serious open weight model family trained end to end in America… that businesses and developers can actually own.”
From Small Models to Scaled Ambition
The Trinity project marks a turning point for Arcee AI, which until now has been known for its compact, enterprise-focused models. The company has raised $29.5 million in funding to date, including a $24 million Series A in 2024 led by Emergence Capital, and its previous releases include AFM-4.5B, a compact instruct-tuned model released in mid-2025, and SuperNova, an earlier 70B-parameter instruction-following model designed for in-VPC enterprise deployment.
Both were aimed at solving regulatory and cost issues plaguing proprietary LLM adoption in the enterprise.
With Trinity, Arcee is aiming higher: not just instruction tuning or post-training, but full-stack pretraining of open-weight foundation models—built for long-context reasoning, synthetic data adaptation, and future integration with live retraining systems.
Originally conceived as a stepping stone to Trinity Large, both Mini and Nano emerged from early experimentation with sparse modeling and quickly became production targets themselves.
Technical Highlights
Trinity Mini is a 26B parameter model with 3B active per token, designed for high-throughput reasoning, function calling, and tool use. Trinity Nano Preview is a 6B parameter model with roughly 800M active non-embedding parameters—a more experimental, chat-focused model with a stronger personality, but lower reasoning robustness.
Both models use Arcee’s new Attention-First Mixture-of-Experts (AFMoE) architecture, a custom MoE design blending global sparsity, local/global attention, and gated attention techniques.
Inspired by recent advances from DeepSeek and Qwen, AFMoE departs from traditional MoE by tightly integrating sparse expert routing with an enhanced attention stack — including grouped-query attention, gated attention, and a local/global pattern that improves long-context reasoning.
Think of a typical MoE model like a call center with 128 specialized agents (called “experts”) — but only a few are consulted for each call, depending on the question. This saves time and energy, since not every expert needs to weigh in.
What makes AFMoE different is how it decides which agents to call and how it blends their answers. Most MoE models use a standard approach that picks experts based on a simple ranking.
AFMoE, by contrast, uses a smoother method (called sigmoid routing) that’s more like adjusting a volume dial than flipping a switch — letting the model blend multiple perspectives more gracefully.
The “attention-first” part means the model focuses heavily on how it pays attention to different parts of the conversation. Imagine reading a novel and remembering some parts more clearly than others based on importance, recency, or emotional impact — that’s attention. AFMoE improves this by combining local attention (focusing on what was just said) with global attention (remembering key points from earlier), using a rhythm that keeps things balanced.
Finally, AFMoE introduces something called gated attention, which acts like a volume control on each attention output — helping the model emphasize or dampen different pieces of information as needed, like adjusting how much you care about each voice in a group discussion.
All of this is designed to make the model more stable during training and more efficient at scale — so it can understand longer conversations, reason more clearly, and run faster without needing massive computing resources.
Unlike many existing MoE implementations, AFMoE emphasizes stability at depth and training efficiency, using techniques like sigmoid-based routing without auxiliary loss, and depth-scaled normalization to support scaling without divergence.
Model Capabilities
Trinity Mini adopts an MoE architecture with 128 experts, 8 active per token, and 1 always-on shared expert. Context windows reach up to 131,072 tokens, depending on provider.
Benchmarks show Trinity Mini performing competitively with larger models across reasoning tasks, including outperforming gpt-oss on the SimpleQA benchmark (tests factual recall and whether the model admits uncertainty), MMLU (Zero shot, measuring broad academic knowledge and reasoning across many subjects without examples), and BFCL V3 (evaluates multi-step function calling and real-world tool use):
MMLU (zero-shot): 84.95
Math-500: 92.10
GPQA-Diamond: 58.55
BFCL V3: 59.67
Latency and throughput numbers across providers like Together and Clarifai show 200+ tokens per second throughput with sub-three-second E2E latency—making Trinity Mini viable for interactive applications and agent pipelines.
Trinity Nano, while smaller and not as stable on edge cases, demonstrates sparse MoE architecture viability at under 1B active parameters per token.
Access, Pricing, and Ecosystem Integration
Both Trinity models are released under the permissive, enterprise-friendly, Apache 2.0 license, allowing unrestricted commercial and research use. Trinity Mini is available via:
API pricing for Trinity Mini via OpenRouter:
$0.045 per million input tokens
$0.15 per million output tokens
A free tier is available for a limited time on OpenRouter
The model is already integrated into apps including Benchable.ai, Open WebUI, and SillyTavern. It’s supported in Hugging Face Transformers, VLLM, LM Studio, and llama.cpp.
Data Without Compromise: DatologyAI’s Role
Central to Arcee’s approach is control over training data—a sharp contrast to many open models trained on web-scraped or legally ambiguous datasets. That’s where DatologyAI, a data curation startup co-founded by former Meta and DeepMind researcher Ari Morcos, plays a critical role.
DatologyAI’s platform automates data filtering, deduplication, and quality enhancement across modalities, ensuring Arcee’s training corpus avoids the pitfalls of noisy, biased, or copyright-risk content.
For Trinity, DatologyAI helped construct a 10 trillion token curriculum organized into three phases: 7T general data, 1.8T high-quality text, and 1.2T STEM-heavy material, including math and code.
This is the same partnership that powered Arcee’s AFM-4.5B—but scaled significantly in both size and complexity. According to Arcee, it was Datology’s filtering and data-ranking tools that allowed Trinity to scale cleanly while improving performance on tasks like mathematics, QA, and agent tool use.
Datology’s contribution also extends into synthetic data generation. For Trinity Large, the company has produced over 10 trillion synthetic tokens—paired with 10T curated web tokens—to form a 20T-token training corpus for the full-scale model now in progress.
Building the Infrastructure to Compete: Prime Intellect
Arcee’s ability to execute full-scale training in the U.S. is also thanks to its infrastructure partner, Prime Intellect. The startup, founded in early 2024, began with a mission to democratize access to AI compute by building a decentralized GPU marketplace and training stack.
While Prime Intellect made headlines with its distributed training of INTELLECT-1—a 10B parameter model trained across contributors in five countries—its more recent work, including the 106B INTELLECT-3, acknowledges the tradeoffs of scale: distributed training works, but for 100B+ models, centralized infrastructure is still more efficient.
For Trinity Mini and Nano, Prime Intellect supplied the orchestration stack, modified TorchTitan runtime, and physical compute environment: 512 H200 GPUs in a custom bf16 pipeline, running high-efficiency HSDP parallelism. It is also hosting the 2048 B300 GPU cluster used to train Trinity Large.
The collaboration shows the difference between branding and execution. While Prime Intellect’s long-term goal remains decentralized compute, its short-term value for Arcee lies in efficient, transparent training infrastructure—infrastructure that remains under U.S. jurisdiction, with known provenance and security controls.
A Strategic Bet on Model Sovereignty
Arcee’s push into full pretraining reflects a broader thesis: that the future of enterprise AI will depend on owning the training loop—not just fine-tuning. As systems evolve to adapt from live usage and interact with tools autonomously, compliance and control over training objectives will matter as much as performance.
“As applications get more ambitious, the boundary between ‘model’ and ‘product’ keeps moving,” Atkins noted in Arcee’s Trinity manifesto. “To build that kind of software you need to control the weights and the training pipeline, not only the instruction layer.”
This framing sets Trinity apart from other open-weight efforts. Rather than patching someone else’s base model, Arcee has built its own—from data to deployment, infrastructure to optimizer—alongside partners who share that vision of openness and sovereignty.
Looking Ahead: Trinity Large
Training is currently underway for Trinity Large, Arcee’s 420B parameter MoE model, using the same afmoe architecture scaled to a larger expert set.
The dataset includes 20T tokens, split evenly between synthetic data from DatologyAI and curated wb data.
The model is expected to launch next month in January 2026, with a full technical report to follow shortly thereafter.
If successful, it would make Trinity Large one of the only fully open-weight, U.S.-trained frontier-scale models—positioning Arcee as a serious player in the open ecosystem at a time when most American LLM efforts are either closed or based on non-U.S. foundations.
A recommitment to U.S. open source
In a landscape where the most ambitious open-weight models are increasingly shaped by Chinese research labs, Arcee’s Trinity launch signals a rare shift in direction: an attempt to reclaim ground for transparent, U.S.-controlled model development.
Backed by specialized partners in data and infrastructure, and built from scratch for long-term adaptability, Trinity is a bold statement about the future of U.S. AI development, showing that small, lesser-known companies can still push the boundaries and innovate in an open fashion even as the industry is increasingly productized and commodtized.
What remains to be seen is whether Trinity Large can match the capabilities of its better-funded peers. But with Mini and Nano already in use, and a strong architectural foundation in place, Arcee may already be proving its central thesis: that model sovereignty, not just model size, will define the next era of AI.