Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam
Nous Research, the San Francisco-based artificial intelligence startup, released on Tuesday an open-source mathematical reasoning system called Nomos 1 that achieved near-elite human performance on this year's William Lowell Putnam Mathematical Competition, one of the most prestigious and notoriously difficult undergraduate math contests in the world.The Putnam is known for its difficulty: While a perfect score is 120, this year's top score was 90, and the median was just 2. Nomos 1, by contrast, scored 87 points — a result that would have ranked second out of 3,988 participants in the 2024 competition, according to the company.The release marks an inflection point in the rapidly accelerating race to build AI systems capable of sophisticated mathematical reasoning. Unlike the massive, compute-intensive models deployed by major technology companies, Nomos 1 achieves its results with a relatively compact architecture: 30 billion parameters with roughly 3 billion active at any given time,
Nous Research, the San Francisco-based artificial intelligence startup, released on Tuesday an open-source mathematical reasoning system called Nomos 1 that achieved near-elite human performance on this year's William Lowell Putnam Mathematical Competition, one of the most prestigious and notoriously difficult undergraduate math contests in the world.
The Putnam is known for its difficulty: While a perfect score is 120, this year's top score was 90, and the median was just 2. Nomos 1, by contrast, scored 87 points — a result that would have ranked second out of 3,988 participants in the 2024 competition, according to the company.
The release marks an inflection point in the rapidly accelerating race to build AI systems capable of sophisticated mathematical reasoning. Unlike the massive, compute-intensive models deployed by major technology companies, Nomos 1 achieves its results with a relatively compact architecture: 30 billion parameters with roughly 3 billion active at any given time, using a mixture-of-experts design based on Alibaba's Qwen3 model.
"This score would rank #2/3988 in 2024 and marks our first step with Hillclimb AI towards creating a SOTA AI mathematician," Nous Research announced on social media Tuesday.
The same base model scored 24 points without Nous Research's specialized training
Perhaps most striking is the gap between Nomos 1 and its base model. When Nous Research ran the same Qwen3-30B-A3B-Thinking-2507 model through an identical testing harness, it scored just 24 out of 120 — a result that underscores the critical importance of post-training optimization and specialized reasoning techniques over raw model scale.
"Nomos 1 achieved an 87/120 with 8 perfect scores," the company stated, noting that the performance difference "is largely due to post-training and data quality rather than the harness."
The results were verified through blind grading by a human expert who had previously finished in the top 200 on the Putnam. Nous Research provided the anonymized submissions to the grader, then published the full set of de-anonymized files and the runbooks used to generate them on GitHub.
Why the Putnam competition is considered the ultimate test of mathematical reasoning
The William Lowell Putnam Mathematical Competition is an annual mathematics competition for undergraduate college students enrolled at institutions of higher learning in the United States and Canada. It is widely considered to be the most prestigious university-level mathematics competition in the world.
The notoriously brutal William Lowell Putnam Mathematical Competition is more of a mathematical sporting event than an academic test. The exam consists of two 3-hour sessions separated by a 2-hour break. There are a total of 12 questions to be solved, 6 for each session. Each question is worth 10 points, for a total of 120 points.
Putnam questions are not the type that come up in regular exams or textbooks. They are more like puzzles than calculations, often requiring students to find different ways to represent things before a solution might unfold.
Last year, nearly 4,000 students across the continent wrote the Putnam. Sixty-one per cent scored three points or fewer, according to the Mathematical Association of America, which organizes the competition. The top score was 90 out of 120.
Many Putnam Fellows have gone on to become distinguished researchers in mathematics and other fields, including three Fields Medalists — John Milnor, David Mumford, and Daniel Quillen — and two Nobel laureates in physics — Richard Feynman and Kenneth Wilson.
Inside the two-phase reasoning system that powers Nomos 1's mathematical breakthroughs
Nomos 1 is a specialization of Qwen's Qwen3-30B-A3B-Thinking model, optimized for mathematical problem-solving and proof-writing in natural language. The system was developed in collaboration with Hillclimb AI.
What distinguishes Nomos 1 from simple model inference is its sophisticated reasoning harness — an open-source framework that orchestrates how the model approaches and solves problems. The harness operates in two distinct phases within a three-hour time limit, mirroring the actual Putnam competition structure.
In the solving phase, parallel workers simultaneously tackle problems using a priority-based system. Each worker picks a problem, generates a submission, then scores its own work on a scale of 1 to 7. Problems with the fewest perfect scores receive priority, ensuring the system focuses its compute on the hardest challenges. This process continues until either all problems have achieved a target number of self-critiqued perfect scores or time runs out.
The finalization phase begins 15 minutes before the time limit (or at 50% for shorter runs) and employs a two-stage selection process. First, a consolidation step groups submissions by conclusion and attempts to identify the correct group — importantly, not necessarily the majority group. Then, a pairwise tournament using single elimination determines the final submission for each problem.
"Our open source reasoning system consists of a solving phase, where workers attempt a least-solved problem and self-assess, followed by a finalization phase, which consolidates submissions to choose a final submission for each problem," Nous Research explained.
How Nomos 1 compares to mathematical AI systems from DeepSeek, Google, and OpenAI
The Nomos 1 results arrive amid a flurry of advances in mathematical reasoning AI. DeepSeek's model, DeepSeekMath-V2, scored 118 out of 120 points on questions from the 2024 William Lowell Putnam Mathematical Competition, beating the top human score of 90. The model also performed at the level of gold-medal winners in the International Mathematical Olympiad.
This year, Google's advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit. They achieved this year's result using an advanced version of Gemini Deep Think.
What makes Nomos 1's achievement notable is not raw performance — it trails DeepSeek's 118/120 — but rather its accessibility and efficiency. At 30 billion parameters with only 3 billion active, the model can run on consumer-grade hardware, a stark contrast to the massive compute clusters required by frontier models from OpenAI and Google.
Hermes 4.3 arrived just six days earlier, trained on a decentralized blockchain network
The Nomos 1 announcement follows closely on the heels of Nous Research's December 3 release of Hermes 4.3, a general-purpose language model that marked another significant milestone for the company.
Hermes 4.3, based on ByteDance's Seed-OSS-36B-Base model, is the first production model that Nous Research trained entirely on its Psyche network — a distributed training infrastructure that uses a novel optimizer called DisTrO to coordinate training across nodes spread throughout data centers over the open internet, secured by consensus on the Solana blockchain.
The company trained Hermes 4.3 both through traditional centralized methods and on the Psyche network, specifically to verify that distributed training could match or exceed centralized performance for production workloads. The Psyche-trained version outperformed the centralized version across a suite of downstream tasks, the company reported.
"The training run proved stable throughout, averaging 144k tokens/second spread across 24 Psyche nodes," Nous Research stated. "Using DisTrO's overlapped collective strategy, the entirety of the P2P communications were hidden by the training time, effectively achieving equivalent throughput to traditional, centralized training."
Hermes 4.3 also achieved state-of-the-art results on RefusalBench, a new benchmark that measures a model's willingness to be helpful across a variety of scenarios commonly restricted by other models. The model answered 74.60% of RefusalBench questions in non-reasoning mode, surpassing its predecessor Hermes 4 70B (59.50%) and outperforming closed models including Grok 4 (51.30%) and Gemini 2.5 Pro (24.23%).
Small models with smart training are closing the gap with trillion-parameter giants
Together, the two releases in a single week signal Nous Research's strategic bet: that smaller, more efficient models with sophisticated post-training techniques and reasoning harnesses can compete with — and in some cases outperform — the massive models developed by better-funded competitors.
For enterprise decision-makers, the implications are significant. Mathematical reasoning capabilities have applications far beyond academic competitions: they're essential for formal verification, theorem proving, scientific modeling, cryptographic analysis, and any domain requiring rigorous logical deduction.
The open-source nature of both releases — Nomos 1 is available under the Apache 2.0 license on Hugging Face, with the full reasoning harness on GitHub — means that organizations can deploy these capabilities on their own infrastructure without relying on API calls to major cloud providers.
"For the first time, anyone can run or access a state-of-the-art AI mathematician," one observer noted on social media. "This lowers the barrier to serious math research, proof verification, modeling complex systems, advanced reasoning work."
The key contributors to Nomos 1 include Roger Jin, who led the training; Jeffrey Quesnelle and Dakota Mahan, who built the infrastructure; Chen Guang, who advised; and Ryan Teknium and Jeffrey Quesnelle, who provided leadership. The model was developed with contributions from Hillclimb AI and a team of math experts including Samuel Kim, Miron Yurkevich, and others.
The race to build AI mathematicians is accelerating faster than anyone predicted
The 86th Putnam Competition took place on Saturday, December 6, 2025 — just three days before Nous Research released Nomos 1. The timing underscores how rapidly the field is moving: companies are now releasing mathematical AI systems capable of near-elite human performance within days of the competitions they're designed to solve.
Competition in mathematical AI has intensified dramatically in recent months. In July, an advanced version of Google DeepMind's Gemini model and an experimental reasoning model from OpenAI both achieved gold status on the IMO 2025. DeepSeek's new model matched their performance, solving 5 out of 6 problems.
But the resource requirements for those frontier systems remain prohibitive for most organizations. OpenAI's o1-pro is estimated at over 1.8 trillion parameters; Google's Gemini 2.5 Pro likely exceeds 400 billion. Nomos 1, by contrast, achieves competitive results with a fraction of that footprint.
The gap between massive frontier models and efficient open-source alternatives is narrowing. And for organizations that need mathematical reasoning capabilities without the budget for hyperscale compute, that gap may have just closed enough to matter.
As one observer put it on social media: "This marks a significant jump for AI math models that are small enough to run on your laptop."
A laptop that can now outperform nearly 4,000 of the continent's best undergraduate mathematicians.
Share
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
