
Alan Dao's personal blog
AI Researcher
I need to vent and share something that’s blown my mind today. I just came across this paper evaluating state-of-the-art LLMs (like O3-MINI, Claude 3.7, etc.) on the 2025 USA Mathematical Olympiad (USAMO) problems. And let me tell you—this is wild.
First off, here’s a quick breakdown of what they did:
They tested six top-tier LLMs on six proof-based math problems from the 2025 USAMO. Each model attempted every problem four times, and solutions were graded by expert human judges using standardized rubrics.
This quarter, I focused on launching two major projects:
AlphaMaze 🔗AlphaMaze is a two-stage training framework that enhances large language models with visual reasoning for maze navigation. It combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to help models build an internal “mental map” of their environment.
Find out more:
Paper: arXiv:2502.14669 Code: GitHub - janHQ/visual-thinker Live Demo: alphamaze.menlo.ai PoseLess 🔗PoseLess is a vision-based robot control framework that maps 2D images to joint angles without explicit pose estimation.
If you’ve been keeping an eye on OpenRouter, you might have noticed that its total token usage has skyrocketed. In fact, I’ve tracked the public data from their website and compiled some stats, showing an incredible 76x growth since 2024 (y-axis is in 10B).
So, what’s driving OpenRouter’s success?
What give way to the success of OpenRouter? 🔗OpenRouter’s growth isn’t happening in isolation—it’s closely tied to the rapid adoption of AI-powered coding assistants.
DeepSeek-V3 is the latest model in the DeepSeek’s model family. The model itself is packed with many cummulative efforts to improve the performance of the model.
The model is literally blowing every closed source or open source model out of the water given its size.
But what made DeepSeek-V3 so good? Let’s deep dive.
Overall 🔗At first you might have an impression that this model is a gigantic monster. Surely 671B is big, but it is really not that big from architecture point of view, that I will explain shortly.
2024 the year of paper 🔗In 2024, over 100 papers are published daily on arXiv, a staggering amount that’s impossible to read all of. However, I’ve come across a few fascinating AI papers—some less mainstream but with solid ideas—grouped into three main categories of interest.
Emergence compressive behavior Distribution matching Alternative I will share what I learned by providing a short explanation (not just summary) of each paper, as well as give some of my general view on each category.