This quarter, I focused on launching two major projects:
AlphaMaze 🔗
AlphaMaze is a two-stage training framework that enhances large language models with visual reasoning for maze navigation. It combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to help models build an internal “mental map” of their environment.
Find out more:
- Paper: arXiv:2502.14669
- Code: GitHub - janHQ/visual-thinker
- Live Demo: alphamaze.menlo.ai
PoseLess 🔗
PoseLess is a vision-based robot control framework that maps 2D images to joint angles without explicit pose estimation. By leveraging tokenized visual inputs and a transformer-based decoder, it enables zero-shot generalization and cross-morphology transfer from robots to human hands.
Find out more:
- Paper: arXiv:2503.07111
- Code: GitHub - janHQ/poseless
Both projects have received positive feedback from the AI and robotics community. Stay tuned for future works!