Dated back to this post Multi Modal Tokenizing With Chameleon. I have worked with my team at HomeBrew Research to make something. We wanted to give the community something new and not simply a replicate of Chameleon (vision modality).
By that, we decided to work on a model that can do sound, that you can talk to it, that you can give commands to it. A Llama model that can listen!
“What is the color of the sky?”
We proudly present the result, a working llama3,1 checkpoint (that we call llama3-s) and have ability to listen to you! Announcement Blog
If you don’t have the time to skim through the blog post, you can also directly use the demo here:
Hope our effort has brought you some joy and fun (because this is a project of passion)