avatar

Alan Dao's personal blog

AI Researcher

⚡Ultra Compact Text-to-Speech: A Quantized F5TTS

How small can a F5TTS model get? Very small! Now you can generate synthetic voice using AI, with pretty high quality under very constrained hardware. In order to run F5-TTS quantized version you just need around ~400mb of VRAM. (Spoiler: mac only currently) Yes, that’s everything you need to do to do voice generating (and voice cloning) on a macbook under 30s. The entire model pipeline is including the below types of weights:

Mlops Engineer: a Pragmatic Perspective

I’ve noticed that there are plenty of guides and courses on “learning MLOps” available online, but many of them fail to address these critical aspects: What the daily work looks like What roles can be transferred into MLOps Career prospects With that in mind, I hope to provide a practical perspective based on my personal experience. This could help you adjust your learning path (if you’re planning to transition into this position) or adapt to moving from large companies with standardized processes to smaller companies or startups.

Paper Summary: What Makes Rope Useful

Up until now everyone has been using “Rotary Position Embedding” (RoPE) like a default method for positional encoding for awhile. However, specifically how and why RoPE makes things “better” is still a little unexlored. Luckily there is a paper Round and Round We Go! What makes Rotary Positional Encodings useful? addressing this specific issue. I found a few of their results are quite interesting. 1. RoPE does not necessarily decay activations with distance: 🔗In the original paper RoFormer the authors made an analysis about the fact that RoPE has some level of decay of the increasing of the context len.

🍓 Ichigo: Llama Learns to Talk

We rebranded llama3-s into “Ichigo” with a cute UI just like below. If you are coming from Singapore Techweek you can also visit the Homebrew Blog Post in the new Annoucement Post. I will update this blog post once our paper comes out.

Tutorial: High Quality Llm on Low Vram - Llama3.1

With recent release of cutting edge model like gemma9B, llama3.1, etc… we are in an era that people can have model as small as just 8B parameters and it can have the same performance with ChatGPT3.5 or ChatGPT4 (according to lmsys ranking). Strangely, the general vibe on community like r/locallama does not seem to agree. But why my Llama3.1 seems, dumb? 🔗Or at the very least, nowhere near chatGPT 3.