avatar

Alan

AI Practitioner

Rotary Position Embedding (RoPE) Down to the Bone

You may have heard everywhere on Reddit or on Twitter about… “Model A has RoPE implemented.” “We can make it run longer by changing the RoPE scaling.” …and so on. But for real? What the hell is RoPE and how does it work? They say something about sin and cos, but what does that even mean? Now, I am about to debunk all of that, for your sake. The intuition behind RoPE 🔗In order to understand what is RoPE firstly we need to review some high school math.

Euler's Formula Proof

Euler’s formula understanding is fundamental to the understanding of Rotary Positional Embedding (RoPE) implementation in model like LLaMA. Below is the proof of Euler’s formula that is used in my blog again and again for referrence purpose. Euler’s formula, often expressed as $$ e^{ix} = \cos(x) + i\sin(x) $$ is a fundamental result in complex analysis and connects trigonometric functions with the exponential function. It was discovered by Leonhard Euler in the 18th century. Here’s a proof using Taylor series expansions:

Ai in a Flash: Ai and the Matrix

Hello Neo here Ha! Got you there, not this matrix, the matrix I am talking about is this one. $$ A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} $$ Boring? But this is the fundamental thing that will connect you to “Neural Network” which is the building block of Artificial Intelligence and the like, surprisingly. I will show you how in the section below. But hey, hold on there! Let’s me tell what you the important thing first.