Rotating and Stretching Data Spaces
A single 2-by-2 matrix can rotate the embedding of the word "machine learning" by 90 degrees and simultaneously stretch its distance from the origin by 3 times, instantly moving it closer in vector space to wiki entries like "neural network" and "deep learning".
Problem
You have built a small wiki inside an LLM where each concept such as "machine learning" lives as an embedding (a vector). You want to move this embedding toward related ideas without manually editing every number. A learned weight matrix should do the heavy lifting, acting like a gentle push that rotates and stretches the idea in the right direction so it sits near its neighbors. This is the exact job linear transformations perform inside every Transformer layer.
Concept
Think of a vector as an arrow on a city map. A geometric transformation is like applying a rule that tells every arrow where to point next. One common tool is a rotation matrix. It spins every arrow by a fixed angle without changing its length. Another tool is a scaling matrix. It stretches or shrinks arrows along each direction, like stretching a rubber sheet in one way but not another.
When you multiply a matrix by a vector you get matrix-vector multiplication. The result is the new position of the arrow after the transformation. The determinant of the matrix tells you how much the whole map has been stretched or squashed overall. A determinant of 2 means every area became twice as large. A negative determinant means the map was also flipped over.
All of these together let a single weight matrix act as a learned nudge that moves embeddings of related wiki entries closer together in the hidden space.
This live simulation shows exactly how a rotation matrix and scaling matrix combine into one geometric transformation. Drag the sliders and watch the input vector (our "machine learning" embedding) swing around while the length changes. The determinant updates instantly so you can see whether the space is being enlarged or shrunk.
Minimal working example
Below is a tiny TypeScript function that applies a transformation to a vector. Every line is commented.
Example breakdown
We start with an embedding that lives at coordinates (3, 1). This is a concrete vector.
The rotationMatrix turns every vector 90 degrees counterclockwise. Its columns are the new basis vectors after rotation. Because we have already covered linear transformation, we know this matrix represents a pure rotation.
The scalingMatrix stretches every direction equally. Multiplying by 2 makes the map twice as big everywhere.
matrixVectorMultiply does the actual work. For each row of the weight matrix it computes the weighted sum of the input components. This is exactly how a neural network layer transforms an embedding.
We apply the rotation first, then the scaling. Changing the order would give a different final position, showing that matrix multiplication is not commutative.
The final numbers [-2, 6] tell us the "machine learning" idea has now been moved to a place twice as far from the origin and rotated toward the positive y-axis, closer to the embeddings of related wiki pages.
This step-through lets you cycle through four different matrices while keeping the same input embedding fixed. Watch how a pure rotation matrix, a pure scaling matrix, a combined transformation, and a realistic learned weight matrix each move the vector to a completely different place. The numbers update at every click so you can connect the algebra to the geometry.
Extended example
Now we make the function more realistic for a wiki. We combine a rotation, scaling, and a small learned nudge that mimics a weight matrix inside an attention layer.
The learnedTransform matrix was not hand-crafted. In a real system it would be updated during training so that embeddings of related concepts naturally drift toward one another. This is the geometric heart of how Transformers organize knowledge.
Common mistakes
Beginners often forget that matrix-vector multiplication is not the same as ordinary multiplication. Writing matrix * vector with the * operator in JavaScript will not work; you must loop over rows and columns. The fix is to keep the matrixVectorMultiply helper from the minimal example.
Another frequent error is mixing row-major and column-major order. If your rotation looks wrong, check whether the matrix rows or columns represent the new basis vectors. Printing the determinant quickly reveals the mistake: a value near zero means the matrix has collapsed the space into a line and you will lose information.
Finally, people sometimes apply the scaling before the rotation when the model expects the reverse order. The quickest way to spot this is to run the live simulation above with the same numbers and compare where the arrow ends up.
These small bugs become visible immediately on a 2D canvas, which is why geometric intuition helps debug models before they ever touch a GPU.
Real-World Application
Inside a vector database that powers your self-evolving knowledge LLM, each new wiki article arrives as a fresh embedding. A single learned weight matrix can rotate and stretch the entire set of embeddings so that "machine learning" naturally sits between "neural network" and "gradient descent" without any extra code. The determinant of that matrix tells you whether the transformation is preserving distances (determinant near 1) or deliberately expanding certain directions to emphasize rare concepts. When you train the next layer, these geometric transformations compound, letting the model discover higher-level relationships exactly as Andrej Karpathy describes in his recent explorations of self-organizing knowledge systems. You can now experiment with these ideas in a few lines of TypeScript before you ever touch a full Transformer.
This lesson gives you the concrete visual and coding foundation you need before the next lesson on detecting features via basis vectors, where we will see how a matrix can automatically discover which directions in embedding space carry the most meaning.