Normalizing vectors for Cosine Similarity — Linear Algebra for LLMs: A Geometric & Code-First Approach

A document containing only the word 'hello' and a massive encyclopedia entry on 'Greetings' can be mathematically identical in meaning, even though the encyclopedia contains thousands of more words. In the world of LLMs, we often want to know if two things are about the same topic, regardless of how long-winded they are. If we use the raw Dot Product we learned previously, the encyclopedia would appear much 'larger' simply because it has higher counts for its features. To fix this, we must strip away the 'size' of the vector and look only at its direction.

The Problem of Magnitude

When we represent text as a vector (an array of numbers), each number usually corresponds to a coordinate axis representing a feature. If a word appears once, the value on that axis might be 1. If it appears 100 times, it is 100. Geometrically, this makes the vector point in the same semantic direction, but it stretches it very far away from the center. This 'length' is called the L2 Norm, or Magnitude. If we don't normalize these lengths, the Dot Product will give gargantuan scores to long documents and tiny scores to short ones, failing to capture their shared meaning.

Concept: The L2 Norm (Magnitude)

Think of the L2 Norm as the 'as-the-crow-flies' distance from the start of the vector (the origin) to its tip. In 2D, this is just the Pythagorean theorem: square both numbers, add them, and take the square root. In higher-dimensional space, the formula stays the same: you square every component, sum them up, and then take the square root of that total. This single number tells you how much 'mass' or 'energy' the vector has.

Implementation: Normalizing to Unit Vectors

To compare documents fairly, we perform Normalization. This means we take our original vector and divide every number inside it by the L2 Norm we just calculated. The result is a Unit Vector: a vector that still points in exactly the same direction but has a magnitude of exactly 1.0. When two vectors are both unit vectors, their Dot Product is called Cosine Similarity. This value will always be between -1 and 1, where 1 means they are perfectly aligned (identical direction) and 0 means they are orthogonal (completely unrelated).

typescript

/** * Normalizes a vector into a Unit Vector (magnitude of 1). */function normalize(vector: number[]): number[] {  // 1. Calculate the L2 Norm (Magnitude)  const magnitude = Math.sqrt(    vector.reduce((sum, val) => sum + val * val, 0)  );
  // 2. Avoid division by zero for empty/zero vectors  if (magnitude === 0) return vector.map(() => 0);
  // 3. Divide every component by the magnitude  // This shrinks/stretches the vector to reach the 'unit circle'  return vector.map(val => val / magnitude);}
// Example: 2D vector for 'Greeting' embeddingconst docA = [3, 4]; // Magnitude is 5const normalizedA = normalize(docA); // [0.6, 0.8]

/** * Normalizes a vector into a Unit Vector (magnitude of 1). */function normalize(vector: number[]): number[] {  // 1. Calculate the L2 Norm (Magnitude)  const magnitude = Math.sqrt(    vector.reduce((sum, val) => sum + val * val, 0)  );
  // 2. Avoid division by zero for empty/zero vectors  if (magnitude === 0) return vector.map(() => 0);
  // 3. Divide every component by the magnitude  // This shrinks/stretches the vector to reach the 'unit circle'  return vector.map(val => val / magnitude);}
// Example: 2D vector for 'Greeting' embeddingconst docA = [3, 4]; // Magnitude is 5const normalizedA = normalize(docA); // [0.6, 0.8]

Example Breakdown

In the TypeScript code above, the first step uses reduce to implement the sum of squares: $\sum v_i^2$ . We then apply Math.sqrt to find the total distance. The second step is a safety check; if a vector is all zeros (meaning it has no features), we can't divide by its length. Finally, we map over the array and scale every element down. The vector [0.6, 0.8] is our Unit Vector. If you calculate its magnitude— $\sqrt{0.6^2 + 0.8^2}$ —you will find it equals exactly 1.

Cosine Similarity in Action

When we use the ai-sdk to fetch embeddings, we often get vectors that are already normalized. This is efficient because to compare two normalized vectors, we just use the dot product. This mathematical shortcut—Normalize then Dot Product—yields the Cosine Similarity. Since the length is fixed at 1, the dot product only responds to how much the two vectors point in the same direction.

typescript

import { dotProduct } from './math-utils'; // Assume logic from previous lesson
function cosineSimilarity(vecA: number[], vecB: number[]): number {  const normA = normalize(vecA);  const normB = normalize(vecB);    // After normalization, dot product = cosine similarity  return dotProduct(normA, normB);}
// A 'hello' vector and a large 'Greeting' vectorconst hello = [1, 0.1];const longGreeting = [100, 10];
// Even though longGreeting has 100x the magnitude,// the similarity will be ~1.0 because their direction is identical.const score = cosineSimilarity(hello, longGreeting);console.log(`Similarity: ${score}`); // Output: Similarity: 1.0

import { dotProduct } from './math-utils'; // Assume logic from previous lesson
function cosineSimilarity(vecA: number[], vecB: number[]): number {  const normA = normalize(vecA);  const normB = normalize(vecB);    // After normalization, dot product = cosine similarity  return dotProduct(normA, normB);}
// A 'hello' vector and a large 'Greeting' vectorconst hello = [1, 0.1];const longGreeting = [100, 10];
// Even though longGreeting has 100x the magnitude,// the similarity will be ~1.0 because their direction is identical.const score = cosineSimilarity(hello, longGreeting);console.log(`Similarity: ${score}`); // Output: Similarity: 1.0

Common Mistakes

Normalizing Twice: If you normalize a unit vector, it stays exactly the same. While harmless, it's a waste of compute in large-scale LLM applications. Check if your embedding provider (like OpenAI or HuggingFace) already returns normalized vectors.
Confusing Euclidean Distance with Cosine Similarity: Euclidean distance looks at the distance between the tips of the vectors. If one document is much longer than the other, they will be "far apart" in space even if they mean the same thing. Always use Cosine Similarity (via normalization) when text length shouldn't matter.
Summing instead of Root-Sum-Square: Beginners often try to normalize by dividing by the sum of components ( $Σ v_i$ ). This is a different type of normalization. For vector geometry and similarity, you must always use the L2 Norm (square, sum, square-root).

Think about it

If you are building a search engine for a law firm and want to ensure that a 1-page summary and a 50-page legal brief are both returned when searching for 'contract liability', why is normalization strictly necessary?

Real-World Application

In Vector Databases (like Pinecone or Weaviate), billions of vectors are stored to help LLMs find relevant context. When you perform a search, the database doesn't just look for words; it calculates the Cosine Similarity between your query and its stored documents. By storing vectors as unit vectors, the database can perform millions of comparisons per second using optimized hardware (GPUs) that can compute Dot Products of length-1 vectors at incredible speeds.