Measuring similarity with the Dot Product — Linear Algebra for LLMs: A Geometric & Code-First Approach

Two vectors can point in completely different directions while containing the exact same numbers. In a vector database for technical documentation, a user's search query might use the same words as a page in your manual, but if those words are arranged in a way that creates a different semantic direction, the two vectors are not truly 'near' each other. To find the right documentation, we don't just look for shared numbers; we look for how much two vectors point at the same thing.

The Problem

When you build a document search tool using the ai-sdk, you convert questions into vectors (arrays of numbers). However, a computer doesn't naturally know if a vector representing "how to deploy" is similar to a vector representing "deployment guide." We need a mathematical operation that takes two arrays, looks at their coordinate axes as features, and spits out a single number representing how well they overlap. Without this, your AI search is just a random guess.

The Concept: Alignment and Projection

Imagine two arrows starting from the same point. If they point in the same direction, they are perfectly aligned. If one arrow points North and the other points East, they are orthogonal (at a 90-degree angle), meaning they share zero common direction.

We measure this using geometric projection: if you shine a flashlight from above one vector, how much shadow does it cast onto the other? A long shadow means high similarity; no shadow means the vectors are unrelated. The Dot Product is the numerical tool that calculates this "shadow length." It performs scalar summation from arrays, meaning it takes two lists of numbers and collapses them into one single value (a scalar). This value tells us how much the vectors are working together.

Minimal Working Example

This function calculates the dot product for any two vectors of the same length, which is the core logic inside vector databases.

typescript

/** * Calculates the dot product of two vectors. * Both arrays must have the same length. */function calculateDotProduct(vecA: number[], vecB: number[]): number {  // 1. We start our sum at zero  let total = 0;
  // 2. Loop through every dimension (feature) of the vectors  for (let i = 0; i < vecA.length; i++) {    // 3. Multiply the corresponding features and add to total    total += vecA[i] * vecB[i];  }
  // 4. Return the final scalar similarity score  return total;}
// Example: Comparing a query to a document vectorconst query = [1, 0]; // Points purely along X-axisconst doc = [0.8, 0.2]; // Points mostly along X-axis
console.log(calculateDotProduct(query, doc)); // Output: 0.8

/** * Calculates the dot product of two vectors. * Both arrays must have the same length. */function calculateDotProduct(vecA: number[], vecB: number[]): number {  // 1. We start our sum at zero  let total = 0;
  // 2. Loop through every dimension (feature) of the vectors  for (let i = 0; i < vecA.length; i++) {    // 3. Multiply the corresponding features and add to total    total += vecA[i] * vecB[i];  }
  // 4. Return the final scalar similarity score  return total;}
// Example: Comparing a query to a document vectorconst query = [1, 0]; // Points purely along X-axisconst doc = [0.8, 0.2]; // Points mostly along X-axis
console.log(calculateDotProduct(query, doc)); // Output: 0.8

Example Breakdown

The logic works by scaling components. If the i-th feature in both vectors is large (e.g., both vectors mention "deployment"), the product $vecA[i] * vecB[i]$ will be large. If one is large and the other is zero, the product is zero.

By summing every pair, we get a global score of Alignment. If every pair results in a positive multiplication, the sum grows quickly. If the vectors point in opposite directions (one positive, one negative), the products become negative, and the sum shrinks, signaling a lack of similarity.

Extended Example: Real Search Logic

In a real app using the ai-sdk, you receive large vectors from an embedding model. You would iterate through your database to find the highest dot product.

typescript

import { embed } from 'ai'; // Hypothetical context
interface Document {  text: string;  embedding: number[];}
function findBestMatch(queryVector: number[], database: Document[]) {  let bestDoc = database[0];  let highestScore = -Infinity;
  for (const doc of database) {    const score = calculateDotProduct(queryVector, doc.embedding);        if (score > highestScore) {      highestScore = score;      bestDoc = doc;    }  }
  return { text: bestDoc.text, score: highestScore };}

import { embed } from 'ai'; // Hypothetical context
interface Document {  text: string;  embedding: number[];}
function findBestMatch(queryVector: number[], database: Document[]) {  let bestDoc = database[0];  let highestScore = -Infinity;
  for (const doc of database) {    const score = calculateDotProduct(queryVector, doc.embedding);        if (score > highestScore) {      highestScore = score;      bestDoc = doc;    }  }
  return { text: bestDoc.text, score: highestScore };}

Common Mistakes

Length Mismatch: Trying to compute the dot product of a 3D vector and a 2D vector. This is like trying to compare a color to a sound; the dimensions (features) must match exactly.
Confusing Scale: A large dot product doesn't always mean similarity if the vectors themselves are huge. If one vector is very "long" (large values), it creates a huge dot product even if the direction is slightly off.
Orthogonality Blindness: Beginners often assume a score of 0 means "bad." In reality, it means Orthogonality—the two concepts are completely independent, like "price" and "font color."

Think about it

If two vectors have high values in completely different indices (e.g., [10, 0, 0] and [0, 10, 10]), their dot product is 0; why does this accurately represent their lack of semantic similarity?

Real-World Application

In recommendation systems like Netflix, the dot product is used to compare a "User Feature Vector" (how much you like Comedy, Action, Horror) against a "Movie Feature Vector." If you love Comedy (high value in index 0) and the movie is a Comedy (high value in index 0), the dot product explodes upward, and the movie appearing on your homepage is simply a result of this scalar sum being the highest in the database.

While the dot product is powerful, it is sensitive to the total length of the vectors. In our next lesson, we will explore Cosine Similarity, which solves this by looking strictly at the angle between vectors, ignoring how large or small the numbers are.