Building a Matrix from Scratch

A single matrix multiplication can turn a list of 512 random numbers that represent a Wikipedia article into a new list of 512 numbers that the transformer suddenly "understands" as related to similar articles.

Problem

Imagine you are building a self-evolving knowledge wiki powered by a small language model. Every time a user adds or edits an article, you convert its text into a numerical list called an embedding. This embedding must then pass through the first layer of your transformer to become a hidden state — a richer representation that captures meaning. Without a mathematical engine to perform this conversion reliably, your wiki cannot connect related ideas or surface relevant pages. That engine is a weight matrix, and we will build one from scratch in TypeScript.

Concept

Think of a vector the way you think of a shopping list: a simple ordered collection of numbers. Each number is a feature. An embedding is a special kind of vector that encodes the meaning of a token or a short piece of text. In our wiki, the word "transformer" might become the vector [0.12, -0.45, 0.78, ...] with 512 numbers.

A matrix is simply a grid of numbers arranged in rows and columns. When we use a matrix inside a neural network we call it a weight matrix because each number (weight) decides how strongly one feature from the input should influence a feature in the output.

A linear transformation is what happens when you multiply this weight matrix by your input vector. Geometrically it stretches, rotates, or shears the entire space of possible embeddings so that similar wiki articles end up closer together in the new space. Algebraically it is a compact way to mix every input number with every weight to produce a new vector — our first hidden state.

Minimal Working Example

Here is the tiniest piece of runnable TypeScript that turns a token embedding into the first hidden state of our wiki transformer layer. Every line is commented.

typescript

// 1. Define a small embedding that represents the token "wiki"const embedding: number[] = [0.2, -0.5, 0.8, 0.1]; // length = 4 for demonstration
// 2. Build a weight matrix that will transform the embedding// Each row corresponds to one output feature of the hidden stateconst weightMatrix: number[][] = [  [0.1,  0.3, -0.2, 0.4],  // weights for hidden neuron 0  [-0.4, 0.2,  0.5, -0.1], // weights for hidden neuron 1  [0.6, -0.3,  0.1, 0.7],  // weights for hidden neuron 2  [-0.1, 0.4, -0.5, 0.2]   // weights for hidden neuron 3];
// 3. Perform the linear transformation: hidden = weightMatrix × embeddinglet hiddenState: number[] = new Array(4).fill(0);for (let i = 0; i < 4; i++) {           // loop over each output neuron  for (let j = 0; j < 4; j++) {         // loop over each input feature    hiddenState[i] += weightMatrix[i][j] * embedding[j]; // multiply & accumulate  }}
console.log("Hidden state after first layer:", hiddenState);// Output example: [0.37, -0.12, 0.81, 0.09]

// 1. Define a small embedding that represents the token "wiki"const embedding: number[] = [0.2, -0.5, 0.8, 0.1]; // length = 4 for demonstration
// 2. Build a weight matrix that will transform the embedding// Each row corresponds to one output feature of the hidden stateconst weightMatrix: number[][] = [  [0.1,  0.3, -0.2, 0.4],  // weights for hidden neuron 0  [-0.4, 0.2,  0.5, -0.1], // weights for hidden neuron 1  [0.6, -0.3,  0.1, 0.7],  // weights for hidden neuron 2  [-0.1, 0.4, -0.5, 0.2]   // weights for hidden neuron 3];
// 3. Perform the linear transformation: hidden = weightMatrix × embeddinglet hiddenState: number[] = new Array(4).fill(0);for (let i = 0; i < 4; i++) {           // loop over each output neuron  for (let j = 0; j < 4; j++) {         // loop over each input feature    hiddenState[i] += weightMatrix[i][j] * embedding[j]; // multiply & accumulate  }}
console.log("Hidden state after first layer:", hiddenState);// Output example: [0.37, -0.12, 0.81, 0.09]

Example Breakdown

The embedding is our input vector — four numbers that came from an earlier tokenizer and embedding lookup. We treat it as a column even though we wrote it horizontally in code.

The weight matrix has exactly as many rows as we want output features (here 4) and exactly as many columns as the embedding length (also 4). Each cell is a learned weight; in a real model these numbers come from training, but we made them up for now.

The two nested for loops implement matrix-vector multiplication. The outer loop picks which hidden-state slot we are computing. The inner loop walks across one row of the matrix, multiplies each weight by the matching element of the embedding, and adds the results. This is the linear transformation.

Running the code produces a new vector of the same length. This is the first hidden state that later transformer layers (attention, feed-forward) will refine further.

Extended Example

Let us make the example closer to a real wiki transformer. We increase the embedding size to 8 and the hidden size to 8. The code stays almost identical; we only change the array lengths and add a helper function for clarity.

typescript

// Helper that multiplies matrix by vector – reusable for any sizefunction linearTransform(weightMatrix: number[][], inputVector: number[]): number[] {  const outputSize = weightMatrix.length;  const inputSize = inputVector.length;  const output = new Array(outputSize).fill(0);  for (let i = 0; i < outputSize; i++) {    for (let j = 0; j < inputSize; j++) {      output[i] += weightMatrix[i][j] * inputVector[j];    }  }  return output;}
// A realistic embedding coming from our wiki tokenizer for the phrase "self-evolving wiki"const wikiEmbedding: number[] = [0.12, -0.34, 0.67, -0.09, 0.45, 0.22, -0.51, 0.38];
// A larger weight matrix (8×8) learned by the modelconst layer1Weights: number[][] = [  [0.05, 0.12, -0.23, 0.08, -0.15, 0.31, 0.09, -0.04],  [-0.18, 0.27, 0.06, -0.33, 0.14, -0.07, 0.22, 0.11],  // ... 6 more rows omitted for brevity but would be present in real code  [0.09, -0.05, 0.19, 0.28, -0.12, 0.04, -0.21, 0.33]];
// Compute first hidden state for the wiki articleconst firstHiddenState = linearTransform(layer1Weights, wikiEmbedding);console.log("Wiki article first hidden state:", firstHiddenState);

// Helper that multiplies matrix by vector – reusable for any sizefunction linearTransform(weightMatrix: number[][], inputVector: number[]): number[] {  const outputSize = weightMatrix.length;  const inputSize = inputVector.length;  const output = new Array(outputSize).fill(0);  for (let i = 0; i < outputSize; i++) {    for (let j = 0; j < inputSize; j++) {      output[i] += weightMatrix[i][j] * inputVector[j];    }  }  return output;}
// A realistic embedding coming from our wiki tokenizer for the phrase "self-evolving wiki"const wikiEmbedding: number[] = [0.12, -0.34, 0.67, -0.09, 0.45, 0.22, -0.51, 0.38];
// A larger weight matrix (8×8) learned by the modelconst layer1Weights: number[][] = [  [0.05, 0.12, -0.23, 0.08, -0.15, 0.31, 0.09, -0.04],  [-0.18, 0.27, 0.06, -0.33, 0.14, -0.07, 0.22, 0.11],  // ... 6 more rows omitted for brevity but would be present in real code  [0.09, -0.05, 0.19, 0.28, -0.12, 0.04, -0.21, 0.33]];
// Compute first hidden state for the wiki articleconst firstHiddenState = linearTransform(layer1Weights, wikiEmbedding);console.log("Wiki article first hidden state:", firstHiddenState);

The function linearTransform encapsulates the matrix-vector multiplication so that our wiki code remains readable even when we later add attention heads or layer normalization.

Common Mistakes

Mistake 1: Wrong dimensions. If your weight matrix has 4 rows but your embedding has 8 numbers, the inner loop will try to read past the end of the arrays and produce NaN. Recognize it by seeing NaN in the console. Fix: make the number of columns in the weight matrix match the length of the embedding.

Mistake 2: Forgetting to initialize the output vector to zeros. You add to undefined and again get NaN. Always create new Array(outputSize).fill(0) before the loops.

Mistake 3: Confusing row vs column vectors. In the wiki code we treat the embedding as a row vector on the right of the matrix (matrix × vector). Some libraries expect a column vector. Keep the loop order output[i] += matrix[i][j] * input[j] to stay consistent.

Avoid these by writing a small test that prints the shapes before multiplying and by using the helper function we created in the extended example.

Think about it

How might changing only a few weights in the first matrix affect which wiki articles your model considers similar later in the network?

Real-World Application

In our self-evolving knowledge wiki, every new article is turned into an embedding by a pretrained tokenizer. That embedding immediately passes through exactly this kind of weight matrix to become the first hidden state. The hidden state is then stored in a vector database so that when users search, we can compare hidden states geometrically. Because the matrix was trained on millions of wiki-style documents, articles about "large language models" and "transformer attention" naturally land near each other after the linear transformation. The next lesson, "Rotating and Stretching data spaces," will show you how to visualize what this matrix actually does to the geometry of the entire embedding space.

Quiz

Q1: What is an embedding?
A. A grid of learned weights
B. A list of numbers that numerically represents the meaning of a token or text
C. The output of a matrix multiplication
D. A single scalar value used for bias
Correct: B
An embedding converts discrete tokens into a continuous vector space so that mathematical operations such as similarity search become possible.

Q2: In the context of this lesson, a weight matrix is
A. A square table of numbers that performs a linear transformation on an input vector
B. The final prediction output of the model
C. A list that stores only one number per row
D. The same thing as an embedding
Correct: A
The dimensions of the weight matrix determine how many features enter and how many features leave the layer.

Q3: What does the linear transformation accomplish geometrically?
A. It leaves every embedding unchanged
B. It stretches, rotates, or shears the space of embeddings so similar wiki articles move closer together
C. It randomly shuffles the order of numbers
D. It converts vectors into images
Correct: B
This geometric change is why downstream attention layers can discover relationships between articles.

Q4: Given a 2×2 weight matrix [[0.5, 0.2], [-0.1, 0.8]] and an embedding vector [3, 4], what is the resulting hidden-state vector after the linear transformation?
A. [2.3, 2.9]
B. [2.3, 3.1]
C. [1.9, 2.9]
D. [2.7, 2.5]
Correct: A
Calculation: first component = 0.5×3 + 0.2×4 = 2.3; second = -0.1×3 + 0.8×4 = 2.9.

Q5: Which of the following best explains why we initialize the hidden-state array with zeros before the loops?
A. To ensure we are only performing addition on defined numbers and avoid NaN results
B. To make the matrix square
C. So the final vector is always positive
D. To normalize the values between 0 and 1
Correct: A
Forgetting to zero the output is a common source of NaN bugs that propagates through the entire transformer stack.

Knowledge Check

Test your understanding

1 / 5

What is an embedding?

Up next

Rotating and Stretching data spaces

Coding a transformation function to see how a matrix 'moves' a vector from one position to another.