Clustering in Embedding Space – Finding Contradictions Visually — Building Karpathy’s Self-Evolving Knowledge LLM Wiki with Reflective Agents

A single tweet can be mathematically closer to two diametrically opposed clusters at once in embedding space — and the points that sit in that geometric no-man’s-land are precisely where the most valuable contradictions hide.

Problem

When you ingest the latest 50 Karpathy tweets on self-reflection into your evolving knowledge wiki, simple retrieval via cosine similarity on the embedding vectors quickly surfaces semantically related statements. Yet human readers quickly notice internal contradictions: one tweet celebrates the necessity of brutal self-honesty while another implies we should be kinder to our past selves. These contradictions are invisible to standard top-k retrieval because they hide in the geometry of high-dimensional spaces. The real task for a self-evolving system is to surface these geometric outliers automatically so that a reflective agent can critique them and evolve the knowledge graph. This is where k-means clustering, silhouette score, and visual contradiction detection become essential architectural components.

Concept

k-means clustering partitions points in high-dimensional embedding space into k groups by iteratively assigning each tweet (represented by its embedding vector) to the nearest centroid and then moving each centroid to the mean of its assigned points. Because the space is high-dimensional, we rely on t-SNE projection (which you already know preserves local neighborhoods better than PCA projection) to visualize the result without losing the semantic clustering in embedding space that cosine similarity reveals.

Geometric outliers are points whose distance to their assigned centroid significantly exceeds the average intra-cluster distance. These are not noise; in the context of self-reflection tweets they frequently mark conceptual tension — places where the same author’s thinking appears to contradict itself when measured by angle between vectors.

The silhouette score quantifies how well each point fits its cluster: for point i it is (b − a) / max(a, b) where a is average dissimilarity to other points in its own cluster and b is the smallest average dissimilarity to any other cluster. A score near +1 means the point is well-clustered; near 0 means it sits on the boundary; negative values indicate it is probably misassigned. When we color nodes by this score we turn an opaque clustering result into an immediately readable map of confidence.

Visual contradiction detection ties these together: any tweet whose silhouette score falls below an adaptive threshold (or whose distance from its centroid exceeds a geometric multiple of the cluster’s radius) is flagged for agent reflection. The geometry itself becomes the signal that retrieval relevance via similarity alone cannot see.

Minimal working example

// types.tsinterface TweetEmbedding {  id: string;  text: string;  vector: number[]; // 384-dimensional embedding vector}
type Cluster = {  id: number;  centroid: number[];  members: TweetEmbedding[];};
// kmeans.ts - every line commentedasync function kMeans(  embeddings: TweetEmbedding[],  k: number = 5,  maxIter: number = 10): Promise<Cluster[]> {  // 1. Choose random initial centroids using the first k points for reproducibility in demo  let centroids = embeddings.slice(0, k).map(e => [...e.vector]);
  for (let iter = 0; iter < maxIter; iter++) {    // 2. Assignment step: assign each point to nearest centroid using cosine similarity    const assignments = new Array(embeddings.length).fill(0);    for (let i = 0; i < embeddings.length; i++) {      let bestScore = -Infinity;      let bestCluster = 0;      for (let c = 0; c < k; c++) {        // cosine similarity (higher = closer)        const sim = cosineSimilarity(embeddings[i].vector, centroids[c]);        if (sim > bestScore) {          bestScore = sim;          bestCluster = c;        }      }      assignments[i] = bestCluster;    }
    // 3. Update step: recompute each centroid as the mean vector of its members    const newCentroids: number[][] = Array.from({ length: k }, () => new Array(embeddings[0].vector.length).fill(0));    const counts = new Array(k).fill(0);
    for (let i = 0; i < embeddings.length; i++) {      const c = assignments[i];      for (let d = 0; d < embeddings[i].vector.length; d++) {        newCentroids[c][d] += embeddings[i].vector[d];      }      counts[c]++;    }
    for (let c = 0; c < k; c++) {      if (counts[c] === 0) continue;      for (let d = 0; d < newCentroids[c].length; d++) {        newCentroids[c][d] /= counts[c];      }      centroids[c] = newCentroids[c];    }  }
  // 4. Build final cluster objects  const clusters: Cluster[] = centroids.map((_, id) => ({ id, centroid: centroids[id], members: [] }));  for (let i = 0; i < embeddings.length; i++) {    clusters[assignments[i]].members.push(embeddings[i]);  }  return clusters;}
function cosineSimilarity(a: number[], b: number[]): number {  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));  return dot / (magA * magB + 1e-10);}

// types.tsinterface TweetEmbedding {  id: string;  text: string;  vector: number[]; // 384-dimensional embedding vector}
type Cluster = {  id: number;  centroid: number[];  members: TweetEmbedding[];};
// kmeans.ts - every line commentedasync function kMeans(  embeddings: TweetEmbedding[],  k: number = 5,  maxIter: number = 10): Promise<Cluster[]> {  // 1. Choose random initial centroids using the first k points for reproducibility in demo  let centroids = embeddings.slice(0, k).map(e => [...e.vector]);
  for (let iter = 0; iter < maxIter; iter++) {    // 2. Assignment step: assign each point to nearest centroid using cosine similarity    const assignments = new Array(embeddings.length).fill(0);    for (let i = 0; i < embeddings.length; i++) {      let bestScore = -Infinity;      let bestCluster = 0;      for (let c = 0; c < k; c++) {        // cosine similarity (higher = closer)        const sim = cosineSimilarity(embeddings[i].vector, centroids[c]);        if (sim > bestScore) {          bestScore = sim;          bestCluster = c;        }      }      assignments[i] = bestCluster;    }
    // 3. Update step: recompute each centroid as the mean vector of its members    const newCentroids: number[][] = Array.from({ length: k }, () => new Array(embeddings[0].vector.length).fill(0));    const counts = new Array(k).fill(0);
    for (let i = 0; i < embeddings.length; i++) {      const c = assignments[i];      for (let d = 0; d < embeddings[i].vector.length; d++) {        newCentroids[c][d] += embeddings[i].vector[d];      }      counts[c]++;    }
    for (let c = 0; c < k; c++) {      if (counts[c] === 0) continue;      for (let d = 0; d < newCentroids[c].length; d++) {        newCentroids[c][d] /= counts[c];      }      centroids[c] = newCentroids[c];    }  }
  // 4. Build final cluster objects  const clusters: Cluster[] = centroids.map((_, id) => ({ id, centroid: centroids[id], members: [] }));  for (let i = 0; i < embeddings.length; i++) {    clusters[assignments[i]].members.push(embeddings[i]);  }  return clusters;}
function cosineSimilarity(a: number[], b: number[]): number {  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));  return dot / (magA * magB + 1e-10);}

Example breakdown

The minimal example deliberately avoids libraries to expose the geometric mechanism. We initialize centroids from actual tweet vectors rather than random noise so that early iterations remain semantically meaningful. The assignment loop uses cosine similarity (the same angle between vectors you studied last lesson) instead of Euclidean distance because embedding vectors are usually normalized; this choice aligns clustering with the retrieval relevance via similarity used elsewhere in the wiki. The update step computes the arithmetic mean of vectors, which in embedding space corresponds to a “prototype” statement that best represents the semantic cluster. The final silhouette-ready clusters are pure data structures, deliberately separated from visualization so that the same clustering engine can run server-side while t-SNE projection happens only for human review.

Extended example

In a real wiki built with the ai-sdk we would embed incoming tweets via an OpenAI embedding model, run the above clustering in a background worker, then compute silhouette scores on the client for interactive exploration. The extended code adds silhouette calculation and outlier flagging:

// silhouette.tsfunction silhouetteScore(pointIdx: number, clusters: Cluster[], embeddings: TweetEmbedding[]): number {  const ownCluster = clusters.find(c => c.members.some(m => m.id === embeddings[pointIdx].id))!;  const a = averageIntraClusterDistance(embeddings[pointIdx].vector, ownCluster.members);  let b = Infinity;  for (const other of clusters) {    if (other.id === ownCluster.id) continue;    const dist = averageIntraClusterDistance(embeddings[pointIdx].vector, other.members);    if (dist < b) b = dist;  }  return b === Infinity ? 0 : (b - a) / Math.max(a, b);}
function averageIntraClusterDistance(vec: number[], members: TweetEmbedding[]): number {  if (members.length < 2) return 0;  let sum = 0, count = 0;  for (const m of members) {    if (m.vector === vec) continue;    sum += 1 - cosineSimilarity(vec, m.vector); // 1-sim = distance    count++;  }  return count ? sum / count : 0;}
// Usage in React component or Node workerconst clusters = await kMeans(allEmbeddings, 5);const scores = allEmbeddings.map((_, i) => silhouetteScore(i, clusters, allEmbeddings));const outliers = allEmbeddings.filter((e, i) => scores[i] < 0.15);// outliers are sent to reflective agent for contradiction critique

// silhouette.tsfunction silhouetteScore(pointIdx: number, clusters: Cluster[], embeddings: TweetEmbedding[]): number {  const ownCluster = clusters.find(c => c.members.some(m => m.id === embeddings[pointIdx].id))!;  const a = averageIntraClusterDistance(embeddings[pointIdx].vector, ownCluster.members);  let b = Infinity;  for (const other of clusters) {    if (other.id === ownCluster.id) continue;    const dist = averageIntraClusterDistance(embeddings[pointIdx].vector, other.members);    if (dist < b) b = dist;  }  return b === Infinity ? 0 : (b - a) / Math.max(a, b);}
function averageIntraClusterDistance(vec: number[], members: TweetEmbedding[]): number {  if (members.length < 2) return 0;  let sum = 0, count = 0;  for (const m of members) {    if (m.vector === vec) continue;    sum += 1 - cosineSimilarity(vec, m.vector); // 1-sim = distance    count++;  }  return count ? sum / count : 0;}
// Usage in React component or Node workerconst clusters = await kMeans(allEmbeddings, 5);const scores = allEmbeddings.map((_, i) => silhouetteScore(i, clusters, allEmbeddings));const outliers = allEmbeddings.filter((e, i) => scores[i] < 0.15);// outliers are sent to reflective agent for contradiction critique

At this stage the system can route low-silhouette tweets directly into a critique prompt that asks the LLM to reconcile the apparent conflict, creating the self-evolving loop that is the heart of Karpathy-inspired knowledge wikis.

Common mistakes

The most frequent error is running k-means directly on raw high-dimensional embedding vectors without any dimensionality reduction for visualization; the resulting t-SNE projection then shows beautiful clusters that the algorithm never actually saw, leading to false confidence. A subtler failure is assuming negative silhouette scores always indicate contradictions when they may simply reflect the natural overlap of adjacent semantic topics (e.g., “failure” and “agency”).

Choosing k by eye is another trap: silhouette scores averaged across the whole dataset give a more robust selection signal, yet many teams still rely on the elbow method which fails when clusters have very different densities. Finally, treating geometric outliers as noise and dropping them quietly destroys the very signal the reflective agent needs.

All these mistakes become obvious once you watch the live iteration and color-coded node diagram below.

Think about it

Why does geometric outlier detection succeed at finding contradictions that pure cosine similarity retrieval misses, and under what conditions might it fail inside a rapidly changing knowledge base?

Real-World Application

Inside a production self-evolving knowledge wiki you would schedule a daily job that (1) pulls the newest Karpathy tweets and papers, (2) generates fresh embedding vectors, (3) runs k-means with k chosen by average silhouette score, (4) surfaces the lowest-scoring geometric outliers to a reflective agent built with the ai-sdk, and (5) lets the agent write reconciliation notes that are added back into the knowledge graph. The same visual dashboard shown in the interactive panels above becomes the human oversight surface, letting you tune the sensitivity of contradiction detection without touching code. This architecture scales because the heavy linear algebra lives in the embedding and clustering steps while the reflective loop only processes a handful of geometrically suspicious tweets each day.

When the same technique is applied to your own notes or research papers you quickly discover that the outliers are rarely mistakes; they are the places where your thinking is evolving. The geometry of embedding space therefore becomes both a retrieval index and a mirror for self-reflection — exactly the dual purpose Karpathy’s tweets themselves keep circling back to.

The next lesson, “Attention as Learned Geometric Routing,” will show how the attention mechanism inside the LLM can be steered to focus exactly on these low-silhouette boundary points, turning the visual contradictions we detect here into the dynamic routing decisions that let the wiki evolve its own understanding in real time.