In 384-dimensional embedding space, two tweets that never share any words can end up closer than two tweets that are almost identical in wording.
This counterintuitive geometry lies at the heart of building a self-evolving knowledge LLM wiki inspired by Andrej Karpathy’s tweets and papers. When your system ingests 50 recent tweets about LLM reasoning, the raw text lives in an impossibly sparse 50 000-dimensional bag-of-words space. Embedding vectors collapse that space into a dense 384- or 1536-dimensional manifold where semantic similarity becomes measurable distance. A self-evolving wiki must constantly add new nodes, detect contradictions, and cluster related ideas—tasks that only become tractable once you can visualize and navigate these high-dimensional spaces.
Problem
The real task is to automatically discover emergent structure inside an unstructured stream of Karpathy’s writing without any hand-crafted labels or rules. You want the wiki to notice that tweets about “chain-of-thought failures” naturally group together even though they use completely different phrases, while tweets about “attention tricks” form a separate tight cluster. This emergent semantic clustering lets the reflective agent critique new incoming knowledge against existing clusters, flag contradictions, and propose merges. Doing this at production scale means projecting the 1536-dimensional vectors down to something a human (or a 3D visualization) can inspect, without destroying the very semantic neighborhoods that make the clustering useful.
Concept
Embedding vectors are simply fixed-length arrays of floating-point numbers that locate each piece of text at a specific point in a high-dimensional space. A 1536-dimensional embedding lives in a space so vast that most volume is empty; distances are dominated by the directions rather than the magnitudes. High-dimensional spaces exhibit surprising concentration-of-measure phenomena: most pairs of random vectors are roughly orthogonal, yet semantically related texts produced by the same model end up pointing in nearly identical directions.
PCA projection finds the orthogonal axes (principal components) that successively maximize variance in the data. It is a linear map: each new dimension is a weighted sum of the original dimensions, chosen to preserve as much global spread as possible. Geometrically it is like shining a flashlight from the most informative angles onto a 3D wall. The price of this global focus is that fine-grained local neighborhoods can be crushed together.
t-SNE projection instead optimizes a non-linear map that preserves local neighborhoods. It models each point’s similarity to its neighbors with a Gaussian in high dimensions and a Student-t distribution in low dimensions, then minimizes the Kullback-Leibler divergence between these two probability distributions. The result is that tweets that were close in 1536-d stay close in 3-d, even at the cost of distorting global distances.
Semantic clustering in embedding space means that points that land near one another share latent meaning even though no explicit label was provided. In the context of Karpathy’s tweets we often observe spontaneous formation of clusters around topics like “reasoning traces vs. direct answers,” “scaling laws,” and “attention geometry tricks.” Because the wiki is self-evolving, these clusters become dynamic knowledge modules that the agent can query, critique, and merge.