Storing Embeddings and Building the Initial Neo4j Knowledge Graph
A single high-quality embedding computed from Andrej Karpathy's tweet on self-attention mechanisms can cluster with related claims across months of content even though the original tweets never explicitly reference each other.
Problem
The real task is to convert semantically rich but disconnected atomic knowledge claims extracted from Karpathy's tweets into a continuously queryable and evolvable knowledge graph that respects the geometric structure of the embedding space. Without this, the wiki remains a flat collection of validated nodes produced by previous lessons; contradictions stay hidden, and reflective agents cannot traverse conceptual neighborhoods to critique or synthesize new understanding. By combining text embedding computation with Voyage or OpenAI, Neo4j graph modeling for knowledge nodes and relationships, storing embeddings as node properties with Cypher, and visualizing knowledge graph in 3D using geometric layout, the system gains spatial awareness of ideas that mirrors how attention mechanisms themselves operate on vector representations.
Concept
Text embeddings translate the cleaned tweet text (already produced by the tweet cleaning pipeline for LLM ingestion) into fixed-length vectors that encode semantic meaning in a high-dimensional Euclidean space. Two vectors that point in similar directions have high cosine similarity regardless of the surface words used. In Neo4j we model each structured wiki entry as a :WikiNode with properties for claim, context, sources, confidence (already validated by Zod schema) plus a new embedding property containing the float32 array. Relationships such as :RELATED_TO or :CONTRADICTS are created by computing pairwise cosine similarities above a chosen threshold. Because the user is learning linear algebra for machine learning, we first visualize the geometric layout before examining the underlying algebra: nodes that belong to the same conceptual cluster (e.g., variants of Karpathy's self-attention explanations) appear geometrically closer in 3D space, revealing emergent structure without any hand-crafted labels.