External Brain — LLM Knowledge Base Showcase

01 — Capture Pipeline

From Thought to Vector in Seconds

Every insight — terminal output, article highlight, Slack thread, or shower thought — flows through a 4-stage pipeline before it's queryable.

Ingest

CLI tool, browser extension, API webhook, or mobile quick-capture

→

Parse & Tag

LLM extracts entities, assigns type, generates title & summary

→

Embed

Text → 1536-dim vector via text-embedding-3-small, stored in Qdrant

→

Link

Auto-connect to related notes via cosine similarity > 0.82 threshold

02 — Knowledge Map

Topic Clusters at a Glance

Organic clusters formed by semantic proximity — not manual folders. The map shows how domains overlap and evolve.

LLM Internals

287 notes — attention mechanisms, KV caches, quantization tricks, fine-tuning logs, prompt engineering patterns.

transformersGGUFLoRAcontext window

Systems Design

194 notes — distributed systems, event sourcing, CQRS, database internals, consistency models, queue architectures.

KafkaPostgresCRDTsraft

Rust & Performance

156 notes — ownership model, async runtime, SIMD, zero-copy patterns, unsafe ergonomics, compiler diagnostics.

tokioborrow checkerwasmperf

RAG & Retrieval

203 notes — chunking strategies, hybrid search, re-ranking, HyDE, metadata filtering, evaluation frameworks.

embeddingsQdrantBM25NDCG

DevOps & Infra

138 notes — Kubernetes patterns, Nix flakes, CI/CD pipelines, observability, cost optimization, GPU scheduling.

k8sNixPrometheusTerraform

Product Thinking

89 notes — user research, growth loops, pricing models, developer experience, onboarding friction analysis.

PLGDXactivationchurn

03 — Note Types

Five Flavors of Knowledge

Each note is auto-classified into one of five types, shaping how it's stored, linked, and surfaced during retrieval.

Concept

KV Cache Compression in Long-Context Models

2025-03-14 · LLM Internals · 4 links

Grouped-query attention (GQA) reduces KV cache size by sharing key-value heads across query groups. In Llama 3, this cuts memory usage by ~4× at 128k context with <1% perplexity regression. The tradeoff: decoding throughput improves but prefill latency stays constant. Pair with PagedAttention for serving.

Snippet

Streaming OpenAI Responses with Backpressure

2025-02-28 · RAG & Retrieval · 2 links

Use an AsyncIterator wrapper around the SSE stream. Apply a bounded channel (capacity=64) between the HTTP reader and the consumer coroutine. When the channel is full, the reader awaits — this naturally applies TCP backpressure upstream to the OpenAI API, preventing OOM on slow clients.

Insight

Why Chunk Size Matters More Than Embedding Model

2025-01-09 · RAG & Retrieval · 7 links

After testing 12 embedding models against 4 chunk sizes (128, 256, 512, 1024 tokens), chunk size explained 3× more variance in retrieval quality than model choice. Sweet spot: 256 tokens with 10% overlap for technical docs. Semantic chunking helped only marginally vs. fixed-size with overlap.

Question

Does LoRA Rank Correlate with Task Complexity?

2025-04-02 · LLM Internals · 3 links

Rank 8 works for style transfer but rank 64 seems necessary for domain-specific reasoning. Is this because reasoning requires modifying deeper weight subspaces? Need to test: hold dataset constant, sweep rank [4,8,16,32,64,128], measure on a multi-hop QA benchmark.

Reference

Raft Consensus — Understandable Distributed Consensus

2024-11-20 · Systems Design · 5 links

Leader election uses randomized timeouts (150–300ms). Log replication is append-only with term numbers. Safety property: at most one leader per term. Key insight from the paper: decomposing consensus into leader election, log replication, and safety makes it implementable vs. Paxos.

04 — Retrieval Examples

Ask Your Past Self Anything

Natural language queries against the knowledge base — showing real results with similarity scores and source attribution.

"How did I handle backpressure in streaming LLM responses?"

0.94 · Snippet · 2025-02-28

Use an AsyncIterator wrapper around the SSE stream with a bounded channel (capacity=64). When full, the reader awaits — applying TCP backpressure upstream to the API.

0.87 · Concept · 2025-01-15

Backpressure patterns in async Rust: tokio::sync::mpsc with bounded capacity, or use Semaphore to limit concurrent in-flight requests.

"What's the optimal chunk size for RAG on technical documentation?"

0.96 · Insight · 2025-01-09

256 tokens with 10% overlap is the sweet spot. Chunk size explained 3× more variance in retrieval quality than embedding model choice across 12 models tested.

0.83 · Reference · 2024-12-03

LlamaIndex benchmark: recursive character splitter vs. semantic chunking. Fixed-size with overlap won on NDCG@10 for structured docs; semantic chunking edged ahead on narrative text.

"Explain the Raft leader election mechanism"

0.95 · Reference · 2024-11-20

Leader election uses randomized timeouts (150–300ms). Safety guarantee: at most one leader per term. Decomposing into election + replication + safety makes Raft implementable vs. Paxos.

0.81 · Insight · 2025-03-22

In practice, Raft's leader lease optimization avoids read-quorum overhead. etcd uses this for linearizable reads without full consensus on every GET.

05 — Weekly Review Loop

Compound Knowledge, One Week at a Time

Every Sunday, the system generates a review: new notes, emerging connections, stale clusters, and knowledge gaps to explore.

Week of April 21 — Activity Log

Monday

Added 8 notes from "Attention Is All You Need" re-read. Linked 3 to existing KV cache cluster.

Tuesday

Captured debugging session — CUDA OOM during LoRA merge. Root cause: accumulated optimizer states.

Wednesday

5 notes from Kubernetes sig-scheduling meeting. New cluster forming: "GPU scheduling policies."

Thursday

Insight note: realized our re-ranker adds 120ms latency for only 2% NDCG gain. Flagged for removal.

Friday

3 snippet captures from pair programming — async Rust patterns for graceful shutdown.

Weekend

Review session: merged 2 overlapping clusters, archived 14 stale notes, opened 3 new questions.

Weekly Summary

New notes

New links

Clusters merged

Notes archived

Knowledge Gaps Detected

▸Speculative decoding — referenced in 6 notes but no concept note exists
▸Ring attention for distributed inference — only 1 shallow mention
▸Postgres BRIN indexes — linked from 4 systems notes, never explained