Surface Memory, Internalized Memory, and Why Mindcache Should Digest Slowly
Date: 2026-04-22 Status: Published note Scope: Product philosophy and runtime direction
One of the clearest lessons from building Mindcache is that "memory digitization" has fuzzier boundaries than we first assumed.
At the beginning, it is tempting to think in data structures:
- graph or tree
- node or document
- relation or cluster
Those questions matter, but they are not the first question.
The first question is:
what kind of memory are we actually trying to digitize?
Surface Memory vs Internalized Memory
The most useful distinction we have found is between surface memory and internalized memory.
Surface memory
Surface memory is the part of life that is easy to say out loud but easy to lose:
- what you ate yesterday
- a place you noticed while walking with a friend
- a quick thought captured on the commute
- a paper you saved but have not really studied yet
- a plan you want to remember next month
This memory is often fragmented, situational, and transactional. It is not deeply integrated into a larger conceptual framework yet.
Its challenge is capture.
Internalized memory
Internalized memory is different. It is what remains after repeated reading, reflection, comparison, and abstraction.
Examples:
- understanding how several papers fit into the same field
- seeing how electromagnetic induction relates to generators and motors
- gaining stable intuition in a domain after studying it long enough
At the highest level, internalized memory can even move beyond language. Some knowledge becomes shared intuition instead of verbal description.
That is one reason not all memory needs to be digitized in the same way.
Not Everything Worth Knowing Needs to Be Stored
Highly shared, consensus-level intuition often does not need explicit storage.
When an apple falls, we already know what will happen. When a player faces an open goal, a whole stadium reacts before anyone explains anything.
That kind of common internalized knowledge is often already present:
- in human culture
- in education
- in collective intuition
- and increasingly, in language model parameters
The memories that benefit most from digital support are usually the ones that still need language:
- personal context
- unfinished understanding
- private timelines
- saved sources
- fragmentary plans
- partially internalized knowledge
That is the real surface area of a memory product.
Why Trees and Graphs Keep Coming Back
Whenever people try to organize knowledge, they reach for trees and graphs.
That is not accidental.
Sources are already relational:
- papers cite papers
- concepts overlap
- topics partially cover one another
- new material often re-explains old material from a different angle
To manage that, the mind compresses.
We do not want to remember every paper independently forever. We want to form a smaller abstract union of them, while keeping original sources as evidence and provenance.
That is already a kind of internalization.
The problem is that this process is slow.
Reading ten papers and forming a clean abstraction can take days or weeks of focused work. So if a product tries to perform that full internalization too early, especially during import, it becomes expensive and brittle.
The Runtime Lesson
Mindcache started with a more structured instinct:
- import source material
- extract structured memory
- build links quickly
- grow a graph and a topic tree
The experiments taught us where that breaks:
- small snippets become too fragmented
- large documents become too heavy
- backfilling a real memory archive becomes too slow
- the cost of eager structure building blocks product adoption
That last point matters the most.
If importing old memory already feels too expensive, the product stops behaving like a product.
A Better Direction
The direction that now looks more realistic is:
1. Flat import first
Import should be broad and cheap:
- preserve the input
- keep lightweight metadata
- avoid heavy structure building by default
This is how a system captures surface memory at scale.
2. Slow digestion later
A background digest process can gradually:
- reconcile atom memory
- group related material
- write daily summaries
- build topic trees
- create more human-readable structure
This is much closer to how memory actually forms:
- first capture
- then revisit
- then internalize
3. Different representations for agents and humans
This may be the most important product insight.
An agent often does not need a fully digested global tree.
It can often work from:
- flat memory materials
- keyword search
- exact retrieval
- BM25-style ranking
- temporary local relationship building
Humans are different.
When people browse their own memory, they want:
- abstraction
- grouping
- temporal organization
- overview
- navigable structure
That means the best human-facing representation may still be:
- topic trees
- daily writeups
- grouped views
- local memory maps
But those should be understood as slow digestion outputs, not mandatory ingestion requirements.
Why This Matters for Mindcache
This changes how we should think about the product itself.
The job of the default system is not to fully internalize memory at import time. The job is to reliably capture memory material.
Then a slower layer can digest that material into something more structured and more useful for people to inspect.
That gives us a cleaner split:
- free or lightweight use: flat memory, local retrieval
- advanced or paid use: background digestion, topic trees, richer human-facing structure
That is not just a business model convenience. It is also a better match for the true cost of memory consolidation.
Public References That Helped
Two public references sharpened this direction for us.
mempalace
mempalace is useful because it takes hierarchical organization seriously. It shows that memory navigation does not have to begin from a dense global graph.
Reference:
- [mempalace](https://github.com/milla-jovovich/mempalace)
Karpathy's llm-wiki
Karpathy's llm-wiki is useful because it frames knowledge as something compiled over time, not simply retrieved raw on demand forever.
Reference:
- [LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
Those ideas support the same broad conclusion:
- preserve source material
- do not force heavy structure too early
- let a slower layer build more usable knowledge over time
The Working Principle
The most useful principle we have now is simple:
surface memory should be captured cheaply; internalized memory should emerge slowly.
That is the direction Mindcache is now moving toward:
- flat import
- ongoing digestion
- local retrieval for agents
- structural consolidation for humans
It is less elegant on paper than "everything becomes a graph immediately," but it feels much closer to how memory actually works.