Surface Memory, Internalized Memory, and Why Mindcache Should Digest Slowly

Date: 2026-04-22 Status: Published note Scope: Product philosophy and runtime direction

One of the clearest lessons from building Mindcache is that "memory digitization" has fuzzier boundaries than we first assumed.

At the beginning, it is tempting to think in data structures:

graph or tree
node or document
relation or cluster

Those questions matter, but they are not the first question.

The first question is:

what kind of memory are we actually trying to digitize?

Surface Memory vs Internalized Memory

The most useful distinction we have found is between surface memory and internalized memory.

Surface memory

Surface memory is the part of life that is easy to say out loud but easy to lose:

what you ate yesterday
a place you noticed while walking with a friend
a quick thought captured on the commute
a paper you saved but have not really studied yet
a plan you want to remember next month

This memory is often fragmented, situational, and transactional. It is not deeply integrated into a larger conceptual framework yet.

Its challenge is capture.

Internalized memory

Internalized memory is different. It is what remains after repeated reading, reflection, comparison, and abstraction.

Examples:

understanding how several papers fit into the same field
seeing how electromagnetic induction relates to generators and motors
gaining stable intuition in a domain after studying it long enough

At the highest level, internalized memory can even move beyond language. Some knowledge becomes shared intuition instead of verbal description.

That is one reason not all memory needs to be digitized in the same way.

Not Everything Worth Knowing Needs to Be Stored

Highly shared, consensus-level intuition often does not need explicit storage.

When an apple falls, we already know what will happen. When a player faces an open goal, a whole stadium reacts before anyone explains anything.

That kind of common internalized knowledge is often already present:

in human culture
in education
in collective intuition
and increasingly, in language model parameters

The memories that benefit most from digital support are usually the ones that still need language:

personal context
unfinished understanding
private timelines
saved sources
fragmentary plans
partially internalized knowledge

That is the real surface area of a memory product.

Why Trees and Graphs Keep Coming Back

Whenever people try to organize knowledge, they reach for trees and graphs.

That is not accidental.

Sources are already relational:

papers cite papers
concepts overlap
topics partially cover one another
new material often re-explains old material from a different angle

To manage that, the mind compresses.

We do not want to remember every paper independently forever. We want to form a smaller abstract union of them, while keeping original sources as evidence and provenance.

That is already a kind of internalization.

The problem is that this process is slow.

Reading ten papers and forming a clean abstraction can take days or weeks of focused work. So if a product tries to perform that full internalization too early, especially during import, it becomes expensive and brittle.

The Runtime Lesson

Mindcache started with a more structured instinct:

import source material
extract structured memory
build links quickly
grow a graph and a topic tree

The experiments taught us where that breaks:

small snippets become too fragmented
large documents become too heavy
backfilling a real memory archive becomes too slow
the cost of eager structure building blocks product adoption

That last point matters the most.

If importing old memory already feels too expensive, the product stops behaving like a product.

A Better Direction

The direction that now looks more realistic is:

1. Flat import first

Import should be broad and cheap:

preserve the input
keep lightweight metadata
avoid heavy structure building by default

This is how a system captures surface memory at scale.

2. Slow digestion later

A background digest process can gradually:

reconcile atom memory
group related material
write daily summaries
build topic trees
create more human-readable structure

This is much closer to how memory actually forms:

first capture
then revisit
then internalize

3. Different representations for agents and humans

This may be the most important product insight.

An agent often does not need a fully digested global tree.

It can often work from:

flat memory materials
keyword search
exact retrieval
BM25-style ranking
temporary local relationship building

Humans are different.

When people browse their own memory, they want:

abstraction
grouping
temporal organization
overview
navigable structure

That means the best human-facing representation may still be:

topic trees
daily writeups
grouped views
local memory maps

But those should be understood as slow digestion outputs, not mandatory ingestion requirements.

Why This Matters for Mindcache

This changes how we should think about the product itself.

The job of the default system is not to fully internalize memory at import time. The job is to reliably capture memory material.

Then a slower layer can digest that material into something more structured and more useful for people to inspect.

That gives us a cleaner split:

free or lightweight use: flat memory, local retrieval
advanced or paid use: background digestion, topic trees, richer human-facing structure

That is not just a business model convenience. It is also a better match for the true cost of memory consolidation.

Public References That Helped

Two public references sharpened this direction for us.

mempalace

mempalace is useful because it takes hierarchical organization seriously. It shows that memory navigation does not have to begin from a dense global graph.

Reference:

[mempalace](https://github.com/milla-jovovich/mempalace)

Karpathy's llm-wiki

Karpathy's llm-wiki is useful because it frames knowledge as something compiled over time, not simply retrieved raw on demand forever.

Reference:

[LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)

Those ideas support the same broad conclusion:

preserve source material
do not force heavy structure too early
let a slower layer build more usable knowledge over time

The Working Principle

The most useful principle we have now is simple:

surface memory should be captured cheaply; internalized memory should emerge slowly.

That is the direction Mindcache is now moving toward:

flat import
ongoing digestion
local retrieval for agents
structural consolidation for humans

It is less elegant on paper than "everything becomes a graph immediately," but it feels much closer to how memory actually works.