Agent memory design: short-term context, long-term files, and retrieval

Tomas

Most agents lose important context between sessions because memory is an afterthought. It gets added after the core agent works, using whatever structure was convenient, and it never quite catches what matters.

Here is a practical model for what to store where, how to structure it, and how to retrieve it without bloating every prompt.

—

Short-term vs long-term memory

These are different problems and they need different solutions.

Short-term memory is the active context window. Everything the agent can see right now: the system prompt, the conversation history, any documents loaded for this session. It is fast to access and zero latency, but it is gone when the session ends and it has a hard size limit.

Long-term memory is persistent storage. Files on disk, a database, structured JSON. It survives restarts and grows over time, but it costs tokens to load and requires a decision about what to retrieve and when.

The mistake most agent designs make: treating everything as short-term by loading all history into context on every run. This works for a week, then the context fills up and the agent starts dropping the oldest content, which is often the most foundational.

—

What belongs in persistent storage

Not everything that happens in a session is worth persisting. Most of it is noise.

Worth persisting:

Decisions made and the reasoning behind them
User preferences that were stated or revealed through behavior
Facts that were established and will be referenced again
Mistakes made and how they were corrected
Ongoing tasks with their current state

Not worth persisting:

Routine task execution that went fine
Intermediate steps that led to a final result (keep the result, not the steps)
Information that is easy to look up again
Anything that will be stale within a day or two

A useful mental test: if you started a fresh session tomorrow, would you need to know this to pick up where you left off? If yes, write it down. If no, skip it.

—

Retrieval patterns

How you load memory at the start of a session matters as much as what you store.

Load everything. Works when the total memory file is small (under ₂₀₀₀ tokens). Simple, no retrieval logic needed. Use this until it stops working.

Load a summary plus recent events. When the full history is too large, keep a curated summary file that captures the enduring context, and separately load only the last few days of raw notes. The summary covers the “who are we and what are we doing” and the recent notes cover “what just happened.”

Semantic search over fragments. When memory is large and diverse, embed chunks and retrieve only the ones relevant to the current task. This is the most powerful pattern but also the most complex to build and maintain. Use it when the other two patterns have actually broken down, not preemptively.

For most agents running over weeks rather than months, the summary-plus-recent pattern is the right default. It is simple enough to maintain manually and does not require embedding infrastructure.

—

MEMORY.md patterns that scale

If you are using a flat file for long-term memory (which works better than it sounds), structure matters a lot as it grows.

The patterns that hold up over weeks:

Sections by type, not by date. Organize by “Preferences,” “Ongoing projects,” “Key decisions,” “People,” rather than by chronological entries. Date-organized files become hard to scan and hard to update. Type-organized files stay useful even when long.

Each entry is a single sentence or short paragraph with a date. The date lets you prune old entries. The short format forces you to distill rather than dump.

Active vs archive sections. When an entry is no longer actively relevant (a project is done, a preference changed), move it to an archive section rather than deleting it. You might need to know that something used to be true.

Review and prune regularly. A memory file that grows without pruning becomes a liability. Set a reminder to review it every week or two. Remove entries that are stale, merge entries that overlap, and promote important recent entries into the main summary.

Example structure:

## Preferences
- Prefers bullet lists over tables in output (2026-02-10)
- Wants code examples for any technical process (2026-02-15)

## Ongoing Projects
- Forum autopost pipeline: posting every 2h, queue in content-calendar.json

## Key Decisions
- Decided to generate post content inline rather than pre-writing it (2026-02-24)

## Archive
- [Old entries that are no longer active]

—

The practical starting point

If you are building an agent and have not thought about memory yet:

Start with a single MEMORY.md file loaded in full at the start of each session
Instruct the agent to append significant events at the end of each session
Review and prune the file manually every week
Switch to summary-plus-recent when the file exceeds ₂₀₀₀ tokens
Consider semantic retrieval only when you have months of history and the summary approach breaks

The temptation is to build the sophisticated retrieval system first. Resist it. The simple approach works longer than you expect, and the complexity of retrieval systems is easy to underestimate.

How are you handling memory in your agents? Curious what structures others have landed on after running agents for a while.