How We Built Agent Builder's Memory System

The LangChain team shares the complete design journey of Agent Builder's memory system: why they prioritized it, technical implementation details, learnings from the build process, and what capabilities it unlocks. The memory system is the core infrastructure enabling agents to maintain context and learn across sessions. Details technical tradeoffs around hierarchical memory design, conflict handling, and write timing — valuable for engineers building agent applications.

How we built Agent Builder’s memory system How we built Agent Builder’s memory system A key part of Agent Builder is its memory system. In this article we cover our rationale for prioritizing a memory system, technical details of how we built it, learnings from building the memory system, what the memory system enables, and discuss future work. We launched LangSmith Agent Builder last month as a no-code way to build agents. A key part of Agent Builder is its memory system. In this article we cover our rationale for prioritizing a memory system, technical details of how we built it, learnings from building the memory system, what the memory system enables, and discuss future work. What is LangSmith Agent Builder LangSmith Agent Builder is a no-code agent builder. It’s built on top of the Deep Agents harness. It is a hosted web solution targeted at technically lite citizen developers. In LangSmith Agent Builder, builders will create an agent to automate a particular workflow or part of their day. Examples include an email assistant, a documentation helper, etc. Early on we made a conscious choice to prioritize memory as a part of the platform. This was not an obvious choice – most AI products launch initially without any form of memory, and even adding it in hasn’t yet transformed products like some may expect. The reason we prioritized it was due to the usage patterns of our users. Unlike ChatGPT or Claude or Cursor, LangSmith Agent Builder is not a general purpose agent. Rather, it is specifically designed to let builders customize agents for particular tasks. In general purpose agents, you are doing a wide variety of tasks that may be completely unrelated, so learnings from one session with the agent may not be relevant for the next. When a LangSmith Agent is doing a task, it is doing the same task over and over again. Lessons from one session translate to the next at a much higher rate. In fact, it would be a bad user experience if memory is not present – that would mean you would have to repeat yourself over and over to the agent in different sessions. When thinking about what exactly memory would even mean for LangSmith Agents, we turned to a third party definition of memory. The COALA paper defines memory for agents in three categories: Procedural: the set of rules that can be applied to working memory to determine the agent’s behavior Semantic: facts about the world Episodic: sequences of the agent’s past behavior How we built our memory system We represent memory in Agent Builder as a set of files. This is an intentional choice to take advantage of the fact that models are good at using filesystems. In this way, we could easily let the agent read and modify its memory without having to give it specialized tools - we just give it access to the filesystem! When possible, we try to use industry standards. We use AGENTS.md to define the core instruction set for the agent. We use agent skills to give the agents particular specialized instructions for specific tasks. There is no subagent standard, but we use a similar format to Claude Code. For MCP access, we use a custom tools.json file. The reason we use a custom tools.json file and not the standard mcp.json is that we want to allow users to give the agent only a subset of the tools in an MCP server to avoid context overflow. We actually do not use a real filesystem to store these files. Rather, we store them in Postgres and expose them to the agent in the shape of a filesystem. We do this because LLMs are great at working with filesystems, but from an infrastructure perspective it is easier and more efficient to use a database. This “virtual filesystem” is natively supported by DeepAgents - and is completely pluggable so you could bring any storage layer you want (S3, MySQL, etc). We also allow users (and agents themselves) to write other files to an agent’s memory folder. These files can contain arbitrary knowledge as well, that the agent can reference as it runs. The agent would edit these files as it’s working, “in the hot path”. The reason it is possible to build complicated agents without any code or any domain specific language (DSL) is that we use a generic agent harness like Deep Agents under the hood. Deep Agents abstracts away a lot of complex context engineering (like summarization, tool call offloading, and planning) and lets you steer your agent with relatively simple configuration. These files map nicely on to the memory types defined in the COALA paper. Procedural memory – what drives the core agent directive – is AGENTS.md and tools.json. Semantic memory is agent skills and other knowledge files. The only type of memory missing is episodic memory, which we didn’t think was as important for these types of agents as the other two. What agent memory in a file system looks like We can look at a real agent we’ve been using internally – a LinkedIn recruiter – built on LangSmith Agent Builder. AGENTS.md: defines the core agents instructions subagents/: defines only one subagent linkedin_search_worker: after the main agent is calibrated on a search, it will kick off this agent to source ~50 candidates. tools.json: defines an MCP server with access to a LinkedIn search tool There are also currently 3 other files in the memory, representing JDs for different candidates. As we’ve worked with the agent on these searches, it has updated and maintained those JDs. How memory editing works: a concrete example To make it more concrete how memory works, we can walk through an illustrative example. You start with a simple AGENTS.md: Summarize meeting notes. The agent produces paragraph summaries. You correct it: "Use bullet points instead." The agent edits AGENTS.md to be: # Formatting Preferences User prefers bullet points for summaries, not paragraphs. You ask the agent to summarize a different meeting. It reads its memory and uses bullet points automatically. No reminder needed. During this session, you ask it to: "Extract action items separately at the end." Memory updates: # Formatting Preferences User prefers bullet points for summaries, not paragraphs. Extract action items in separate section at end. Both patterns apply automatically. You continue adding refinements as new edge cases surface. The agent's memory includes: Formatting preferences for different document types Domain-specific terminology Distinctions between "action items", "decisions", and "discussion points" Names and roles of frequent meeting participants Meeting type handling (engineering vs. planning vs. customer) Edge case corrections accumulated through use The memory file might look like: # Meeting Summary Preferences - Use bullet points, not paragraphs - Extract action items in separate section at end - Use past tense for decisions - Include timestamp at top - Engineering meetings: highlight technical decisions and rationale - Planning meetings: emphasize priorities and timelines - Customer meetings: redact sensitive information - Short meetings (<10 min): just key points - Sarah Chen (Engineering Lead) - focus on technical details - Mike Rodriguez (PM) - focus on business impact The AGENTS.md built itself through corrections, not through upfront documentation. We arrived iteratively at an appropriately detailed agent specification, without the user ever manually changing the AGENTS.md. Learnings from building this memory system There are several lessons we learned along the way.

Sources

LangChain Blog