Claude FINALLY Fixed AI Coding By Releasing This
Scores
Ever been stuck in a long coding session on AI agent only for it to suddenly forget something important that you told it to remember? Claude code is no doubt one of the best AI agents out there, especially with its new model, Sonnet 4.5. But even this agent can hallucinate. This is because we're not using Claude code to its maximum potential. The solution, context engineering. Anthropic recently released a paper titled effective context engineering for AI agents which explained how to use context effectively and it was a real gamecher for getting the most out of clawed code. So we took the concepts from the paper and simplified them for you so you can implement them right away. Let's get started. But first here's a word from our sponsor ACI.dev. Gate22 by aci.dev is the open-source MCP gateway helping build trust in how Agentic clients operate. It's redefining how Agentic frameworks connect and collaborate. Built for clients like Claude Code, GPT, and other Agentic systems, ACI.dev gives you a single intelligent bridge to manage everything MCP related. With ACI.dev, you get an admin control center to manage MCPs, monitor performance, and even create custom MCPs or connect to listed ones, all from one clean dashboard. Need collaboration? ACI.dev makes it easy. Create teams, share tools securely with permission-based access, and distribute configurations using bundles. These bundles condense multiple MCP servers into one unified endpoint. With a single click, you can generate your setup, copy it, and paste it into your local MCP.configuration file to start instantly. ACI.dev, your control hub for every Aentic connection. Click the link in the description and start building today. While working with AI models, you might have noticed that as the conversation grows, the model suddenly starts hallucinating. This happens because all these models have a specific context window in which they work. And when the conversation becomes too long, the context window gets filled. As a result, the model begins to hallucinate and loses track of what's happening. This is where our first principle comes in. Compaction. It's the practice of taking the conversation history and summarizing it while maintaining the key components and discarding anything that's no longer needed. This way, the next session starts with a fresh context window guided by a compact summary of what had already happened before. You've probably heard about this feature and may have even implemented it. But despite being an old concept, the paper emphasized that people still weren't using it enough. There are two ways Claude handles this. The first one is called autoco compact. The clawed context window doesn't just consist of a simple conversation history. It also includes system prompts, tools, MCP, memory files, and other messages that make up the context window. When you monitor the context using Claude's built-in context command, you can see how much the context window has been used up. As in my case, it was 92%. When Claude detects that the context window is almost full, it automatically runs autocompact. It summarizes the entire conversation history and automatically creates a new session with a summary that helps Claude understand what had already happened. Now, you might be wondering, should you always wait for the context window to fill up completely before doing autocompact? The answer is no. Whenever you feel the context window needs to be reduced, you can manually trigger it using Claude's command. You can also specify optional customization instructions to ensure that the important details you wanted to remember are still retained. I ran the compact command and saw how much context is left. The previous 92% of the context was now reduced to 36% and the available space increased. This is how compaction helps manage context effectively. So the question is if Claude automatically handles compaction, why do it manually? Because autocompact might trigger at an unexpected time, save only what it considers essential and discard the rest. It's considered best practice to run it manually because it prevents autoco compact from triggering unexpectedly and gives you control over what you want to save. You can decide which instructions you want it to remember which you can't control with autocompact. The new Claude Sonnet 4.5 model is an extremely powerful and highly intelligent model with built-in context management tools. But now it has become even more agentic. This model introduced new capabilities making it smarter than ever at managing context. The new Claude model features a 200K context window with improved intelligence that helps it determine which parts of the context are important to keep and which can be discarded, making context management better than ever. With Sonnet 4.5, it features new context editing that automatically trims all the results from previous tool calls and generated outputs, significantly expanding your context window even before any manual compaction. This greatly helps in reducing context clutter since you don't want to keep tool calls from a long time ago in memory. Now, the context window isn't the only place for storing context. The Claude memory tool uses a file-based system to create, read, delete, and update files in memory within the working environment. This memory tool helps Claude preserve the project state. So, when you open it in a new session, it already has a knowledge base of the project and you don't need to repeat the project details again. Claude also uses file names and file paths as references to retrieve details from these files just in time and load into the context. These specific file names and file paths help it identify the purpose of each file and access the data whenever necessary. You can also control what you want to save in Claude's memory by using the hash command. With every instruction you add after it, it will be stored in Claude's memory. Here I asked Claude to remember to tell me how much context was remaining before starting a new task to make sure the remaining context was enough for the new task. When I pressed enter, it asked which memory to store information in. I saved it in the project memory. Then it updated the memory in the claude.md file. Let's see how that worked in action. As you can see, after saving the command in memory, I asked it to create a new page for the user profile. Since I had already mentioned in the instructions saved in memory to issue a warning, it alerted me about the remaining context. It let me know that there wasn't enough context available to safely complete the task and advised me to run the compact command first to free up space before proceeding. This ensured there was enough room to complete the page implementation successfully. So, if you're starting a new session or can't resume from where you left off, do you have to explain your project all over again? Not anymore. With the claude.md file located in the root of your project, all the details are stored in one place, helping Claude understand what the project is about and what the requirements are. This concept isn't new and has been around for quite some time, but the paper emphasized that this strategy isn't being used enough. The claude.md file generally starts with a project overview and includes the technical details that guide it on how to build the project and what information it should remember. It contains key development details including specific instructions you want it to retain. You can modify the claw.md file with your own instructions as needed. To assess the impact of having a claw.md file, I created two projects using the same prompts. One with the claw.md file and one without. This was the result of the project without the file. When I asked it to clone Twitter, it produced a very basic version of the application. The authentication worked fine and you could post tweets, like and retweet. On the other hand, this was what happened when I gave the instructions in the claw.md file. As you can see, this is a more well-refined version of the application. It closely resembled the current X homepage, included a theme toggle, and had authentication working perfectly. You could like and retweet posts properly, and your retweeted posts appeared on the application. Overall, it was a much more complete and polished version, all thanks to the detailed instructions defined in the claw.md file. So that brings us to the next concept which is structured note takingaking. It's a technique where your agent regularly makes notes that persist in memory outside the context window. Just like how Claude creates a to-do list or another agent maintains a notes.md file, this simple task allows the agent to track its progress and manage the context effectively. The idea is to have Claude maintain certain files that are identified by their names. For example, you might have a progress.mmd file that contains the progress of all completed tasks and outlines the next steps. You might have a decisions.mmd file that documents the architectural decisions Claude has made along the way. You could also maintain a bugs.md file that records all the bugs the application has encountered and how to resolve them. The claude.md file is a special file that contains all project instructions for the agent. As we've discussed, these files allow Claude to take notes in a structured way that helps it manage and maintain the project more effectively. You can explicitly ask Claude to use structured note-taking by giving it instructions to update the progress and modify these files after every run. You can add this instruction to the project memory and whenever it executes tasks, it will update these files along the way using them as its persistent memory. The good thing about this method is that Claude can reference these files at any time without needing them in the active context window. So whenever it needs to look back at a past decision, it can consult the file and take actions related to it. This way, when agents use structured notes while working, they become more reliable in the long term and their performance improves significantly. This approach becomes even more powerful when combined with the memory tool as Claude can automatically update these files, creating a self-documentation workflow. Why use a single AI agent when you can have a team of multiple specialized agents working together? With sub agent architecture, you can have one main agent that acts as a coordinator and handles the delegation of tasks, while multiple smaller agents, each with their own access to tools and context, handle more specialized tasks with their own specific instructions. We can better understand how the sub agent architecture works with this simple example from Anthropic. The user's request is first sent to the lead agent. The lead agent then decides which sub agent will be used to perform a specific task. Each smaller sub aent has its own data, instructions, and tools. Each agent performs tasks, generates a summary, and sends the response back to the lead agent. Finally, after coordinating the tasks among all the agents, the lead agent compiles the results and sends it back to the user. There are multiple open-source sub aent frameworks that you can explore and check out. This is a really cool framework that I found. It contains all the installation details and information about the sub aents. It also features a van command that acts as an entry point to the task orchestrator which serves as the main component that decides which specialist agent to forward the task to. I'll link the repository in the description. So do check it out for yourself. Sub agents are an effective way to manage context because each operates within its own isolated window with specific instructions and tools. This lets them handle smaller well-defined tasks instead of depending on a single agent with one large and complex set of instructions. While a single agent system can get overloaded as the context expands, a multi- aent setup spreads the workload across sub agents, keeping things clear and efficient. Each sub aent handles its part and the lead agent coordinates and combines their outputs into a complete final result. Despite being such a cool feature, sub aents can become troublesome if not handled effectively because the agents need to coordinate with each other. It can take more time to process a single instruction compared to using one agent. Using sub aents involves message passing and multiple summarization steps. If not managed properly, it can lead to issues such as repetitive tool calls or bloated context, which can hinder rather than enhance overall performance. To see them in action, I created four agents in this project, each with its own specialized purpose. You can either use an open-source framework like the one I showed you or create your own agents with Claude. I'll be using these agents to help us build a beautiful website. So, what happened here was that I gave the task coordinator agent a prompt to develop an aesthetic coffee website with multiple pages. It analyzed the prompt and passed the result to the web developer agent. The web developer agent then started building the pages. With the coordination of multiple agents, this website was created. You can see that it's a well-built website that is responsive across all platforms and contains multiple pages with all the details as instructed in the prompt. It includes a menu page with a common menu and even went a step further by integrating a map showing the location. All in all, this is a thorough application created from just a simple prompt thanks to the coordination of agents working together. You just saw how to use multiple context engineering techniques to make your coding sessions better than ever. The choice between these approaches depends on the nature of the task. For example, you can use compaction when you want to maintain a natural flow of conversation that has become too long, not taking when you're developing iteratively with clear goals, and multi-agent architecture when you need to handle a complex task. Try out these techniques for yourself and see which one works best for you. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.
Summary
This video explains how to effectively manage context in AI coding with Claude, focusing on techniques like compaction, structured note-taking, and multi-agent architecture to prevent hallucinations and improve performance.
Key Points
- Claude's context window can fill up, causing hallucinations, so context management is crucial.
- Compaction summarizes conversation history to reduce context usage and prevent overflow.
- Claude's autocompact feature runs automatically but manual compaction offers better control over what to retain.
- The Sonnet 4.5 model has a 200K context window and improved context editing to trim tool outputs automatically.
- Claude memory tool uses files like claude.md to store project details and instructions for persistent context.
- Structured note-taking with files like progress.md, decisions.md, and bugs.md helps agents track progress and decisions.
- The claude.md file acts as a central instruction set, improving project understanding and output quality.
- Sub-agent architecture divides tasks among specialized agents, improving efficiency for complex projects.
- Multi-agent systems use a lead agent to coordinate specialists, reducing context overload and improving reliability.
- Open-source frameworks like van can help implement multi-agent systems with structured task delegation.
Key Takeaways
- Use manual compaction to maintain control over context retention and prevent unwanted information loss.
- Store project instructions in a claude.md file to ensure Claude retains essential details across sessions.
- Implement structured note-taking with persistent files to track progress, decisions, and bugs.
- Leverage multi-agent architecture for complex tasks to avoid context overload and improve task specialization.
- Combine context management techniques based on task complexity—use compaction for long conversations and sub-agents for complex builds.