Google's Biggest Fail of the year
Scores
If you've been following the channel, you must be familiar with the different types of context engineering workflows that we've covered here. Well, Google also released another one. I wish I could say that it's better than other workflows, but the truth is that it's not. And there are many problems with this. Even if you argue that it's better for the Gemini ecosystem, it's still not good. Before we dive into why there was no need to release this, let's take a quick break to talk about Automata. After teaching millions of people how to build with AI, we started implementing these workflows ourselves. We discovered we could build better products faster than ever before. We help bring your ideas to life, whether it's apps or websites. Maybe you've watched our videos thinking, "I have a great idea, but I don't have a tech team to build it." That's exactly where we come in. Think of us as your technical co-pilot. We apply the same workflows we've taught millions directly to your project, turning concepts into real working solutions without the headaches of hiring or managing a dev team. Ready to accelerate your idea into reality? reach out at hello@automator.dev. Now, before I explain the reason as to why this is just another poor attempt at a context engineering workflow, let's first dive into how conductor actually works. So, this is the article and I'll have a link for this down in the description below. At the end, you'll get a command to actually install this as an extension in Gemini CLI. For those of you who don't know, extensions are sets of commands, MCPs, and other rules that are bundled together and made into a package that people can then host and share with others. Claude also has something similar called plugins. So to actually start the workflow, you use the command and it installs. After installation, you can use it /ash commands in conductor. You'll get these five commands that actually control conductor and how you use the workflow. Now the very first command that you're going to use is the setup command. What this command does is first check if the existing conductor files such as the setup state and the other files that tell it if a project has already been initialized are available or not. Instead of stories, it makes up these files called tracks and completes those one by one. After that, it initialized a new GitHub repo and asked what to build. To test it out, I created a simple project, but I did want to test whether the architecture it made would actually be good. So, just to actually test if it would recommend the things that I would actually need, I told it that it should be production ready and scalable to a larger number of users. After that, it created the product.md file which contained the actual concept of what I wanted to build to actually refine and craft it. It started asking me questions and at the end because the questions weren't actually leading anywhere and they were really simplistic. I just had it autogenerate everything. After it approved and saved the product guide, it wanted to create another file which was the product guidelines which were mainly focused on the styling of the product and some design principles. It also approved that and saved the product guidelines as well. After that, it defined the technology stack. And this is one of the reasons the workflow was not good. It messed up the text stack that it was offering me because it knew what my whole project was and it still didn't really recommend what was appropriate. After I had that corrected, it also approved the text stack and updated that MD file as well. It also has these files called code style guides. If I go into the actual folder, these are the only languages that it has. And if it thinks we are going to be using any of these in the project, it adds them to our current project's code style guides during the initialization. The default workflow that it's using is actually pretty good. By default, it includes 80% code test coverage. And while it was setting stuff up and writing the base components, it was making sure that the tests were being written as well. And after completing tasks, it was testing them as well. At the same time, it was committing changes after every task and also using git notes so that we could actually track where or whenever something went wrong. After completing the initial setup, it created some high-level product requirements so that we could get on the initial track. This is the first track that it was trying to implement. Again, this was too broad and needed to be broken into smaller tracks. This was too much to do in one track and there were a lot of chances to mess up if it was doing this much at the same time. So after you complete that, you can start your work by running the implement command. And in the tracks folder, you have different tracks that it implements one by one. Each track has two files, a plan.md and a spec.md. The spec.md contains the objective and the technical details extracted from the text stack and the information that we inputed at the start. The plan.md actually contains the tasks that it needs to implement one by one. When you're actually using the implement command, it looks at the tracks.mmd and basically looks at each track where based on the status, it actually knows what to do. So if it's empty, it's not started. This means that it's in progress and this means that the track has been completed. And as you can see, this current track is in progress. As for the other commands, the status command gives you a status report of what is currently going on and which tracks are being followed and which ones are not complete. If you use the new track command, it's going to ask you the different questions again for the new task. I also implemented it in a pre-existing repository and it went pretty much the same way. It was a little different because it would look at the existing files and just ask me clarifying questions and it didn't ask for a new track. I had to implement a new track myself as a new feature. And then there's revert. Another really clever feature that actually mitigates any damage and is git aware. So it uses git to help out if the agent messes up anywhere. Now currently the file management and structure isn't that bad. The way it implements new features or existing tasks into tracks and then keeps track of them is actually pretty good. But the way the instructions have been written or how these command files have been written does need work because they're not really properly managing the context loop where it has to check everything and if there is a change then how it needs to change that because even during this initial process there were a lot of mistakes. The first mistake is that while it was asking for the creation of each document, it didn't really dissect my idea properly and I had to guide it through a lot of the stuff. When I thought it was adequate, I just let it autogenerate the rest of the content. And again, as I mentioned before, while defining the technology stack, it also missed a lot of things. Option B was good, but since I told it that I wanted a fully scalable app with a large number of users, it missed a lot of things that I had to clarify and explicitly tell it that it also needed. And then it modified the plan. When the initial track was generated, I actually went in and looked at the plan and the specs that it had generated and the database schema was totally incomplete. It had missed a lot of things that were crucial to setting up the app and I had to guide it again and steer it in the right direction. Now, Gemini is actually a really good model. So, I have to suspect that the commands that have been implemented are what's making it behave this way. And then the biggest reason I believe that even though the setup itself is actually good, there are a lot of problems in the main/comands and especially the workflow.mmd is because it messed up a really big part. After I told it that I wanted to change npm and instead I wanted to use pnpm since I had forgotten to mention it earlier. For some reason it tried to make a backup first and while doing that it stated that it needed to remove the files made with npm but it ended up removing the entire conductor folder itself which contained all the planning files. After deleting that, it was continuously looking for the folder and when it couldn't find it, it said that it would reconstruct the conductor folder using its context and everything that it had in its memory. So basically, it had to rewrite everything as opposed to what a normal context workflow should do where the change should only affect the main context files and the files related to that specific task, which is what BMAD does to operate efficiently. Now, if I hadn't asked it to abruptly change something, maybe it would have gone well. But still when it was initializing all the tasks and I asked it to start implementing the first track, it began and initialized the project and the other core services that I needed. Now when it came to configuring the environment variables for the Superbase connection, for some reason it automatically marked the task as completed while clearly putting a dummy key in there. It didn't even ask me to set up the Superbase project or provide it with an actual key and it automatically tried to push the database schema since there was no actual key. It failed and then it asked me to doublech checkck the string. So, even the tasks aren't being properly updated and it wasn't really following them correctly. I honestly wouldn't use this right now for end-to-end spec development. BMAD is a much better option and for small projects, I still make my own context files. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.
Summary
The video critiques Google's Conductor workflow for Gemini CLI, highlighting its poor context management, flawed command structure, and inability to handle changes effectively, concluding that it's not suitable for end-to-end development compared to alternatives like BMAD.
Key Points
- Google released a context engineering workflow called Conductor for Gemini CLI, which the creator finds ineffective.
- Conductor uses a track-based system with plan.md and spec.md files to implement features incrementally.
- The workflow includes commands like setup, implement, status, new track, and revert, but suffers from poor context handling.
- During setup, Conductor failed to properly understand the project requirements, missing key elements like a scalable database schema.
- It made incorrect technology stack recommendations and failed to properly manage environment variables for Supabase.
- A critical bug occurred when changing package managers: Conductor deleted the entire conductor folder instead of just updating relevant files.
- The workflow attempted to reconstruct the entire project from memory instead of preserving context, violating efficient context engineering principles.
- Even with a strong model like Gemini, poor command design leads to flawed execution and context drift.
- The creator recommends using BMAD for better context management and building small projects manually when needed.
Key Takeaways
- Evaluate context engineering workflows not just on model capability, but on how well they manage context and handle changes.
- Use tools that maintain context integrity and only modify relevant files when changes occur.
- Avoid workflows that require full reconstruction of project state after minor changes.
- Consider alternatives like BMAD for more reliable end-to-end development workflows.
- When building AI-powered tools, ensure command design supports accurate context tracking and error recovery.