ShadCN is Way too Powerful Now

AILABS-393 MFJ0mH72_qI Watch on YouTube Published January 06, 2026
Scored
Duration
9:43
Views
35,703
Likes
916

Scores

Composite
0.68
Freshness
0.04
Quality
0.88
Relevance
1.00
2,073 words Language: en Auto-generated

Most of you already know Shad CN as one of the most widely used UI libraries, but using an AI agent to build with it can be problematic. If you're building oneshot landing pages, you won't have a huge problem. But if you're building a new app or implementing a new feature, things break and they break other parts of the app as well. But this isn't anything new. This problem has already been solved and it's how engineers build apps. Now, AI agents always test the code that they write, but these agents become unreliable with large contexts. Therefore, we need a way to ensure that agents complete the work they're given. This is where the concept of agentic loops comes in. And anthropic solves this by using the Ralph loop. To solve my UI problem, I tried to implement the Ralph loop. And at first, it completely failed. But I soon learned that it wasn't because of the Ralph loop. It was because of the process I had implemented with it. So, Ralph is actually a new plug-in that has been released by Anthropic themselves. But this wasn't one of their original ideas. It's based on a technique by someone else and Anthropic implemented and open sourced it. Basically, Ralph is a loop and if you know about Claude code hooks, it uses these stop hooks which run when Claude stops outputting an answer. As soon as it stops, the AI agent is fed its initial prompt file again and this allows it to iteratively improve its work. Now, here's the important question. When does it actually break out of the loop? There's something called a completion promise, which can be any word you input. When Claude feels like its task is completed, it outputs this promise by itself. For example, in this case, the promise is the word complete. If the promise is in the return prompt, then the loop doesn't run again. So, until Claude outputs a promise, it doesn't stop. This makes sure that Claude doesn't just quit whenever it wants. After you install the plugin, you'll have three commands. The Ralph loop command, a cancel command, and a help command. In the loop command, you need to provide the prompt that is fed to the agent again and again. Sometimes it may get an impossible task that it's not able to solve, and it might get stuck in an infinite loop. So setting a max iteration count is a really good practice. I will leave the link to the repo below because they have some good best practices for the prompts you can give to the Ralph loop. But in this video, I'm only going to discuss the ones related to the UI workflow that I'm going to show you. So let's say we want to implement two features in this app. One is a command pallet where we add a menu to search through our app and execute other commands. To make sure that this new feature doesn't break other parts of the app, you would start with the tests. This is called test-driven development. If you're not familiar with this, you can ask Claude Code to set up the TDD structure for you where it creates an end-to-end test folder, a screenshots folder to check for UI problems and the corresponding test as well. The other feature we're going to implement is a board view in the database, similar to what notion allows us to do with their databases. If you've caught on, testdriven development is an approach where tests are written before the code is implemented. But this means that the initial tests will always fail. So if I'm implementing the command pallet feature, I wouldn't just start writing the code for it. Instead, I would first write elaborate tests for it. Then we write the minimum amount of code required to pass those tests. Once that's done, we refactor and add more functionality. And with every addition, we make sure the tests still pass. Another interesting thing is that these tests are automated and playright can be imported and used for visual verification. If you think that we're using the Playright MCP to autonomously verify this through the browser, you're wrong. With TDD, for each functional behavior, you can take screenshots. For example, if the functional behavior is adding a card, then the screenshot would show a card added into the board. So now, all the AI agent has to do is look at those screenshots and make sure there are no problems in the way that these Shad CNN components have been implemented. These test files ensure that whenever something new is added or while a feature is being built, all of our behavioral requirements are fulfilled. But in our case, we want to use the screenshots purely for UI verification. >> >> But if we already have TDD, why do we need the Ralph loop? As I already stated, with larger tasks and context windows becoming nearly filled, these models abruptly quit their tasks and require constant human input. Therefore, I can have tests written beforehand for any type of function that I want, then use the loop to instruct it on what to do, and it can work autonomously. by telling it what workflow to follow and then giving it the condition for when it can output the promise it completes the task and exits the loop which in this case is when it passes all 25 unique tests. So using the Ralph slash command I gave it a prompt so that it would iteratively build the command pallet feature. In the prompt we were basically telling it to implement the feature along with some basic requirements which aren't really important because the requirements can be found in the tests as well but we did outline the whole workflow. In that workflow, it was supposed to start by running the tests. It knows that the tests will fail and after that it needs to implement the components to make them pass. So that's the whole goal. Now, if this were a much broader task, chances are that when the context window fills up or claude gets confused, it will quit automatically. It will never output the completion promise. And since it never outputs the promise, the prompt will be fed back again and it will have to start all over, meaning it will iteratively keep working on it. But since this was a smaller task, it was actually able to implement everything in a single go, write out all the components, and make all of the tests pass. Now, after the tests pass, the workflow tells it to review all of the screenshots for the command pallet. These are screenshots taken at different stages to make sure that the UI, whether it's shad CN or any other component library you're using, is implemented correctly and that there aren't any minor issues. After that, it should run the tests again and make sure they still pass after the UI changes. Since all of the tests were passing and the screenshots were reviewed, it output the completion promise. This is where the loop stopped and didn't continue again. But there was a really big problem with this which I didn't notice in the command pallet feature because there were very few chances of UI errors there. However, when I moved on to implementing the board view, I realized there was a huge mistake in the system. I started by implementing the board with the same prompt. The requirements were different, of course, but the workflow was pretty much the same. Now, I was kind of surprised when it completed all of the requirements in one go. Don't get me wrong, it was actually making sure that all of the tests were passing. But while it was doing that, there were cases where the number of successful tests would actually decrease because by changing something, it would break something else. And this is why TDD is actually really important because of this recursive testing and making sure that everything works. But the main problem was that after it had verified that it was done and I went ahead and checked the UI. Most of the things were implemented correctly but it had completely missed some UI errors such as this one. I also checked the screenshots and the errors were showing up in those screenshots as well. So I asked it and we analyzed what actually went wrong. The real issue was a process failure specifically in terms of fixing the UI. What happened was that it did pass all of the tests because it was supposed to run the test files again and again, but there was no specific test for the UI other than the screenshots. It glanced at a few of them and it even ignored some of the UI errors that it had seen. Some files were completely ignored. So, the main issue was that it output its promise statement prematurely and didn't verify whether the UI was actually fixed. We went through a whole brainstorming session on how we could fix this and I even gave the prompt writing best practices from the repo to claude code. But in the end, we came up with some specific rules and a change in the process that would ensure the UI was always correct. Now, this had nothing to do with the tests because they're always going to run. The prompt we used for the command pallet is really helpful when the feature or implementation is very large where Claude doesn't hallucinate that it has completed the task, but instead due to a full context window or the complexity of the task, it quits abruptly. Now, Claude code is already really autonomous. There's no doubt about that. But there are still issues like this that we need to fix. So, we changed a number of things in the main prompt. The first was the screenshot verification protocol. We added a simple prefix to each image that told Claude whether it had read the screenshot or not. But when I first implemented this, it still didn't read all of the images. It would read a few, write verified on top of them, and just like before, it would quit early. So to solve this, we encouraged it to think in a different way. We told it that after it renamed all the screenshots, it should not output the promise yet, meaning it should not consider the task completed and it should let the next iteration confirm completion. So at least two loops should run. What happens in the next verification is that Claude verifies all the files have the verified prefix. Of course, this meant we had to change the tests and separate the image verification from the functional tests. The next iteration makes sure that all the images have verified results. And if Claude misses any, it looks at them again and fixes the output. With this change, the small UI errors we were facing were finally fixed and it was able to implement all of these features correctly. So, when it entered the next loop, it ran the tests again. Since it found some errors, it fixed them. And because all the files had the word verified in them, it ran one final test. This time, it completed its task in two loops and was able to fix all the major UI errors in the app. Let's talk about Automator now. After teaching millions of people how to build with AI, we started implementing these workflows ourselves. We discovered we could build better products faster than ever before. We help bring your ideas to life, whether it's apps or websites. Maybe you've watched our videos thinking, "I have a great idea, but I don't have a tech team to build it." That's exactly where we come in. Think of us as your technical co-pilot. We apply the same workflows we've taught millions directly to your project, turning concepts into real working solutions without the headaches of hiring or managing a dev team. Ready to accelerate your idea into reality? Reach out at hello@automator.dev. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below.

Summary

The video explores how to use Anthropic's Ralph loop with AI agents to reliably build complex UI features in ShadCN applications, emphasizing test-driven development and iterative verification to prevent UI errors and ensure reliable code completion.

Key Points

  • ShadCN is a popular UI library, but AI agents often fail to build complex features reliably due to context limitations and early termination.
  • The Ralph loop, a plugin by Anthropic, enables iterative code generation by refeeding prompts until a completion promise is output.
  • AI agents use stop hooks and a completion promise (e.g., 'complete') to determine when to exit the loop, ensuring task completion.
  • Test-driven development (TDD) is crucial: tests are written first, and AI agents must pass them before completing a feature.
  • Playwright screenshots are used for UI verification, ensuring ShadCN components are implemented correctly.
  • The main issue is premature completion: AI may pass functional tests but miss UI bugs, especially when ignoring screenshots.
  • To fix UI errors, a screenshot verification protocol was implemented where each image must be explicitly marked as 'verified'.
  • The process was refined so that at least two loops run: one to fix UI issues and a second to confirm all images are verified.
  • The solution ensures that AI agents don't prematurely exit the loop and only complete tasks after all functional and UI checks pass.
  • This approach enables autonomous, reliable feature development without human intervention, even for complex UI implementations.

Key Takeaways

  • Use the Ralph loop with a completion promise to ensure AI agents complete complex tasks reliably.
  • Implement test-driven development (TDD) with Playwright screenshots to verify both functionality and UI correctness.
  • Always verify UI changes by explicitly checking screenshots and marking them as 'verified' to prevent AI from skipping errors.
  • Set a max iteration count to prevent infinite loops when AI gets stuck on complex tasks.
  • Refine your prompt to require multiple iterations for verification, ensuring robustness in AI-generated code.

Primary Category

AI Agents

Secondary Categories

AI Tools & Frameworks Programming & Development AI Engineering

Topics

ShadCN UI AI coding agentic loops Claude Code Ralph loop test-driven development Playwright TDD automated testing UI verification context engineering Cursor AI Anthropic MCP server AI agent workflows

Entities

people
organizations
Anthropic Automator
products
ShadCN UI Claude Code Ralph loop Cursor AI Playwright MCP server
technologies
AI agents LLMs agentic workflows test-driven development automated screenshot verification context engineering
domain_specific

Sentiment

0.70 (Positive)

Content Type

tutorial

Difficulty

intermediate

Tone

educational technical inspiring promotional