Learning AI by Breaking It: What Building an MCP System in a Flower Shop Taught Me
I didn’t start with a framework. I started with a question that kept me up at night.
Why did every AI system I tried feel like it was operating in a different reality than the one I was working in?
The models were innovative. The prompts were decent. But the AI kept suggesting things that made sense in theory and collapsed under real constraints. It ignored inventory limits until after the recommendation. It missed timing pressure entirely. It treated every decision as if options were infinite and consequences didn’t exist.
That gap is exactly what Model Context Protocol fixed for AI decisions.

I wanted to know if that was fixable.
So I wired an AI system directly into a real business with real constraints, real inventory, and real money on the line. The test case was a local flower shop. Small operation. Tight margins. Inventory that expires. Demand that spikes without warning.
I learned more in three months of building and breaking that system than I learned in two years of reading about AI.
What I Tried First
I started the way most people start. I built a prompt library.
I wrote prompts for SEO analysis, customer intent classification, inventory planning, and demand forecasting. Each prompt was carefully structured. Each one worked fine in isolation.
The problem showed up when I tried to use them together.
The AI would classify intent beautifully, but had no idea what inventory looked like. It would forecast demand, but couldn’t connect that to actual customer behavior. It would suggest optimizing a page for “sympathy flowers” without knowing we only had three arrangements in stock and that 12 people were searching for same-day delivery.
Every answer was technically correct and operationally useless.
I realized the problem wasn’t the prompts. The problem was that the AI was reasoning in a vacuum. It had access to my questions but not to the business.
That’s when I started rebuilding the whole thing from the ground up.
Wiring Context Instead of Writing Better Prompts
I stopped trying to make the prompts smarter.
I started trying to make the context real.
I didn’t design this from scratch. I found an existing Model Context Protocol (MCP) implementation, cloned it, and started wiring it into a live business. That decision mattered. Starting from real code forced me to deal with assumptions I wouldn’t have hit if I’d stayed in theory.
Nothing worked out of the box.
The MCP concepts were sound, but connecting them to Search Console, GA4, and Shopify exposed how messy real data actually is. The naming didn’t match. Timing didn’t line up. IDs drifted. Every integration decision forced me to choose what the system should treat as truth.

That’s when working with real data made the LLM useful.
Not as a code generator. As a coach.
I used it to reason through tradeoffs.
- What should count as intent versus noise?
- When should behavior override search signals?
- How should inventory constraints shape recommendations?
I would wire something up, test it, watch it fail, then ask the LLM why the logic didn’t hold once real data flowed through it. We’d adjust the structure, not the prompt. Then I’d test again.
That loop repeated dozens of times.
Instead of pulling raw data, I built objects that carried relationships.
A search for “funeral flowers delivered today” wasn’t a string anymore. It became an object that knew urgency, typical behavior under pressure, historical conversion patterns, current inventory state, and revenue outcomes from similar moments.
That shift changed everything.
I wasn’t feeding the LLM more data. I was feeding it meaning.
Once that layer existed, I could ask questions that only made sense in context.
Questions like, “Which pages are attracting urgent intent but failing to convert?”
The system didn’t guess. It retrieved the context objects, compared intent to behavior, cross-referenced outcomes, and surfaced pages where the mismatch was obvious.
That’s when I found the first broken page.
The Page That Looked Fine But Performed Terribly
The “Sympathy Flowers” page ranked well. Traffic was steady. Conversion was terrible.
I passed the context to the LLM and asked it to reason about the gap.
The dominant search intent was urgent. “Funeral flowers delivered today.” “Same-day sympathy arrangements.” People were scanning for speed and certainty.
The page copy was calm and explanatory. It talked about the meaning of flowers. It offered browsing options. Delivery timing was buried halfway down the page.
The LLM didn’t tell me to rewrite the page. It asked me a question.
“If someone arrives with this intent and this urgency, what questions are they trying to answer in the first 10 seconds?”
I wasn’t ready for that question.
I had been thinking about the page as content. The LLM, working within a structured context, was treating it as a decision environment.
Once I saw it that way, the mismatch was obvious. The page answered questions nobody was asking. It leaned on emotion but ignored intent. People wanted to know if we could deliver today. We talked about flower symbolism.
I restructured the page. Delivery timing moved to the top. Availability became explicit. The copy stopped explaining and started confirming.
Conversion rate doubled in four days.
I didn’t write better sentences. I reordered information based on decision pressure.
What Surprised Me Once the System Was Live
The biggest surprise wasn’t that it worked. The biggest surprise was how differently the LLM behaved once it had structured context.
Before, when I asked the LLM questions about SEO strategy, it would give me generic best practices. “Focus on user intent.” “Improve page speed.” “Add schema markup.” All true. None of it is actionable.
After wiring in the MCP layer, the LLM stopped guessing and started reasoning.
I would ask: “What’s driving same-day demand right now?”
The system would return:
- Intent patterns from the last 48 hours show a spike in sympathy arrangement searches
- Behavioral signals showing people checking delivery timing and leaving without converting
- The current inventory state shows we only have three arrangements left
- Revenue implications based on historical conversion under similar conditions
The LLM could reason over that context. It didn’t hallucinate personas or invent motivations. It worked with what was real.
The difference was dramatic. I stopped getting advice. I started getting analysis grounded in the actual business state.
How RAG Changed What the LLM Could Do
Next, I implemented retrieval-augmented generation (RAG) in the system, but not as most tutorials describe it.
I wasn’t retrieving documents. I was retrieving the operational state.
When I asked a question, the system pulled meaning, not SQL results. It returned context objects with relationships already defined.
That’s what made RAG worthwhile. Not knowledge retrieval. State retrieval.
The LLM could reason about what was happening right now, not what generally happens or what the training data suggests might happen.
Hallucination dropped almost to zero. Not because I wrote better prompts. Because I constrained the LLM to known objects and verified relationships.
If the system didn’t have an object for something, the LLM couldn’t reason about it. That sounds limiting. It was liberating.
I stopped worrying about whether the AI was making things up. I started trusting it to help me think through decisions.

and constrains LLM reasoning with real context
What I Got Wrong at First
I thought the hard part would be the AI.
The hard part was the data.
Product names didn’t match across systems. Dates were in different formats. IDs drifted. I spent three weeks just normalizing data so that “sympathy arrangement” in Shopify meant the same thing as “sympathy flowers” in Search Console and “funeral flowers” in GA4.
Nobody writes blog posts about that work. That work made everything else possible.
I also thought I needed to start big. Build the whole system. Connect everything. Make it comprehensive.
That was wrong.
I should have started with three objects, one question, one decision. Prove that the structure worked before expanding it.
Complexity earns its place. Complexity doesn’t deserve it upfront.
How Using the LLM as a Coach Changed My Approach
I stopped asking the LLM to write things for me.
I started asking it to help me think.
While restructuring pages, I didn’t ask the LLM to rewrite the copy. I asked it to help me sequence answers.
- “What question should be answered first for someone with urgent intent?”
- “What question comes second?”
- “What can wait?”
The LLM was good at ordering. It wasn’t good at writing in my voice.
So I used it as a thinking partner, not a generator. It helped me pressure-test whether the structure matched the intent. I wrote the actual words.
That kept the pages human. But it also kept them aligned with how people were actually making decisions.
The same approach worked for strategy questions.
I would show the LLM the context objects and ask: “Does this title confirm or contradict the intent?”
If the answer was a contradiction, I rewrote. If the answer was confirmation, I moved on.
The LLM became a mirror, reflecting whether my decisions matched the reality the system was seeing.
What Changed Once the System Had a Clear Context
The most significant operational shift wasn’t that we made better decisions. It’s that we made them faster.
Before, analyzing a page took days. Pull Search Console data. Export GA4 reports. Check Shopify manually. Try to correlate the signals. Guess what might be wrong.
After that, the system surfaced mismatches within minutes.
Pages with urgent intent, fast scroll behavior, low conversion rates, and early exits near delivery sections appeared automatically. I didn’t have to hunt for them. The context objects made the pattern obvious.
Decision time collapsed after I removed ambiguity from the system.
I wasn’t guessing which pages needed work. The system showed me pages where intent and structure were misaligned.
I wasn’t guessing what to fix. The LLM helped me reason about decision pressure based on actual behavior patterns.
I wasn’t guessing if it worked. Shopify revenue outcomes told me.
That feedback loop changed how I thought about AI systems entirely.
Why Building Under Real Constraints Matters
You can’t learn this from tutorials.
Tutorials assume clean data, clear objectives, and forgiving timelines. Real businesses have messy data, conflicting constraints, and inventory that expires before you finish the analysis.
I learned more from the system breaking than from the system working.
When the normalization layer failed, I learned why the MCP schema matters more than I thought.
When the LLM hallucinated before I added structured context, I learned why retrieval beats cleverness.
When pages I thought were fine showed apparent misalignment, I learned why intent patterns matter more than keywords.
Every mistake taught me something I couldn’t have learned by reading about frameworks and theories.
The system I built isn’t perfect. It’s good enough to make better decisions under real constraints. That’s the bar that matters.
What I’d Tell Someone Starting Now
Don’t start with the AI. Start with the context.
Figure out what signals actually matter in your business. Build objects that carry those signals and their relationships. Make sure the data is clean enough to mean something consistent.
Then bring the LLM in, but not as a generator. Bring it in as a reasoning layer that sits atop a structured context.
Ask it to help you think, not to feel for you.
Test it under real constraints. Let it break. Fix what breaks. Test it again.
Learning happens during the iteration, not in the initial build.
Why This Matters More Than Tools or Models

executive level view of traffic sessions purchases and revenue across a single interface
The tools will change. The models will get better. The frameworks will evolve.
What won’t change is that operators make decisions under constraints with imperfect information.
AI can’t replace that judgment. AI can amplify it if you build systems that reflect how decisions actually get made in your business.
That requires learning by building, not learning by reading.
You have to wire the AI into real operations, test it against real outcomes, and fix it when it fails. You have to spend weeks on tedious normalization work that nobody will celebrate. You have to rebuild things you thought were done.
That’s not scalable advice. That’s not frameworkable. That’s just how you learn whether AI can actually help you make better decisions or if it’s just adding complexity.
I learned that building an MCP system in a flower shop wasn’t about the flower shop. It was about understanding what happens when you force AI to operate inside the same constraints you do.
Constraints are where learning happens.
The constraints are where AI stops being impressive and starts being useful.

Frequently Asked Questions
What did building an MCP system teach you that reading about AI didn’t?
Building inside a real business exposed constraints that tutorials never show. Inventory limits, timing pressure, messy data, and incomplete signals forced the system to fail in ways theory never predicted. Each failure clarified what context actually matters for decision-making.
How did the MCP change how the LLM behaved?
Before MCP, the LLM offered generic best practices. After MCP, it reasoned over actual business state. The difference came from structured context, not better prompts. The model stopped guessing and started analyzing.
How did this affect how you optimized SEO pages?
It shifted my focus from keywords to decision pressure. Pages failed when they answered the wrong question for the intent driving the visit. Once intent, behavior, and outcomes were connected, mismatches became obvious and fixes became structural, not copy-driven.
Did the LLM write the content for you?
No. I used the LLM as a reasoning partner, not a writer. It helped me sequence answers, test assumptions, and validate whether page structure matched intent. I wrote the copy.
What broke first when you built this system?
Data consistency. Product names, dates, and identifiers didn’t align across systems. Normalization took longer than anything else, and it determined whether the rest of the system worked.
Does this approach apply outside e-commerce?
Yes. Any business where decisions depend on intent, behavior, and outcomes can benefit. The flower shop was a test case. The pattern generalizes.
Stay Connected
I share new leadership frameworks and case studies every week. Subscribe to my newsletter below or follow me on LinkedIn and Substack to stay ahead and put structured decision-making into practice.
Related Articles
Building AI Tools With LLMs: A Practical Guide Leaders Need
Regression vs Classification: The Truth About My $8K ML Failure
AI Task Analysis: How Top Performers Cut Workflow Time 40%
AI Traffic in GA4: How to Separate Humans vs Bots
The Truth About AI-Driven SEO Most Pros Miss
Intent-Driven SEO: The Future of Scalable Growth
SEO Strategy for ROI: A Better Way to Win Big
Future of SEO: Unlocking AEO & GEO for Smarter Growth
Skyrocket Growth with Keyword Strategy for Founders
Unlock Massive Growth with This 4-Step SEO Funnel
About the Author
I’m Richard Naimy, an operator and product leader with over 20 years of experience growing platforms like Realtor.com and MyEListing.com. I work with founders and operating teams to solve complex problems at the intersection of product, marketing, AI, systems, and scale. I write to share real-world lessons from inside fast-moving organizations, offering practical strategies that help ambitious leaders build smarter and lead with confidence.
I write about:
- AI + MarTech Automation
- AI Strategy
- COO Ops & Systems
- Growth Strategy (B2B & B2C)
- Infographic
- Leadership & Team Building
- My Case Studies
- Personal Journey
- Revenue Operations (RevOps)
- Sales Strategy
- SEO & Digital Marketing
- Strategic Thinking
Want 1:1 strategic support?
Connect with me on LinkedIn
Read my playbooks on Substack

Leave a Reply