Building RAG That Actually Works: Lessons From 5 Production Systems
I've built RAG systems for Caffè Vergnano (coffee), I Love Riccio (hair care), and Amati Model (ship modeling). Each had completely different data structures, query patterns, and accuracy requirements. Here's what I learned.
Lesson 1: Metadata Is More Important Than Embeddings
Everyone focuses on embedding models and chunk sizes. The real differentiator is metadata. When searching across multiple product catalogs, I built a universal metadata parser that normalizes fields across indexes. Without it, cross-index search was useless.
Lesson 2: Multi-Index Search Requires a Router
Searching one Pinecone index is straightforward. Searching three simultaneously -each with different schemas -requires an intelligent routing layer that decides which indexes to query based on intent.
Lesson 3: Fine-Tuning Beats Prompt Engineering for Domain Knowledge
For AmatiBot (ship modeling), generic prompts couldn't capture the domain depth needed. We created custom JSONL datasets from product catalogs and trained using ReAct reasoning chains: Thought → Action → Action Input → Observation. The improvement was dramatic.
Lesson 4: User Parameter Extraction Changes Everything
For I Love Riccio, we extract hair type, thickness, problems, and gender before making any recommendation. This structured extraction turns a vague "recommend something" into a precise query that hits the right vectors.
Lesson 5: Test With Real Users, Not Benchmarks
Every RAG system I've built scored well on synthetic benchmarks and failed with real users initially. The gap is always in how real people phrase queries versus how we expect them to.
The Architecture Pattern
Query → Parameter Extraction → Intent Classification → Index Selection → Retrieval → Re-ranking → Response Generation → Reflection
Each step is a node. Each node is testable. That's the difference between a demo and a system.