← Back to all posts
Fine-TuningReActJSONL

Fine-Tuning LLMs With ReAct Reasoning Chains

February 4, 2026·6 min read·3 views

Generic LLMs know a lot about everything and not enough about your specific domain. Fine-tuning bridges that gap -but only if your training data encodes the right reasoning patterns.

The ReAct Pattern

ReAct (Reasoning + Acting) structures each training example as:

Thought - The model's internal reasoning about what to do
Action - The specific action to take (search, recommend, calculate)
Action Input - The parameters for that action
Observation - The result of the action

This teaches the model HOW to think, not just WHAT to answer.

Building the Dataset

For each client AI assistant, we create JSONL datasets from:

Company websites (product info, policies, FAQs)
Product catalogs (specifications, compatibility, pricing)
Prompt builder rules (response format, tone, constraints)

Each example follows the ReAct chain, showing the model the complete reasoning process from question to answer.

Case Study: AmatiBot

For Amati Model's ship modeling assistant, we extracted every product detail from their catalog -materials, scales, difficulty levels, tools required. Each training example shows the model reasoning through a customer question:

Thought: The customer is asking about beginner ship models. I should filter by difficulty level and recommend appropriate kits. Action: search_products Action Input: difficulty=beginner, category=ship_kits Observation: Found 12 beginner kits...

The Process

1.Extract domain knowledge from all available sources
2.Structure into ReAct reasoning chains
3.Format as JSONL training data
4.Fine-tune and evaluate
5.Deploy with monitoring

This is repeatable across any domain. The key investment is in building high-quality training data with proper reasoning chains.