Response Text Chatbot Service - Complete Developer Documentation¶
Service: LLM Response Generation for Text Chatbots (RAG Implementation)
Port: 8012
Purpose: Generate AI responses using retrieval-augmented generation (RAG) with Milvus vector search
Technology: FastAPI, LangChain, Azure OpenAI, Milvus, FastEmbed
Code Location:/response-text-chatbot-service/src/main.py(726 lines)
Owner: Backend Team
Last Updated: 2025-12-26
Table of Contents¶
- Service Overview
- Complete Architecture
- Complete Endpoints
- RAG Implementation
- UTM-Based Targeting
- Conversation History
- System Prompts
- Security Analysis
- Performance
- Deployment
Service Overview¶
The Response Text Chatbot Service implements a production-grade Retrieval-Augmented Generation (RAG) system for text-based chatbot interactions. This service is one of three response generation services, handling text-only chatbot conversations.
Key Responsibilities¶
✅ RAG Pipeline - Semantic search + LLM generation
✅ Milvus Vector Search - Find relevant content chunks
✅ UTM-Based Targeting - Personalized responses by traffic source
✅ Conversation History - Multi-turn conversations with context
✅ Token Tracking - Monitor API usage per session
✅ LLM Orchestration - Azure OpenAI GPT-3.5-Turbo-16k via LangChain
Technology Stack¶
| Technology | Purpose | Specifications |
|---|---|---|
| Azure OpenAI | LLM | GPT-3.5-Turbo-16k (API 2024-02-15) |
| Milvus | Vector database | Cosine similarity search |
| FastEmbed | Embeddings | BAAI/bge-small-en-v1.5 (384D) |
| LangChain | LLM framework | Messages, chains, parsers |
| MongoDB | Data storage | chatbot_history1, files |
| Tiktoken | Token counting | cl100k_base encoding |
Statistics¶
- Total Lines: 726
- Endpoints: 2 (main RAG + simple chain)
- System Prompts: 3 (Sales-Agent, Service-Agent, Informational-Agent)
- Default Top-K: 5 chunks
- Average Response Time: 2-5 seconds
- Token Limit: 16,000 tokens
Complete Architecture¶
End-to-End Data Flow¶
graph TB
USER[\"User Question\"]
subgraph \"Step 1: Context Retrieval\"
EMBED[\"Generate Embedding<br/>(BAAI/bge-small-en-v1.5)\"]
UTM[\"Match UTM Config<br/>(Scoring Algorithm)\"]
MILVUS[\"Milvus Search<br/>(Top-5 Chunks)\"]
UTM_CHUNKS[\"Retrieve UTM<br/>Chunks\"]
end
subgraph \"Step 2: History Retrieval\"
MONGO[\"MongoDB Lookup<br/>(chatbot_history1)\"]
PARSE[\"Parse Chat<br/>History\"]
end
subgraph \"Step 3: Prompt Assembly\"
SYS_PROMPT[\"System Prompt<br/>(Sales/Service/Info)\"]
UTM_INST[\"UTM Instructions<br/>(Append)\"]
CONTEXT[\"Build Context<br/>(General + UTM)\"]
MESSAGES[\"LangChain Messages<br/>(System + History + User)\"]
end
subgraph \"Step 4: LLM Generation\"
OPENAI[\"Azure OpenAI<br/>GPT-3.5-Turbo-16k\"]
RESPONSE[\"AI Response\"]
end
subgraph \"Step 5: Post-Processing\"
TOKENS[\"Count Tokens<br/>(Tiktoken)\"]
SAVE[\"Save to MongoDB<br/>(chatbot_history1)\"]
end
USER --> EMBED
USER --> UTM
EMBED --> MILVUS
UTM --> UTM_CHUNKS
MILVUS --> CONTEXT
UTM_CHUNKS --> UTM_INST
USER --> MONGO
MONGO --> PARSE
SYS_PROMPT --> MESSAGES
UTM_INST --> MESSAGES
CONTEXT --> MESSAGES
PARSE --> MESSAGES
MESSAGES --> OPENAI
OPENAI --> RESPONSE
RESPONSE --> TOKENS
TOKENS --> SAVE
SAVE --> USER
style USER fill:#e1f5fe
style OPENAI fill:#fff3e0
style MILVUS fill:#f3e5f5
style SAVE fill:#c8e6c9
Complete Endpoints¶
1. POST /v2/get-response-text-chatbot¶
Purpose: Generate AI response using full RAG pipeline with conversation history and UTM targeting
Code Location: Lines 560-677 (118 lines)
Request:
POST /v2/get-response-text-chatbot
Content-Type: multipart/form-data
user_id=User-123456
project_id=User-123456_Project_1
session_id=session_20250115_140530
question=What are your pricing options?
originating_url=https://example.com/pricing?utm_source=google&utm_medium=cpc
Parameters:
| Parameter | Required | Description |
|---|---|---|
| user_id | Yes | User identifier |
| project_id | Yes | Project identifier |
| session_id | Yes | Session identifier for history tracking |
| question | Yes | User's question |
| originating_url | Optional | URL with UTM parameters for targeting |
Response:
{
"question": "What are your pricing options?",
"text": "We offer three pricing tiers: Basic ($29/month), Pro ($99/month), and Enterprise (custom pricing). Each tier includes different features and chatbot limits. Would you like me to explain the differences in detail?"
}
Processing Steps¶
Step 1: System Prompt Selection (Line 572)
⚠️ Note: Currently hardcoded to "Sales-Agent" - doesn't use chatbot_purpose from database
Step 2: UTM Config Matching (Lines 574-603)
if originating_url:
utm_config = get_matching_utm_config(originating_url, user_id, project_id)
if utm_config:
# Retrieve UTM-specific chunks from Milvus
utm_chunks = retrieve_utm_content(
utm_config_id=str(utm_config.get("_id")),
user_id=user_id,
project_id=project_id,
question=question,
top_k=5
)
utm_content = "\n".join([chunk["content"] for chunk in utm_chunks])
# Append UTM-specific instructions to system prompt
utm_instructions = utm_config.get("instructions", "")
if utm_instructions:
system_prompt = system_prompt + f"\n\n[UTM-Specific Instructions]:\n{utm_instructions}"
Step 3: General RAG Retrieval (Line 606)
results = retrieve_relevant_documents(user_id, project_id, question)
context = "\n".join([result["content"] for result in results[:5]])
Step 4: Chat History Retrieval (Lines 617-633)
chat_session = history_collection.find_one({
"project_id": project_id,
"user_id": user_id,
"session_id": session_id
})
chat_history_text = ""
if chat_session:
for msg in chat_session.get("chat_data", []):
user_input = msg.get("input_prompt", "").strip()
bot_response = msg.get("output_response", "").strip()
if user_input and bot_response:
chat_history_text += f"User: {user_input}\nAssistant: {bot_response}\n"
Step 5: LangChain Message Construction (Lines 635-661)
messages = []
# System message
messages.append(SystemMessage(content=system_prompt))
# Chat history messages
if chat_history_text:
history_lines = chat_history_text.strip().split('\n')
i = 0
while i < len(history_lines):
if history_lines[i].startswith("User: ") and history_lines[i+1].startswith("Assistant: "):
messages.append(HumanMessage(content=history_lines[i].replace("User: ", "")))
messages.append(AIMessage(content=history_lines[i+1].replace("Assistant: ", "")))
i += 2
# Current question with context
context_parts = []
if utm_content:
context_parts.append(f"[UTM-Specific Content]:\n{utm_content}")
context_parts.append(f"Context:\n{context}")
context_and_question = "\n\n".join(context_parts) + f"\n\nQuestion: {question}"
messages.append(HumanMessage(content=context_and_question))
Step 6: LLM Invocation (Lines 663-665)
Step 7: Save to History (Line 668)
2. POST /v2/get-response-chain¶
Purpose: Simplified response generation using LangChain chains (no history, no UTM)
Code Location: Lines 681-721
Differences from Main Endpoint:
| Feature | Main Endpoint | Chain Endpoint |
|---|---|---|
| UTM Targeting | ✅ Yes | ❌ No |
| Chat History | ✅ Yes | ❌ No |
| RAG Retrieval | ✅ Yes | ✅ Yes |
| Use Case | Production | Testing/Simple Q&A |
RAG Implementation¶
Retrieval Function¶
Function: retrieve_relevant_documents() (Lines 516-555)
Purpose: Find top-5 most relevant content chunks from Milvus
Complete Flow:
Step 1: Generate Query Embedding
question_embedding = list(embedder.embed([question]))[0]
question_embedding_list = [float(x) for x in question_embedding]
Embedding Model:
- Name: BAAI/bge-small-en-v1.5
- Dimensions: 384
- Max Length: 512 tokens
- Same model as Data Crawling Service
Step 2: Search Milvus
search_results = milvus_embeddings.search_embeddings(
collection_name="embeddings",
query_vector=question_embedding_list,
user_id=user_id,
project_id=project_id,
top_k=5
)
Milvus Search:
- Collection:
embeddings - Partition:
{project_id} - Metric: Cosine similarity
- Filter:
user_id+project_id - Returns: Top-5 chunks with scores
Step 3: Format Results
top_chunks = []
for result in search_results:
top_chunks.append({
"content": result.get("text", ""),
"similarity": result.get("score", 0.0), # 0.0-1.0
"document_id": result.get("document_id", ""),
"chunk_index": result.get("chunk_index", 0),
"data_type": result.get("data_type", "") # "url", "pdf", "utm"
})
Example Search Result:
[
{
"content": "Our pricing starts at $29/month for the Basic plan, which includes 1 chatbot...",
"similarity": 0.87,
"document_id": "a1b2c3d4-e5f6-...",
"chunk_index": 3,
"data_type": "url"
},
{
"content": "The Pro plan at $99/month offers 5 chatbots, advanced analytics...",
"similarity": 0.82,
"document_id": "a1b2c3d4-e5f6-...",
"chunk_index": 7,
"data_type": "url"
},
...
]
UTM-Based Targeting¶
What is UTM Targeting?¶
UTM targeting enables personalized chatbot responses based on where users come from. Different traffic sources can have:
- Different content chunks
- Different instructions
- Different tone/messaging
Use Cases:
- Google Ads users → Focus on pricing and ROI
- Facebook users → Social proof and testimonials
- Product page visitors → Feature comparisons
- Pricing page visitors → Discount offers
UTM Config Structure¶
Collection: files (file_type="utm")
Document Example:
{
"_id": ObjectId("..."),
"user_id": "User-123456",
"project_id": "User-123456_Project_1",
"file_type": "utm",
"filename": "google_ads_pricing.json",
// Target URL (optional)
"target_url": "https://example.com/pricing",
// UTM parameters (optional)
"utm_config": {
"source": "google",
"medium": "cpc",
"campaign": "pricing_2025",
"content": "",
"term": ""
},
// Custom instructions to append to system prompt
"instructions": "Focus heavily on ROI and cost savings. User is price-sensitive. Emphasize value proposition.",
// Milvus embedding IDs for UTM-specific content
"milvus_embedding_ids": [123456, 123457, 123458],
"timestamp": ISODate("2025-01-15T14:00:00Z")
}
Matching Algorithm¶
Function: get_matching_utm_config() (Lines 401-462)
Scoring System:
| Match Type | Points | Description |
|---|---|---|
| Target URL match | +10 | Originating URL starts with config target_url |
| Each UTM param match | +2 | utm_source, utm_medium, utm_campaign, etc. |
| Target URL only | +5 | Config has URL but no UTM params (lower priority) |
Matching Logic:
def calculate_match_score(originating_url: str, config: Dict[str, Any]) -> float:
score = 0.0
# Extract components
url_params = extract_utm_parameters(originating_url)
config_params = config.get('utm_config', {})
config_target_url = config.get('target_url', '').strip()
# Check Target URL match
if config_target_url:
originating_base = extract_base_url(originating_url)
if originating_base and originating_base.startswith(config_target_url.rstrip('/')):
score += 10
# Check UTM parameters
if config_params and url_params:
for key in ['source', 'medium', 'campaign', 'content', 'term']:
config_value = config_params.get(key)
if config_value:
url_key = f'utm_{key}'
if url_params.get(url_key) == config_value:
score += 2
else:
# Mismatch - this config doesn't match
return -1
return score if score > 0 else -1
Example Matching:
Request URL:
Configs:
Config A:
{
"target_url": "https://example.com/pricing",
"utm_config": {"source": "google", "medium": "cpc"}
}
Score: 10 (URL) + 2 (source) + 2 (medium) = 14 points ← Winner!
Config B:
Score: 5 (URL only) = 5 points
Config C:
Score: -1 (URL doesn't match) = No match
UTM Content Retrieval¶
Function: retrieve_utm_content() (Lines 465-513)
Purpose: Retrieve chunks ONLY from UTM-specific content
Code:
# Get milvus_embedding_ids from UTM config
milvus_ids = utm_config.get("milvus_embedding_ids", [])
# Search only within these specific IDs
search_results = milvus_embeddings.search_embeddings(
collection_name="embeddings",
query_vector=question_embedding_list,
user_id=user_id,
project_id=project_id,
top_k=5,
milvus_ids=milvus_ids # ⭐ Filter to these IDs only
)
Benefit: Ensures responses use campaign-specific content!
Conversation History¶
History Storage¶
Collection: chatbot_history1
Document Structure:
{
"user_id": "User-123456",
"project_id": "User-123456_Project_1",
"session_id": "session_20250115_140530",
"datetime": "2025-01-15 14:05:30", // IST timezone
"session_total_tokens": 1523, // Cumulative for entire session
"chat_data": [
{
"input_prompt": "What are your pricing options?",
"output_response": "We offer three pricing tiers...",
"timestamp": "2025-01-15 14:05:30",
"input_tokens": 8,
"output_tokens": 52,
"total_tokens": 60
},
{
"input_prompt": "Tell me about the Pro plan",
"output_response": "The Pro plan at $99/month includes...",
"timestamp": "2025-01-15 14:05:45",
"input_tokens": 7,
"output_tokens": 48,
"total_tokens": 55
}
]
}
Token Counting¶
Function: count_tokens() (Lines 160-162)
Uses Tiktoken:
tokenizer = tiktoken.get_encoding("cl100k_base") # GPT-3.5/4 encoding
def count_tokens(text):
"""Counts tokens using OpenAI's tokenizer"""
return len(tokenizer.encode(text))
Why Token Counting Matters:
- Cost tracking - OpenAI charges per token
- Context limits - GPT-3.5-Turbo-16k has 16,000 token limit
- Performance monitoring - Large prompts = slow responses
Save History Function¶
Function: save_chat_history() (Lines 164-208)
Features:
- ✅ Computes token counts (input, output, total)
- ✅ Uses IST timezone
- ✅ Appends to existing session or creates new
- ✅ Increments session_total_tokens
Code:
# Compute tokens
input_tokens = count_tokens(input_prompt)
output_tokens = count_tokens(output_response)
total_tokens = input_tokens + output_tokens
chat_entry = {
"input_prompt": input_prompt,
"output_response": output_response,
"timestamp": current_time.strftime("%Y-%m-%d %H:%M:%S"),
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"total_tokens": total_tokens
}
# Update or insert
if existing_session:
history_collection.update_one(
{"user_id": user_id, "project_id": project_id, "session_id": session_id},
{
"$push": {"chat_data": chat_entry},
"$set": {"datetime": current_time},
"$inc": {"session_total_tokens": total_tokens} # Cumulative!
}
)
else:
history_collection.insert_one({
"user_id": user_id,
"project_id": project_id,
"session_id": session_id,
"datetime": current_time,
"session_total_tokens": total_tokens,
"chat_data": [chat_entry]
})
System Prompts¶
Three Agent Types¶
Defined at: Lines 57-155
1. Sales-Agent¶
Character: Highly skilled AI sales agent
Goal: Drive sales conversions
Tone: Warm, friendly, persuasive
Key Guidelines:
- Keep conversation flowing with questions
- Responses: 100-150 characters (concise!)
- Refer to old chat history
- Collect customer info via form
- Add open-ended questions for engagement
Example Response:
"We offer Basic ($29), Pro ($99), and Enterprise plans. Each includes different chatbot limits and features. Which tier interests you most?"
2. Service-Agent¶
Character: Specialized service agent
Goal: Efficient troubleshooting and support
Tone: Professional, empathetic, builds trust
Response Structure:
- Acknowledgment (1 sentence)
- Solution (step-by-step when applicable)
- Additional resources
- Call-to-action or follow-up
Guidelines:
- Responses: 170-200 characters
- Step-by-step format for solutions
- Reference official docs when helpful
- Politely redirect unrelated queries
Example Response:
"I can help with that! Navigate to Settings > Chatbot Configuration > Voice Settings. Select your preferred voice from the dropdown and click Save. Need help with anything else?"
3. Informational-Agent¶
Character: Informational agent
Goal: Provide accurate, concise information
Tone: Professional yet conversational
Response Structure:
- Direct answer (1-2 sentences)
- Concise explanation (200-300 characters)
- Next step or call-to-action
Guidelines:
- Include key features and pricing
- Clear policy summaries (no legal jargon)
- Never fabricate information
- Balanced comparisons
Example Response:
"Our chatbots support 10 languages including English, Spanish, French, German, and more. You can set the default language in your dashboard under Settings > Language. Want to see the full list?"
Security Analysis¶
Issues Found¶
1. ⚠️ Hardcoded Azure OpenAI API Key (Line 217)
subscription_key = os.getenv("AZURE_OPENAI_API_KEY",
"AZxDVMYB08AaUip0i5ed1sy73ZpUsqencYYxKDbm6nfWfG1AqPZ3JQQJ99BBACOGVUo7")
Issue: Hardcoded default API key in source code
2. ⚠️ Duplicate LLM Initialization (Lines 260-266 & 271-277)
Issue: LLM initialized twice with identical config
Impact: Wastes memory (minor)
3. ⚠️ Hardcoded System Prompt Selection (Line 572)
Issue: Doesn't use chatbot_purpose from chatbot_selections
Should Be:
chatbot_selection = chatbot_collection.find_one({
"user_id": user_id,
"project_id": project_id
})
purpose_map = {
"Sales Bot": "Sales-Agent",
"Service Bot": "Service-Agent",
"Custom Bot": "Informational-Agent"
}
purpose = chatbot_selection.get("chatbot_purpose", "Sales Bot")
system_prompt = system_prompts.get(purpose_map.get(purpose, "Sales-Agent"))
Performance¶
Response Time Breakdown¶
| Step | Avg Latency | Notes |
|---|---|---|
| 1. Embedding generation | 50-100ms | BAAI/bge-small-en-v1.5 on CPU |
| 2. UTM config matching | 10-30ms | MongoDB query + scoring |
| 3. Milvus search | 50-150ms | Depends on corpus size |
| 4. MongoDB history lookup | 20-50ms | Simple find query |
| 5. Azure OpenAI call | 1-3 seconds | GPT-3.5-Turbo-16k |
| 6. Token counting | 10-20ms | Tiktoken encoding |
| 7. MongoDB history save | 20-50ms | Update/insert operation |
| TOTAL | 2-5 seconds | User-facing latency |
Optimization Opportunities:
- ✅ Parallel Milvus + history retrieval (async)
- ✅ Cache common question embeddings
- ✅ Stream OpenAI responses (
llm.stream()) - ✅ Redis cache for recent Q&A pairs
- ✅ Batch token counting
Deployment¶
Docker Configuration¶
Dockerfile:
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy shared modules
COPY shared/ ./shared/
COPY src/ .
EXPOSE 8012
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8012"]
Requirements.txt¶
fastapi>=0.95.0
uvicorn[standard]>=0.22.0
pymongo>=4.3.3
python-multipart>=0.0.6
python-dotenv>=1.0.0
# LangChain & OpenAI
langchain>=0.1.0
langchain-openai>=0.0.5
langchain-core>=0.1.0
openai>=1.0.0
# Embeddings
fastembed>=0.1.0
# Utilities
tiktoken>=0.5.0
pytz>=2023.3
scikit-learn>=1.3.0
numpy>=1.24.0
# Monitoring
ddtrace>=1.19.0
Environment Variables¶
# Azure OpenAI
AZURE_OPENAI_API_KEY=<your-key>
ENDPOINT_URL=https://machineagentopenai.openai.azure.com/...
DEPLOYMENT_NAME=gpt-35-turbo-16k-0613
# MongoDB
MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_demo
# Milvus
MILVUS_HOST=localhost
MILVUS_PORT=19530
# DataDog
DD_SERVICE=response-text-chatbot-service
DD_ENV=production
Related Documentation¶
- Response 3D Chatbot Service - 3D chatbot responses (MOST CRITICAL)
- Response Voice Chatbot Service - Voice chatbot responses
- Data Crawling Service - Creates embeddings for RAG
Recommendations¶
Improvements¶
- Fix System Prompt Selection - Use chatbot_purpose from database
- Add Response Caching - Redis with hash of (user+project+question)
- Stream OpenAI Responses - Better UX with
llm.stream() - Add Fallback - If Milvus fails, use OpenAI without RAG
- Optimize Token Usage - Truncate context if exceeds 8000 tokens
Code Quality¶
- Remove Duplicate LLM Init
- Add Type Hints - Complete typing for all functions
- Extract Helper Functions - Message construction, context building
- Add Unit Tests - Test UTM matching, token counting, message construction
Last Updated: 2025-12-26
Code Version: response-text-chatbot-service/src/main.py (726 lines)
Total Endpoints: 2
Review Cycle: Monthly (Important Service)
"Intelligent conversations through retrieval-augmented generation."