Response Text Chatbot Service - Complete Developer Documentation¶

Service: LLM Response Generation for Text Chatbots (RAG Implementation)
Port: 8012
Purpose: Generate AI responses using retrieval-augmented generation (RAG) with Milvus vector search
Technology: FastAPI, LangChain, Azure OpenAI, Milvus, FastEmbed
Code Location: /response-text-chatbot-service/src/main.py (726 lines)
Owner: Backend Team
Last Updated: 2025-12-26

Service Overview¶

The Response Text Chatbot Service implements a production-grade Retrieval-Augmented Generation (RAG) system for text-based chatbot interactions. This service is one of three response generation services, handling text-only chatbot conversations.

Key Responsibilities¶

✅ RAG Pipeline - Semantic search + LLM generation
✅ Milvus Vector Search - Find relevant content chunks
✅ UTM-Based Targeting - Personalized responses by traffic source
✅ Conversation History - Multi-turn conversations with context
✅ Token Tracking - Monitor API usage per session
✅ LLM Orchestration - Azure OpenAI GPT-3.5-Turbo-16k via LangChain

Technology Stack¶

Technology	Purpose	Specifications
Azure OpenAI	LLM	GPT-3.5-Turbo-16k (API 2024-02-15)
Milvus	Vector database	Cosine similarity search
FastEmbed	Embeddings	BAAI/bge-small-en-v1.5 (384D)
LangChain	LLM framework	Messages, chains, parsers
MongoDB	Data storage	chatbot_history1, files
Tiktoken	Token counting	cl100k_base encoding

Statistics¶

Total Lines: 726
Endpoints: 2 (main RAG + simple chain)
System Prompts: 3 (Sales-Agent, Service-Agent, Informational-Agent)
Default Top-K: 5 chunks
Average Response Time: 2-5 seconds
Token Limit: 16,000 tokens

Complete Architecture¶

End-to-End Data Flow¶

graph TB
    USER[\"User Question\"]

    subgraph \"Step 1: Context Retrieval\"
        EMBED[\"Generate Embedding<br/>(BAAI/bge-small-en-v1.5)\"]
        UTM[\"Match UTM Config<br/>(Scoring Algorithm)\"]
        MILVUS[\"Milvus Search<br/>(Top-5 Chunks)\"]
        UTM_CHUNKS[\"Retrieve UTM<br/>Chunks\"]
    end

    subgraph \"Step 2: History Retrieval\"
        MONGO[\"MongoDB Lookup<br/>(chatbot_history1)\"]
        PARSE[\"Parse Chat<br/>History\"]
    end

    subgraph \"Step 3: Prompt Assembly\"
        SYS_PROMPT[\"System Prompt<br/>(Sales/Service/Info)\"]
        UTM_INST[\"UTM Instructions<br/>(Append)\"]
        CONTEXT[\"Build Context<br/>(General + UTM)\"]
        MESSAGES[\"LangChain Messages<br/>(System + History + User)\"]
    end

    subgraph \"Step 4: LLM Generation\"
        OPENAI[\"Azure OpenAI<br/>GPT-3.5-Turbo-16k\"]
        RESPONSE[\"AI Response\"]
    end

    subgraph \"Step 5: Post-Processing\"
        TOKENS[\"Count Tokens<br/>(Tiktoken)\"]
        SAVE[\"Save to MongoDB<br/>(chatbot_history1)\"]
    end

    USER --> EMBED
    USER --> UTM

    EMBED --> MILVUS
    UTM --> UTM_CHUNKS

    MILVUS --> CONTEXT
    UTM_CHUNKS --> UTM_INST

    USER --> MONGO
    MONGO --> PARSE

    SYS_PROMPT --> MESSAGES
    UTM_INST --> MESSAGES
    CONTEXT --> MESSAGES
    PARSE --> MESSAGES

    MESSAGES --> OPENAI
    OPENAI --> RESPONSE

    RESPONSE --> TOKENS
    TOKENS --> SAVE

    SAVE --> USER

    style USER fill:#e1f5fe
    style OPENAI fill:#fff3e0
    style MILVUS fill:#f3e5f5
    style SAVE fill:#c8e6c9

Complete Endpoints¶

1. POST `/v2/get-response-text-chatbot`¶

Purpose: Generate AI response using full RAG pipeline with conversation history and UTM targeting

Code Location: Lines 560-677 (118 lines)

Request:

POST /v2/get-response-text-chatbot
Content-Type: multipart/form-data

user_id=User-123456
project_id=User-123456_Project_1
session_id=session_20250115_140530
question=What are your pricing options?
originating_url=https://example.com/pricing?utm_source=google&utm_medium=cpc

Parameters:

Parameter	Required	Description
user_id	Yes	User identifier
project_id	Yes	Project identifier
session_id	Yes	Session identifier for history tracking
question	Yes	User's question
originating_url	Optional	URL with UTM parameters for targeting

Response:

{
  "question": "What are your pricing options?",
  "text": "We offer three pricing tiers: Basic ($29/month), Pro ($99/month), and Enterprise (custom pricing). Each tier includes different features and chatbot limits. Would you like me to explain the differences in detail?"
}

Processing Steps¶

Step 1: System Prompt Selection (Line 572)

system_prompt = system_prompts["Sales-Agent"]  # Currently hardcoded

⚠️ Note: Currently hardcoded to "Sales-Agent" - doesn't use chatbot_purpose from database

Step 2: UTM Config Matching (Lines 574-603)

if originating_url:
    utm_config = get_matching_utm_config(originating_url, user_id, project_id)

    if utm_config:
        # Retrieve UTM-specific chunks from Milvus
        utm_chunks = retrieve_utm_content(
            utm_config_id=str(utm_config.get("_id")),
            user_id=user_id,
            project_id=project_id,
            question=question,
            top_k=5
        )

        utm_content = "\n".join([chunk["content"] for chunk in utm_chunks])

        # Append UTM-specific instructions to system prompt
        utm_instructions = utm_config.get("instructions", "")
        if utm_instructions:
            system_prompt = system_prompt + f"\n\n[UTM-Specific Instructions]:\n{utm_instructions}"

Step 3: General RAG Retrieval (Line 606)

results = retrieve_relevant_documents(user_id, project_id, question)
context = "\n".join([result["content"] for result in results[:5]])

Step 4: Chat History Retrieval (Lines 617-633)

chat_session = history_collection.find_one({
    "project_id": project_id,
    "user_id": user_id,
    "session_id": session_id
})

chat_history_text = ""
if chat_session:
    for msg in chat_session.get("chat_data", []):
        user_input = msg.get("input_prompt", "").strip()
        bot_response = msg.get("output_response", "").strip()
        if user_input and bot_response:
            chat_history_text += f"User: {user_input}\nAssistant: {bot_response}\n"

Step 5: LangChain Message Construction (Lines 635-661)

messages = []

# System message
messages.append(SystemMessage(content=system_prompt))

# Chat history messages
if chat_history_text:
    history_lines = chat_history_text.strip().split('\n')
    i = 0
    while i < len(history_lines):
        if history_lines[i].startswith("User: ") and history_lines[i+1].startswith("Assistant: "):
            messages.append(HumanMessage(content=history_lines[i].replace("User: ", "")))
            messages.append(AIMessage(content=history_lines[i+1].replace("Assistant: ", "")))
        i += 2

# Current question with context
context_parts = []
if utm_content:
    context_parts.append(f"[UTM-Specific Content]:\n{utm_content}")
context_parts.append(f"Context:\n{context}")

context_and_question = "\n\n".join(context_parts) + f"\n\nQuestion: {question}"
messages.append(HumanMessage(content=context_and_question))

Step 6: LLM Invocation (Lines 663-665)

response = llm.invoke(messages)
answer = response.content

Step 7: Save to History (Line 668)

save_chat_history(user_id, project_id, session_id, question, answer)

2. POST `/v2/get-response-chain`¶

Purpose: Simplified response generation using LangChain chains (no history, no UTM)

Code Location: Lines 681-721

Differences from Main Endpoint:

Feature	Main Endpoint	Chain Endpoint
UTM Targeting	✅ Yes	❌ No
Chat History	✅ Yes	❌ No
RAG Retrieval	✅ Yes	✅ Yes
Use Case	Production	Testing/Simple Q&A

RAG Implementation¶

Retrieval Function¶

Function: retrieve_relevant_documents() (Lines 516-555)

Purpose: Find top-5 most relevant content chunks from Milvus

Complete Flow:

Step 1: Generate Query Embedding

question_embedding = list(embedder.embed([question]))[0]
question_embedding_list = [float(x) for x in question_embedding]

Embedding Model:

Name: BAAI/bge-small-en-v1.5
Dimensions: 384
Max Length: 512 tokens
Same model as Data Crawling Service

Step 2: Search Milvus

search_results = milvus_embeddings.search_embeddings(
    collection_name="embeddings",
    query_vector=question_embedding_list,
    user_id=user_id,
    project_id=project_id,
    top_k=5
)

Milvus Search:

Collection: embeddings
Partition: {project_id}
Metric: Cosine similarity
Filter: user_id + project_id
Returns: Top-5 chunks with scores

Step 3: Format Results

top_chunks = []
for result in search_results:
    top_chunks.append({
        "content": result.get("text", ""),
        "similarity": result.get("score", 0.0),  # 0.0-1.0
        "document_id": result.get("document_id", ""),
        "chunk_index": result.get("chunk_index", 0),
        "data_type": result.get("data_type", "")  # "url", "pdf", "utm"
    })

Example Search Result:

[
    {
        "content": "Our pricing starts at $29/month for the Basic plan, which includes 1 chatbot...",
        "similarity": 0.87,
        "document_id": "a1b2c3d4-e5f6-...",
        "chunk_index": 3,
        "data_type": "url"
    },
    {
        "content": "The Pro plan at $99/month offers 5 chatbots, advanced analytics...",
        "similarity": 0.82,
        "document_id": "a1b2c3d4-e5f6-...",
        "chunk_index": 7,
        "data_type": "url"
    },
    ...
]

UTM-Based Targeting¶

What is UTM Targeting?¶

UTM targeting enables personalized chatbot responses based on where users come from. Different traffic sources can have:

Different content chunks
Different instructions
Different tone/messaging

Use Cases:

Google Ads users → Focus on pricing and ROI
Facebook users → Social proof and testimonials
Product page visitors → Feature comparisons
Pricing page visitors → Discount offers

UTM Config Structure¶

Collection: files (file_type="utm")

Document Example:

{
    "_id": ObjectId("..."),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",
    "file_type": "utm",
    "filename": "google_ads_pricing.json",

    // Target URL (optional)
    "target_url": "https://example.com/pricing",

    // UTM parameters (optional)
    "utm_config": {
        "source": "google",
        "medium": "cpc",
        "campaign": "pricing_2025",
        "content": "",
        "term": ""
    },

    // Custom instructions to append to system prompt
    "instructions": "Focus heavily on ROI and cost savings. User is price-sensitive. Emphasize value proposition.",

    // Milvus embedding IDs for UTM-specific content
    "milvus_embedding_ids": [123456, 123457, 123458],

    "timestamp": ISODate("2025-01-15T14:00:00Z")
}

Matching Algorithm¶

Function: get_matching_utm_config() (Lines 401-462)

Scoring System:

Match Type	Points	Description
Target URL match	+10	Originating URL starts with config target_url
Each UTM param match	+2	utm_source, utm_medium, utm_campaign, etc.
Target URL only	+5	Config has URL but no UTM params (lower priority)

Matching Logic:

def calculate_match_score(originating_url: str, config: Dict[str, Any]) -> float:
    score = 0.0

    # Extract components
    url_params = extract_utm_parameters(originating_url)
    config_params = config.get('utm_config', {})
    config_target_url = config.get('target_url', '').strip()

    # Check Target URL match
    if config_target_url:
        originating_base = extract_base_url(originating_url)
        if originating_base and originating_base.startswith(config_target_url.rstrip('/')):
            score += 10

    # Check UTM parameters
    if config_params and url_params:
        for key in ['source', 'medium', 'campaign', 'content', 'term']:
            config_value = config_params.get(key)
            if config_value:
                url_key = f'utm_{key}'
                if url_params.get(url_key) == config_value:
                    score += 2
                else:
                    # Mismatch - this config doesn't match
                    return -1

    return score if score > 0 else -1

Example Matching:

Request URL:

https://example.com/pricing?utm_source=google&utm_medium=cpc&utm_campaign=spring_sale

Configs:

Config A:

{
    "target_url": "https://example.com/pricing",
    "utm_config": {"source": "google", "medium": "cpc"}
}

Score: 10 (URL) + 2 (source) + 2 (medium) = 14 points ← Winner!

Config B:

{
    "target_url": "https://example.com/pricing",
    "utm_config": {}
}

Score: 5 (URL only) = 5 points

Config C:

{
    "target_url": "https://example.com/features",
    "utm_config": {"source": "google"}
}

Score: -1 (URL doesn't match) = No match

UTM Content Retrieval¶

Function: retrieve_utm_content() (Lines 465-513)

Purpose: Retrieve chunks ONLY from UTM-specific content

Code:

# Get milvus_embedding_ids from UTM config
milvus_ids = utm_config.get("milvus_embedding_ids", [])

# Search only within these specific IDs
search_results = milvus_embeddings.search_embeddings(
    collection_name="embeddings",
    query_vector=question_embedding_list,
    user_id=user_id,
    project_id=project_id,
    top_k=5,
    milvus_ids=milvus_ids  # ⭐ Filter to these IDs only
)

Benefit: Ensures responses use campaign-specific content!

Conversation History¶

History Storage¶

Collection: chatbot_history1

Document Structure:

{
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",
    "session_id": "session_20250115_140530",
    "datetime": "2025-01-15 14:05:30",  // IST timezone
    "session_total_tokens": 1523,  // Cumulative for entire session
    "chat_data": [
        {
            "input_prompt": "What are your pricing options?",
            "output_response": "We offer three pricing tiers...",
            "timestamp": "2025-01-15 14:05:30",
            "input_tokens": 8,
            "output_tokens": 52,
            "total_tokens": 60
        },
        {
            "input_prompt": "Tell me about the Pro plan",
            "output_response": "The Pro plan at $99/month includes...",
            "timestamp": "2025-01-15 14:05:45",
            "input_tokens": 7,
            "output_tokens": 48,
            "total_tokens": 55
        }
    ]
}

Token Counting¶

Function: count_tokens() (Lines 160-162)

Uses Tiktoken:

tokenizer = tiktoken.get_encoding("cl100k_base")  # GPT-3.5/4 encoding

def count_tokens(text):
    """Counts tokens using OpenAI's tokenizer"""
    return len(tokenizer.encode(text))

Why Token Counting Matters:

Cost tracking - OpenAI charges per token
Context limits - GPT-3.5-Turbo-16k has 16,000 token limit
Performance monitoring - Large prompts = slow responses

Save History Function¶

Function: save_chat_history() (Lines 164-208)

Features:

✅ Computes token counts (input, output, total)
✅ Uses IST timezone
✅ Appends to existing session or creates new
✅ Increments session_total_tokens

Code:

# Compute tokens
input_tokens = count_tokens(input_prompt)
output_tokens = count_tokens(output_response)
total_tokens = input_tokens + output_tokens

chat_entry = {
    "input_prompt": input_prompt,
    "output_response": output_response,
    "timestamp": current_time.strftime("%Y-%m-%d %H:%M:%S"),
    "input_tokens": input_tokens,
    "output_tokens": output_tokens,
    "total_tokens": total_tokens
}

# Update or insert
if existing_session:
    history_collection.update_one(
        {"user_id": user_id, "project_id": project_id, "session_id": session_id},
        {
            "$push": {"chat_data": chat_entry},
            "$set": {"datetime": current_time},
            "$inc": {"session_total_tokens": total_tokens}  # Cumulative!
        }
    )
else:
    history_collection.insert_one({
        "user_id": user_id,
        "project_id": project_id,
        "session_id": session_id,
        "datetime": current_time,
        "session_total_tokens": total_tokens,
        "chat_data": [chat_entry]
    })

System Prompts¶

Three Agent Types¶

Defined at: Lines 57-155

1. Sales-Agent¶

Character: Highly skilled AI sales agent
Goal: Drive sales conversions
Tone: Warm, friendly, persuasive

Key Guidelines:

Keep conversation flowing with questions
Responses: 100-150 characters (concise!)
Refer to old chat history
Collect customer info via form
Add open-ended questions for engagement

Example Response:

"We offer Basic ($29), Pro ($99), and Enterprise plans. Each includes different chatbot limits and features. Which tier interests you most?"

2. Service-Agent¶

Character: Specialized service agent
Goal: Efficient troubleshooting and support
Tone: Professional, empathetic, builds trust

Response Structure:

Acknowledgment (1 sentence)
Solution (step-by-step when applicable)
Additional resources
Call-to-action or follow-up

Guidelines:

Responses: 170-200 characters
Step-by-step format for solutions
Reference official docs when helpful
Politely redirect unrelated queries

Example Response:

"I can help with that! Navigate to Settings > Chatbot Configuration > Voice Settings. Select your preferred voice from the dropdown and click Save. Need help with anything else?"

3. Informational-Agent¶

Character: Informational agent
Goal: Provide accurate, concise information
Tone: Professional yet conversational

Response Structure:

Direct answer (1-2 sentences)
Concise explanation (200-300 characters)
Next step or call-to-action

Guidelines:

Include key features and pricing
Clear policy summaries (no legal jargon)
Never fabricate information
Balanced comparisons

Example Response:

"Our chatbots support 10 languages including English, Spanish, French, German, and more. You can set the default language in your dashboard under Settings > Language. Want to see the full list?"

Security Analysis¶

Issues Found¶

1. ⚠️ Hardcoded Azure OpenAI API Key (Line 217)

subscription_key = os.getenv("AZURE_OPENAI_API_KEY",
    "AZxDVMYB08AaUip0i5ed1sy73ZpUsqencYYxKDbm6nfWfG1AqPZ3JQQJ99BBACOGVUo7")

Issue: Hardcoded default API key in source code

2. ⚠️ Duplicate LLM Initialization (Lines 260-266 & 271-277)

Issue: LLM initialized twice with identical config

Impact: Wastes memory (minor)

3. ⚠️ Hardcoded System Prompt Selection (Line 572)

system_prompt = system_prompts["Sales-Agent"]  # Always Sales-Agent!

Issue: Doesn't use chatbot_purpose from chatbot_selections

Should Be:

chatbot_selection = chatbot_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})
purpose_map = {
    "Sales Bot": "Sales-Agent",
    "Service Bot": "Service-Agent",
    "Custom Bot": "Informational-Agent"
}
purpose = chatbot_selection.get("chatbot_purpose", "Sales Bot")
system_prompt = system_prompts.get(purpose_map.get(purpose, "Sales-Agent"))

Performance¶

Response Time Breakdown¶

Step	Avg Latency	Notes
1. Embedding generation	50-100ms	BAAI/bge-small-en-v1.5 on CPU
2. UTM config matching	10-30ms	MongoDB query + scoring
3. Milvus search	50-150ms	Depends on corpus size
4. MongoDB history lookup	20-50ms	Simple find query
5. Azure OpenAI call	1-3 seconds	GPT-3.5-Turbo-16k
6. Token counting	10-20ms	Tiktoken encoding
7. MongoDB history save	20-50ms	Update/insert operation
TOTAL	2-5 seconds	User-facing latency

Optimization Opportunities:

✅ Parallel Milvus + history retrieval (async)
✅ Cache common question embeddings
✅ Stream OpenAI responses (llm.stream())
✅ Redis cache for recent Q&A pairs
✅ Batch token counting

Deployment¶

Docker Configuration¶

Dockerfile:

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy shared modules
COPY shared/ ./shared/

COPY src/ .

EXPOSE 8012

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8012"]

Requirements.txt¶

fastapi>=0.95.0
uvicorn[standard]>=0.22.0
pymongo>=4.3.3
python-multipart>=0.0.6
python-dotenv>=1.0.0

# LangChain & OpenAI
langchain>=0.1.0
langchain-openai>=0.0.5
langchain-core>=0.1.0
openai>=1.0.0

# Embeddings
fastembed>=0.1.0

# Utilities
tiktoken>=0.5.0
pytz>=2023.3
scikit-learn>=1.3.0
numpy>=1.24.0

# Monitoring
ddtrace>=1.19.0

Environment Variables¶

# Azure OpenAI
AZURE_OPENAI_API_KEY=<your-key>
ENDPOINT_URL=https://machineagentopenai.openai.azure.com/...
DEPLOYMENT_NAME=gpt-35-turbo-16k-0613

# MongoDB
MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_demo

# Milvus
MILVUS_HOST=localhost
MILVUS_PORT=19530

# DataDog
DD_SERVICE=response-text-chatbot-service
DD_ENV=production

Response 3D Chatbot Service - 3D chatbot responses (MOST CRITICAL)
Response Voice Chatbot Service - Voice chatbot responses
Data Crawling Service - Creates embeddings for RAG

Recommendations¶

Improvements¶

Fix System Prompt Selection - Use chatbot_purpose from database
Add Response Caching - Redis with hash of (user+project+question)
Stream OpenAI Responses - Better UX with llm.stream()
Add Fallback - If Milvus fails, use OpenAI without RAG
Optimize Token Usage - Truncate context if exceeds 8000 tokens

Code Quality¶

Remove Duplicate LLM Init
Add Type Hints - Complete typing for all functions
Extract Helper Functions - Message construction, context building
Add Unit Tests - Test UTM matching, token counting, message construction

Last Updated: 2025-12-26
Code Version: response-text-chatbot-service/src/main.py (726 lines)
Total Endpoints: 2
Review Cycle: Monthly (Important Service)

"Intelligent conversations through retrieval-augmented generation."

Response Text Chatbot Service - Complete Developer Documentation¶

Table of Contents¶

Service Overview¶

Key Responsibilities¶

Technology Stack¶

Statistics¶

Complete Architecture¶

End-to-End Data Flow¶

Complete Endpoints¶

1. POST /v2/get-response-text-chatbot¶

Processing Steps¶

2. POST /v2/get-response-chain¶

RAG Implementation¶

Retrieval Function¶

UTM-Based Targeting¶

What is UTM Targeting?¶

UTM Config Structure¶

Matching Algorithm¶

UTM Content Retrieval¶

Conversation History¶

History Storage¶

Token Counting¶

Save History Function¶

System Prompts¶

Three Agent Types¶

1. Sales-Agent¶

2. Service-Agent¶

3. Informational-Agent¶

Security Analysis¶

Issues Found¶

Performance¶

Response Time Breakdown¶

Deployment¶

Docker Configuration¶

Requirements.txt¶

Environment Variables¶

Related Documentation¶

Recommendations¶

Improvements¶

Code Quality¶

1. POST `/v2/get-response-text-chatbot`¶

2. POST `/v2/get-response-chain`¶