Skip to content

Response Text Chatbot Service - Complete Developer Documentation

Service: LLM Response Generation for Text Chatbots (RAG Implementation)
Port: 8012
Purpose: Generate AI responses using retrieval-augmented generation (RAG) with Milvus vector search
Technology: FastAPI, LangChain, Azure OpenAI, Milvus, FastEmbed
Code Location: /response-text-chatbot-service/src/main.py (726 lines)
Owner: Backend Team
Last Updated: 2025-12-26


Table of Contents

  1. Service Overview
  2. Complete Architecture
  3. Complete Endpoints
  4. RAG Implementation
  5. UTM-Based Targeting
  6. Conversation History
  7. System Prompts
  8. Security Analysis
  9. Performance
  10. Deployment

Service Overview

The Response Text Chatbot Service implements a production-grade Retrieval-Augmented Generation (RAG) system for text-based chatbot interactions. This service is one of three response generation services, handling text-only chatbot conversations.

Key Responsibilities

RAG Pipeline - Semantic search + LLM generation
Milvus Vector Search - Find relevant content chunks
UTM-Based Targeting - Personalized responses by traffic source
Conversation History - Multi-turn conversations with context
Token Tracking - Monitor API usage per session
LLM Orchestration - Azure OpenAI GPT-3.5-Turbo-16k via LangChain

Technology Stack

Technology Purpose Specifications
Azure OpenAI LLM GPT-3.5-Turbo-16k (API 2024-02-15)
Milvus Vector database Cosine similarity search
FastEmbed Embeddings BAAI/bge-small-en-v1.5 (384D)
LangChain LLM framework Messages, chains, parsers
MongoDB Data storage chatbot_history1, files
Tiktoken Token counting cl100k_base encoding

Statistics

  • Total Lines: 726
  • Endpoints: 2 (main RAG + simple chain)
  • System Prompts: 3 (Sales-Agent, Service-Agent, Informational-Agent)
  • Default Top-K: 5 chunks
  • Average Response Time: 2-5 seconds
  • Token Limit: 16,000 tokens

Complete Architecture

End-to-End Data Flow

graph TB
    USER[\"User Question\"]

    subgraph \"Step 1: Context Retrieval\"
        EMBED[\"Generate Embedding<br/>(BAAI/bge-small-en-v1.5)\"]
        UTM[\"Match UTM Config<br/>(Scoring Algorithm)\"]
        MILVUS[\"Milvus Search<br/>(Top-5 Chunks)\"]
        UTM_CHUNKS[\"Retrieve UTM<br/>Chunks\"]
    end

    subgraph \"Step 2: History Retrieval\"
        MONGO[\"MongoDB Lookup<br/>(chatbot_history1)\"]
        PARSE[\"Parse Chat<br/>History\"]
    end

    subgraph \"Step 3: Prompt Assembly\"
        SYS_PROMPT[\"System Prompt<br/>(Sales/Service/Info)\"]
        UTM_INST[\"UTM Instructions<br/>(Append)\"]
        CONTEXT[\"Build Context<br/>(General + UTM)\"]
        MESSAGES[\"LangChain Messages<br/>(System + History + User)\"]
    end

    subgraph \"Step 4: LLM Generation\"
        OPENAI[\"Azure OpenAI<br/>GPT-3.5-Turbo-16k\"]
        RESPONSE[\"AI Response\"]
    end

    subgraph \"Step 5: Post-Processing\"
        TOKENS[\"Count Tokens<br/>(Tiktoken)\"]
        SAVE[\"Save to MongoDB<br/>(chatbot_history1)\"]
    end

    USER --> EMBED
    USER --> UTM

    EMBED --> MILVUS
    UTM --> UTM_CHUNKS

    MILVUS --> CONTEXT
    UTM_CHUNKS --> UTM_INST

    USER --> MONGO
    MONGO --> PARSE

    SYS_PROMPT --> MESSAGES
    UTM_INST --> MESSAGES
    CONTEXT --> MESSAGES
    PARSE --> MESSAGES

    MESSAGES --> OPENAI
    OPENAI --> RESPONSE

    RESPONSE --> TOKENS
    TOKENS --> SAVE

    SAVE --> USER

    style USER fill:#e1f5fe
    style OPENAI fill:#fff3e0
    style MILVUS fill:#f3e5f5
    style SAVE fill:#c8e6c9

Complete Endpoints

1. POST /v2/get-response-text-chatbot

Purpose: Generate AI response using full RAG pipeline with conversation history and UTM targeting

Code Location: Lines 560-677 (118 lines)

Request:

POST /v2/get-response-text-chatbot
Content-Type: multipart/form-data

user_id=User-123456
project_id=User-123456_Project_1
session_id=session_20250115_140530
question=What are your pricing options?
originating_url=https://example.com/pricing?utm_source=google&utm_medium=cpc

Parameters:

Parameter Required Description
user_id Yes User identifier
project_id Yes Project identifier
session_id Yes Session identifier for history tracking
question Yes User's question
originating_url Optional URL with UTM parameters for targeting

Response:

{
  "question": "What are your pricing options?",
  "text": "We offer three pricing tiers: Basic ($29/month), Pro ($99/month), and Enterprise (custom pricing). Each tier includes different features and chatbot limits. Would you like me to explain the differences in detail?"
}

Processing Steps

Step 1: System Prompt Selection (Line 572)

system_prompt = system_prompts["Sales-Agent"]  # Currently hardcoded

⚠️ Note: Currently hardcoded to "Sales-Agent" - doesn't use chatbot_purpose from database

Step 2: UTM Config Matching (Lines 574-603)

if originating_url:
    utm_config = get_matching_utm_config(originating_url, user_id, project_id)

    if utm_config:
        # Retrieve UTM-specific chunks from Milvus
        utm_chunks = retrieve_utm_content(
            utm_config_id=str(utm_config.get("_id")),
            user_id=user_id,
            project_id=project_id,
            question=question,
            top_k=5
        )

        utm_content = "\n".join([chunk["content"] for chunk in utm_chunks])

        # Append UTM-specific instructions to system prompt
        utm_instructions = utm_config.get("instructions", "")
        if utm_instructions:
            system_prompt = system_prompt + f"\n\n[UTM-Specific Instructions]:\n{utm_instructions}"

Step 3: General RAG Retrieval (Line 606)

results = retrieve_relevant_documents(user_id, project_id, question)
context = "\n".join([result["content"] for result in results[:5]])

Step 4: Chat History Retrieval (Lines 617-633)

chat_session = history_collection.find_one({
    "project_id": project_id,
    "user_id": user_id,
    "session_id": session_id
})

chat_history_text = ""
if chat_session:
    for msg in chat_session.get("chat_data", []):
        user_input = msg.get("input_prompt", "").strip()
        bot_response = msg.get("output_response", "").strip()
        if user_input and bot_response:
            chat_history_text += f"User: {user_input}\nAssistant: {bot_response}\n"

Step 5: LangChain Message Construction (Lines 635-661)

messages = []

# System message
messages.append(SystemMessage(content=system_prompt))

# Chat history messages
if chat_history_text:
    history_lines = chat_history_text.strip().split('\n')
    i = 0
    while i < len(history_lines):
        if history_lines[i].startswith("User: ") and history_lines[i+1].startswith("Assistant: "):
            messages.append(HumanMessage(content=history_lines[i].replace("User: ", "")))
            messages.append(AIMessage(content=history_lines[i+1].replace("Assistant: ", "")))
        i += 2

# Current question with context
context_parts = []
if utm_content:
    context_parts.append(f"[UTM-Specific Content]:\n{utm_content}")
context_parts.append(f"Context:\n{context}")

context_and_question = "\n\n".join(context_parts) + f"\n\nQuestion: {question}"
messages.append(HumanMessage(content=context_and_question))

Step 6: LLM Invocation (Lines 663-665)

response = llm.invoke(messages)
answer = response.content

Step 7: Save to History (Line 668)

save_chat_history(user_id, project_id, session_id, question, answer)

2. POST /v2/get-response-chain

Purpose: Simplified response generation using LangChain chains (no history, no UTM)

Code Location: Lines 681-721

Differences from Main Endpoint:

Feature Main Endpoint Chain Endpoint
UTM Targeting ✅ Yes ❌ No
Chat History ✅ Yes ❌ No
RAG Retrieval ✅ Yes ✅ Yes
Use Case Production Testing/Simple Q&A

RAG Implementation

Retrieval Function

Function: retrieve_relevant_documents() (Lines 516-555)

Purpose: Find top-5 most relevant content chunks from Milvus

Complete Flow:

Step 1: Generate Query Embedding

question_embedding = list(embedder.embed([question]))[0]
question_embedding_list = [float(x) for x in question_embedding]

Embedding Model:

  • Name: BAAI/bge-small-en-v1.5
  • Dimensions: 384
  • Max Length: 512 tokens
  • Same model as Data Crawling Service

Step 2: Search Milvus

search_results = milvus_embeddings.search_embeddings(
    collection_name="embeddings",
    query_vector=question_embedding_list,
    user_id=user_id,
    project_id=project_id,
    top_k=5
)

Milvus Search:

  • Collection: embeddings
  • Partition: {project_id}
  • Metric: Cosine similarity
  • Filter: user_id + project_id
  • Returns: Top-5 chunks with scores

Step 3: Format Results

top_chunks = []
for result in search_results:
    top_chunks.append({
        "content": result.get("text", ""),
        "similarity": result.get("score", 0.0),  # 0.0-1.0
        "document_id": result.get("document_id", ""),
        "chunk_index": result.get("chunk_index", 0),
        "data_type": result.get("data_type", "")  # "url", "pdf", "utm"
    })

Example Search Result:

[
    {
        "content": "Our pricing starts at $29/month for the Basic plan, which includes 1 chatbot...",
        "similarity": 0.87,
        "document_id": "a1b2c3d4-e5f6-...",
        "chunk_index": 3,
        "data_type": "url"
    },
    {
        "content": "The Pro plan at $99/month offers 5 chatbots, advanced analytics...",
        "similarity": 0.82,
        "document_id": "a1b2c3d4-e5f6-...",
        "chunk_index": 7,
        "data_type": "url"
    },
    ...
]

UTM-Based Targeting

What is UTM Targeting?

UTM targeting enables personalized chatbot responses based on where users come from. Different traffic sources can have:

  • Different content chunks
  • Different instructions
  • Different tone/messaging

Use Cases:

  • Google Ads users → Focus on pricing and ROI
  • Facebook users → Social proof and testimonials
  • Product page visitors → Feature comparisons
  • Pricing page visitors → Discount offers

UTM Config Structure

Collection: files (file_type="utm")

Document Example:

{
    "_id": ObjectId("..."),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",
    "file_type": "utm",
    "filename": "google_ads_pricing.json",

    // Target URL (optional)
    "target_url": "https://example.com/pricing",

    // UTM parameters (optional)
    "utm_config": {
        "source": "google",
        "medium": "cpc",
        "campaign": "pricing_2025",
        "content": "",
        "term": ""
    },

    // Custom instructions to append to system prompt
    "instructions": "Focus heavily on ROI and cost savings. User is price-sensitive. Emphasize value proposition.",

    // Milvus embedding IDs for UTM-specific content
    "milvus_embedding_ids": [123456, 123457, 123458],

    "timestamp": ISODate("2025-01-15T14:00:00Z")
}

Matching Algorithm

Function: get_matching_utm_config() (Lines 401-462)

Scoring System:

Match Type Points Description
Target URL match +10 Originating URL starts with config target_url
Each UTM param match +2 utm_source, utm_medium, utm_campaign, etc.
Target URL only +5 Config has URL but no UTM params (lower priority)

Matching Logic:

def calculate_match_score(originating_url: str, config: Dict[str, Any]) -> float:
    score = 0.0

    # Extract components
    url_params = extract_utm_parameters(originating_url)
    config_params = config.get('utm_config', {})
    config_target_url = config.get('target_url', '').strip()

    # Check Target URL match
    if config_target_url:
        originating_base = extract_base_url(originating_url)
        if originating_base and originating_base.startswith(config_target_url.rstrip('/')):
            score += 10

    # Check UTM parameters
    if config_params and url_params:
        for key in ['source', 'medium', 'campaign', 'content', 'term']:
            config_value = config_params.get(key)
            if config_value:
                url_key = f'utm_{key}'
                if url_params.get(url_key) == config_value:
                    score += 2
                else:
                    # Mismatch - this config doesn't match
                    return -1

    return score if score > 0 else -1

Example Matching:

Request URL:

https://example.com/pricing?utm_source=google&utm_medium=cpc&utm_campaign=spring_sale

Configs:

Config A:

{
    "target_url": "https://example.com/pricing",
    "utm_config": {"source": "google", "medium": "cpc"}
}

Score: 10 (URL) + 2 (source) + 2 (medium) = 14 pointsWinner!

Config B:

{
    "target_url": "https://example.com/pricing",
    "utm_config": {}
}

Score: 5 (URL only) = 5 points

Config C:

{
    "target_url": "https://example.com/features",
    "utm_config": {"source": "google"}
}

Score: -1 (URL doesn't match) = No match


UTM Content Retrieval

Function: retrieve_utm_content() (Lines 465-513)

Purpose: Retrieve chunks ONLY from UTM-specific content

Code:

# Get milvus_embedding_ids from UTM config
milvus_ids = utm_config.get("milvus_embedding_ids", [])

# Search only within these specific IDs
search_results = milvus_embeddings.search_embeddings(
    collection_name="embeddings",
    query_vector=question_embedding_list,
    user_id=user_id,
    project_id=project_id,
    top_k=5,
    milvus_ids=milvus_ids  # ⭐ Filter to these IDs only
)

Benefit: Ensures responses use campaign-specific content!


Conversation History

History Storage

Collection: chatbot_history1

Document Structure:

{
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",
    "session_id": "session_20250115_140530",
    "datetime": "2025-01-15 14:05:30",  // IST timezone
    "session_total_tokens": 1523,  // Cumulative for entire session
    "chat_data": [
        {
            "input_prompt": "What are your pricing options?",
            "output_response": "We offer three pricing tiers...",
            "timestamp": "2025-01-15 14:05:30",
            "input_tokens": 8,
            "output_tokens": 52,
            "total_tokens": 60
        },
        {
            "input_prompt": "Tell me about the Pro plan",
            "output_response": "The Pro plan at $99/month includes...",
            "timestamp": "2025-01-15 14:05:45",
            "input_tokens": 7,
            "output_tokens": 48,
            "total_tokens": 55
        }
    ]
}

Token Counting

Function: count_tokens() (Lines 160-162)

Uses Tiktoken:

tokenizer = tiktoken.get_encoding("cl100k_base")  # GPT-3.5/4 encoding

def count_tokens(text):
    """Counts tokens using OpenAI's tokenizer"""
    return len(tokenizer.encode(text))

Why Token Counting Matters:

  • Cost tracking - OpenAI charges per token
  • Context limits - GPT-3.5-Turbo-16k has 16,000 token limit
  • Performance monitoring - Large prompts = slow responses

Save History Function

Function: save_chat_history() (Lines 164-208)

Features:

  • ✅ Computes token counts (input, output, total)
  • ✅ Uses IST timezone
  • ✅ Appends to existing session or creates new
  • ✅ Increments session_total_tokens

Code:

# Compute tokens
input_tokens = count_tokens(input_prompt)
output_tokens = count_tokens(output_response)
total_tokens = input_tokens + output_tokens

chat_entry = {
    "input_prompt": input_prompt,
    "output_response": output_response,
    "timestamp": current_time.strftime("%Y-%m-%d %H:%M:%S"),
    "input_tokens": input_tokens,
    "output_tokens": output_tokens,
    "total_tokens": total_tokens
}

# Update or insert
if existing_session:
    history_collection.update_one(
        {"user_id": user_id, "project_id": project_id, "session_id": session_id},
        {
            "$push": {"chat_data": chat_entry},
            "$set": {"datetime": current_time},
            "$inc": {"session_total_tokens": total_tokens}  # Cumulative!
        }
    )
else:
    history_collection.insert_one({
        "user_id": user_id,
        "project_id": project_id,
        "session_id": session_id,
        "datetime": current_time,
        "session_total_tokens": total_tokens,
        "chat_data": [chat_entry]
    })

System Prompts

Three Agent Types

Defined at: Lines 57-155


1. Sales-Agent

Character: Highly skilled AI sales agent
Goal: Drive sales conversions
Tone: Warm, friendly, persuasive

Key Guidelines:

  • Keep conversation flowing with questions
  • Responses: 100-150 characters (concise!)
  • Refer to old chat history
  • Collect customer info via form
  • Add open-ended questions for engagement

Example Response:

"We offer Basic ($29), Pro ($99), and Enterprise plans. Each includes different chatbot limits and features. Which tier interests you most?"

2. Service-Agent

Character: Specialized service agent
Goal: Efficient troubleshooting and support
Tone: Professional, empathetic, builds trust

Response Structure:

  1. Acknowledgment (1 sentence)
  2. Solution (step-by-step when applicable)
  3. Additional resources
  4. Call-to-action or follow-up

Guidelines:

  • Responses: 170-200 characters
  • Step-by-step format for solutions
  • Reference official docs when helpful
  • Politely redirect unrelated queries

Example Response:

"I can help with that! Navigate to Settings > Chatbot Configuration > Voice Settings. Select your preferred voice from the dropdown and click Save. Need help with anything else?"

3. Informational-Agent

Character: Informational agent
Goal: Provide accurate, concise information
Tone: Professional yet conversational

Response Structure:

  1. Direct answer (1-2 sentences)
  2. Concise explanation (200-300 characters)
  3. Next step or call-to-action

Guidelines:

  • Include key features and pricing
  • Clear policy summaries (no legal jargon)
  • Never fabricate information
  • Balanced comparisons

Example Response:

"Our chatbots support 10 languages including English, Spanish, French, German, and more. You can set the default language in your dashboard under Settings > Language. Want to see the full list?"

Security Analysis

Issues Found

1. ⚠️ Hardcoded Azure OpenAI API Key (Line 217)

subscription_key = os.getenv("AZURE_OPENAI_API_KEY",
    "AZxDVMYB08AaUip0i5ed1sy73ZpUsqencYYxKDbm6nfWfG1AqPZ3JQQJ99BBACOGVUo7")

Issue: Hardcoded default API key in source code


2. ⚠️ Duplicate LLM Initialization (Lines 260-266 & 271-277)

Issue: LLM initialized twice with identical config

Impact: Wastes memory (minor)


3. ⚠️ Hardcoded System Prompt Selection (Line 572)

system_prompt = system_prompts["Sales-Agent"]  # Always Sales-Agent!

Issue: Doesn't use chatbot_purpose from chatbot_selections

Should Be:

chatbot_selection = chatbot_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})
purpose_map = {
    "Sales Bot": "Sales-Agent",
    "Service Bot": "Service-Agent",
    "Custom Bot": "Informational-Agent"
}
purpose = chatbot_selection.get("chatbot_purpose", "Sales Bot")
system_prompt = system_prompts.get(purpose_map.get(purpose, "Sales-Agent"))

Performance

Response Time Breakdown

Step Avg Latency Notes
1. Embedding generation 50-100ms BAAI/bge-small-en-v1.5 on CPU
2. UTM config matching 10-30ms MongoDB query + scoring
3. Milvus search 50-150ms Depends on corpus size
4. MongoDB history lookup 20-50ms Simple find query
5. Azure OpenAI call 1-3 seconds GPT-3.5-Turbo-16k
6. Token counting 10-20ms Tiktoken encoding
7. MongoDB history save 20-50ms Update/insert operation
TOTAL 2-5 seconds User-facing latency

Optimization Opportunities:

  1. ✅ Parallel Milvus + history retrieval (async)
  2. ✅ Cache common question embeddings
  3. ✅ Stream OpenAI responses (llm.stream())
  4. ✅ Redis cache for recent Q&A pairs
  5. ✅ Batch token counting

Deployment

Docker Configuration

Dockerfile:

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy shared modules
COPY shared/ ./shared/

COPY src/ .

EXPOSE 8012

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8012"]

Requirements.txt

fastapi>=0.95.0
uvicorn[standard]>=0.22.0
pymongo>=4.3.3
python-multipart>=0.0.6
python-dotenv>=1.0.0

# LangChain & OpenAI
langchain>=0.1.0
langchain-openai>=0.0.5
langchain-core>=0.1.0
openai>=1.0.0

# Embeddings
fastembed>=0.1.0

# Utilities
tiktoken>=0.5.0
pytz>=2023.3
scikit-learn>=1.3.0
numpy>=1.24.0

# Monitoring
ddtrace>=1.19.0

Environment Variables

# Azure OpenAI
AZURE_OPENAI_API_KEY=<your-key>
ENDPOINT_URL=https://machineagentopenai.openai.azure.com/...
DEPLOYMENT_NAME=gpt-35-turbo-16k-0613

# MongoDB
MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_demo

# Milvus
MILVUS_HOST=localhost
MILVUS_PORT=19530

# DataDog
DD_SERVICE=response-text-chatbot-service
DD_ENV=production


Recommendations

Improvements

  1. Fix System Prompt Selection - Use chatbot_purpose from database
  2. Add Response Caching - Redis with hash of (user+project+question)
  3. Stream OpenAI Responses - Better UX with llm.stream()
  4. Add Fallback - If Milvus fails, use OpenAI without RAG
  5. Optimize Token Usage - Truncate context if exceeds 8000 tokens

Code Quality

  1. Remove Duplicate LLM Init
  2. Add Type Hints - Complete typing for all functions
  3. Extract Helper Functions - Message construction, context building
  4. Add Unit Tests - Test UTM matching, token counting, message construction

Last Updated: 2025-12-26
Code Version: response-text-chatbot-service/src/main.py (726 lines)
Total Endpoints: 2
Review Cycle: Monthly (Important Service)


"Intelligent conversations through retrieval-augmented generation."