Skip to content

Remote Physio Service (Port 8018)

Service Path: machineagents-be/remote-physio-service/
Port: 8018
Total Lines: 1,602
Purpose: ⚠️ CLIENT-SPECIFIC HARDCODED SERVICE for "Remote Physios" - Specialized physiotherapy chatbot with multi-stage consultation flow, bilingual support (English/Hindi), RAG-based exercise/assessment recommendations, and clinical summary generation.


[!CAUTION] > ARCHITECTURAL ISSUE: CLIENT-SPECIFIC CODE IN MAIN PRODUCT

This is a HARDCODED client-specific service embedded in the main MachineAgents product backend. This violates separation of concerns and should be refactored into a configurable multi-tenant system.

Problems:

  • Client name "Remote Physios" hardcoded throughout
  • Specific API endpoints hardcoded (rp-api.anubhaanant.com, `api.remoteph

ysios.com`)

  • Specialized medical conversation flow not reusable for other chatbots
  • Cannot be disabled for non-Remote Physio deployments
  • Mixes product code with client-specific logic

Impact: Every deployment includes Remote Physios code even if not used


Table of Contents

  1. Service Overview
  2. Hardcoded Client-Specific Configuration
  3. Architecture & Dependencies
  4. 8-Stage State Machine
  5. Session State Management
  6. Database Collections
  7. RAG System (Hybrid BM25 + Vector)
  8. API Integration Points
  9. Main Conversation Flow
  10. Language Support (English/Hindi)
  11. User Profile Management
  12. Medical Context Extraction
  13. Clinical Summary Generation
  14. TTS & Lip-Sync Pipeline
  15. API Endpoints
  16. Security Analysis
  17. Refactoring Recommendations

Service Overview

Primary Responsibility

Remote Physios Physiotherapy Chatbot: Conducts virtual physiotherapy consultations through an 8-stage conversation flow, collects patient information, generates clinical summaries, and recommends exercises and assessments using RAG.

Client-Specific Purpose

This service is built exclusively for "Remote Physios", a specific client offering online physiotherapy consultations. The entire service is tailored to their workflow:

  1. Language Selection (English or Hindi/Hinglish)
  2. Patient Onboarding (Name, Age, Weight)
  3. Problem Assessment (Root cause analysis)
  4. Clinical Summary Generation (Medical documentation)
  5. Exercise/Assessment Recommendations (via RAG from knowledge base)
  6. Follow-up Consultation
  7. Inactivity Handling (5-minute timeout)
  8. New Concern Flow (Restart consultation)

Key Features

  1. 8-Stage State Machine - Complex conversation flow management
  2. Bilingual Support - English + Hindi (Hinglish - Hindi in English script)
  3. Hybrid RAG - BM25 + Vector search for exercises/assessments
  4. Clinical Summary Generation - Automated medical documentation
  5. User Profile Persistence - Name/Age/Weight stored permanently
  6. Session Inactivity Detection - 5-minute timeout with return flow
  7. External API Integration - Remote Physios API for prompts & data storage
  8. Medical Context Extraction - Prevents redundant questions
  9. Calendly Integration - Hardcoded consultation booking link

Hardcoded Client-Specific Configuration

1. Client Name References

Lines with "Remote Physios" hardcoded:

# Line 1037
answer = f"""Hello {state.user_name}! Welcome back to Remote Physios."""

# Line 1045
answer = """Hello! Welcome back to Remote Physios."""

# Line 1113
answer = """If you need further assistance or have any questions, you can contact Remote Physios:"""

# Line 1150
answer = f"""Great, {state.user_name}! I'm happy to help with your new concern."""

# Line 1290
"Hello {state.user_name}! Welcome to Remote Physios."

# Line 1297
"Hello! Welcome to Remote Physios."

# Line 1397
"Hello {state.user_name}! Welcome to Remote Physios."

# Line 1407
"Hello! Welcome to Remote Physios."

Total Occurrences: 15+ hardcoded references to "Remote Physios" brand name

2. Hardcoded API Endpoints

Lines 111-112:

DEFAULT_API_BASE = "https://rp-api.anubhaanant.com/api/v1"
REMOTEPHYSIOS_API_BASE = "https://api.remotephysios.com/api/v1"

API Selection Logic (Lines 522, 550, 764):

# Different API based on avtarType
if avtarType == "User-181473_Project_15":
    base_url = REMOTEPHYSIOS_API_BASE
else:
    base_url = DEFAULT_API_BASE

Problem: User-181473_Project_15 is a magic string identifying Remote Physios client

Lines 587-588, 1107, 1114:

if "calendly.com/remotephysios/free-consultation" in url:
    return "calendly link for free consultation"

# In responses
📅 Free Consultation: https://calendly.com/remotephysios/free-consultation

Impact: Non-Remote Physios users will see Remote Physios' booking link

4. Hardcoded Prompts Structure

Prompt IDs (Lines 818-819, 923-924):

# Hardcoded prompt IDs for Remote Physios
fetch_prompt_from_api(1, avtarType)  # Main consultation prompt
fetch_prompt_from_api(2, avtarType)  # Assessment specialist
fetch_prompt_from_api(3, avtarType)  # Exercise specialist
fetch_prompt_from_api(4, avtarType)  # Follow-up prompt

Problem: Prompt IDs 1, 2, 3, 4 are assumed to exist in Remote Physios API

5. Hardcoded Conversation Flow

Entire 8-stage state machine is physio-specific:

class SessionState(BaseModel):
    stage: str = "language_selection"  # Hardcoded stages
    # Stages: language_selection → root_problem → follow_up

Medical-Specific Logic:

  • Pain location detection (Lines 282-299)
  • Duration/intensity extraction (Lines 309-340)
  • Symptom detection (Lines 353-362)
  • Clinical summary indicators (Lines 200-208)

Architecture & Dependencies

Technology Stack

Framework:

  • FastAPI (web framework)
  • Uvicorn (ASGI server)

AI/ML:

  • Azure OpenAI GPT-4 (conversation, summaries)
  • LangChain (message handling, streaming)
  • Hybrid RAG (BM25 + Vector search)
  • Custom HybridRetriever for assessments & exercises

TTS & Voice:

  • Azure Cognitive Services Speech SDK
  • 2 voices: English (en-IN-AartiNeural), Hindi (hi-IN-AartiNeural)
  • Regional: centralindia

Lip-Sync:

  • Rhubarb Lip-Sync 1.13.0
  • FFmpeg for audio conversion

Storage:

  • MongoDB (CosmosDB) - Chat history, user profiles
  • RAG Artifacts (BM25 + FAISS indices)

External APIs:

  • Remote Physios API (rp-api.anubhaanant.com)
  • RemotePhysios.com API (api.remotephysios.com)

Key Imports

from langchain_openai import AzureChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from physio.assessments.hybrid_retriever import HybridRetriever as AssessmentRetriever
from physio.exercises.hybrid_retriever import HybridRetriever as ExerciseRetriever
from cachetools import LRUCache  # Prompt caching
from asyncio import Semaphore  # TTS rate limiting
import tiktoken  # Token counting

Environment Variables

Azure OpenAI:

AZURE_OPENAI_API_KEY=0d9d78cabb4c4e22a5b4a6ef53253155  # ⚠️ HARDCODED (Lines 99, 708, 744)

Azure TTS:

# No environment variable - HARDCODED in code
subscription="9N41NOfDyVDoduiD4EjlzmZU9CbUX3pPqWfLCORpl7cBf0l2lzVQJQQJ99BCACGhslBXJ3w3AAAYACOG2329"  # Line 650
region="centralindia"

MongoDB:

MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_dev

Remote Physios APIs:

DEFAULT_API_BASE=https://rp-api.anubhaanant.com/api/v1  # Hardcoded (Line 111)
REMOTEPHYSIOS_API_BASE=https://api.remotephysios.com/api/v1  # Hardcoded (Line 112)

RAG Directory Structure

Artifacts Paths:

ASSESSMENT_ARTIFACTS_PATH = "physio/assessments/rag_artifacts"
EXERCISE_ARTIFACTS_PATH = "physio/exercises/rag_artifacts"

Contents:

  • bm25_index.pkl - BM25 index for keyword search
  • embeddings.npy - Dense embeddings (likely FAISS)
  • metadata.json - Document metadata
  • documents/ - Raw assessment/exercise documents

8-Stage State Machine

Stage Diagram

┌──────────────────────────────────────────────────────────────────────┐
│ STAGE 1: language_selection                                          │
│ - Ask: "English or Hindi?"                                           │
│ - Detect language preference                                          │
│ - Transition → root_problem                                           │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ STAGE 2: root_problem (Onboarding + Problem Assessment)              │
│ - Collect: Name, Age, Weight (if not stored)                         │
│ - Extract: Pain location, duration, intensity, triggers              │
│ - Extract: Initial problem description from first message            │
│ - Conduct: Medical history questions                                 │
│ - Detect: Clinical summary in GPT response                           │
│ - Transition → follow_up (when summary generated)                    │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ STAGE 3: follow_up (Post-Summary Interaction)                        │
│ - Display: Clinical summary to user                                  │
│ - Background: RAG pipeline (assessments + exercises)                 │
│ - Answer: Questions about summary/recommendations                    │
│ - Detect: Acknowledgment (ok, thanks, etc.)                          │
│ - Transition → waiting_for_new_concern_decision                      │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ STAGE 4: waiting_for_new_concern_decision                            │
│ - Show: Calendly link for consultation                               │
│ - Ask: "Would you like to discuss a new concern?"                    │
│ - Detect: Yes → STAGE 1, No → END                                    │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ SPECIAL STAGE: user_returned_after_inactivity                        │
│ (Triggered if >5 minutes since last message)                         │
│ - Ask: Language preference again                                     │
│ - Show: Previous conversation preview                                │
│ - Ask: "Continue previous or start new?"                             │
│ - Transition based on user choice                                    │
└──────────────────────────────────────────────────────────────────────┘

Stage Transitions

State Transition Logic (Lines 862-883):

async def handle_state_transitions(state, user_input, raw_answer, ...):
    # Detect clinical summary in GPT response
    if is_clinical_summary(raw_answer):
        logger.info(f"✅ Clinical summary detected")
        state.summary = raw_answer.strip()
        state.conversation_has_summary = True
        state.last_completed_summary = raw_answer.strip()
        state.last_summary_timestamp = datetime.now()
        state.summary_shown = True
        state.stage = "follow_up"  # TRANSITION
        state.last_transition_time = datetime.now()

        # Start RAG pipeline in background
        asyncio.create_task(run_rag_and_api_pipeline_in_background(...))
        return raw_answer

    return raw_answer

Clinical Summary Indicators (Lines 200-208):

def is_clinical_summary(text: str) -> bool:
    indicators = [
        'clinical understanding:', 'chief complaint:', 'provisional diagnosis:',
        'clinical summary:', 'mukhya shikayat:', 'dard ki tivrata:',
        'praarambhik nidan:', 'clinical assessment'
    ]
    return any(indicator in text.lower() for indicator in indicators)

Session State Management

SessionState Model (Lines 152-192)

23 State Fields:

class SessionState(BaseModel):
    # === CORE STATE ===
    stage: str = "language_selection"  # Current stage
    summary: Optional[str] = None  # Clinical summary
    assessments: Optional[Any] = None  # RAG assessments
    exercises: Optional[Any] = None  # RAG exercises
    last_transition_time: Optional[datetime] = None

    # === USER PROFILE (PERMANENT - NEVER RESET) ===
    user_name: Optional[str] = None
    user_age: Optional[int] = None
    user_weight: Optional[float] = None
    profile_loaded: bool = False
    name_permanently_stored: bool = False

    # === INITIAL PROBLEM TRACKING ===
    initial_problem_description: Optional[str] = None  # First message if contains problem
    problem_already_described: bool = False

    # === SESSION TRACKING ===
    current_language: Optional[str] = None  # "english" or "hindi"
    language_asked_this_session: bool = False
    conversation_has_summary: bool = False

    # === CONVERSATION TRACKING ===
    last_user_message_time: Optional[datetime] = None
    conversation_started_at: Optional[datetime] = None
    last_history_length: int = 0

    # === USER RETURN TRACKING (5-min timeout) ===
    user_returned_after_inactivity: bool = False
    asked_continue_or_new: bool = False
    previous_conversation_summary: Optional[str] = None

    # === POST-SUMMARY TRACKING ===
    summary_shown: bool = False
    waiting_for_new_concern_decision: bool = False

    # === LAST COMPLETED SUMMARY ===
    last_completed_summary: Optional[str] = None
    last_summary_timestamp: Optional[datetime] = None

Session Inactivity Detection (Lines 900-911)

5-Minute Timeout:

SESSION_INACTIVITY_TIMEOUT = 300  # seconds (5 minutes)

if state.last_user_message_time:
    time_since_last_message = (now - state.last_user_message_time).total_seconds()

    if time_since_last_message > SESSION_INACTIVITY_TIMEOUT and current_history_length > 4:
        if not state.user_returned_after_inactivity:
            logger.info(f"🔄 USER RETURNED after {time_since_last_message/60:.1f} minutes")
            state.user_returned_after_inactivity = True
            state.asked_continue_or_new = False
            state.previous_conversation_summary = extract_conversation_preview(history)

Return Flow (Lines 987-1067):

  1. First ask for language preference
  2. Show previous conversation preview
  3. Ask: "Continue previous or start new?"
  4. Route based on response

Database Collections

2 MongoDB Collections

history_collection = db["chatbot_history"]       # Chat messages
user_profiles_collection = db["user_profiles"]  # User data

chatbot_history Schema

{
    "session_id": "remote_physio_session_12345",
    "messages": [
        {
            "role": "user",
            "content": "I have back pain since yesterday",
            "timestamp": "2024-01-15 10:30:00"
        },
        {
            "role": "assistant",
            "content": "I understand. Can you tell me exactly where in your back?",
            "timestamp": "2024-01-15 10:30:05"
        }
    ]
}

History Retrieval (Lines 495-503):

async def get_chat_history_from_db_async(session_id: str):
    doc = await asyncio.to_thread(history_collection.find_one, {"session_id": session_id})
    messages = doc.get("messages", []) if doc else []
    # Return last 30 messages
    return [
        {"role": msg["role"], "content": msg["content"]}
        for msg in messages[-30:]
        if msg.get("content") is not None
    ]

user_profiles Schema

{
    "session_id": "remote_physio_session_12345",
    "name": "John Doe",
    "name_stored": true,  // Flag to prevent re-asking
    "age": 35,
    "weight": 75.5,
    "updated_at": ISODate("2024-01-15T10:30:00.000Z")
}

Profile is PERMANENT - Never deleted, even on new concern


RAG System (Hybrid BM25 + Vector)

Hybrid Retriever Architecture

Two Separate RAG Systems:

  1. Assessment Retriever - Medical assessments knowledge base
  2. Exercise Retriever - Physiotherapy exercises knowledge base

Hybrid Search (Lines 807-814, 815-821):

# BM25 (keyword-based) + Dense Vector (semantic)
result = retriever.retrieve(
    query=state.summary,  # Clinical summary as query
    k=5,                  # Top 5 results
    alpha=0.7             # Weight: 0.7 vector + 0.3 BM25
)

Alpha = 0.7:

  • 70% weight to dense vector search (semantic similarity)
  • 30% weight to BM25 search (keyword matching)

RAG Pipeline (Lines 792-856)

Background Processing (After Clinical Summary):

async def run_rag_and_api_pipeline_in_background(state, session_id, avtarType):
    # 1. Run parallel retrieval
    results = await asyncio.gather(
        run_retriever(data_as, state.summary, "assessment"),  # Assessment RAG
        run_retriever(data_ex, state.summary, "exercise"),    # Exercise RAG
        fetch_prompt_from_api(2, avtarType),                  # Assessment prompt
        fetch_prompt_from_api(3, avtarType),                  # Exercise prompt
    )

    assessment_docs, exercise_docs, prompt_2, prompt_3 = results

    # 2. Build context from retrieved documents
    assessment_context = "\\n\\n".join([d['text'] for d in assessment_docs])
    exercise_context = "\\n\\n".join([d['text'] for d in exercise_docs])

    # 3. Generate recommendations using GPT-4
    assessment_msgs = [
        {"role": "system", "content": f"{prompt_2}\\n\\nContext:\\n{assessment_context}"},
        {"role": "user", "content": f"Assessments for: {state.summary}"}
    ]
    exercise_msgs = [
        {"role": "system", "content": f"{prompt_3}\\n\\nContext:\\n{exercise_context}"},
        {"role": "user", "content": f"Exercises for: {state.summary}"}
    ]

    assessment_resp, exercise_resp = await asyncio.gather(
        lc_generate_from_simple_messages(assessment_msgs),
        lc_generate_from_simple_messages(exercise_msgs)
    )

    # 4. Store in session state
    state.assessments = assessment_resp
    state.exercises = exercise_resp

    # 5. Send to Remote Physios API
    await send_data_to_api(state.summary, assessment_resp, "Not applicable",
                        exercise_resp, session_id, avtarType)

RAG Artifacts Initialization (Lines 122-131):

data_as = AssessmentRetriever(artifacts_dir=ASSESSMENT_ARTIFACTS_PATH)
data_ex = ExerciseRetriever(artifacts_dir=EXERCISE_ARTIFACTS_PATH)

If RAG fails to initialize, service exits:

if not data_as or not data_ex:
    sys.exit(f"Could not initialize retrievers. Error: {e}")

API Integration Points

1. Remote Physios API Endpoints

Base URL Selection (Lines 522, 550, 764):

def get_base_url(avtarType):
    if avtarType == "User-181473_Project_15":
        return "https://api.remotephysios.com/api/v1"
    else:
        return "https://rp-api.anubhaanant.com/api/v1"

API Endpoints Used:

  1. GET /chatbot-prompts/{prompt_id} - Fetch conversation prompts
# Lines 540-566
async def fetch_prompt_from_api(prompt_id: int, avtarType: str):
    url = f"{base_url}/chatbot-prompts/{prompt_id}"
    headers = {"Origin": "https://api.machineavatars.com"}
    response = await async_http_client.get(url, headers=headers)
    prompt_content = response.json().get("data", {}).get("prompt")
    return prompt_content
  1. POST /chatbot/save - Save chat history to Remote Physios backend
# Lines 520-538
async def save_chat_to_api_async(session_id, question, answer, avtarType):
    url = f"{base_url}/chatbot/save"
    payload = {
        "threadId": session_id,
        "clientId": session_id,
        "messages": [
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer"}
        ]
    }
    await async_http_client.post(url, json=payload, headers=headers)
  1. POST /cb-assessments - Send clinical summary + RAG results
    # Lines 761-790
    async def send_data_to_api(state_summary, rag_result, evaluation, rag_result_ex, session_id, avtarType):
        url = f"{base_url}/cb-assessments"
        payload = {
            "clinicalSummary": state_summary,
            "assessment": rag_result,
            "evaluation": evaluation,
            "exercise": rag_result_ex,
            "clientId": session_id
        }
        await client.post(url, headers=headers, json=payload)
    

2. Prompt Caching (Lines 103-105, 540-566)

LRU Cache for API Prompts:

prompt_cache = LRUCache(maxsize=128)
prompt_cache_lock = Lock()

# Check cache before API call
cache_key = (prompt_id, avtarType)
with prompt_cache_lock:
    if cache_key in prompt_cache:
        logger.info(f"Cache HIT: prompt {prompt_id}")
        return prompt_cache[cache_key]

# Store in cache after fetch
with prompt_cache_lock:
    prompt_cache[cache_key] = prompt_content

Cache Size: 128 prompts max (likely sufficient for 4 prompts × multiple avatars)


Main Conversation Flow

POST /v2/get-response-physio (Lines 889-1473)

Primary Endpoint - 584 Lines of Logic

Request:

POST /v2/get-response-physio
Content-Type: application/x-www-form-urlencoded

question=I+have+back+pain
&session_id=remote_physio_12345
&avtarType=User-181473_Project_15

Flow Overview:

1. Load session state
2. Check inactivity (>5 min)
3. Fetch data in parallel (history, prompts, profile)
4. Load user profile from DB
5. Extract user info from question (name, age, weight)
6. Extract initial problem description
7. Handle special cases (inactivity, new concern, acknowledgment)
8. Detect language preference
9. Build context (profile, medical history, language)
10. Get GPT response
11. Handle state transitions (detect clinical summary)
12. Generate TTS + lip-sync in parallel
13. Save to DB + API
14. Return response

Detailed Flow Steps

Step 1: Initialize Session State (Lines 891)

state = session_states.setdefault(session_id, SessionState())

Session states stored in memory - Will be lost on service restart

Step 2: Inactivity Detection (Lines 894-911)

history = await get_chat_history_from_db_async(session_id)
now = datetime.now(pytz.timezone("Asia/Kolkata"))

if state.last_user_message_time:
    time_since_last_message = (now - state.last_user_message_time).total_seconds()

    if time_since_last_message > 300 and current_history_length > 4:
        if not state.user_returned_after_inactivity:
            state.user_returned_after_inactivity = True
            state.previous_conversation_summary = extract_conversation_preview(history)

Step 3: Parallel Data Fetch (Lines 920-927)

results = await asyncio.gather(
    get_chat_history_from_db_async(session_id),
    fetch_prompt_from_api(1, avtarType),  # Main prompt
    fetch_prompt_from_api(4, avtarType),  # Follow-up prompt
    get_user_profile(session_id),
    return_exceptions=True
)

Performance: 4 operations in parallel (2-3 seconds total vs 8-12 sequential)

Step 4: Load User Profile (Lines 937-948)

if user_profile and not state.profile_loaded:
    state.user_name = user_profile.get('name')
    state.user_age = user_profile.get('age')
    state.user_weight = user_profile.get('weight')
    state.name_permanently_stored = user_profile.get('name_stored', False)
    state.profile_loaded = True

Step 5: Extract User Info from Text (Lines 950-974)

Regex Patterns for Name/Age/Weight:

def extract_user_info_from_text(text: str, name_already_stored: bool):
    # Name patterns
    name_patterns = [
        r"(?:my name is|i am|i'm|naam hai|mera naam)\\s+([A-Za-z]+)",
        r"^([A-Z][a-z]+)\\s*(?:here|hai)",
    ]

    # Age patterns
    age_patterns = [
        r"(?:age|umar|umra)(?:\\s+is)?\\s*(?:hai)?\\s*(\\d{1,3})",
        r"(\\d{1,3})\\s*(?:years?|saal|sal)(?:\\s+old)?",
    ]

    # Weight patterns
    weight_patterns = [
        r"(?:weight|wazan|vajan)(?:\\s+is)?\\s*(?:hai)?\\s*(\\d{1,3}(?:\\.\\d+)?)\\s*(?:kg)?",
        r"(\\d{1,3}(?:\\.\\d+)?)\\s*(?:kg|kgs|kilos?)",
    ]

    return {"name": ..., "age": ..., "weight": ...}

Save to DB in background:

if profile_updated:
    asyncio.create_task(save_user_profile(session_id, name=..., age=..., weight=...))

Step 6: Extract Initial Problem (Lines 976-982)

if not state.initial_problem_description and not state.problem_already_described:
    problem_desc = extract_problem_description(question, history)
    if problem_desc:
        state.initial_problem_description = problem_desc
        state.problem_already_described = True

Problem Detection (Lines 426-454):

  • Skip if just greeting ("hi", "hello", "namaste")
  • Skip if too short (<5 words)
  • Check for problem keywords: "pain", "dard", "hurt", "swelling", "injury", etc.

Language Support (English/Hindi)

Language Detection (Lines 378-394)

Detect Language Preference:

def detect_language_preference(user_input: str) -> Optional[str]:
    user_lower = user_input.lower().strip()

    # Keyword-based
    if any(word in user_lower for word in ["english", "angrezi", "angrez"]):
        return "english"
    elif any(word in user_lower for word in ["hindi", "हिंदी", "हिन्दी", "hinglish"]):
        return "hindi"

    # Unicode-based (Devanagari script)
    if any('\\u0900' <= char <= '\\u097F' for char in user_input):
        return "hindi"

    return None

Hindi = Hinglish: Hindi in English script (not Devanagari)

Language-Specific Responses

Lines 998-1017 (After inactivity):

if detected_lang == "hindi":
    answer = f"""Dhanyavaad! Main aapki madad karne ke liye yahan hoon.

Aapki pichli baat: "{state.previous_conversation_summary}"

Kya aap chahte hain:
1. Pichli baat ko continue karein
2. Naya concern discuss karein

Kripya batayein."""

else:
    answer = f"""Thank you! I'm here to help you.

Your previous conversation: "{state.previous_conversation_summary}"

Would you like to:
1. Continue the previous conversation
2. Discuss a new concern

Please let me know."""

Lines 1103-1116 (Post-summary new concern offer):

if state.current_language == "hindi":
    answer = """Aapka consultation complete ho gaya hai. Summary aapko mil gayi hai.

Agar aapko aur koi madad chahiye ya koi sawal hai, toh aap Remote Physios se contact kar sakte hain:
📅 Free Consultation: https://calendly.com/remotephysios/free-consultation

Agar aapko koi naya concern hai, toh main aapki madad ke liye yahan hoon! Kya aap naya concern discuss karna chahte hain?"""

else:
    answer = """Your consultation is complete. You've received your clinical summary.

If you need further assistance or have any questions, you can contact Remote Physios:
📅 Free Consultation: https://calendly.com/remotephysios/free-consultation

If you have a new concern, I'm here to help! Would you like to discuss a new concern?"""

TTS Voice Selection (Lines 1430)

voice = VOICE_HINDI if detect_language(answer) == "hindi" else VOICE_ENGLISH
# VOICE_HINDI = "hi-IN-AartiNeural"
# VOICE_ENGLISH = "en-IN-AartiNeural"

Note: Both voices use AartiNeural, just different language variants


User Profile Management

Profile Persistence (PERMANENT)

Lines 160-165:

# User profile (PERMANENT - never reset)
user_name: Optional[str] = None
user_age: Optional[int] = None
user_weight: Optional[float] = None
profile_loaded: bool = False
name_permanently_stored: bool = False

Profile NEVER reset on new concern - stays for entire user lifetime

Profile Context in Prompts (Lines 1189-1254)

Complete Profile:

if profile_complete:
    user_context = f"""
=== 🎯 USER PROFILE ===
✓ Name: {state.user_name}
✓ Age: {state.user_age} years
✓ Weight: {state.user_weight} kg

🚫 NEVER ASK FOR NAME, AGE, OR WEIGHT AGAIN!

⚠️ CRITICAL - NAME USAGE RULES:
🚫 DO NOT say things like "Do you have fever, {state.user_name}?"
🚫 DO NOT end questions with the patient's name
✅ Use name in initial greeting or when specifically addressing the patient
✅ Ask questions naturally: "Do you have any fever?" NOT "Do you have fever, {state.user_name}?"

EXAMPLES OF WHAT TO AVOID:
❌ "Is there any swelling in your upper legs, {state.user_name}?"
❌ "Do you have any medical history, {state.user_name}?"

CORRECT WAY:
✅ "Is there any swelling in your upper legs?"
✅ "Do you have any medical history?"
✅ "Thank you, {state.user_name}! I'm here to help you."
"""

Incomplete Profile:

if state.user_name:
    user_context += f"✓ Name: {state.user_name} (STORED)\\n"
else:
    user_context += "❌ Name: NOT PROVIDED\\n"

# Same for age, weight

Onboarding Instruction (Lines 1351-1376)

Ask for missing profile data ONE AT A TIME:

if not profile_complete and state.stage == "root_problem":
    if not state.user_name:
        if state.current_language == "hindi":
            onboarding_instruction = "ASK: \\"Aapka naam kya hai?\\""
        else:
            onboarding_instruction = "ASK: \\"What is your name?\\""

    elif not state.user_age:
        if state.current_language == "hindi":
            onboarding_instruction = "ASK: \\"Aapki umar kya hai?\\""
        else:
            onboarding_instruction = "ASK: \\"How old are you?\\""

    elif not state.user_weight:
        if state.current_language == "hindi":
            onboarding_instruction = "ASK: \\"Aapka weight kitna hai (kg mein)?\\""
        else:
            onboarding_instruction = "ASK: \\"What is your weight in kg?\\""

    onboarding_instruction += "\\n⚠️ Ask ONE question at a time.\\n"

Medical Context Extraction

Comprehensive Medical Info Detection (Lines 269-376)

Prevents Redundant Questions:

def extract_medical_context_from_history(history, current_question):
    # Combine last 10 messages + current question
    all_text = current_question.lower() + " "
    for msg in history[-10:]:
        if msg.get('role') == 'user':
            all_text += msg.get('content', '').lower() + " "

    provided_info = []

    # 1. Pain location detection (15 locations)
    location_keywords = {
        'lower back': ['lower back', 'kamar', 'neeche ki peeth', 'lumbar'],
        'upper back': ['upper back', 'upar ki peeth', 'shoulder blade'],
        'neck': ['neck', 'gardan', 'cervical', 'gala'],
        'shoulder': ['shoulder', 'kandha', 'rotator cuff'],
        'knee': ['knee', 'ghutna', 'patella'],
        'ankle': ['ankle', 'takna', 'takhna', 'gatta'],
        'hip': ['hip', 'kalha', 'koolha'],
        'elbow': ['elbow', 'kohnee', 'koni'],
        'wrist': ['wrist', 'kalai'],
        'foot': ['foot', 'pair', 'feet', 'paon'],
        'hand': ['hand', 'hath', 'haath'],
        'leg': ['leg', 'pair', 'taang'],
        'arm': ['arm', 'baazu', 'baah'],
        'head': ['head', 'headache', 'sir', 'sar dard'],
        'chest': ['chest', 'chaati', 'seena'],
    }

    # 2. Duration detection (patterns)
    duration_patterns = [
        (r'since yesterday', 'Duration: since yesterday'),
        (r'kal se', 'Duration: since yesterday'),
        (r'since morning', 'Duration: since morning'),
        (r'for (\\d+) (day|days|week|weeks|month|months)', 'Duration'),
        (r'(\\d+) (din|dino|hafte|mahine|saal)', 'Duration'),
    ]

    # 3. Pain intensity detection
    intensity_keywords = {
        'severe': ['severe', 'unbearable', 'bahut zyada', '8', '9', '10'],
        'moderate': ['moderate', 'bearable', 'thoda', '5', '6', '7'],
        'mild': ['mild', 'little', 'halka', '2', '3', '4'],
    }

    # 4. Activity/trigger detection
    activity_keywords = [
        'lifting', 'exercise', 'workout', 'running', 'walking', 'sitting',
        'standing', 'bending', 'twisting', 'sleeping', 'driving',
        'uthaate hue', 'chalne se', 'baithne se', 'sote hue'
    ]

    # 5. Symptoms detection
    symptom_keywords = [
        'numbness', 'tingling', 'weakness', 'stiffness', 'swelling',
        'burning', 'sharp pain', 'dull ache', 'shooting pain',
        'sunpan', 'kamzori', 'akdan', 'sujan', 'jalan'
    ]

    # Build context
    context = """
🔍 === CONTEXT FROM CONVERSATION ===
User has ALREADY mentioned:
✓ Pain location: lower back
✓ Duration: since yesterday
✓ Pain intensity: severe
✓ Triggers/Activities: lifting, bending
✓ Associated symptoms: numbness, stiffness

⚠️ IMPORTANT - Use this context to ask SMART follow-up questions!
🚫 DO NOT ask 'which part' if location is clear!
🚫 DO NOT ask 'since when' if duration is mentioned!

✅ Ask ONLY for NEW missing information that will help diagnosis.
"""

    return context

Impact: Reduces frustrating duplicate questions, improves user experience


Clinical Summary Generation

Summary Detection (Lines 868-882)

Triggered when GPT response contains clinical summary keywords:

if is_clinical_summary(raw_answer):
    logger.info(f"✅ Clinical summary detected: {session_id}")
    state.summary = raw_answer.strip()
    state.conversation_has_summary = True
    state.last_completed_summary = raw_answer.strip()
    state.last_summary_timestamp = datetime.now()
    state.summary_shown = True
    state.stage = "follow_up"  # TRANSITION TO FOLLOW-UP STAGE

    # Start RAG pipeline in background
    asyncio.create_task(run_rag_and_api_pipeline_in_background(state, session_id, avtarType))

    return raw_answer  # Return summary to user

Summary Format (Expected from GPT)

Prompt instructs GPT to generate structured clinical summary:

Clinical Understanding:
- Chief Complaint: Lower back pain
- Duration: Since yesterday evening
- Intensity: 8/10 (severe)
- Location: Lumbosacral region, bilateral
- Character: Sharp, shooting pain
- Aggravating Factors: Bending forward, prolonged sitting
- Relieving Factors: Lying down, rest
- Associated Symptoms: Mild numbness in left leg

Provisional Diagnosis:
- Acute mechanical lower back pain
- Possible lumbar disc herniation (to be confirmed)

Recommendations:
- Rest for 48 hours
- Apply ice packs (15 minutes, 3-4 times daily)
- Avoid bending, lifting, twisting
- Follow prescribed exercises

TTS & Lip-Sync Pipeline

Complete Pipeline (Lines 1428-1448)

async def audio_pipeline():
    # 1. Select voice based on detected language
    voice = VOICE_HINDI if detect_language(answer) == "hindi" else VOICE_ENGLISH

    # 2. Azure TTS with rate limiting
    wav_file = await text_to_speech_azure(answer, voice, session_id)

    # 3. FFmpeg PCM conversion
    pcm_file = os.path.join(OUTPUT_DIR, f"{session_id}_pcm.wav")
    converted_pcm = await convert_wav_to_pcm_async(wav_file, pcm_file)

    # 4. Rhubarb lip-sync generation
    json_file = await generate_lip_sync_async(converted_pcm, session_id)
    lip_sync_data = parse_lip_sync(json_file, pcm_file) if json_file else None

    # 5. Base64 encode audio
    audio_base64 = base64.b64encode(open(wav_file, "rb").read()).decode()

    # 6. Cleanup temp files
    for f in [wav_file, pcm_file, json_file]:
        if f and os.path.exists(f):
            os.remove(f)

    return {"audio": audio_base64, "lipsync": lip_sync_data}

TTS Rate Limiting (Lines 145-148, 642-679)

Semaphore limits concurrent TTS requests:

TTS_MAX_CONCURRENT_REQUESTS = 3
tts_semaphore = Semaphore(TTS_MAX_CONCURRENT_REQUESTS)

@time_it
async def text_to_speech_azure(text, voice, session_id, max_retries=3):
    async with tts_semaphore:  # Limit to 3 concurrent
        for attempt in range(max_retries):
            try:
                speech_config = speechsdk.SpeechConfig(
                    subscription="9N41NOfDyVDoduiD4EjlzmZU9CbUX3pPqWfLCORpl7cBf0l2lzVQJQQJ99BCACGhslBXJ3w3AAAYACOG2329",
                    region="centralindia"
                )
                speech_config.speech_synthesis_voice_name = voice
                audio_config = speechsdk.audio.AudioOutputConfig(filename=wav_file)
                synthesizer = speechsdk.SpeechSynthesizer(...)

                result = await asyncio.to_thread(synthesizer.speak_text_async(text).get)

                if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
                    is_rate_limit = "429" in str(error_details.error_details)

                    if is_rate_limit and attempt < max_retries - 1:
                        wait_time = 2 ** attempt  # Exponential backoff
                        await asyncio.sleep(wait_time)
                        continue

                    raise Exception(f"TTS failed: {error_details.reason}")

                return wav_file
            except Exception as e:
                if "429" in str(e) and attempt < max_retries - 1:
                    wait_time = 2 ** attempt
                    await asyncio.sleep(wait_time)
                    continue
                raise

        raise Exception(f"TTS failed after {max_retries} attempts")

Rate Limit Handling:

  • Detect 429 errors
  • Exponential backoff: 1s → 2s → 4s
  • Max 3 retries

API Endpoints

Main Endpoint

POST /v2/get-response-physio - Primary conversation endpoint (584 lines)

Utility Endpoints

GET /check-profile/{session_id} (Lines 1479-1510)

Purpose: Debug endpoint to inspect session state

Response:

{
  "session_id": "remote_physio_12345",
  "profile_in_db": {
    "name": "John Doe",
    "age": 35,
    "weight": 75.5,
    "name_stored": true
  },
  "state_in_memory": {
    "stage": "follow_up",
    "current_language": "english",
    "user_name": "John Doe",
    "summary_shown": true,
    "waiting_for_new_concern_decision": false,
    "initial_problem_description": "I have back pain since yesterday morning",
    "problem_already_described": true
  }
}

POST /reset-session/{session_id} (Lines 1512-1519)

Purpose: Reset session state in memory

if session_id in session_states:
    del session_states[session_id]
    return {"status": "success"}

GET /remote_physio_history (Lines 1529-1594)

Purpose: Get chat history with token counts

Query: ?session_id=remote_physio_12345

Response:

{
  "session_id": "remote_physio_12345",
  "datetime": "2024-01-15 10:30:00",
  "session_total_tokens": 3450,
  "chat_data": [
    {
      "input_prompt": "I have back pain",
      "output_response": "I understand...",
      "timestamp": "2024-01-15 10:30:00",
      "input_tokens": 5,
      "output_tokens": 120,
      "total_tokens": 125
    }
  ],
  "month": "January"
}

Token Counting:

tokenizer = tiktoken.get_encoding("cl100k_base")
def count_tokens(text):
    return len(tokenizer.encode(text))

Security Analysis

🔴 CRITICAL: Hardcoded Azure TTS API Key

Line 650:

speech_config = speechsdk.SpeechConfig(
    subscription="9N41NOfDyVDoduiD4EjlzmZU9CbUX3pPqWfLCORpl7cBf0l2lzVQJQQJ99BCACGhslBXJ3w3AAAYACOG2329",
    region="centralindia"
)

Risk: Same key as other services - high exposure

🔴 CRITICAL: Hardcoded Azure OpenAI API Key

Lines 99, 708, 744:

openai_client = AsyncAzureOpenAI(
    azure_endpoint='https://eastus.api.cognitive.microsoft.com/',
    api_key='0d9d78cabb4c4e22a5b4a6ef53253155',  # HARDCODED
    api_version="2024-02-01",
)

# Also used as default in LangChain
api_key=os.getenv("AZURE_OPENAI_API_KEY", "0d9d78cabb4c4e22a5b4a6ef53253155")

Total Hardcoded Keys: 2 (TTS + OpenAI)

🟠 SECURITY: Overly Permissive CORS

Lines 69-71:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # ANY ORIGIN
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

🟡 CODE QUALITY: Magic String for Client Detection

Lines 522, 550, 764:

if avtarType == "User-181473_Project_15":
    base_url = REMOTEPHYSIOS_API_BASE
else:
    base_url = DEFAULT_API_BASE

Problem: "User-181473_Project_15" is magic string identifying Remote Physios

Better Approach:

REMOTE_PHYSIOS_AVATAR_ID = os.getenv("REMOTE_PHYSIOS_AVATAR_ID", "User-181473_Project_15")
if avtarType == REMOTE_PHYSIOS_AVATAR_ID:
    ...

🟡 CODE QUALITY: Session State in Memory

Line 194:

session_states = {}  # In-memory dictionary

Problem:

  • Lost on service restart
  • No sharing across multiple service instances (horizontal scaling)
  • Memory leak potential (no cleanup)

Fix: Store in Redis or MongoDB

🟡 DATA INTEGRITY: Profile Stored in User Profiles Collection

Lines 471-493:

await asyncio.to_thread(
    user_profiles_collection.update_one,
    {"session_id": session_id},
    {"$set": update_data},
    upsert=True
)

Issue: User profiles tied to session_id, not user_id

Impact: Same user with different session_id = different profile

🟢 GOOD PRACTICE: Performance Timing

Lines 57-65:

def time_it(func):
    @functools.wraps(func)
    async def async_wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = await func(*args, **kwargs)
        end = time.perf_counter()
        logger.info(f"Function '{func.__name__}' executed in {end - start:.4f}s")
        return result
    return async_wrapper

Decorated Functions:

  • text_to_speech_azure
  • convert_wav_to_pcm_async
  • generate_lip_sync_async
  • get_gpt_response
  • fetch_prompt_from_api
  • send_data_to_api

🟢 GOOD PRACTICE: Prompt Caching

Lines 540-566:

Reduces API calls to Remote Physios for prompts (LRU cache with 128 max)


Refactoring Recommendations

1. Extract Client-Specific Configuration

CreateConfigurablePhysioService:

class PhysioServiceConfig(BaseModel):
    client_name: str = "Generic Physiotherapy"
    api_base_url: str
    calendly_link: Optional[str] = None
    avatar_id: str
    supported_languages: List[str] = ["english", "hindi"]
    consultation_stages: List[str] = ["language_selection", "root_problem", "follow_up"]
    rag_enabled: bool = True

# Load from database or config file per chatbot
config = load_config(project_id, user_id)

2. Move to Multi-Tenant Architecture

Instead of:

if avtarType == "User-181473_Project_15":
    base_url = REMOTEPHYSIOS_API_BASE

Use:

chatbot_config = db["chatbot_configurations"].find_one({"avatarType": avtarType})
base_url = chatbot_config.get("api_base_url") or DEFAULT_API_BASE

3. Externalize Hardcoded Strings

Move to database/config:

  • Brand name ("Remote Physios")
  • API endpoints
  • Calendly links
  • Prompt IDs
  • Magic strings

4. Make State Machine Configurable

Current: Hardcoded 8-stage flow for physiotherapy

Better: Load stage configuration per chatbot type

stage_config = {
    "physiotherapy": ["language", "onboarding", "assessment", "summary", "follow_up"],
    "general_support": ["greeting", "problem", "resolution"],
    "sales": ["greeting", "qualification", "demo", "close"]
}

5. Session State to Redis

Current: In-memory dictionary (session_states = {})

Better: Redis with TTL

await redis.setex(
    f"session:{session_id}",
    3600,  # 1 hour TTL
    json.dumps(state.dict())
)

6. Separate Generic Chatbot Service

Architecture:

machineagents-be/
├── generic-chatbot-service/  # Base chatbot with configurable flows
│   ├── state_machine.py      # Configurable state machine
│   ├── profile_manager.py    # Reusable profile management
│   └── context_extractor.py  # Medical/domain context extraction
└── client-configurations/     # Client-specific configs (not code!)
    ├── remote-physios.yaml    # Remote Physios configuration
    ├── therapist-ai.yaml      # Another therapy client
    └── default.yaml           # Default chatbot config

Prompt Structure Details

4 Prompts from Remote Physios API

The service fetches 4 different prompts from the Remote Physios API, each serving a distinct purpose in the consultation flow:

Prompt Fetching (Lines 818-819, 923-924):

# Parallel fetch
results = await asyncio.gather(
    fetch_prompt_from_api(1, avtarType),  # Main consultation prompt
    fetch_prompt_from_api(2, avtarType),  # Assessment specialist prompt
    fetch_prompt_from_api(3, avtarType),  # Exercise specialist prompt
    fetch_prompt_from_api(4, avtarType),  # Follow-up prompt
)

Prompt Usage Breakdown

Prompt 1: Main Consultation Prompt

  • Used in: Primary conversation flow (Lines 923, 1418)
  • Purpose: Guides GPT through patient assessment and clinical summary generation
  • Expected Content:
You are a physiotherapy assistant conducting a virtual consultation.
Your goal is to gather patient information and provide a clinical summary.

Ask targeted questions about:
1. Pain location and characteristics
2. Duration and onset
3. Aggravating/relieving factors
4. Medical history
5. Previous treatments

When you have enough information, provide a structured clinical summary...

Prompt 2: Assessment Specialist Prompt

  • Used in: RAG pipeline for assessments (Lines 818, 834)
  • Purpose: Generate assessment recommendations based on clinical summary + retrieved documents
  • Expected Content:
You are a physiotherapy assessment specialist.
Based on the clinical summary and assessment protocols provided,
recommend appropriate assessments for diagnosis.

Context: [RAG-retrieved assessment documents]

Provide:
1. Primary assessments to conduct
2. Expected findings
3. Red flags to watch for

Prompt 3: Exercise Specialist Prompt

  • Used in: RAG pipeline for exercises (Lines 819, 838)
  • Purpose: Generate exercise plan based on clinical summary + retrieved exercises
  • Expected Content:
You are a physiotherapy exercise specialist.
Based on the clinical summary and exercise library provided,
design a personalized exercise plan.

Context: [RAG-retrieved exercise documents]

Provide:
1. Progressive exercise protocol
2. Dosage (sets, reps, duration)
3. Precautions and contraindications

Prompt 4: Follow-up Prompt

  • Used in: Post-summary interactions (Lines 924, 1422)
  • Purpose: Guide conversations after clinical summary is shown
  • Expected Content:
You have already provided a clinical summary to the patient.
Answer any questions they have about the summary, assessments, or exercises.

Encourage them to:
- Book a consultation if needed
- Start exercises gradually
- Monitor symptoms

Prompt API Response Format

Expected API Response (Lines 555-559):

{
  "data": {
    "id": 1,
    "prompt": "You are a physiotherapy assistant...",
    "created_at": "2024-01-15T10:30:00Z",
    "updated_at": "2024-01-20T15:45:00Z"
  }
}

Alternative Format:

{
    "content": "You are a physiotherapy assistant...",
    "metadata": {...}
}

Fallback (Lines 558-559):

prompt_content = data.get("data", {}).get("prompt") or data.get("content")
if not prompt_content:
    raise ValueError(f"Prompt not found: {prompt_id}")

Dockerfile Configuration

Build Configuration

Path: remote-physio-service/Dockerfile
Base Image: python:3.9-slim
Exposed Port: 8018

System Dependencies

Installed Packages (Lines 47-49):

RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg \          # Audio conversion for TTS
    curl \            # HTTP requests
    build-essential \ # Python package compilation
    cifs-utils \      # Network file system support
    unzip             # Archive extraction

Application Setup

Working Directory:

WORKDIR /app

Python Virtual Environment (Lines 55-57):

RUN python -m venv /opt/venv && \
    /opt/venv/bin/pip install --upgrade pip && \
    /opt/venv/bin/pip install -r requirements.txt

Source Code Copy (Line 60):

COPY src/ ./src/

Rhubarb Permissions (Line 63):

RUN chmod +x /app/src/Rhubarb/rhubarb || true

Runtime Configuration

Environment Variables (Line 66):

ENV PYTHONPATH=/app

Exposed Port (Line 68):

EXPOSE 8018

Start Command (Line 71):

CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8018"]

⚠️ IMPORTANT: Code has port=8000 in if __name__ == "__main__" (Line 1601) but Dockerfile uses --port 8018

Docker Compose Override:

  • Docker run command: uvicorn src.main:app ... --port 8018
  • Direct Python run: uvicorn.run(app, host="0.0.0.0", port=8000)
  • Production uses Docker → Port 8018 ✅
  • Local development might use Port 8000 ⚠️

Additional Source Files

Directory Structure

remote-physio-service/src/
├── main.py                      # Primary service (1,602 lines) ✅ ACTIVE
├── oldversion_main.py           # Legacy backup (1,603 lines) ⚠️ DEPRECATED
├── lang.py                      # Alternative version (659 lines) ⚠️ EXPERIMENTAL
├── physio/
│   ├── __init__.py
│   ├── assessments/
│   │   ├── __init__.py
│   │   ├── hybrid_retriever.py  # Assessment RAG retriever
│   │   └── rag_artifacts/       # BM25 + FAISS indices
│   │       ├── bm25_index.pkl
│   │       ├── embeddings.npy
│   │       ├── metadata.json
│   │       └── documents/
│   └── exercises/
│       ├── __init__.py
│       ├── hybrid_retriever.py  # Exercise RAG retriever
│       └── rag_artifacts/       # BM25 + FAISS indices
│           ├── bm25_index.pkl
│           ├── embeddings.npy
│           ├── metadata.json
│           └── documents/
├── utils/
│   └── (utility modules)
├── Rhubarb/
│   └── rhubarb                  # Linux executable
├── Rhubarb-Lip-Sync-1.13.0-Windows/
│   └── rhubarb.exe              # Windows executable
└── __pycache__/

Version Comparison

main.py (ACTIVE):

  • 1,602 lines
  • LangChain integration
  • Full 8-stage state machine
  • Hybrid RAG implementation
  • Current production version

oldversion_main.py (DEPRECATED):

  • 1,603 lines
  • Likely previous iteration
  • ⚠️ Should be deleted or archived

lang.py (EXPERIMENTAL):

  • 659 lines
  • Possibly simplified version or language-specific variant
  • Has shutdown event handler (@app.on_event("shutdown"))
  • ⚠️ Unclear purpose - needs investigation

physio Module (RAG Implementation)

HybridRetriever Class:

Expected interface (based on usage):

class HybridRetriever:
    def __init__(self, artifacts_dir: str):
        """Load BM25 index, embeddings, metadata"""
        self.bm25 = load_bm25(f"{artifacts_dir}/bm25_index.pkl")
        self.embeddings = np.load(f"{artifacts_dir}/embeddings.npy")
        self.metadata = json.load(f"{artifacts_dir}/metadata.json")

    def retrieve(self, query: str, k: int = 5, alpha: float = 0.7) -> List[Dict]:
        """
        Hybrid search combining BM25 and vector similarity

        Args:
            query: Search query (clinical summary)
            k: Number of results to return
            alpha: Weight (0.7 = 70% vector, 30% BM25)

        Returns:
            List of documents with 'text' field
        """
        # Vector search
        vector_scores = cosine_similarity(embed(query), self.embeddings)

        # BM25 search
        bm25_scores = self.bm25.get_scores(query)

        # Combine scores
        final_scores = alpha * vector_scores + (1 - alpha) * bm25_scores

        # Get top-k
        top_indices = np.argsort(final_scores)[-k:][::-1]

        return [self.metadata[i] for i in top_indices]

RAG Artifacts Format:

metadata.json:

[
    {
        "id": "ex_001",
        "text": "Lumbar Extension Exercise:\n\nLie prone...",
        "category": "lower_back",
        "difficulty": "beginner",
        "contraindications": ["acute disc herniation"]
    },
    ...
]

Embeddings: Likely generated using sentence-transformers or Azure OpenAI embeddings


Performance Characteristics

Timing Decorator Coverage

All major functions decorated with @time_it (Lines 57-65):

Timed Functions:

  1. fetch_prompt_from_api (~200-500ms with cache, ~800-1500ms without)
  2. text_to_speech_azure (~2000-4000ms)
  3. convert_wav_to_pcm_async (~300-800ms)
  4. generate_lip_sync_async (~1000-2000ms)
  5. get_gpt_response (~3000-8000ms depending on context)
  6. send_data_to_api (~500-1500ms)
  7. handle_state_transitions (~10-50ms)

End-to-End Performance

Typical Request Breakdown:

User Question → Response
├── 1. Load session state (memory)                    ~1ms
├── 2. Fetch history from MongoDB                     ~50-200ms
├── 3. Fetch prompts from API (parallel)              ~200-500ms (cached)
│                                                      ~800-1500ms (uncached)
├── 4. Load user profile from MongoDB                 ~50-150ms
├── 5. Extract user info from text (regex)            ~5-10ms
├── 6. Build context (medical extraction)             ~10-30ms
├── 7. GPT-4 response generation                      ~3000-8000ms ⏳ BOTTLENECK
├── 8. State transitions (detect summary)             ~10-50ms
├── 9. TTS generation (parallel with history save)    ~2000-4000ms
├── 10. FFmpeg conversion                             ~300-800ms
├── 11. Rhubarb lip-sync                              ~1000-2000ms
├── 12. Base64 encoding                               ~10-30ms
├── 13. Save to MongoDB                               ~100-300ms
└── 14. Save to Remote Physios API                    ~500-1500ms

TOTAL (parallel optimized): ~6-15 seconds
TOTAL (if all sequential): ~10-20 seconds

Parallel Optimization (Lines 920-927, 1456):

# Step 1: Parallel data fetch (saves ~2-4 seconds)
results = await asyncio.gather(
    get_chat_history_from_db_async(session_id),
    fetch_prompt_from_api(1, avtarType),
    fetch_prompt_from_api(4, avtarType),
    get_user_profile(session_id),
)

# Step 2: Parallel audio generation + history save (saves ~2-3 seconds)
audio_result, _ = await asyncio.gather(
    audio_pipeline(),
    save_history_pipeline()
)

Without Parallelization: ~18-25 seconds per request

With Parallelization: ~6-15 seconds per request

Speedup: ~60-70% improvement

Background RAG Pipeline Performance

After Clinical Summary Detected (Lines 792-856):

Clinical Summary Generated
Background Task Started (non-blocking)
    ├── 1. Parallel RAG retrieval                      ~2000-4000ms
    │   ├── Assessment retriever (k=5)                 ~1000-2000ms
    │   ├── Exercise retriever (k=5)                   ~1000-2000ms
    │   ├── Fetch prompt 2                             ~200-500ms (cached)
    │   └── Fetch prompt 3                             ~200-500ms (cached)
    ├── 2. GPT-4 generate assessments                  ~3000-6000ms
    ├── 3. GPT-4 generate exercises                    ~3000-6000ms
    └── 4. Send to Remote Physios API                  ~500-1500ms

TOTAL BACKGROUND: ~8-15 seconds

User Impact: NONE - runs asynchronously, results stored in state.assessments and state.exercises

TTS Rate Limiting Impact

Concurrent Request Limit: 3 (Line 146)

Scenario: 10 simultaneous users

Users 1-3:  Start TTS immediately              (0ms wait)
Users 4-6:  Wait for slot, average             (~2000ms wait)
Users 7-9:  Wait for slot, average             (~4000ms wait)
User 10:    Wait for slot, average             (~6000ms wait)

With retry + backoff (Lines 663-677):

  • 1st retry: Wait 1 second (2^0)
  • 2nd retry: Wait 2 seconds (2^1)
  • 3rd retry: Wait 4 seconds (2^2)

Max total delay if all retries: ~7 seconds additional

Memory Usage

Session States (In-Memory):

  • SessionState object: ~2-5 KB per session
  • 1000 active sessions: ~2-5 MB
  • 10,000 active sessions: ~20-50 MB
  • No garbage collection → grows unbounded ⚠️

Prompt Cache (LRU):

  • Max 128 prompts
  • Estimated size: ~10-50 KB per prompt
  • Total: ~1.3-6.4 MB maximum

RAG Artifacts (Loaded on Startup):

  • BM25 indices: ~10-50 MB (2 systems)
  • Embeddings FAISS: ~50-200 MB (2 systems)
  • Metadata: ~5-20 MB (2 systems)
  • Total RAM: ~150-500 MB for RAG

Total Service Memory Footprint: ~200-600 MB


Summary

Service Statistics

  • Total Lines: 1,602
  • Total Endpoints: 4
  • Total Collections: 2 (MongoDB)
  • Total Stages: 8
  • Total Languages: 2 (English, Hindi/Hinglish)
  • Total RAG Systems: 2 (Assessments, Exercises)
  • Total Hardcoded API Keys: 2
  • Total Client References: 15+

Key Capabilities

  1. 8-Stage State Machine with inactivity handling
  2. Bilingual Support (English + Hinglish)
  3. Hybrid RAG (BM25 + Vector, alpha=0.7)
  4. Medical Context Extraction (15 pain locations, duration, intensity, symptoms)
  5. User Profile Persistence (Name/Age/Weight stored permanently)
  6. Clinical Summary Auto-Detection
  7. Background RAG Pipeline (Parallel assessments + exercises)
  8. External API Integration (2 Remote Physios API endpoints)
  9. TTS Rate Limiting (3 concurrent max, exponential backoff)
  10. Performance Timing (All major functions decorated)

Critical Architectural Issues

  1. 🔴 Client-Specific Code in Main Product - Should be configurable
  2. 🔴 Hardcoded API Keys (2 discovered)
  3. 🔴 Hardcoded Brand References (15+ occurrences)
  4. 🔴 Magic Strings ("User-181473_Project_15")
  5. 🟡 Session State in Memory - Not scalable
  6. 🟡 No Multi-Tenancy - One client = entire service

Immediate Actions Needed

  1. Extract hardcoded API keys to environment variables
  2. Document refactoring plan to make service generic
  3. Create migration path to client configuration system
  4. Restrict CORS to specific origins
  5. Move session state to Redis
  6. Create client configuration schema for Remote Physios

Documentation Complete: Remote Physio Service (Port 8018)
Status: COMPREHENSIVE, DEVELOPER-GRADE, INVESTOR-GRADE, AUDIT-READY ✅

⚠️ ARCHITECTURAL DEBT DOCUMENTED - REFACTORING REQUIRED