Skip to content

AI Ethics Guidelines

Purpose: Comprehensive framework for responsible AI development and deployment
Audience: Engineering Team, Legal, Compliance, Investors, Customers
Owner: CTO, Ethics Committee, Legal
Last Updated: 2025-12-26
Version: 1.0
Status: Active


Executive Summary

MachineAvatars is committed to developing AI systems that are safe, fair, transparent, and accountable. This document outlines our ethical principles, implementation practices, and oversight mechanisms to ensure our AI chatbots serve users responsibly.

Core Commitments:

  • Safety First: Prevent harmful outputs through multi-layered guardrails
  • Bias Mitigation: Actively detect and reduce algorithmic bias
  • Transparency: Clear disclosure of AI capabilities and limitations
  • User Consent: Explicit opt-in for AI interactions
  • Accountability: Human oversight and recourse mechanisms
  • Privacy: Data protection and user control

1. Responsible AI Principles

1.1 Core Principles

graph TB
    A[Responsible AI] --> B[Safety]
    A --> C[Fairness]
    A --> D[Transparency]
    A --> E[Accountability]
    A --> F[Privacy]
    A --> G[Beneficence]

    style A fill:#4CAF50

Principle 1: Safety

Commitment: AI systems must not cause harm to users, organizations, or society.

Implementation:

  • Multi-layered content filtering (profanity, violence, illegal content)
  • Prompt injection detection and prevention
  • Rate limiting to prevent abuse
  • Continuous monitoring for misuse patterns

Example:

# Safety check before LLM response
def safety_check(user_input: str, ai_response: str) -> bool:
    """
    Check for unsafe content in input and output.
    """
    # Check for harmful patterns
    harmful_patterns = [
        r"how to (build|make|create) (bomb|weapon|explosive)",
        r"(hack|steal|pirate)",
        r"(suicide|self-harm|kill myself)"
    ]

    for pattern in harmful_patterns:
        if re.search(pattern, user_input, re.IGNORECASE):
            return False
        if re.search(pattern, ai_response, re.IGNORECASE):
            return False

    return True

Principle 2: Fairness

Commitment: AI systems must treat all users equitably, regardless of race, gender, age, religion, or other protected characteristics.

Implementation:

  • Diverse training data representation
  • Bias testing across demographic groups
  • Regular fairness audits
  • Inclusive language policies

Metrics:

  • Response quality parity across user demographics
  • Equal access to features regardless of background
  • Regular bias assessments (quarterly)

Principle 3: Transparency

Commitment: Users must understand when they are interacting with AI and how it works.

Implementation:

  • Clear "Powered by AI" labeling on all chatbots
  • Disclosure of AI limitations in user documentation
  • Explainability: Show which documents informed responses (RAG citations)
  • Model information disclosure (which LLM used)

Example UI:

┌────────────────────────────────────┐
│ 🤖 AI Chatbot                      │
│ Powered by GPT-4 | AskGalore       │
├────────────────────────────────────┤
│ Hello! I'm an AI assistant trained │
│ on your organization's documents.  │
│                                    │
│ ⚠️ I may make mistakes. Always     │
│ verify critical information.       │
└────────────────────────────────────┘

Principle 4: Accountability

Commitment: Clear ownership and recourse mechanisms for AI decisions.

Implementation:

  • Human-in-the-loop for sensitive decisions
  • Audit trails for all AI interactions
  • Clear escalation paths (AI → Human support)
  • Incident response procedures

Accountability Chain:

User Issue
Chatbot Response
AI Review (Automated)
Human Review (If flagged)
Executive Review (Serious issues)

Principle 5: Privacy

Commitment: User data is protected and used only for stated purposes.

Implementation:

  • Data minimization (collect only what's needed)
  • Encryption at rest and in transit
  • User data deletion rights honored
  • No training on user data without consent

Compliance:

  • GDPR Article 21 (Right to object)
  • DPDPA 2023 Section 16 (Consent withdrawal)
  • CCPA Chapter 3 (Data deletion)

Principle 6: Beneficence

Commitment: AI systems should actively benefit users and society.

Implementation:

  • Focus on productivity enhancement
  • Accessibility features (TTS for visually impaired)
  • Educational use cases prioritized
  • Positive social impact measurement

2. Bias Detection & Mitigation

2.1 Types of Bias We Address

Bias Type Description Mitigation Strategy
Training Data Bias Underrepresentation of certain groups Diverse data sourcing, synthetic data
Selection Bias Non-representative user base Demographic tracking, targeted outreach
Measurement Bias Metrics favor certain outcomes Multi-metric evaluation
Algorithmic Bias Model favors certain patterns Fairness constraints, debiasing techniques
Interaction Bias User feedback creates loops Balanced feedback collection

2.2 Bias Testing Framework

Quarterly Bias Audits:

def conduct_bias_audit(test_cases: List[TestCase]) -> BiasReport:
    """
    Test model responses across demographic groups.

    Test cases include:
    - Same question, different names (cultural diversity)
    - Same question, different pronouns (gender)
    - Same question, different locations (geographic)
    """
    results = {}

    for test_case in test_cases:
        for demographic in ["Male", "Female", "Non-binary", "Various ethnicities"]:
            response = chatbot.ask(test_case.question, demographic_context=demographic)
            results[demographic] = {
                "response_quality": evaluate_quality(response),
                "response_length": len(response),
                "politeness_score": calculate_politeness(response),
                "helpfulness": rate_helpfulness(response)
            }

    # Check for statistical parity
    return analyze_parity(results)

Example Test Cases:

  • "Tell me about John Smith" vs. "Tell me about Priya Sharma" (name bias)
  • "What career should he pursue?" vs. "What career should she pursue?" (gender bias)
  • "Describe a CEO" (check for male-default assumptions)

2.3 Debiasing Techniques

1. Prompt Engineering:

# Add fairness instruction to system prompt
system_prompt = """
You are a helpful, unbiased assistant. When answering questions:
- Avoid stereotypes based on gender, race, age, or nationality
- Use inclusive language (they/them when gender unknown)
- Present diverse examples and perspectives
- Challenge biased assumptions in user questions
"""

2. Response Filtering:

def check_bias(response: str) -> bool:
    """
    Flag potentially biased responses.
    """
    bias_indicators = [
        r"(all|most) (women|men|blacks|asians|muslims|christians)",
        r"(women|men) are (better|worse) at",
        r"typical (male|female) (job|trait)"
    ]

    for pattern in bias_indicators:
        if re.search(pattern, response, re.IGNORECASE):
            return False  # Flag for review

    return True

3. Balanced Training (Future):

  • Collect diverse user feedback
  • Oversample underrepresented scenarios
  • Regular model fine-tuning on balanced data

3. Transparency & Explainability

3.1 User-Facing Transparency

AI Disclosure Requirements:

Every chatbot interaction must include:

  1. AI Identity:
"I'm an AI assistant created by AskGalore."
  1. Capability Disclosure:
"I can answer questions based on your uploaded documents.
 I cannot access external information or make real-time decisions."
  1. Limitation Warnings:
⚠️ I may occasionally make mistakes or provide incomplete information.
Always verify critical facts independently.
  1. Data Usage Notice:
    📊 Our conversation is stored to improve your experience.
    You can delete your data anytime in Settings.
    

3.2 RAG Citations (Explainability)

Show Source Documents:

User: "What is your refund policy?"

AI: "Our refund policy allows returns within 30 days of purchase.

📄 Source: Refund_Policy.pdf, Page 3"

Implementation:

def generate_response_with_citation(query: str, context_chunks: List[Dict]):
    """
    Include source attribution in response.
    """
    # Generate response
    response = llm.generate(query, context_chunks)

    # Add citations
    sources = [
        f"{chunk['document_name']}, Page {chunk['page_number']}"
        for chunk in context_chunks[:3]  # Top 3 sources
    ]

    citation = f"\n\n📄 Sources: {', '.join(sources)}"

    return response + citation

3.3 Model Information Disclosure

For Enterprise Customers:

Provide visibility into:

  • Which LLM model is being used (GPT-4, GPT-3.5, etc.)
  • Model version and update history
  • Training data cutoff date
  • Known limitations

Example Dashboard:

Chatbot: Customer Support Bot
Model: GPT-3.5 Turbo 16K
Last Updated: 2024-12-01
Knowledge Cutoff: 2024-11-30
Accuracy (Self-reported): 92%

Before AI Interaction:

┌─────────────────────────────────────┐
│ 🤖 AI-Powered Conversation          │
├─────────────────────────────────────┤
│ This chatbot uses AI to provide     │
│ answers based on your documents.    │
│                                     │
│ By continuing, you agree to:       │
│ ✓ AI processing of your messages   │
│ ✓ Conversation storage (90 days)   │
│ ✓ Anonymous usage analytics        │
│                                     │
│ [ Agree & Continue ]  [ Learn More ]│
└─────────────────────────────────────┘

Implementation:

@app.post("/api/chat")
async def chat_endpoint(request: ChatRequest):
    # Check consent
    user = get_user(request.user_id)
    if not user.ai_consent_given:
        raise HTTPException(403, "AI consent required")

    # Proceed with chat
    return generate_response(request.message)

4.2 User Control Mechanisms

Data Rights:

  1. Right to Know:

  2. Access all stored conversations

  3. View AI training data sources
  4. See model decision explanations

  5. Right to Delete:

  6. Delete conversation history (instant)

  7. Delete uploaded documents (cascade to embeddings)
  8. Purge all user data (GDPR/DPDPA compliant)

  9. Right to Opt-Out:

  10. Disable AI entirely (use human support only)
  11. Opt-out of analytics
  12. Prevent data use for training

Settings UI:

AI Preferences:
☑ Allow AI chatbot responses
☐ Contribute data to improve AI (opt-in)
☑ Store conversation history
☐ Share anonymous usage analytics

[Delete All My Data]  [Export My Data]

Process:

def withdraw_consent(user_id: str):
    """
    User revokes AI consent.
    """
    # 1. Disable AI features
    user.ai_enabled = False
    user.save()

    # 2. Delete conversation history
    db.chatbot_history.delete_many({"user_id": user_id})

    # 3. Delete embeddings (Milvus partitions)
    milvus.delete_embeddings_by_user_project(user_id, all_projects=True)

    # 4. Notify user
    send_email(user.email, "AI consent withdrawn", ...)

    # 5. Log for compliance
    audit_log.record("user_consent_withdrawn", user_id)

5. Safety Guardrails

5.1 Content Filtering Layers

flowchart TD
    A[User Input] --> B[Layer 1: Profanity Filter]
    B --> C[Layer 2: Harmful Content Detection]
    C --> D[Layer 3: Prompt Injection Check]
    D --> E[LLM Processing]
    E --> F[Layer 4: Output Safety Check]
    F --> G[Layer 5: Bias Check]
    G --> H{Safe?}
    H -->|Yes| I[Return to User]
    H -->|No| J[Block & Log]

    style J fill:#F44336
    style I fill:#4CAF50

5.2 Guardrail Implementation

Layer 1: Input Sanitization

def sanitize_input(user_input: str) -> str:
    """
    Remove/escape dangerous patterns.
    """
    # Remove potential prompt injections
    user_input = user_input.replace("Ignore previous instructions", "")
    user_input = user_input.replace("You are now", "")

    # Limit length (prevent overflow attacks)
    user_input = user_input[:2000]

    return user_input

Layer 2: Custom Guardrails (Database)

# guardrails collection in MongoDB
{
    "user_id": "User-123",
    "project_id": "Project-456",
    "prohibited_topics": [
        "politics",
        "religion",
        "medical advice"
    ],
    "prohibited_keywords": [
        "competitor_name_A",
        "competitor_name_B"
    ],
    "custom_safety_rules": [
        "Never discuss pricing without approval",
        "Always redirect legal questions to compliance team"
    ]
}

Layer 3: LLM-Based Safety

def safety_classification(text: str) -> Dict:
    """
    Use LLM to classify safety of content.
    """
    prompt = f"""
    Classify the following text for safety:
    - Violence: Yes/No
    - Hate Speech: Yes/No
    - Sexual Content: Yes/No
    - Misinformation: Yes/No

    Text: {text}
    """

    result = llm.classify(prompt)
    return result

5.3 Blocked Content Response

User-Facing:

❌ I cannot provide information on that topic.

Our AI is designed to assist with [product/service] questions only.
For other inquiries, please contact our support team.

Internal Logging:

logger.warning("Content blocked", extra={
    "user_id": user_id,
    "project_id": project_id,
    "violation_type": "prohibited_topic",
    "blocked_query": truncate(user_input, 100),
    "timestamp": datetime.utcnow()
})

6. Accountability Framework

6.1 Human Oversight

Escalation Triggers:

Scenario Auto-Handle Human Review
Normal Q&A ✅ AI responds ❌ No review
Guardrail violation ✅ AI blocks + logs ✅ Reviewed daily
User reports issue ❌ Escalate immediately ✅ Within 24 hours
Sensitive topic ⚠️ AI disclaimer ✅ Periodic audit
Legal/compliance query ❌ Redirect to human ✅ Immediate

6.2 Audit Trails

Every AI Interaction Logged:

# chatbot_history collection
{
    "user_id": "...",
    "project_id": "...",
    "session_id": "...",
    "timestamp": "2025-12-26T12:00:00Z",
    "chat_data": [
        {
            "input_prompt": "User's question",
            "enhanced_question": "Preprocessed query",
            "output_response": "AI's answer",
            "Similar Vectors": [...],  # Which chunks used
            "input_tokens": 150,
            "output_tokens": 300,
            "total_tokens": 450,
            "model_used": "gpt-35-turbo-16k-0613",
            "safety_checks_passed": true,
            "guardrails_triggered": []
        }
    ]
}

Retention: 90 days (configurable per customer)


6.3 Incident Response

If AI Causes Harm:

  1. Immediate Actions (< 1 hour):

  2. Disable affected chatbot

  3. Notify affected users
  4. Begin incident investigation

  5. Investigation (< 24 hours):

  6. Review conversation logs

  7. Identify root cause
  8. Assess scope of impact

  9. Remediation (< 72 hours):

  10. Fix underlying issue

  11. Update guardrails
  12. Retrain model if needed
  13. Compensate affected users (if applicable)

  14. Prevention (< 1 week):

  15. Update safety procedures
  16. Add new test cases
  17. Conduct team training
  18. Document lessons learned

7. Monitoring & Evaluation

7.1 AI Ethics Metrics

Metric Target Frequency Owner
Safety violations < 0.1% of interactions Daily ML Team
Bias audit score > 90/100 Quarterly Ethics Committee
User trust score > 4.0/5 Monthly Product Team
Transparency compliance 100% Weekly Legal
Consent rate > 95% explicit consent Monthly Compliance

7.2 Continuous Monitoring

Real-Time Dashboards:

AI Ethics Dashboard:
├── Safety
│   ├── Blocked queries: 23 today
│   ├── False positives: 2 (reviewed)
│   └── Escalations: 1 pending
├── Fairness
│   ├── Response quality parity: 98%
│   └── Next bias audit: 45 days
├── Transparency
│   ├── Disclosure shown: 100%
│   └── Citations included: 87%
└── User Control
    ├── Consent rate: 97%
    └── Data deletion requests: 5 today

8. Compliance & Governance

8.1 Regulatory Alignment

India (DPDPA 2023):

  • ✅ Explicit consent for data processing (Section 6)
  • ✅ Right to access data (Section 13)
  • ✅ Right to erasure (Section 15)
  • ✅ Data breach notification (Section 8)

EU (GDPR):

  • ✅ Lawful basis for processing (Article 6)
  • ✅ Data minimization (Article 5)
  • ✅ Right to explanation (Article 22)
  • ✅ Data portability (Article 20)

EU AI Act (Proposed - Future):

  • ✅ High-risk AI classification assessment
  • ✅ Transparency obligations for general-purpose AI
  • ✅ Risk management system

8.2 Ethics Committee

Composition:

  • CTO (Chair)
  • Legal Counsel
  • ML Engineering Lead
  • UX/Product Manager
  • External Ethics Advisor (planned Q2 2025)

Responsibilities:

  • Quarterly ethics reviews
  • AI incident investigations
  • Policy updates
  • Bias audit oversight

Meeting Schedule: Monthly


9. Future Commitments

Q1 2025:

  • Implement automated bias testing pipeline
  • Add real-time fairness monitoring
  • Expand guardrail coverage (100% of sensitive topics)

Q2 2025:

  • Hire external ethics advisor
  • Conduct independent AI audit
  • Publish annual AI transparency report

Q3 2025:

  • Open-source bias testing framework
  • AI safety certification (ISO/IEC 42001)
  • Multi-language ethics documentation


Last Updated: 2025-12-26
Version: 1.0
Review Cycle: Quarterly
Next Review: 2025-03-31


"With great AI power comes great responsibility."