Hausa Chatbot Documentation

1. Project Overview

The Hausa AI Chatbot is a comprehensive conversational AI system that enables fluent communication in the Hausa language using advanced GPT models integrated with Google Cloud's Speech-to-Text and Text-to-Speech APIs.

Key Objectives:

Fine-tune GPT models for Hausa language understanding
Enable voice-based interaction in Hausa
Provide natural, culturally-aware responses
Support both text and voice input/output

2. Features

Text Features

Real-time text chat in Hausa
Context-aware conversations
English-Hausa bilingual support
Chat history management

Voice Features

Speech-to-Text (Hausa)
Text-to-Speech (Hausa)
Audio visualization
Natural voice synthesis

3. System Architecture

┌─────────────────┐
│   Frontend UI   │ (hausa_chatbot.html)
│   - HTML/JS     │
│   - Tailwind    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Backend API    │ (Flask Application)
│  - app.py       │
│  - Routes       │
└────────┬────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌────────┐ ┌──────────┐
│  GPT   │ │  Google  │
│  API   │ │  Cloud   │
└────────┘ └──────────┘

Components:
• Frontend: Single-page application with voice recording capabilities
• Backend: Flask API handling GPT and Google Cloud integrations
• GPT Model: Fine-tuned or prompted for Hausa language
• Google Cloud: Speech-to-Text and Text-to-Speech services

4. Installation & Setup

4.1 Prerequisites

Python 3.8 or higher
Node.js (for package management)
OpenAI API account and key
Google Cloud Platform account
Git

4.2 Backend Setup

# Clone the repository
git clone https://github.com/adab-tech/adab-tech.github.io.git
cd adab-tech.github.io/backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

4.3 Google Cloud Setup

Steps:

Go to Google Cloud Console
Create a new project or select existing
Enable Speech-to-Text API
Enable Text-to-Speech API
Create service account and download JSON credentials
Set GOOGLE_APPLICATION_CREDENTIALS in .env

4.4 OpenAI API Setup

# Get your API key from https://platform.openai.com
# Add to .env file:
OPENAI_API_KEY=sk-your-api-key-here

5. Data Preparation

5.1 Hausa Dataset Sources

HausaNLP: Open-source Hausa language datasets
JW300: Parallel corpus with Hausa translations
OPUS: Open parallel corpus collections
Custom data: Collect from Hausa news sites, literature, conversations

5.2 Data Format

Prepare data in conversation format:

{
  "messages": [
    {"role": "system", "content": "You are a helpful Hausa assistant."},
    {"role": "user", "content": "Sannu, ina kwana?"},
    {"role": "assistant", "content": "Lafiya lau. Na gode. Yaya kake?"}
  ]
}

5.3 Using the Preprocessor

python data_preprocessing.py

# Or use programmatically:
from data_preprocessing import HausaDataPreprocessor

preprocessor = HausaDataPreprocessor()
data = preprocessor.load_from_csv('hausa_data.csv')
preprocessor.validate_data(data)
preprocessor.save_for_finetuning(data, 'training.jsonl')

6. GPT Fine-Tuning

6.1 Prepare Training Data

Ensure your data is in JSONL format with at least 10 examples (recommended: 100+)

6.2 Upload Training Data

import openai

# Upload training file
with open("hausa_training.jsonl", "rb") as f:
    response = openai.File.create(
        file=f,
        purpose='fine-tune'
    )
    
file_id = response.id
print(f"File uploaded: {file_id}")

6.3 Create Fine-Tuning Job

# Create fine-tuning job
response = openai.FineTuningJob.create(
    training_file=file_id,
    model="gpt-3.5-turbo"
)

job_id = response.id
print(f"Fine-tuning job created: {job_id}")

# Check status
status = openai.FineTuningJob.retrieve(job_id)
print(f"Status: {status.status}")

Note:

Fine-tuning can take several hours to complete. Monitor the status regularly. Once complete, update the model ID in your backend configuration.

7. Speech API Integration

7.1 Speech-to-Text Configuration

from google.cloud import speech

client = speech.SpeechClient()

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="ha-NG",  # Hausa (Nigeria)
    alternative_language_codes=["en-US"],
    enable_automatic_punctuation=True
)

7.2 Text-to-Speech Configuration

from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

voice = texttospeech.VoiceSelectionParams(
    language_code="ha-NG",
    name="ha-NG-Standard-A"
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3,
    speaking_rate=0.9,
    pitch=0.0
)

Available Hausa Voices:

ha-NG-Standard-A (Female)
ha-NG-Wavenet-A (Female, higher quality)

8. Deployment

8.1 Local Deployment

# Start the backend server
cd backend
python app.py

# Server will run on http://localhost:5000
# Open hausa_chatbot.html in browser

8.2 Cloud Deployment (AWS)

# Using AWS Elastic Beanstalk
eb init -p python-3.8 hausa-chatbot
eb create hausa-chatbot-env
eb deploy

# Or using Docker
docker build -t hausa-chatbot .
docker run -p 5000:5000 hausa-chatbot

8.3 Frontend Deployment (GitHub Pages)

The frontend is already configured for GitHub Pages. Simply push to the main branch.

git add hausa_chatbot.html
git commit -m "Add Hausa chatbot"
git push origin main

# Access at: https://adab-tech.github.io/hausa_chatbot.html

9. Testing

9.1 Backend API Testing

# Test health endpoint
curl http://localhost:5000/api/health

# Test chat endpoint
curl -X POST http://localhost:5000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Sannu", "history": []}'

9.2 Testing Checklist

✓ Text input/output functionality
✓ Voice recording and transcription
✓ Text-to-speech playback
✓ Hausa language accuracy
✓ Response time and latency
✓ Error handling
✓ Cross-browser compatibility

10. Troubleshooting

Issue: API Key Not Working

Solution: Verify your API key is correctly set in .env and has proper permissions

Issue: Google Cloud Authentication Failed

Solution: Ensure GOOGLE_APPLICATION_CREDENTIALS points to valid JSON credentials file

Issue: Microphone Access Denied

Solution: Enable microphone permissions in browser settings. Use HTTPS for production.

Issue: CORS Errors

Solution: Ensure flask-cors is installed and properly configured in backend

Hausa AI Chatbot Documentation

Table of Contents