Design Guide
Xena SignalX Chat Based AI Service - Design
Overview
Xena is a chat-based query interface for platforms RM and Dart, supporting risk-based, view-based, and data-based queries using LLM-based reasoning, vector database search, and classification mechanisms.
- Risk-Based Query Answering: LLM-driven responses within due diligence, customizable across risk domains.
- View-Based Query Answering: Uses precomputed embeddings for structured responses.
- Data-Based Query Answering: Dynamic embeddings with vector search for RM & Dart platforms.
Core Functional Requirements
| Platform | Query Type | Storage |
|---|---|---|
| RM | View-Based | Milvus Lite (Temporary) |
| RM | Risk-Based | LLM Only |
| Dart | View-Based | Milvus Lite (Temporary) |
| Dart | Data-Based | Milvus Standalone (Persistent) |
Modules
- Query Classification: Routes queries into view, data, or risk categories for both platforms.
- LLM-Based Query Answering: Single risk-focused LLM, using domain-specific prompts.
- View-Based Query Handling: Fetches from Milvus Lite, calls LLM for structured response.
- Data-Based Query (Dart): Uses Milvus Standalone for persistent vector search accuracy.
- Efficient Searching: Metadata filtering for fast, relevant vector retrieval (Dart data queries).
Architecture Overview
- Modular microservices, shared across RM & Dart.
- Dockerized and orchestrated via Kubernetes.
- Milvus Lite for temporary embeddings; Milvus Standalone for persistent embeddings.
- WebSockets supported for real-time query responses.
Processing Flow
- User query enters system
- Classification: Categorize as view, data, or risk-based
- Processing:
- View: Fetch from Milvus Lite → Call LLM → Return response
- Data: Vector search in Milvus Standalone → Call LLM → Return response
- Risk: Direct LLM call → Domain-specific prompt → Return response
Deployment Strategy
- Microservices: Each query type (classification, view, data, risk) independently deployable
- API Gateway: Routes via REST or WebSockets
Low-Level Design (LLD)
Modules and Functions
-
Query Classification
- Keyword/LLM-based logic
- Determines query type (view/data/risk)
-
View-Based Query Processing
- Embedding fetches from Milvus Lite
- LLM generates responses from context and embeddings
-
Data-Based Query Processing (Dart Only)
- Vector search via Milvus Standalone
- LLM response generated using results and query
-
Risk-Based Query Processing
- Direct LLM response engineered for risk domain
-
Efficient Searching (Dart Data-Based)
- Metadata filtering for fast, relevant vector retrieval
High-Level Component Overview (v0.1.0)
- FastAPI backend: Handles requests, endpoints
- Milvus vector database: Embedding storage/retrieval
- SentenceTransformer model: Generates embeddings
- Authentication: API security
- Services: Business logic implementation
Data Flow
- Document ingestion
- Embedding generation
- Storage in Milvus
- Query processed to embedding
- Semantic search for similarity
- Response returned via API
API Modules
- API Server (
server.py): FastAPI endpoints/routes, middleware - Embeddings (
embeddings.py): MiniLM-L6-v2 for embeddings - Milvus Client (
milvusclient.py): Connects with DB, vector search - Document Processing (
loadview.py): Parses, stores docs - Authentication (
authutils.py): Middleware - Database (
dbservice.py): Data persistence - Business Logic: Query routing, suggestion, response handling
Deployment
Prerequisites
# Start Milvus Database
docker-compose up -d
# Install dependencies
poetry install
# Run FastAPI Server
uvicorn app.serverapp --host 0.0.0.0 --port 8000