Architecture

How VoiceA is technically built

VoiceA consists of five services that run as containers on your infrastructure: an ASR service (Whisper large-v3-turbo), a TTS service (Piper), an LLM service (Ollama plus fine-tuned models), a vector-retrieval service (Qdrant), and an operator API. A Bayesian fusion layer reconciles the signals from all services into a calibrated comprehension score. Every event is persisted in a SHA-256-chained audit log.

System overview

The components communicate via internal gRPC; externally VoiceA exposes only two endpoints — SIP trunk / WebRTC for the audio layer and a secured HTTPS interface for the operator UI. No service requires internet access; updates ship as signed OCI images on an authority-owned registry.

ASR & TTS

The ASR service uses Whisper large-v3-turbo, fine-tuned on German administrative vocabulary, Saxon, Austrian, and Alemannic dialect variants, and migration languages (Turkish, Arabic, Russian, French). The word error rate is below 6 per cent on typical citizen-office telephony in internal benchmarks. The TTS service uses Piper and delivers responses with less than 150 ms latency on commodity CPU hardware.

LLM & RAG

The LLM service runs on Ollama with a fine-tuned German-language 8-billion-parameter model. The retriever (Qdrant) indexes your authority's knowledge base — forms, deadline calculators, eligibility rules, internal work instructions — and semantically enriches each query. The LLM only generates answers grounded in at least one indexed document; otherwise the enquiry is handed off to a human specialist.

Integrations

VoiceA integrates through the following interfaces: SIP trunk for classical telephony (Asterisk, FreeSWITCH), WebRTC for browser- and app-based calls, REST for domain-system connections (ELAK, EVA, ePostfach). The operator UI is a progressive web app that also runs offline inside the authority network. Standards: OAuth 2.1 / OIDC for sign-in, SAML 2.0 for federated identity, audit-safe logging per BSI TR-03107.

Security posture

Every call is documented in a SHA-256-chained event log (ASR transcript, intent classification, handoff decision, operator action). The chain is append-only; checksums are mirrored hourly to a read-only volume. The entire session is encrypted (TLS 1.3 externally, mTLS within the cluster). Role model: operators, supervisors, data protection officers, system administration — each with minimal privileges.