Private, in-tenant AI — your data never leaves
Transcription, sentiment, automated QA scoring, and call rating and ranking run on your own models inside your environment. No PHI or PII egress, no data sniffing, and nothing is ever used to train a model vendor's system.
AI that never sees the outside world.
Transcription, sentiment, and automated QA run on models deployed inside your own environment. PHI and PII never leave your tenant and are never sent to a proprietary model provider — no data sniffing, no training on your conversations, no exposure.
Your data never leaves your boundary. Every AI step — speech-to-text, sentiment, QA scoring, redaction — executes on your own infrastructure. No recording, transcript, or prompt is ever transmitted to an external LLM for inference, logging, or training.
Private transcription
Speech-to-text with speaker diarization, running on open ASR models (Whisper-class) or your own — entirely inside your environment.
Sentiment & emotion
Per-turn and per-call sentiment scoring to power QA, escalation, and trend analysis — the audio never leaves your tenant.
Automated QA scoring
Score every interaction against your scorecards and rubrics automatically — consistent evaluation across 100% of calls, not a 2% sample.
Rating, ranking & prioritization
Auto-rank the archive to surface best and worst calls, coaching moments, and compliance risk — so reviewers see what matters first.
Automatic PII / PHI redaction
Detect and mask sensitive spans — card numbers, member IDs, health data — before storage or playback. Privacy by default.
Topic & intent detection
Classify interactions, detect intents and categories, and auto-tag the archive for faster, smarter search.
Summaries & coaching insights
Generate call summaries and agent-coaching feedback to shorten QA reviews and accelerate improvement.
Bring your own model
Run open-weight LLMs (Llama, Mistral, and more), your fine-tuned models, or your existing inference stack. No lock-in to a model vendor.
- Your PHI/PII is sent to a third-party model provider
- Conversations may be retained or used to improve their models
- Per-token data egress on every transcript and prompt
- Compliance scope expands to every vendor in the path
- Models run inside your VPC, on-prem, or fully air-gapped
- Recordings, transcripts, and prompts never leave your tenant
- Never used to train anyone's model — yours or a provider's
- One compliance boundary: your environment
Every step runs inside your boundary
Audio, transcripts, and prompts move only between services you operate — from object storage to the search index, without a single outbound call.
Air-gapped path — zero outbound calls. The entire pipeline can run with no internet route at all. Every arrow above stays inside your network; nothing dials home for licensing, telemetry, or inference.
GPU and CPU paths both supported. Run on GPUs for maximum throughput, or on CPU-only nodes where accelerators aren't available — the same models, the same boundary, sized to your hardware.
Pick the models that fit your hardware
Mix and match ASR and LLM families across CPU, single-GPU, or multi-GPU nodes — or drop in your own fine-tuned weights. Throughput figures are indicative and vary with audio length, hardware, and batching.
| Model family | Task | Hardware | Throughput (indicative) | Languages | Accuracy tier |
|---|---|---|---|---|---|
| Whisper large-v3 | ASR | CPU / single-GPU | ~120 calls/hr | Multilingual (99+) | High |
| NVIDIA Parakeet | ASR | Single-GPU | ~600 calls/hr | English-focused | Highest (EN) |
| Llama 3.x 8B | LLM | CPU / single-GPU | ~400 calls/hr | Multilingual | Strong |
| Llama 3.x 70B | LLM | Multi-GPU | ~90 calls/hr | Multilingual | Highest |
| Mistral | LLM | CPU / single-GPU | ~450 calls/hr | Multilingual (EU) | Strong |
| Qwen | LLM | Single-GPU | ~350 calls/hr | Multilingual (CJK) | Strong |
| Your fine-tuned model | LLM | CPU / single-GPU / multi-GPU | Depends on size | Your domain & tongues | Tuned to you |
Throughput is indicative only — measured in calls per hour and dependent on average call duration, node specification, and concurrency. Bring your own model to set your own profile.
No token egress · no training on your data.
Audio, transcripts, and prompts never cross the tenant boundary, and your conversations are never used to train any vendor model — ours or a third party's. This is backed by a short, signed technical commitment and data-handling appendix you can attach to your contract and hand to your auditors.
Backfill the past, keep up with the present
The same in-tenant models run in two modes — a full historical backfill of the migrated archive, and continuous enrichment of new recordings as they arrive.
Pay for compute you own, not per-minute AI SaaS fees.
Running models in your own tenant turns AI from a metered line item into fixed infrastructure — so enriching a decade of recordings doesn't come with a usage-based bill.
Capex, not metered SaaS
Spend on GPUs or CPU capacity you own and amortize — no per-minute or per-token invoice that scales with every transcript.
Reuse idle capacity
Schedule batch enrichment on hardware you already run, filling spare cycles instead of renting someone else's inference.
Predictable at archive scale
Reprocessing millions of historical calls costs compute time, not a usage bill — so a full backfill never triggers a surprise overage.
Private AI FAQ
Own your recordings. Keep the experience.
See the control plane live in minutes, or talk to us about migrating off NICE or Genesys into the cloud you already trust. No rip-and-replace, no lost calls.
No data migration required to evaluate · Your cloud, your keys, your data