Private AI

Private, in-tenant AI — your data never leaves

Transcription, sentiment, automated QA scoring, and call rating and ranking run on your own models inside your environment. No PHI or PII egress, no data sniffing, and nothing is ever used to train a model vendor's system.

Private AI · your models, your data

AI that never sees the outside world.

Transcription, sentiment, and automated QA run on models deployed inside your own environment. PHI and PII never leave your tenant and are never sent to a proprietary model provider — no data sniffing, no training on your conversations, no exposure.

Your data never leaves your boundary. Every AI step — speech-to-text, sentiment, QA scoring, redaction — executes on your own infrastructure. No recording, transcript, or prompt is ever transmitted to an external LLM for inference, logging, or training.

Private transcription

Speech-to-text with speaker diarization, running on open ASR models (Whisper-class) or your own — entirely inside your environment.

Sentiment & emotion

Per-turn and per-call sentiment scoring to power QA, escalation, and trend analysis — the audio never leaves your tenant.

Automated QA scoring

Score every interaction against your scorecards and rubrics automatically — consistent evaluation across 100% of calls, not a 2% sample.

Rating, ranking & prioritization

Auto-rank the archive to surface best and worst calls, coaching moments, and compliance risk — so reviewers see what matters first.

Automatic PII / PHI redaction

Detect and mask sensitive spans — card numbers, member IDs, health data — before storage or playback. Privacy by default.

Topic & intent detection

Classify interactions, detect intents and categories, and auto-tag the archive for faster, smarter search.

Summaries & coaching insights

Generate call summaries and agent-coaching feedback to shorten QA reviews and accelerate improvement.

Bring your own model

Run open-weight LLMs (Llama, Mistral, and more), your fine-tuned models, or your existing inference stack. No lock-in to a model vendor.

Proprietary cloud AI
  • Your PHI/PII is sent to a third-party model provider
  • Conversations may be retained or used to improve their models
  • Per-token data egress on every transcript and prompt
  • Compliance scope expands to every vendor in the path
Arkivo private AI
  • Models run inside your VPC, on-prem, or fully air-gapped
  • Recordings, transcripts, and prompts never leave your tenant
  • Never used to train anyone's model — yours or a provider's
  • One compliance boundary: your environment
Deploys alongside Arkivo in your cloud, on-prem, or air-gapped GPU-accelerated or CPU · scales with your volume
Inference architecture

Every step runs inside your boundary

Audio, transcripts, and prompts move only between services you operate — from object storage to the search index, without a single outbound call.

01

Audio in object storage

Recordings sit in your own bucket or container — the same tenant-owned storage Arkivo migrates and indexes into.

02

In-tenant ASR

Whisper-class, NVIDIA Parakeet, or your own speech model transcribes with diarization — inside your environment.

03

In-tenant LLM

Llama 3.x, Mistral, Qwen, or your fine-tuned model reads each transcript on infrastructure you control.

04

Enrichment

Sentiment, intent, redaction spans, QA scores, and summaries are generated per call — no transcript leaves the boundary.

05

Metadata & search index

Structured results land in the Arkivo index, making the whole archive searchable, filterable, and reportable.

Air-gapped path — zero outbound calls. The entire pipeline can run with no internet route at all. Every arrow above stays inside your network; nothing dials home for licensing, telemetry, or inference.

GPU and CPU paths both supported. Run on GPUs for maximum throughput, or on CPU-only nodes where accelerators aren't available — the same models, the same boundary, sized to your hardware.

Model menu & sizing

Pick the models that fit your hardware

Mix and match ASR and LLM families across CPU, single-GPU, or multi-GPU nodes — or drop in your own fine-tuned weights. Throughput figures are indicative and vary with audio length, hardware, and batching.

Model familyTaskHardwareThroughput (indicative)LanguagesAccuracy tier
Whisper large-v3ASRCPU / single-GPU~120 calls/hrMultilingual (99+)High
NVIDIA ParakeetASRSingle-GPU~600 calls/hrEnglish-focusedHighest (EN)
Llama 3.x 8BLLMCPU / single-GPU~400 calls/hrMultilingualStrong
Llama 3.x 70BLLMMulti-GPU~90 calls/hrMultilingualHighest
MistralLLMCPU / single-GPU~450 calls/hrMultilingual (EU)Strong
QwenLLMSingle-GPU~350 calls/hrMultilingual (CJK)Strong
Your fine-tuned modelLLMCPU / single-GPU / multi-GPUDepends on sizeYour domain & tonguesTuned to you

Throughput is indicative only — measured in calls per hour and dependent on average call duration, node specification, and concurrency. Bring your own model to set your own profile.

No token egress · no training on your data.

Audio, transcripts, and prompts never cross the tenant boundary, and your conversations are never used to train any vendor model — ours or a third party's. This is backed by a short, signed technical commitment and data-handling appendix you can attach to your contract and hand to your auditors.

Processing modes

Backfill the past, keep up with the present

The same in-tenant models run in two modes — a full historical backfill of the migrated archive, and continuous enrichment of new recordings as they arrive.

Batch

Backfill the whole archive

Run enrichment across 100% of the historical recordings you migrate — not a sampled slice — so every legacy call carries transcripts, sentiment, and QA scores.

  • Processes the full migrated corpus on your schedule
  • Scales across GPU or CPU workers to fit your window
  • Resumable, checkpointed jobs with live progress
Near-real-time

Enrich new syncs as they land

As fresh recordings sync from your connectors, they flow through the same in-tenant pipeline within minutes — keeping the index current without a manual rerun.

  • Triggered automatically on each new sync
  • Same models, same boundary as the batch path
  • New calls are searchable and scored shortly after capture

Pay for compute you own, not per-minute AI SaaS fees.

Running models in your own tenant turns AI from a metered line item into fixed infrastructure — so enriching a decade of recordings doesn't come with a usage-based bill.

Capex, not metered SaaS

Spend on GPUs or CPU capacity you own and amortize — no per-minute or per-token invoice that scales with every transcript.

Reuse idle capacity

Schedule batch enrichment on hardware you already run, filling spare cycles instead of renting someone else's inference.

Predictable at archive scale

Reprocessing millions of historical calls costs compute time, not a usage bill — so a full backfill never triggers a surprise overage.

Questions, answered

Private AI FAQ

Your cloud · Your keys · Your data

Own your recordings. Keep the experience.

See the control plane live in minutes, or talk to us about migrating off NICE or Genesys into the cloud you already trust. No rip-and-replace, no lost calls.

Launch the live app

No data migration required to evaluate · Your cloud, your keys, your data