How it works

Every voice command goes through a 5-step pipeline on the server. The browser receives live updates via Server-Sent Events (SSE) so the UI updates as each step completes rather than waiting for everything at once.

1
Whisper STT — Speech to Text Your audio recording is sent to OpenAI Whisper (whisper-1). Returns a text transcript. ~2s latency.
2
GPT-4o — Intent Parser The transcript is classified into a structured action (read_inbox, read_from, compose, reply, etc.) with parameters like sender, account, time range. ~1s latency.
3
Gmail API / Microsoft Graph — Email Fetch Emails are fetched from the relevant account(s) using the parsed parameters (account, sender filter, date range, unread/all). ~2–5s latency.
4
GPT-4o — Email Summarizer Raw email data is converted into natural spoken language. Highlights emails that need a reply if requested. ~2s latency.
5
TTS — Text to Speech The summary text is converted to audio. Either OpenAI TTS (~2s, best quality) or PocketTTS (~200ms, free local model). Audio is streamed back to the browser.

Cost per command

Approximate OpenAI API cost per voice command (using OpenAI TTS):

APIUsage per commandCost (USD)
Whisper STT~5 seconds of audio~$0.0005
GPT-4o intent~200 tokens~$0.001
GPT-4o summarise~800 tokens~$0.004
OpenAI TTS~400 characters~$0.006
Total~$0.01
Tip: Switch to PocketTTS in the app to eliminate the TTS cost (~60% saving). PocketTTS runs locally on the server at ~200ms and is free.

Voice commands

Speak naturally — the intent parser understands full sentences, not just keywords. Examples:

Reading emails

"Read my inbox"
"Check my emails"
"Any new emails?"
"Read my Ajishra email"
"Check my Emory inbox this week"
"Emails from John"
"Anything from ETS?"
"Important emails"
"Starred emails"
"All emails including read"

Time filters

"Emails today"
"Last 3 days"
"This week"
"Last week"
"Emails from Sarah this week"

Navigating

"Read the first one"
"Open number two"
"Next"
"Previous"
"Mark as read"

Replying & composing

"Reply to this"
"Write back and say I'll be there"
"Send an email to john@example.com about the meeting"
"Yes send it"
"Cancel"

Filters & toggles

The filter bar at the top of the app provides quick controls that apply to every voice command.

ControlWhat it does
Account pill tap Locks the next command to a specific mailbox (e.g. "Ajishra Email"). Tap Clear to search all accounts again.
Unread / All toggle Default is Unread. Switch to All to include already-read emails. If no unread emails are found, the app automatically falls back to all emails.
🔊 OpenAI / Pocket toggle Switches the TTS engine. OpenAI = best voice quality. Pocket = free local model (~200ms). Choice is remembered across sessions.

TTS engines

EngineQualityLatencyCostRequires internet
openai Excellent (natural) ~2s ~$0.006/cmd Yes
pocket Good (100M params) ~200ms Free No

Change the default engine in .env:

TTS_ENGINE=openai   # or: pocket

Override per-session using the toggle button in the app, or per-request via the X-TTS-Engine header.

Reconnecting accounts

If any account shows an amber Reconnect button, its token has expired or was never saved. Gmail and Outlook use different flows.

Gmail — web browser

Tap Reconnect next to the disconnected Gmail account. A new tab opens, redirects to Google sign-in, and returns automatically when done.

Gmail — iOS app

The same button works from the iOS app, but the reconnect URL is automatically routed through ASWebAuthenticationSession instead of WKWebView. See iOS OAuth.

Outlook — device code flow

Tap Reconnect on the Work Outlook account. A page opens showing a short code and a Microsoft URL. Open the URL, enter the code, sign in — the page auto-detects completion and shows a confirmation.

Gmail prerequisites: credentials.json must be Web Application type. Redirect URI https://gotenor.totesoft.com/api/email/gmail/callback must be registered in Google Cloud Console. APP_URL=https://gotenor.totesoft.com in .env.

POST /api/auth/register

POST Public — no auth

Create a new user account. Returns a JWT on success.

Request

POST /api/auth/register
Content-Type: application/json

{ "email": "you@example.com", "password": "yourpassword" }

Response — success (200)

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": { "id": 1, "email": "you@example.com", "plan": "free" }
}

Response — email already registered (400)

{ "detail": "Email already registered." }

POST /api/auth/login

POST Public — no auth

Sign in to an existing account. Returns a JWT.

Request

POST /api/auth/login
Content-Type: application/json

{ "email": "you@example.com", "password": "yourpassword" }

Response — success (200)

{ "token": "eyJhbGci...", "user": { "id": 1, "email": "you@example.com", "plan": "free" } }

Response — wrong credentials (401)

{ "detail": "Invalid email or password." }
Token usage: Include the JWT in every subsequent request as Authorization: Bearer <token>.

GET /api/auth/me

GET Requires Authorization header

Verify a stored token and fetch the current user's profile.

Request

GET /api/auth/me
Authorization: Bearer eyJhbGci...

Response

{ "id": 1, "email": "you@example.com", "plan": "free", "is_active": true }

GET /api/health

GET Public — no auth

Health check. Use this to verify the server is running.

curl https://gotenor.totesoft.com/api/health

Response

{ "status": "ok", "auth": "jwt" }

GET /api/status

GET Requires Authorization header

Returns the connection status of all configured email accounts.

Request

GET /api/status
Authorization: Bearer eyJhbGci...

Response

{
  "status": "ok",
  "accounts": [
    { "label": "Rojavaasam Gmail",   "connected": true,  "type": "gmail" },
    { "label": "Ajishra Email",      "connected": false, "type": "gmail" },
    { "label": "Work Outlook",       "connected": true,  "type": "outlook" }
  ],
  "total_accounts": 3,
  "all_connected": false
}

Accounts with "connected": false are shown with a Reconnect button in the app UI.

GET /api/email/accounts

GET Requires Authorization header

Extended account list including credential type and re-auth capability.

Response

{
  "accounts": [
    {
      "label": "Ajishra Email",
      "connected": false,
      "type": "gmail",
      "can_reauth": true,
      "credential_type": "web"
    }
  ]
}

GET /api/email/gmail/reauth

GET Public — browser redirect

Starts the Gmail OAuth flow for a given account. Redirects the browser to Google sign-in. On success, the token is saved server-side.

Query parameters

ParamRequiredDescription
labelYesAccount label matching one configured in .env
platformNoweb (default) or ios. When ios, the callback redirects to voicemail://oauth/callback so ASWebAuthenticationSession closes automatically.
Note: This endpoint is public because the browser navigates to it directly during the OAuth flow, before a JWT can be attached.

GET /api/email/outlook/reauth

GET Public — browser page

Starts the Outlook re-auth using MSAL device code flow. Returns an HTML page showing a short code and the Microsoft sign-in URL. A background task polls for completion; when the user signs in, the page updates automatically.

Flow

1 User taps Reconnect on the Outlook account pill
2 Backend calls MSAL initiate_device_flow() → gets user_code + verification_uri
3 HTML page shown with the code and a link to microsoft.com/devicelogin
4 Background thread polls Microsoft every few seconds via acquire_token_by_device_flow()
5 User opens the link, enters the code, signs in with Microsoft
6 Page JS polls /api/email/outlook/poll every 4 seconds
7 On success: token saved, page shows ✓ Connected ✓
Note: Public endpoint — no JWT required. No redirect URI registration needed in Azure AD (device code flow doesn't use one).

GET /api/email/outlook/poll

GET Public — polled by device code page

Returns the current state of a pending Outlook device code session. Called automatically by the page returned from /outlook/reauth.

Query parameter

ParamDescription
sessionSession ID returned from the /reauth page

Responses

{ "status": "pending" }                             // still waiting
{ "status": "connected", "label": "Work Outlook" }   // success
{ "status": "error",     "detail": "expired_token: ..." } // failed

POST /api/voice

POST Requires Authorization header

The core endpoint. Accepts an audio file (multipart/form-data) and streams Server-Sent Events (SSE) back as each pipeline step completes.

Request headers

HeaderRequiredValuesDescription
AuthorizationYesBearer <token>JWT from /api/auth/login
X-Account-LabelNopartial nameRestrict to one mailbox e.g. "Ajishra"
X-Unread-OnlyNotrue | falseDefault: true
X-TTS-EngineNoopenai | pocketOverride TTS engine for this request

Request body

POST /api/voice
Content-Type: multipart/form-data
Authorization: Bearer eyJhbGci...
X-TTS-Engine: pocket

audio=@recording.webm

SSE response stream

The response is a stream of text/event-stream events. Parse each line starting with data: as JSON.

EventFieldsWhen sent
transcripttextAfter Whisper transcribes (~2s)
processingtextStatus hint while emails are fetching
responsetextFull email summary text (~8s)
audiodata (base64), engineAudio bytes ready to play (~12s)
suggestionsitems[]Reply suggestions after reading a single email
errormessageIf any step fails
doneEnd of stream

Example SSE stream

data: {"event":"transcript","text":"check my Ajishra email this week"}

data: {"event":"processing","text":"Checking your emails in Ajishra (this week)..."}

data: {"event":"response","text":"You have 2 emails this week in Ajishra..."}

data: {"event":"audio","data":"SUQzBAAA...","engine":"openai"}

data: {"event":"done"}

Consuming SSE from any frontend

const res = await fetch("https://gotenor.totesoft.com/api/voice", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + token,
    "X-TTS-Engine":  "openai",
  },
  body: formData,   // FormData with field name "audio"
});

const reader  = res.body.getReader();
const decoder = new TextDecoder();
let   buffer  = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop();
  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const event = JSON.parse(line.slice(6));
    // handle: transcript, processing, response, audio, suggestions, error, done
  }
}

Environment variables

VariableDefaultDescription
OPENAI_API_KEYRequired. Your OpenAI API key.
OPENAI_TTS_VOICEnovaOpenAI TTS voice: alloy | echo | fable | onyx | nova | shimmer
TTS_ENGINEopenaiDefault TTS engine: openai | pocket
GMAIL_CREDENTIALS_1Path to Google OAuth Web Application credentials JSON (account 1)
GMAIL_LABEL_1Gmail Account 1Display name for account 1
GMAIL_TOKEN_1credentials/gmail_account1_token.jsonWhere the OAuth token is saved after reconnect
OUTLOOK_CLIENT_IDAzure AD app registration client ID
OUTLOOK_TENANT_IDAzure AD tenant ID
DATABASE_URLPostgreSQL connection: postgresql+asyncpg://user:pass@host/db
JWT_SECRET_KEYSecret for signing JWTs. Generate: python -c "import secrets; print(secrets.token_hex(32))"
APP_URLhttp://localhost:8000Public HTTPS URL. Used for OAuth redirect URIs and Stripe callbacks. Must be set correctly on the VM.
ADMIN_EMAIL(first user)Email with admin panel access. Defaults to user id=1.
STRIPE_SECRET_KEYOptional. Enables billing features.
WAKE_WORD_ENABLEDtrueEnable desktop wake word listener. Set false on GCP VM.
APP_HOST0.0.0.0Bind address for uvicorn
APP_PORT8000Port for uvicorn
INBOX_FETCH_LIMIT10Max emails fetched per inbox read command

Building an external frontend

Any app (React, Flutter, Swift, Android) can connect to the VoiceMail API. Follow this flow:

Step 1 — Register or log in

POST /api/auth/login
{ "email": "you@example.com", "password": "yourpassword" }
→ { "token": "eyJhbGci..." }

Store the JWT in secure storage. Attach it as Authorization: Bearer <token> on every request.

Step 2 — Check account status

GET /api/status
Authorization: Bearer eyJhbGci...

Step 3 — Send voice commands

Record audio → POST to /api/voice → parse the SSE stream as shown in the API reference above.

CORS

All origins are allowed (Access-Control-Allow-Origin: *). No additional CORS setup needed.

Audio formats: The API accepts any format Whisper supports — webm (Chrome/Android), m4a (Safari/iOS), wav, mp3. Send as multipart/form-data with field name audio.

iOS TestFlight wrapper

The ios-wrapper/ directory in the repository contains a minimal native iOS app that loads the VoiceMail PWA in a full-screen WKWebView. It can be submitted to TestFlight for beta distribution.

Files

FilePurpose
VoiceMailApp.swiftSwiftUI @main entry point
ContentView.swiftFull-screen WKWebView loading gotenor.totesoft.com
WebCoordinator.swiftNavigation delegate + Gmail OAuth interception via ASWebAuthenticationSession

Xcode setup (5 minutes)

File → New → Project → iOS → App (SwiftUI, Swift)
Delete generated ContentView.swift and VoiceMailApp.swift
Drag 3 Swift files from ios-wrapper/VoiceMail/ into the project
In Info.plist add URL Type — Identifier: voicemail, Schemes: voicemail
Set your Team in Signing & Capabilities
Product → Archive → Distribute → TestFlight ✓
No App Store review needed for TestFlight. You can distribute to yourself immediately after uploading.

ASWebAuthenticationSession (iOS OAuth)

Google blocks OAuth sign-in inside WKWebView as a security policy. ASWebAuthenticationSession is Apple's solution — it opens OAuth flows in a separate sandboxed Safari process that Google trusts.

How it works in VoiceMail

1 User taps Reconnect in the PWA
2 WKWebView tries to load /api/email/gmail/reauth?label=...
3 WebCoordinator cancels the navigation, appends ?platform=ios
4 ASWebAuthenticationSession opens the URL in a Safari sheet
5 User signs in on Google (trusted Safari, not WKWebView)
6 Google → backend callback → token saved → backend redirects to voicemail://oauth/callback
7 ASWebAuthenticationSession detects voicemail:// scheme → Safari sheet closes
8 App calls loadStatus() → account pills refresh automatically ✓

The registered redirect URI in Google Cloud Console remains unchanged: https://gotenor.totesoft.com/api/email/gmail/callback. The voicemail:// scheme is only used for the final app callback after the token is saved — Google never sees it.