How it works
Every voice command goes through a 5-step pipeline on the server. The browser receives live updates via Server-Sent Events (SSE) so the UI updates as each step completes rather than waiting for everything at once.
Cost per command
Approximate OpenAI API cost per voice command (using OpenAI TTS):
| API | Usage per command | Cost (USD) |
|---|---|---|
| Whisper STT | ~5 seconds of audio | ~$0.0005 |
| GPT-4o intent | ~200 tokens | ~$0.001 |
| GPT-4o summarise | ~800 tokens | ~$0.004 |
| OpenAI TTS | ~400 characters | ~$0.006 |
| Total | ~$0.01 |
Voice commands
Speak naturally — the intent parser understands full sentences, not just keywords. Examples:
Reading emails
Time filters
Navigating
Replying & composing
Filters & toggles
The filter bar at the top of the app provides quick controls that apply to every voice command.
| Control | What it does |
|---|---|
| Account pill tap | Locks the next command to a specific mailbox (e.g. "Ajishra Email"). Tap Clear to search all accounts again. |
| Unread / All toggle | Default is Unread. Switch to All to include already-read emails. If no unread emails are found, the app automatically falls back to all emails. |
| 🔊 OpenAI / Pocket toggle | Switches the TTS engine. OpenAI = best voice quality. Pocket = free local model (~200ms). Choice is remembered across sessions. |
TTS engines
| Engine | Quality | Latency | Cost | Requires internet |
|---|---|---|---|---|
openai |
Excellent (natural) | ~2s | ~$0.006/cmd | Yes |
pocket |
Good (100M params) | ~200ms | Free | No |
Change the default engine in .env:
TTS_ENGINE=openai # or: pocket
Override per-session using the toggle button in the app, or per-request via the X-TTS-Engine header.
Reconnecting accounts
If any account shows an amber Reconnect button, its token has expired or was never saved. Gmail and Outlook use different flows.
Gmail — web browser
Tap Reconnect next to the disconnected Gmail account. A new tab opens, redirects to Google sign-in, and returns automatically when done.
Gmail — iOS app
The same button works from the iOS app, but the reconnect URL is automatically routed through ASWebAuthenticationSession instead of WKWebView. See iOS OAuth.
Outlook — device code flow
Tap Reconnect on the Work Outlook account. A page opens showing a short code and a Microsoft URL. Open the URL, enter the code, sign in — the page auto-detects completion and shows a confirmation.
credentials.json must be Web Application type. Redirect URI https://gotenor.totesoft.com/api/email/gmail/callback must be registered in Google Cloud Console. APP_URL=https://gotenor.totesoft.com in .env.
POST /api/auth/register
POST Public — no auth
Create a new user account. Returns a JWT on success.
Request
POST /api/auth/register
Content-Type: application/json
{ "email": "you@example.com", "password": "yourpassword" }
Response — success (200)
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"user": { "id": 1, "email": "you@example.com", "plan": "free" }
}
Response — email already registered (400)
{ "detail": "Email already registered." }
POST /api/auth/login
POST Public — no auth
Sign in to an existing account. Returns a JWT.
Request
POST /api/auth/login
Content-Type: application/json
{ "email": "you@example.com", "password": "yourpassword" }
Response — success (200)
{ "token": "eyJhbGci...", "user": { "id": 1, "email": "you@example.com", "plan": "free" } }
Response — wrong credentials (401)
{ "detail": "Invalid email or password." }
Authorization: Bearer <token>.
GET /api/auth/me
GET Requires Authorization header
Verify a stored token and fetch the current user's profile.
Request
GET /api/auth/me Authorization: Bearer eyJhbGci...
Response
{ "id": 1, "email": "you@example.com", "plan": "free", "is_active": true }
GET /api/health
GET Public — no auth
Health check. Use this to verify the server is running.
curl https://gotenor.totesoft.com/api/health
Response
{ "status": "ok", "auth": "jwt" }
GET /api/status
GET Requires Authorization header
Returns the connection status of all configured email accounts.
Request
GET /api/status Authorization: Bearer eyJhbGci...
Response
{
"status": "ok",
"accounts": [
{ "label": "Rojavaasam Gmail", "connected": true, "type": "gmail" },
{ "label": "Ajishra Email", "connected": false, "type": "gmail" },
{ "label": "Work Outlook", "connected": true, "type": "outlook" }
],
"total_accounts": 3,
"all_connected": false
}
Accounts with "connected": false are shown with a Reconnect button in the app UI.
GET /api/email/accounts
GET Requires Authorization header
Extended account list including credential type and re-auth capability.
Response
{
"accounts": [
{
"label": "Ajishra Email",
"connected": false,
"type": "gmail",
"can_reauth": true,
"credential_type": "web"
}
]
}
GET /api/email/gmail/reauth
GET Public — browser redirect
Starts the Gmail OAuth flow for a given account. Redirects the browser to Google sign-in. On success, the token is saved server-side.
Query parameters
| Param | Required | Description |
|---|---|---|
| label | Yes | Account label matching one configured in .env |
| platform | No | web (default) or ios. When ios, the callback redirects to voicemail://oauth/callback so ASWebAuthenticationSession closes automatically. |
GET /api/email/outlook/reauth
GET Public — browser page
Starts the Outlook re-auth using MSAL device code flow. Returns an HTML page showing a short code and the Microsoft sign-in URL. A background task polls for completion; when the user signs in, the page updates automatically.
Flow
initiate_device_flow() → gets user_code + verification_urimicrosoft.com/deviceloginacquire_token_by_device_flow()/api/email/outlook/poll every 4 secondsGET /api/email/outlook/poll
GET Public — polled by device code page
Returns the current state of a pending Outlook device code session. Called automatically by the page returned from /outlook/reauth.
Query parameter
| Param | Description |
|---|---|
| session | Session ID returned from the /reauth page |
Responses
{ "status": "pending" } // still waiting
{ "status": "connected", "label": "Work Outlook" } // success
{ "status": "error", "detail": "expired_token: ..." } // failed
POST /api/voice
POST Requires Authorization header
The core endpoint. Accepts an audio file (multipart/form-data) and streams Server-Sent Events (SSE) back as each pipeline step completes.
Request headers
| Header | Required | Values | Description |
|---|---|---|---|
| Authorization | Yes | Bearer <token> | JWT from /api/auth/login |
| X-Account-Label | No | partial name | Restrict to one mailbox e.g. "Ajishra" |
| X-Unread-Only | No | true | false | Default: true |
| X-TTS-Engine | No | openai | pocket | Override TTS engine for this request |
Request body
POST /api/voice Content-Type: multipart/form-data Authorization: Bearer eyJhbGci... X-TTS-Engine: pocket audio=@recording.webm
SSE response stream
The response is a stream of text/event-stream events. Parse each line starting with data: as JSON.
| Event | Fields | When sent |
|---|---|---|
transcript | text | After Whisper transcribes (~2s) |
processing | text | Status hint while emails are fetching |
response | text | Full email summary text (~8s) |
audio | data (base64), engine | Audio bytes ready to play (~12s) |
suggestions | items[] | Reply suggestions after reading a single email |
error | message | If any step fails |
done | — | End of stream |
Example SSE stream
data: {"event":"transcript","text":"check my Ajishra email this week"}
data: {"event":"processing","text":"Checking your emails in Ajishra (this week)..."}
data: {"event":"response","text":"You have 2 emails this week in Ajishra..."}
data: {"event":"audio","data":"SUQzBAAA...","engine":"openai"}
data: {"event":"done"}
Consuming SSE from any frontend
const res = await fetch("https://gotenor.totesoft.com/api/voice", {
method: "POST",
headers: {
"Authorization": "Bearer " + token,
"X-TTS-Engine": "openai",
},
body: formData, // FormData with field name "audio"
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop();
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const event = JSON.parse(line.slice(6));
// handle: transcript, processing, response, audio, suggestions, error, done
}
}
Environment variables
| Variable | Default | Description |
|---|---|---|
| OPENAI_API_KEY | — | Required. Your OpenAI API key. |
| OPENAI_TTS_VOICE | nova | OpenAI TTS voice: alloy | echo | fable | onyx | nova | shimmer |
| TTS_ENGINE | openai | Default TTS engine: openai | pocket |
| GMAIL_CREDENTIALS_1 | — | Path to Google OAuth Web Application credentials JSON (account 1) |
| GMAIL_LABEL_1 | Gmail Account 1 | Display name for account 1 |
| GMAIL_TOKEN_1 | credentials/gmail_account1_token.json | Where the OAuth token is saved after reconnect |
| OUTLOOK_CLIENT_ID | — | Azure AD app registration client ID |
| OUTLOOK_TENANT_ID | — | Azure AD tenant ID |
| DATABASE_URL | — | PostgreSQL connection: postgresql+asyncpg://user:pass@host/db |
| JWT_SECRET_KEY | — | Secret for signing JWTs. Generate: python -c "import secrets; print(secrets.token_hex(32))" |
| APP_URL | http://localhost:8000 | Public HTTPS URL. Used for OAuth redirect URIs and Stripe callbacks. Must be set correctly on the VM. |
| ADMIN_EMAIL | (first user) | Email with admin panel access. Defaults to user id=1. |
| STRIPE_SECRET_KEY | — | Optional. Enables billing features. |
| WAKE_WORD_ENABLED | true | Enable desktop wake word listener. Set false on GCP VM. |
| APP_HOST | 0.0.0.0 | Bind address for uvicorn |
| APP_PORT | 8000 | Port for uvicorn |
| INBOX_FETCH_LIMIT | 10 | Max emails fetched per inbox read command |
Building an external frontend
Any app (React, Flutter, Swift, Android) can connect to the VoiceMail API. Follow this flow:
Step 1 — Register or log in
POST /api/auth/login
{ "email": "you@example.com", "password": "yourpassword" }
→ { "token": "eyJhbGci..." }
Store the JWT in secure storage. Attach it as Authorization: Bearer <token> on every request.
Step 2 — Check account status
GET /api/status Authorization: Bearer eyJhbGci...
Step 3 — Send voice commands
Record audio → POST to /api/voice → parse the SSE stream as shown in the API reference above.
CORS
All origins are allowed (Access-Control-Allow-Origin: *). No additional CORS setup needed.
webm (Chrome/Android), m4a (Safari/iOS), wav, mp3. Send as multipart/form-data with field name audio.
iOS TestFlight wrapper
The ios-wrapper/ directory in the repository contains a minimal native iOS app that loads the VoiceMail PWA in a full-screen WKWebView. It can be submitted to TestFlight for beta distribution.
Files
| File | Purpose |
|---|---|
| VoiceMailApp.swift | SwiftUI @main entry point |
| ContentView.swift | Full-screen WKWebView loading gotenor.totesoft.com |
| WebCoordinator.swift | Navigation delegate + Gmail OAuth interception via ASWebAuthenticationSession |
Xcode setup (5 minutes)
ios-wrapper/VoiceMail/ into the projectvoicemail, Schemes: voicemailASWebAuthenticationSession (iOS OAuth)
Google blocks OAuth sign-in inside WKWebView as a security policy. ASWebAuthenticationSession is Apple's solution — it opens OAuth flows in a separate sandboxed Safari process that Google trusts.
How it works in VoiceMail
/api/email/gmail/reauth?label=...WebCoordinator cancels the navigation, appends ?platform=iosASWebAuthenticationSession opens the URL in a Safari sheetvoicemail://oauth/callbackvoicemail:// scheme → Safari sheet closesloadStatus() → account pills refresh automatically ✓The registered redirect URI in Google Cloud Console remains unchanged: https://gotenor.totesoft.com/api/email/gmail/callback. The voicemail:// scheme is only used for the final app callback after the token is saved — Google never sees it.