VoiceMail — Documentation

How it works

Every voice command goes through a 5-step pipeline on the server. The browser receives live updates via Server-Sent Events (SSE) so the UI updates as each step completes rather than waiting for everything at once.

Whisper STT — Speech to Text Your audio recording is sent to OpenAI Whisper (whisper-1). Returns a text transcript. ~2s latency.

GPT-4o — Intent Parser The transcript is classified into a structured action (read_inbox, read_from, compose, reply, etc.) with parameters like sender, account, time range. ~1s latency.

Gmail API / Microsoft Graph — Email Fetch Emails are fetched from the relevant account(s) using the parsed parameters (account, sender filter, date range, unread/all). ~2–5s latency.

GPT-4o — Email Summarizer Raw email data is converted into natural spoken language. Highlights emails that need a reply if requested. ~2s latency.

TTS — Text to Speech The summary text is converted to audio. Either OpenAI TTS (~2s, best quality) or PocketTTS (~200ms, free local model). Audio is streamed back to the browser.

Cost per command

Approximate OpenAI API cost per voice command (using OpenAI TTS):

API	Usage per command	Cost (USD)
Whisper STT	~5 seconds of audio	~$0.0005
GPT-4o intent	~200 tokens	~$0.001
GPT-4o summarise	~800 tokens	~$0.004
OpenAI TTS	~400 characters	~$0.006
Total		~$0.01

Tip: Switch to PocketTTS in the app to eliminate the TTS cost (~60% saving). PocketTTS runs locally on the server at ~200ms and is free.

Voice commands

Speak naturally — the intent parser understands full sentences, not just keywords. Examples:

Reading emails

"Read my inbox"

"Check my emails"

"Any new emails?"

"Read my Ajishra email"

"Check my Emory inbox this week"

"Emails from John"

"Anything from ETS?"

"Important emails"

"Starred emails"

"All emails including read"

Time filters

"Emails today"

"Last 3 days"

"This week"

"Last week"

"Emails from Sarah this week"

Navigating

"Read the first one"

"Open number two"

"Next"

"Previous"

"Mark as read"

Replying & composing

"Reply to this"

"Write back and say I'll be there"

"Send an email to john@example.com about the meeting"

"Yes send it"

"Cancel"

Filters & toggles

The filter bar at the top of the app provides quick controls that apply to every voice command.

Control	What it does
Account pill tap	Locks the next command to a specific mailbox (e.g. "Ajishra Email"). Tap Clear to search all accounts again.
Unread / All toggle	Default is Unread. Switch to All to include already-read emails. If no unread emails are found, the app automatically falls back to all emails.
🔊 OpenAI / Pocket toggle	Switches the TTS engine. OpenAI = best voice quality. Pocket = free local model (~200ms). Choice is remembered across sessions.

TTS engines

Engine	Quality	Latency	Cost	Requires internet
`openai`	Excellent (natural)	~2s	~$0.006/cmd	Yes
`pocket`	Good (100M params)	~200ms	Free	No

Change the default engine in .env:

TTS_ENGINE=openai   # or: pocket

Override per-session using the toggle button in the app, or per-request via the X-TTS-Engine header.

Reconnecting accounts

If any account shows an amber Reconnect button, its token has expired or was never saved. Gmail and Outlook use different flows.

Gmail — web browser

Tap Reconnect next to the disconnected Gmail account. A new tab opens, redirects to Google sign-in, and returns automatically when done.

Gmail — iOS app

The same button works from the iOS app, but the reconnect URL is automatically routed through ASWebAuthenticationSession instead of WKWebView. See iOS OAuth.

Outlook — device code flow

Tap Reconnect on the Work Outlook account. A page opens showing a short code and a Microsoft URL. Open the URL, enter the code, sign in — the page auto-detects completion and shows a confirmation.

Gmail prerequisites: credentials.json must be Web Application type. Redirect URI https://gotenor.totesoft.com/api/email/gmail/callback must be registered in Google Cloud Console. APP_URL=https://gotenor.totesoft.com in .env.

POST /api/auth/register

POST Public — no auth

Create a new user account. Returns a JWT on success.

Request

POST /api/auth/register
Content-Type: application/json

{ "email": "you@example.com", "password": "yourpassword" }

Response — success (200)

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": { "id": 1, "email": "you@example.com", "plan": "free" }
}

Response — email already registered (400)

{ "detail": "Email already registered." }

POST /api/auth/login

POST Public — no auth

Request

POST /api/auth/login
Content-Type: application/json

{ "email": "you@example.com", "password": "yourpassword" }

Response — success (200)

{ "token": "eyJhbGci...", "user": { "id": 1, "email": "you@example.com", "plan": "free" } }

Response — wrong credentials (401)

{ "detail": "Invalid email or password." }

Token usage: Include the JWT in every subsequent request as Authorization: Bearer <token>.

GET /api/auth/me

GET Requires Authorization header

Verify a stored token and fetch the current user's profile.

Request

GET /api/auth/me
Authorization: Bearer eyJhbGci...

Response

{ "id": 1, "email": "you@example.com", "plan": "free", "is_active": true }

GET /api/health

GET Public — no auth

Health check. Use this to verify the server is running.

curl https://gotenor.totesoft.com/api/health

Response

{ "status": "ok", "auth": "jwt" }

GET /api/status

GET Requires Authorization header

Returns the connection status of all configured email accounts.

Request

GET /api/status
Authorization: Bearer eyJhbGci...

Response

{
  "status": "ok",
  "accounts": [
    { "label": "Rojavaasam Gmail",   "connected": true,  "type": "gmail" },
    { "label": "Ajishra Email",      "connected": false, "type": "gmail" },
    { "label": "Work Outlook",       "connected": true,  "type": "outlook" }
  ],
  "total_accounts": 3,
  "all_connected": false
}

Accounts with "connected": false are shown with a Reconnect button in the app UI.

GET /api/email/accounts

GET Requires Authorization header

Extended account list including credential type and re-auth capability.

Response

{
  "accounts": [
    {
      "label": "Ajishra Email",
      "connected": false,
      "type": "gmail",
      "can_reauth": true,
      "credential_type": "web"
    }
  ]
}

GET /api/email/gmail/reauth

GET Public — browser redirect

Starts the Gmail OAuth flow for a given account. Redirects the browser to Google sign-in. On success, the token is saved server-side.

Query parameters

Param	Required	Description
label	Yes	Account label matching one configured in `.env`
platform	No	`web` (default) or `ios`. When `ios`, the callback redirects to `voicemail://oauth/callback` so ASWebAuthenticationSession closes automatically.

Note: This endpoint is public because the browser navigates to it directly during the OAuth flow, before a JWT can be attached.

GET /api/email/outlook/reauth

GET Public — browser page

Starts the Outlook re-auth using MSAL device code flow. Returns an HTML page showing a short code and the Microsoft sign-in URL. A background task polls for completion; when the user signs in, the page updates automatically.

Flow

1 User taps Reconnect on the Outlook account pill

2 Backend calls MSAL initiate_device_flow() → gets user_code + verification_uri

3 HTML page shown with the code and a link to microsoft.com/devicelogin

4 Background thread polls Microsoft every few seconds via acquire_token_by_device_flow()

5 User opens the link, enters the code, signs in with Microsoft

6 Page JS polls /api/email/outlook/poll every 4 seconds

7 On success: token saved, page shows ✓ Connected ✓

Note: Public endpoint — no JWT required. No redirect URI registration needed in Azure AD (device code flow doesn't use one).

GET /api/email/outlook/poll

GET Public — polled by device code page

Returns the current state of a pending Outlook device code session. Called automatically by the page returned from /outlook/reauth.

Query parameter

Param	Description
session	Session ID returned from the `/reauth` page

Responses

{ "status": "pending" }                             // still waiting
{ "status": "connected", "label": "Work Outlook" }   // success
{ "status": "error",     "detail": "expired_token: ..." } // failed

POST /api/voice

POST Requires Authorization header

The core endpoint. Accepts an audio file (multipart/form-data) and streams Server-Sent Events (SSE) back as each pipeline step completes.

Request headers

Header	Required	Values	Description
Authorization	Yes	Bearer <token>	JWT from /api/auth/login
X-Account-Label	No	partial name	Restrict to one mailbox e.g. "Ajishra"
X-Unread-Only	No	true \| false	Default: true
X-TTS-Engine	No	openai \| pocket	Override TTS engine for this request

Request body

POST /api/voice
Content-Type: multipart/form-data
Authorization: Bearer eyJhbGci...
X-TTS-Engine: pocket

audio=@recording.webm

SSE response stream

The response is a stream of text/event-stream events. Parse each line starting with data: as JSON.

Event	Fields	When sent
`transcript`	text	After Whisper transcribes (~2s)
`processing`	text	Status hint while emails are fetching
`response`	text	Full email summary text (~8s)
`audio`	data (base64), engine	Audio bytes ready to play (~12s)
`suggestions`	items[]	Reply suggestions after reading a single email
`error`	message	If any step fails
`done`	—	End of stream

Example SSE stream

data: {"event":"transcript","text":"check my Ajishra email this week"}

data: {"event":"processing","text":"Checking your emails in Ajishra (this week)..."}

data: {"event":"response","text":"You have 2 emails this week in Ajishra..."}

data: {"event":"audio","data":"SUQzBAAA...","engine":"openai"}

data: {"event":"done"}

Consuming SSE from any frontend

const res = await fetch("https://gotenor.totesoft.com/api/voice", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + token,
    "X-TTS-Engine":  "openai",
  },
  body: formData,   // FormData with field name "audio"
});

const reader  = res.body.getReader();
const decoder = new TextDecoder();
let   buffer  = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop();
  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const event = JSON.parse(line.slice(6));
    // handle: transcript, processing, response, audio, suggestions, error, done
  }
}

Environment variables

Variable	Default	Description
OPENAI_API_KEY	—	Required. Your OpenAI API key.
OPENAI_TTS_VOICE	nova	OpenAI TTS voice: alloy \| echo \| fable \| onyx \| nova \| shimmer
TTS_ENGINE	openai	Default TTS engine: openai \| pocket
GMAIL_CREDENTIALS_1	—	Path to Google OAuth Web Application credentials JSON (account 1)
GMAIL_LABEL_1	Gmail Account 1	Display name for account 1
GMAIL_TOKEN_1	credentials/gmail_account1_token.json	Where the OAuth token is saved after reconnect
OUTLOOK_CLIENT_ID	—	Azure AD app registration client ID
OUTLOOK_TENANT_ID	—	Azure AD tenant ID
DATABASE_URL	—	PostgreSQL connection: postgresql+asyncpg://user:pass@host/db
JWT_SECRET_KEY	—	Secret for signing JWTs. Generate: `python -c "import secrets; print(secrets.token_hex(32))"`
APP_URL	http://localhost:8000	Public HTTPS URL. Used for OAuth redirect URIs and Stripe callbacks. Must be set correctly on the VM.
ADMIN_EMAIL	(first user)	Email with admin panel access. Defaults to user id=1.
STRIPE_SECRET_KEY	—	Optional. Enables billing features.
WAKE_WORD_ENABLED	true	Enable desktop wake word listener. Set false on GCP VM.
APP_HOST	0.0.0.0	Bind address for uvicorn
APP_PORT	8000	Port for uvicorn
INBOX_FETCH_LIMIT	10	Max emails fetched per inbox read command

Building an external frontend

Any app (React, Flutter, Swift, Android) can connect to the VoiceMail API. Follow this flow:

Step 1 — Register or log in

POST /api/auth/login
{ "email": "you@example.com", "password": "yourpassword" }
→ { "token": "eyJhbGci..." }

Store the JWT in secure storage. Attach it as Authorization: Bearer <token> on every request.

Step 2 — Check account status

GET /api/status
Authorization: Bearer eyJhbGci...

Step 3 — Send voice commands

Record audio → POST to /api/voice → parse the SSE stream as shown in the API reference above.

CORS

All origins are allowed (Access-Control-Allow-Origin: *). No additional CORS setup needed.

Audio formats: The API accepts any format Whisper supports — webm (Chrome/Android), m4a (Safari/iOS), wav, mp3. Send as multipart/form-data with field name audio.

iOS TestFlight wrapper

The ios-wrapper/ directory in the repository contains a minimal native iOS app that loads the VoiceMail PWA in a full-screen WKWebView. It can be submitted to TestFlight for beta distribution.

Files

File	Purpose
VoiceMailApp.swift	SwiftUI `@main` entry point
ContentView.swift	Full-screen WKWebView loading `gotenor.totesoft.com`
WebCoordinator.swift	Navigation delegate + Gmail OAuth interception via ASWebAuthenticationSession

Xcode setup (5 minutes)

→ File → New → Project → iOS → App (SwiftUI, Swift)

→ Delete generated ContentView.swift and VoiceMailApp.swift

→ Drag 3 Swift files from ios-wrapper/VoiceMail/ into the project

→ In Info.plist add URL Type — Identifier: voicemail, Schemes: voicemail

→ Set your Team in Signing & Capabilities

→ Product → Archive → Distribute → TestFlight ✓

No App Store review needed for TestFlight. You can distribute to yourself immediately after uploading.

ASWebAuthenticationSession (iOS OAuth)

Google blocks OAuth sign-in inside WKWebView as a security policy. ASWebAuthenticationSession is Apple's solution — it opens OAuth flows in a separate sandboxed Safari process that Google trusts.

How it works in VoiceMail

1 User taps Reconnect in the PWA

2 WKWebView tries to load /api/email/gmail/reauth?label=...

3 WebCoordinator cancels the navigation, appends ?platform=ios

4 ASWebAuthenticationSession opens the URL in a Safari sheet

5 User signs in on Google (trusted Safari, not WKWebView)

6 Google → backend callback → token saved → backend redirects to voicemail://oauth/callback

7 ASWebAuthenticationSession detects voicemail:// scheme → Safari sheet closes

8 App calls loadStatus() → account pills refresh automatically ✓

The registered redirect URI in Google Cloud Console remains unchanged: https://gotenor.totesoft.com/api/email/gmail/callback. The voicemail:// scheme is only used for the final app callback after the token is saved — Google never sees it.

🎙 VoiceMail — Documentation

How it works

Cost per command

Voice commands

Reading emails

Time filters

Navigating

Replying & composing

Filters & toggles

TTS engines

Reconnecting accounts

Gmail — web browser

Gmail — iOS app

Outlook — device code flow

POST /api/auth/register

Request

Response — success (200)

Response — email already registered (400)

POST /api/auth/login

Request

Response — success (200)

Response — wrong credentials (401)

GET /api/auth/me

Request

Response

GET /api/health

Response

GET /api/status

Request

Response

GET /api/email/accounts

Response

GET /api/email/gmail/reauth

Query parameters

GET /api/email/outlook/reauth

Flow

GET /api/email/outlook/poll

Query parameter

Responses

POST /api/voice

Request headers

Request body

SSE response stream

Example SSE stream

Consuming SSE from any frontend

Environment variables

Building an external frontend

Step 1 — Register or log in

Step 2 — Check account status

Step 3 — Send voice commands

CORS

iOS TestFlight wrapper

Files

Xcode setup (5 minutes)

ASWebAuthenticationSession (iOS OAuth)

How it works in VoiceMail