IVR Telephone Voting System - Technical Design Document
1. Executive Summary
This document outlines the technical design for an IVR (Interactive Voice Response) telephone voting system for the Sequent Voting Platform. The system will be deployed in Canada and will allow voters without internet access to participate in elections via telephone.
Key Design Decisions
- Lambda Runtime: Rust (consistent with existing codebase)
- IVR Provider: Amazon Connect with Contact Flows
- State Management: DynamoDB for ephemeral call session state; phone-number routing in a versioned S3 file (§6.2)
- Authentication: Keycloak OIDC Direct Grant (ROPC) with configurable multi-factor authentication
- Election Config: Published ballot publication on public S3 (same data as voting portal)
- Election Status: Hasura GraphQL for real-time status checks
- Vote Casting: Harvest API for vote submission
2. Architecture Overview
2.1 Component Responsibilities
| Component | Responsibility |
|---|---|
| Amazon Connect | Receive calls, play prompts via Polly, capture DTMF input, route to Lambda |
| IVR Lambda | State machine logic, prompt generation, input validation, API orchestration |
| DynamoDB | Ephemeral call session state, keyed by contact_id; read-write with conditional-write guards (§4.1) |
| S3 (versioned, private) | Phone number → cluster/environment/tenant/event routing file (§6.2). Read-only from the Lambda; written only by gitops CI |
| Public S3 | Published ballot publication: election structure, ballot styles, contests, candidates, IVR flow config, prompts, IVR-only spoken-text overrides, public keys (same data used by voting portal in preview mode) |
| Keycloak | Voter authentication via OIDC Direct Grant (ROPC) with configurable auth factors, JWT issuance |
| Hasura | Real-time election event status query, plus the voter's already-cast-ballot listing for re-entry after a dropped call (both use the same GraphQL surface and row-level permissions as the voting portal) |
| Harvest API | Cast votes via /insert-cast-vote |
3. Config-Driven Flow Engine
3.0 Design Principle
The IVR call flow is not a hardcoded state machine. It is a configurable pipeline of phases defined in the election event's presentation.ivr.flow configuration and published to S3. The Lambda ships with execution engines for a finite set of phase types, but which phases run, in what order, and with what settings is entirely configuration.
This means:
- Adding a declaration step, receipt readback, or phone blacklist check = config change, not code change
- Removing phases for a simpler deployment = config change
- Reordering phases = config change
- Adding a new phase type (e.g., ranked-choice input) = code change (new execution engine)
3.1 Flow Configuration
The flow is an ordered array of phases stored in presentation.ivr.flow:
{
"ivr": {
"flow": [
{ "phase": "blacklist_check" },
{ "phase": "language_select" },
{ "phase": "announcement", "name": "welcome", "prompt_key": "greeting" },
{ "phase": "auth" },
{ "phase": "eligibility_check" },
{ "phase": "announcement", "name": "declaration", "prompt_key": "declaration_text", "accept_key": "2" },
{ "phase": "announcement", "name": "pre_voting_statement", "prompt_key": "pre_voting_statement" },
{ "phase": "ballot_loop", "receipt_format": "phonetic_hex_4" },
{ "phase": "goodbye" }
]
}
}
A simpler deployment (voter ID + PIN, no frills):
{
"ivr": {
"flow": [
{ "phase": "language_select" },
{ "phase": "announcement", "name": "welcome", "prompt_key": "greeting" },
{ "phase": "auth" },
{ "phase": "ballot_loop" },
{ "phase": "goodbye" }
]
}
}
Same Lambda code, different config.
3.2 Phase Types
Each phase type has an execution engine in the Lambda. The engine handles prompting, input collection, validation, and API calls for that phase.
| Phase Type | Description | Input | Behavior |
|---|---|---|---|
announcement | Play a prompt, optionally wait for an acceptance key | None (auto-advance) or DTMF if accept_key set | Play the configured prompt_key. If accept_key is set, wait for that DTMF and retry on invalid input up to max_retries. If not, auto-advance. Used for greeting, declaration, pre-voting statement, and any other play-and-continue or play-and-confirm prompts — one engine, different config. The executor only considers accept_key matches; * never reaches it because the dispatcher replays the last prompt before dispatch (§3.5.3, §3.4) |
language_select | Language selection menu | DTMF if more than 1 enabled language | If language_conf.enabled_language_codes contains exactly 1 language, set it automatically and advance without prompting. Otherwise collect DTMF (1=English, 2=French, etc.), set session language, advance |
blacklist_check | Check caller phone against blacklist | None (auto-advance) | Query Hasura (see §6.3) for a blacklist entry matching the caller phone number; if present, play blacklist_message and disconnect. Because this phase runs before language selection, the message should be authored to work before the caller has chosen a language, typically by making it bilingual |
auth | Collect credentials, authenticate with Keycloak | DTMF per step | Iterate through auth steps discovered via Keycloak's /realms/\{realm\}/ivr-config endpoint (see §5.1), submit to Keycloak ROPC. On failure, retry up to limit. (OTP over IVR is a possible future extension — see §5.1.4.) |
eligibility_check | Validate voter eligibility and election status | None (auto-advance) | Play eligibility_check prompt. Check voter eligibility via API; if ineligible, play not_eligible and disconnect. Also query Hasura for telephone_voting_status (see §5.2); if not OPEN, play election_closed and disconnect |
ballot_loop | Per-election voting cycle: select → confirm → submit → receipt | DTMF | The inner voting loop (see 3.3). For each election: vote all contests, read back summary, confirm, encrypt and submit ballot via Harvest API, read a ballot locator derived from the first 4 hex characters of ballot_id using phonetic spelling (a3f2 → "alpha three foxtrot two"). Then advance to next election or finish. All behavior driven by published election/contest data |
goodbye | Farewell message, disconnect | None (disconnect) | Play goodbye prompt, disconnect |
Note on the announcement phase. Three previously-separate phase types (welcome, declaration, pre_voting_statement) are all the same pattern: play a prompt, optionally wait for a key, advance. Collapsing them into one engine saves three execution paths, three test surfaces, and three config schemas. Each instance in the flow carries a name field so logs and metrics remain distinguishable (name: "welcome", name: "declaration", etc.).
Overall Phase Flow
The following diagram shows the complete end-to-end IVR call flow through all configured phases. Each box corresponds to a phase type from the table above. Diamond nodes represent phases where the call may terminate early.
Per-Election Submission Cycle (inside ballot_loop)
After all contests in one election are voted, the ballot loop enters the per-election submission sub-phases: ElectionSummary → ElectionSubmit → ElectionReceipt. Only after the ballot for the current election is submitted does the voter proceed to the next election or finish.
3.3 Ballot Loop (Inner Flow)
The ballot_loop phase is the most complex. Rather than implementing it as a single monolith, it is decomposed into sub-phases — each one a small, testable unit. The outer ballot_loop phase engine advances through sub-phases like a mini flow engine within the main flow.
All behavior is driven by the published election/contest data — the same structures the voting portal reads. The IVR Lambda honors the same config fields:
3.3.1 Config Fields Consumed by the Ballot Loop
| Config Field | Source | IVR Behavior |
|---|---|---|
skip_election_list | ElectionEventPresentation | If true, only 1 election, and that election is still selectable for this voter (not already cast with num_allowed_revotes = 0 — see §9.3): skip the election-selection sub-phase and go straight into that election's language check / intro / contest loop. If the single election is not selectable at ballot-loop entry (typically a re-entry after a prior submission), the skip is not applied — ElectionSelect runs so the voter hears the "already voted" announcement and can exit via 0 instead of being dropped into a ballot loop for a closed election. Same voting portal behavior when the election is selectable |
elections_order | ElectionEventPresentation | Sort elections before presenting: alphabetical (by alias/name), custom (by sort_order), random (shuffled once at session init) |
contests_order | ElectionPresentation | Sort contests within an election: alphabetical, custom, random |
candidates_order | ContestPresentation | Sort candidates within a contest: alphabetical, custom, random. Determines DTMF assignment order |
blank_vote_policy | ContestPresentation | allowed: offer blank ballot confirmation. warn/warn-only-in-review: play warning then allow. not-allowed: require at least one selection |
under_vote_policy | ContestPresentation | allowed: accept silently. warn/warn-and-alert: play warning before confirming. warn-only-in-review: warn during summary only |
language_conf | ElectionPresentation | If the election's enabled/default language differs from the session language, offer a per-ballot language switch. If exactly 1 language is enabled for the election, select it automatically without prompting |
min_votes / max_votes | Contest | Enforce selection count. max_votes=1 → stop after 1 selection. min_votes>0 + blank_vote_policy=not_allowed → force selection |
is_explicit_invalid | CandidatePresentation | Excluded from the numbered DTMF list (IVR has no "invalid vote" affordance — invalid ballots cannot be cast via phone by design) |
is_explicit_blank | CandidatePresentation | Excluded from the numbered DTMF list, but reachable through the reserved 0 key when the voter wants to cast blank for the contest. See §3.3.5 for the 0-key decision tree (explicit-blank selection vs. implicit blank via min_votes = 0 vs. rejection) |
3.3.2 Ballot Loop Sub-Phases
The ballot loop is a nested state machine with three levels: election → contest → candidate selection. After all contests in an election are voted, the voter reviews, confirms, and submits the ballot for that election before moving to the next one. Each level has its own sub-phases:
3.3.3 Sub-Phase Descriptions
| Sub-Phase | Input | Behavior |
|---|---|---|
ElectionSelect | DTMF (election index, or 0 = exit ballot loop) | Present sorted elections (by elections_order) with each election annotated as either "already voted" or selectable, based on the voter's cast-vote history read through CastVoteHistoryPort (§3.5.2, §9.3). Already-voted elections are announced but not selectable when num_allowed_revotes = 0. Single-digit if ≤9, multi-digit otherwise. Pressing 0 exits the ballot loop and advances to the next outer phase (typically goodbye) — the escape hatch for voters whose elections are all already voted or not currently open. Skipped at entry only if skip_election_list=true, only 1 election, and that election is selectable; otherwise ElectionSelect runs so the voter can see the state and exit cleanly (§3.3.1, §9.3) |
LanguageSwitch | DTMF (1=keep, 2=switch) if multiple languages are available | Offer only if the election's language_conf differs from the current effective_language() (§3.3.4). If the election exposes exactly 1 enabled language, switch automatically without prompting. Scope — per election by construction: LanguageSwitch writes to BallotLoopState.election_language_override, not to session.language; the override is read via effective_language() = override.unwrap_or(session.language) and is cleared automatically by advance_to_election (§3.3.6) on the next election-boundary transition. So if election A is bilingual and the voter switches to French on A, the override is dropped the moment the loop advances to election B, and effective_language() falls back to the event-level session.language; B's own LanguageSwitch then decides independently. Runs before ElectionIntro so the intro is read in the correct language. Invariant: an election's language_conf.enabled_language_codes is always a subset of the election event's; additionally an election may override the default_language_code, so "differs from current effective_language()" means either the effective language is not in the election's enabled set, or the election's default differs from the current effective language. Both cases trigger the offer; otherwise skip |
ElectionIntro | None (auto-advance) | Play election_intro prompt with \{election_name\}, announce contest count. Rendered in effective_language() (§3.3.4), so it picks up an override from the preceding LanguageSwitch automatically and falls back to session.language otherwise |
ContestIntro | None (auto-advance) or DTMF to repeat | Play contest_intro with \{contest_name\}, \{max_votes\}, \{min_votes\}. Explain rules: "Select up to {max_votes} candidates" |
CandidateSelect | DTMF per candidate | Present only unselected candidates sorted by candidates_order. Single-digit (1-9) or multi-digit (01-99#) based on remaining count. Accumulate selections until max_votes is reached or the voter signals done with # (the Connect terminator). 0 means "skip/abstain" from this contest — never "end multi-select". Its interaction with pending_selections is fully specified in §3.3.5 and matches the voting-portal behavior. Already-selected candidates are omitted from the list (DTMF numbers are reassigned to remaining candidates) |
SelectionCheck | DTMF (confirm/restart) | Validate selections against min_votes/max_votes. Apply blank_vote_policy: if no selections and allowed→blank_ballot_confirm; if not_allowed→re-prompt. Apply under_vote_policy: if under minimum and warn→play warning then confirm |
VoteConfirm | DTMF (1=confirm, 2=change) | Read back selected candidates. "You selected {candidate_name} for {contest_name}. Press 1 to confirm, 2 to change your selection" |
ElectionSummary | DTMF (00# = submit, NN# = edit contest N) | Read back all selections for the current election, numbering each contest. "For contest 1, {contest_name}: you selected {candidate_name}. For contest 2, …" Press 00# to submit this election's ballot, or press a contest number followed by # to edit that contest's selection. Editing a contest goes through enter_contest_edit (§3.3.4), which atomically clears the prior votes[contest_id], clears pending_selections, and marks the edit target, then re-enters CandidateSelect for that contest only — afterwards returns directly to ElectionSummary (not to the next contest). This matters for max_votes > 1: the voter re-makes all selections for that contest; no pre-edit selections carry over. Note: summary is its own explicit confirmation — there is no separate ElectionConfirm step before submission. The summary uniformly uses multi-digit input regardless of contest count — contest indices always take the form 01#–NN#, and 00# is the unambiguous submit code (contest numbering starts at 1, so 00 cannot collide) |
ElectionSubmit | None (auto-advance) | Refresh access token if needed, encrypt ballot with election public keys, POST /insert-cast-vote with election_id. On success → play vote_success, advance to ElectionReceipt. On per-election rejection from Harvest (revote limit reached, channel closed, etc. — see §5.4 for the full variant list) → play the matching error prompt, advance to next election. On fatal error (timeout, session expired) → disconnect |
ElectionReceipt | DTMF (*=repeat) | Read a ballot locator derived from the first 4 hex characters of ballot_id, rendered phonetically (a3f2 → "alpha three foxtrot two"). "Your ballot locator for {election_name} is {confirmation_number}. Press star to repeat." Skipped if receipt_format is not configured. Portal dependency: the voting portal ballot locator lookup must be scoped to the authenticated voter and current election, so uniqueness only needs to hold within that smaller set |
3.3.4 BallotLoopState (Session Cursor)
The ballot loop's position is tracked in PhaseState::BallotLoop, which acts as a cursor into the nested election→contest→sub-phase structure:
The ballot-loop cursor carries:
- Position — current election index, current contest index, and the current sub-phase (typed enum — see the sub-phase list in §3.3.2; a typed enum gives the dispatcher exhaustive coverage).
- Sorted ID snapshots — sorted election IDs (computed once on entry using
elections_order), sorted contest IDs for the current election (refreshed on election change), sorted candidate IDs for the current contest (refreshed on contest change). The candidate sort stays stable for the whole contest;CandidateSelectjust skips already-selected IDs when reading the list — the underlying order and DTMF mapping do not change. - Pending selections — an accumulator used by multi-selection contests (
max_votes > 1). election_list_skipped— records whetherElectionSelectwas bypassed viaskip_election_list;VoteConfirm/ElectionSummaryconsult this to decide whether to offer navigation back to the election list.edit_target_contest: Option<usize>— set when the voter enters a contest viaElectionSummary"edit contest N". When present,VoteConfirmreturns toElectionSummaryinstead of advancing to the next contest, and clears the field.election_language_override: Option<Language>— scoped override for the current election frame, set by the innerLanguageSwitchsub-phase (or auto-set when the election exposes exactly one enabled language that differs fromsession.language). Read path: every prompt lookup inside the ballot loop goes througheffective_language() = election_language_override.unwrap_or(session.language)— ballot-loop sub-phases never readsession.languagedirectly. Write path:session.languageis the event-level choice and is never mutated byLanguageSwitch; only the override is written. Reset: the override is cleared as part of the singleadvance_to_election(state, next_index)helper that also refreshes the sorted contest IDs and zeroes the contest cursor on an election-boundary transition — see §3.3.6. This makes the §3.3.3 promise ("switch affects prompts for this election only") true by construction: when the loop moves to election B,effective_language()naturally falls back tosession.language, and election B's ownLanguageSwitchthen decides whether to set a new override.
Edit-entry invariant. Every transition from ElectionSummary into CandidateSelect for editing contest N MUST atomically (a) remove the prior votes[contest_id] entry for that contest, (b) clear pending_selections, and (c) set edit_target_contest = Some(N). This is especially important for max_votes > 1 contests, where a forgotten reset would let pre-edit selections silently merge with new ones — the voter hears the edit prompt, makes fewer selections than before, and the ballot ends up with a union of the two sets instead of only the new one.
The invariant is enforced by a single helper — enter_contest_edit(state: &mut BallotLoopState, contest_index: usize) — that owns all three mutations. No other code path may construct the edit transition by mutating these fields individually. The sub-phase dispatcher calls this helper on every NN# branch out of ElectionSummary; no other caller should exist. A unit test asserts that after enter_contest_edit(_, N), all three post-conditions hold, so the first forgetful refactor that open-codes the transition fails the test before it reaches review.
3.3.5 Candidate Selection Detail
Candidate presentation follows the same ordering as the voting portal (candidates_order), then assigns DTMF mappings. The rule is simple: if the contest has ≤ 9 candidates, each gets a single-digit code 1–9; if there are more, all candidates get zero-padded two-digit codes (01, 02, … 99). The choice is per-contest, not global — a short contest with 5 candidates keeps the fast single-digit UX even when the next contest has 20. 0 is never a candidate code — it is reserved for "skip/abstain" — and * is never a candidate code — it is reserved for "repeat instructions". See §3.4 for the full reserved-key table.
Candidates flagged is_explicit_invalid or is_explicit_blank in CandidatePresentation are excluded from the numbered DTMF list — no single-digit or NN# code is assigned to them, so a voter can never select them by candidate number. They are still present in the underlying candidates array, and the explicit-blank candidate (if any) is reachable only through the reserved 0 key.
0 semantics in CandidateSelect (voting-portal parity). In a max_votes > 1 contest the voter may already have accumulated some selections in pending_selections before pressing 0. The behavior mirrors the voting portal's current contest-selection rules exactly — one decision tree, three branches, evaluated in order:
- Explicit-blank candidate exists in the contest (any candidate with
is_explicit_blank = true). Pressing0clearspending_selections, records that single explicit-blank candidate as the sole selection for the contest, and advances toSelectionCheck. This matches how the portal "Select None / Blank" button works when the ballot defines an explicit blank option: selecting blank replaces whatever the voter had picked, it does not co-exist with other selections. - No explicit-blank candidate, and
min_votes = 0(implicit blank is allowed). Pressing0clearspending_selectionsand advances toSelectionCheckwith zero selections — whichSelectionCheckthen routes throughblank_vote_policy(§3.3.3):allowed→blank_ballot_confirm,warn→ warning then confirm,not_allowed→ reject back toCandidateSelect. - No explicit-blank candidate, and
min_votes > 0. Pressing0is rejected inline — replay the candidate prompt with a short "you must select at least {min_votes} candidates" preamble, without modifyingpending_selections. The voter keeps whatever they had already picked.
The reason for the order: branch 1 is a hard contest-level decision (the ballot author declared that an explicit-blank slot exists; picking it is an affirmative choice, not an omission) and must take priority over any per-voter policy interpretation in branch 2. Branch 3 exists because pressing 0 in a contest that requires selections is almost always a keypad slip — rejecting inline lets the voter continue rather than forcing them to re-enter from the top.
Forward reference — Ballot Policy Engine. The three-branch decision above is authored as a short, self-contained block in the CandidateSelect executor today. Longer-term it is meant to be expressed through the Ballot Policy Engine described in meta#6557, which will centralize contest-level validation and selection-transform rules across the voting portal, IVR, and admin portal so that "what does 0 do" has exactly one implementation rather than one-per-client. When the BPE lands, the IVR executor's branch 1/2/3 dispatch collapses into a single BPE call with a BlankIntent input; the user-visible behavior is unchanged. Until then, the IVR implementation matches the portal's current behavior literally to avoid a divergence that the BPE migration would later have to reconcile.
3.3.6 Shared LanguageSelector Component
The outer LanguageSelect phase (event-level) and the inner LanguageSwitch sub-phase (per-election override) share the same selection logic: if only one language is enabled, select it automatically; otherwise offer the enabled set and collect a DTMF digit. What differs is where the result is written — and that is the point of keeping a single shared component with a scope argument:
- Event scope (outer
LanguageSelect) — reads the event'slanguage_conf, writessession.language. Runs exactly once per call. - Election scope (inner
LanguageSwitch) — reads the election'slanguage_conf, writesBallotLoopState.election_language_override. Runs once per election iteration of the ballot loop. Never writessession.language.
Implement once as a helper parameterized by scope and have both the outer phase engine and the ballot-loop sub-phase dispatch to it. One implementation, one set of tests, two call sites.
Election-boundary reset — advance_to_election. A single helper owns every election-boundary transition inside the ballot loop: on entering the ballot loop for the first time, and whenever ElectionSelect picks a different election (or the loop auto-advances after submit). The helper sets the new election index, refreshes the sorted contest IDs, zeroes the contest cursor, clears pending_selections and edit_target_contest, and clears election_language_override. The language reset sits here alongside the other per-election cursor fields for the same reason enter_contest_edit owns the per-contest reset (§3.3.4) — one place, one invariant, one unit test. The dispatcher must not open-code election transitions; a forgetful refactor that mutated the index directly would leak the prior election's language override into the next one, silently re-introducing the exact leak this section was written to prevent.
3.4 Multi-Digit DTMF Input Handling
Amazon Connect supports multi-digit DTMF collection, enabling support for more than 9 options:
Single-Digit Mode (1-9 options):
- Immediate capture after single keypress
- Best UX: "Press 1 for Alice, Press 2 for Bob..."
- Use for: Language selection, most contests
Multi-Digit Mode (10-99 options):
- Collect 2 digits terminated by pound key (#)
- Prompts: "Enter the two-digit candidate number followed by pound"
- Example: "Candidate 01: Alice Smith, Candidate 02: Bob Johnson... Candidate 15: Zoe Martinez"
- Amazon Connect "Get customer input" block configured with "Maximum digits: 2" and terminator: "#"
Reserved keys (uniform across every phase and sub-phase). Each reserved key has exactly one meaning everywhere it appears — there are no context-dependent overloads:
| Key | Meaning | Notes |
|---|---|---|
* | Repeat instructions | Intercepted by the flow-engine dispatcher (§3.5.3) before any phase executor is invoked: the dispatcher replays session.last_response (§4.1) and returns without advancing the cursor. Phase executors never see * and must not handle it themselves — this is the mechanism that makes "uniform across every phase" enforceable rather than a per-phase convention. Never a candidate number, never a contest number, never a terminator. Safe on every phone keypad |
0 | Skip/abstain the current item | In a contest: skip/abstain, gated by EBlankVotePolicy and rejected if not_allowed. Interaction with in-progress selections in a max_votes > 1 contest is defined in §3.3.5 (voting-portal-parity: select the explicit-blank candidate if one exists, else clear selections when min_votes = 0, else reject). On ElectionSelect: skip the election-selection entirely — exits the ballot loop and advances to the next outer phase. In both cases the semantic is "I don't want to make a selection here"; the behavior is context-appropriate but the meaning is uniform. Never doubles as "end of multi-select" |
# | Terminator for multi-digit input | Matches the Connect "Get customer input" block terminator. Also ends accumulation in a multi-select contest once max_votes selections have been made or the voter has entered fewer than max_votes and wants to stop |
00# | Submit on ElectionSummary | Unambiguous because contest numbering starts at 1, so 00 cannot collide with a contest index |
01#–NN# | Edit contest N on ElectionSummary | Always multi-digit on summary, regardless of contest count — one rule, no edge cases |
Single digits 1–9 are always candidate numbers (in single-digit mode)
or are rejected (in multi-digit mode, where only two-digit entries are
valid). Under this convention there are no collisions between candidate
selection, contest editing, submit, skip, and repeat.
Practical Limits:
- 1-9 candidates: Single-digit input (optimal UX)
- 10-30 candidates: Two-digit input acceptable
- >30 candidates: Consider pagination or warn that phone voting may not be suitable
- >99 candidates: Not supported via phone (usability limit, not technical)
Implementation Notes:
- Lambda detects option count and instructs Connect whether to use single or multi-digit mode
- Prompts adapt based on mode: "Press 1" vs "Enter 0-1 followed by pound"
- Listing >20 candidates takes several minutes; consider pagination or summary mode
3.5 Hexagonal Architecture & Flow Engine
The IVR Lambda follows hexagonal architecture (ports & adapters). The domain logic (flow engine, phase engines, ballot loop) has zero knowledge of AWS, DynamoDB, S3, or HTTP. All external dependencies are behind port traits, with concrete adapters injected at startup.
3.5.1 Architecture Overview
3.5.2 Ports
Ports are the seams between domain logic and the outside world. Each port has one external dependency behind it and a narrow responsibility. The exact trait signatures and method shapes are an implementation decision — below is what each port is for and what guarantees it must preserve, not a prescription for how to spell it in Rust.
| Port | Backed by | Responsibility | Must preserve |
|---|---|---|---|
| Session | DynamoDB | Load, save, delete per-call session state keyed by contact_id | Conditional writes on every path (see §4.1): attribute_not_exists(contact_id) on create, version = :expected on update. One mechanism, applied uniformly — no read-then-write TOCTOU inside the adapter |
| Auth | Keycloak | Exchange collected credentials for tokens; refresh tokens | Never persist credentials in the port; tokens carry an absolute expiry, not a relative expires_in |
| ElectionConfig | Public S3 | Fetch the published ballot publication pinned to a specific publication_id | Process-level cache keyed by (tenant_id, event_id, publication_id) so concurrent calls share one copy |
| ElectionStatus | Hasura | Query real-time per-channel voting status | Requires a voter JWT (same auth model as the portal) |
| CastVoteHistory | Hasura | List ballots already cast by the authenticated voter in the current event; list the per-election num_allowed_revotes needed to decide whether re-entry is possible | Row-level scoping via JWT voter claims — mirrors the portal's GetCastVotes / GetElections so the IVR sees exactly what the portal would show the same voter. Distinct from ElectionStatus because the question ("what has this voter cast?") and the callers (ballot-loop entry vs. eligibility check) are different — shared Hasura adapter wiring, separate port trait |
| VoteCasting | Harvest | Submit an encrypted ballot | Must carry a deterministic idempotency key so retries can't double-submit (§4.1 blockquote) |
| PhoneConfig | S3 object (versioned bucket) | Map caller_phone → tenant/event/URLs | Read-only from the Lambda — the IAM execution role has s3:GetObject on this one object and nothing else on this bucket; no PutObject, no DeleteObject. Lookups resolve against a process-cached copy of the file (§6.2) |
| Blacklist | Hasura (+ service-account JWT via Keycloak client_credentials, see §6.3) | Yes/no answer for a phone number before auth | Authenticated query — not an anonymous endpoint. Service token comes from the platform IVR service client (shared client_id / client_secret installed identically in every IVR-enabled realm; secret in Secrets Manager), fetched through a TokenManager::get_service_token(realm) path that is separate from the voter ROPC path (§5.1.9) |
| PhoneHasher | AWS Secrets Manager | Produce (hash, salt_gen) for a raw E.164 phone number scoped to a tenant_id, for CloudWatch logging | Signature is hash(tenant_id, e164) -> (hash, salt_gen) — salt is per-tenant so rotation can align with each tenant's election calendar (§9.2.1). Per-container HashMap<TenantId, (Salt, SaltGen)> cache, no TTL; a new salt takes effect on cold start. Lambda must never log the raw E.164 — raw values live only in the in-flight DynamoDB session and the Hasura blacklist table |
Ports are separate; shared backends share one adapter. Three of the ports above route to Hasura — ElectionStatus, CastVoteHistory, and Blacklist — and they are distinct ports because their access patterns diverge (different query set, different JWT principal, different call timing: pre-auth for Blacklist, post-auth for the other two). But underneath, all three adapter implementations share a single HasuraClient per Lambda container — one reqwest::Client, one connection pool, one retry/backoff config, one circuit-breaker and metric surface. The port traits stay unaware of each other; the adapter structs each hold an Arc<HasuraClient> and differ only in which GraphQL document they send and which TokenManager they pull the JWT from (voter ROPC for ElectionStatus / CastVoteHistory, get_service_token(realm) for Blacklist).
This is called out explicitly because the naive reading of "one port, one adapter" leads to three separate HTTP clients — which would mean 3× connection pools to Hasura, three independent retry budgets firing in parallel when Hasura hiccups, and three places to keep TLS / timeout / tracing config in sync. One shared HasuraClient avoids all of that without compromising the port separation that makes the code testable. The same pattern applies to any future port that reaches Hasura: add a new trait, reuse the client.
Three domain types are referenced by the ports but deliberately left abstract in this document because the right definition depends on what the implementer chooses to reuse from sequent-core:
- Published ballot publication. The subset of the S3 publication JSON the IVR reads — event, sorted elections/contests/candidates, crypto config. Start from the portal's existing published-ballot types in
packages/sequent-corerather than inventing a new one. - Encrypted ballot. The in-memory representation the Lambda builds before calling
/insert-cast-vote. Must match the portal's ciphertext + proof layout so server-side acceptance rules do not diverge — mirrorsequent-core::ballot. - Auth credentials. What the Lambda hands to the Auth port. Should be a narrow tagged type (one case per step kind: voter-id, password/PIN, DoB, …), not a
HashMap<String, String>— the port signature then documents the contract. New step kinds (e.g. OTP, if ever added — see §5.1.4) show up as new cases.
Adapter implementations are free to add methods (batch queries, streaming, etc.) as long as the responsibility above stays intact. Tests substitute in-memory adapters; the handler wires the live ones.
3.5.3 Domain: Flow Engine
Key concept — the Lambda is stateless. Every invocation loads the session from DynamoDB (including the cursor into the flow pipeline), executes exactly one phase, saves the updated session, and responds. There is no in-memory state that survives between invocations other than the process-level publication cache.
The flow engine's job is small:
- Intercept the reserved
*= repeat key before dispatch. If the incomingLambdaInputisDtmf("*")and the session has a cachedlast_response(§4.1), return that cached response unchanged — the phase executor is not invoked, the cursor is not advanced, and no session fields are mutated other thanversion. This makes the §3.4 reserved-key promise ("*repeats instructions uniformly across every phase and sub-phase") true by construction: no phase executor sees*, so no phase executor can forget to honor it. If there is no cached response (e.g.,*arrives on the very first turn before any input-expecting prompt has been rendered), fall through to normal dispatch — the phase executor may treat it as an invalid input per its own rules. - Look up the current phase from the pipeline using the cursor in session state.
- Dispatch to the right phase executor. A typed (tagged-enum-style) pipeline makes the dispatch exhaustive — unknown phase tags fail at deserialization time, never mid-call.
- Cache the response, then return it unchanged from the phase executor. If the returned response has
expect_input = true, it is stored insession.last_responseso the next turn's*interception has something to replay. Auto-advancing responses (expect_input = false) are not cached — there is nothing to repeat yet, and the next input-expecting turn will overwrite the slot.
The engine itself owns no state. It borrows the flow pipeline, the prompt resolver, and the published ballot publication for the duration of the invocation. Phase executors are pure functions of (session, input, ports) → (new session, response); all external effects happen through ports. Phase executors must not list * in their own per-phase valid_inputs handling or treat * as invalid input — the dispatcher owns it before the executor is called; adding * to valid_inputs on the outgoing response is the dispatcher's responsibility too, so every input-expecting prompt accepts * automatically.
Phase context — PhaseCtx<'a>. A struct of &'a dyn Port references (one field per port) plus non-port environment (publication, prompts, clock). Every phase executor has the same signature fn(&mut IvrSession, &LambdaInput, &PhaseCtx<'_>) -> PhaseResult; dispatch is a mechanical match on the phase enum. Rejected alternatives: a generic PhaseCtx<S, A, …> (9+ type parameters for unmeasurable perf, against I/O-bound code) and a single "god trait" (collapses the one-port-one-responsibility rule from §3.5.2). Constraint: every port trait MUST be object-safe — no generic methods, &self receivers, async via async_trait — which is how they want to be written anyway. Test doubles are hand-rolled fakes behind dyn Port.
3.5.4 Domain: Ballot Loop Phase (Sub-Phase Dispatch)
The ballot-loop phase is itself a tiny flow engine one level down: it holds a sub-phase cursor (which sub-phase of the loop is active) and dispatches to the matching sub-phase executor. On first entry — when the previous outer phase has just transitioned into the loop — it initializes the cursor: computes the sorted election IDs using elections_order, decides whether to skip ElectionSelect (per §3.3.1 skip_election_list), reads the voter's cast-vote history through CastVoteHistoryPort so subsequent sub-phases can distinguish already-voted elections from eligible ones (§9.3), and seeds sub-phase state.
Sub-phase executors follow the same pure-function shape as outer phases. Most of them only need the session, the input, prompts, and the published publication. ElectionSelect additionally reads from the CastVoteHistoryPort (to annotate already-voted elections), and ElectionSubmit is the only one that reaches the Auth and VoteCasting ports.
Sub-phase transitions (what advances to what, when the loop goes back to ElectionSummary vs. forward to the next contest, how edit_target_contest interacts with VoteConfirm, and how the enter_contest_edit helper is the single owner of the edit-entry invariant in §3.3.4) are fully specified in §3.3; the dispatch code itself is mechanical.
Two dispatchers by design, not by accident. The outer dispatcher (§3.5.3) and this one are not unified into a single generic dispatcher, even though both take the shape (cursor, input) → (new_cursor, response). They dispatch different kinds of flow: the outer flow is a configurable linear pipeline (admin-editable at publication time, cursor is phase_index: usize, reserved-key interception for * lives at this level); the ballot-loop flow is a closed state machine (fixed sub-phase set modelling "cast one ballot", non-linear transitions, never sees * because the outer dispatcher has already consumed it). A unified dispatcher would need generics over cursor shape, transition kind, and port-context width — machinery that hides the difference rather than expressing it. The sub-phase set is not a configuration surface, so adding new outer phase types does not reopen this design.
3.5.5 Driving Adapter: Lambda Handler
The handler is thin — it does not contain business logic, it wires things together:
- Read
contact_idand the optionaluser_inputDTMF from the Connect event. - Load or create the session via the Session port. Create uses
ConditionExpression: attribute_not_exists(contact_id)— symmetric with theversion = :expectedguard on the update path (§4.1); a concurrent creator surfaces asSessionRacedand is handled by the same reload-and-decide policy. On create, look up the caller phone in the phone-config file (S3, §6.2) and snapshot the URLs/realm into the session so later phases don't re-read the routing config. - Fetch the published publication via the ElectionConfig port, pinned to the session's
publication_id(§5.1.8). - Construct the flow engine from the publication and invoke it.
- Save the session through the Session port (with optimistic concurrency, see §4.1) and return the response to Connect.
Errors bubble up through IvrError; the handler's only job with them is turning the "presented-to-voter" errors (§8.2) into a response whose prompt + should_disconnect match the error's intent, and logging the internal errors.
3.5.6 Testing Strategy
The pure-function shape of phase executors is the lever — every interesting scenario can be driven as (session_in, input) → (session_out, response):
- Phase / sub-phase unit tests. Construct a session, call the executor with in-memory adapters, assert on the resulting session cursor and response prompt key. No DynamoDB, S3, or HTTP.
- Record-and-replay session tests. Because every turn is deterministic, a full call is a list of
(input, expected_prompt_key, expected_expect_input, expected_disconnect)tuples. Client IVR specs (Barrie-style) become replay fixtures checked in alongside the code — regressions fail at CI time against a known-good script. (See §15.2.) - Text-in / text-out harness. The same pure-function shape lets the engine run without Amazon Connect at all — stdin/stdout (CLI), a fixture file, or a hosted endpoint — substituting only the Connect adapter. Used for automated scenarios and the
step-ivrCLI for manual walkthroughs and reproducing production issues. (See §15.2.1.) The admin portal is deliberately not a consumer of this harness in the initial release (§7.4); it remains a text-only editor. - Contract tests at port boundaries. The
/ivr-configresponse shape (§5.1.2) is verified by running a real Keycloak with a representative flow and asserting the JSON matches the Lambda's parser. (See §15.3.) - End-to-end tests exercise the Connect contact flow, Polly, and the live Harvest/Hasura/Keycloak stack.
The developer picks the concrete mock / trait-double style (mockall, hand-rolled fakes, wiremock for HTTP-level fakes, etc.) per port.
3.5.7 Why Hexagonal Architecture
- Testable — Domain logic tested with mock ports; no DynamoDB/S3/HTTP in unit tests
- Portable — Same domain could run in a different runtime (e.g., local CLI for testing) by swapping adapters
- Isolated changes — Switching from DynamoDB to Redis = new adapter, zero domain changes. Adding a new external service = new port + adapter
- Phase engines are pure — Given session state + input, produce new state + response. No side effects except through ports
- Config-driven — Flow composition is data, not code. Adding/removing phases = config change
- Ballot behavior from source of truth — Contest rules (blank, decline, min/max, ordering) read from published election data, same as voting portal
3.6 Channel-Specific Voting Periods
Phone voting can have independent start/stop times from online voting, following the same pattern as KIOSK and EARLY_VOTING channels.
What must exist. ElectionEventStatus in sequent-core already carries per-channel status + period pairs for voting (online), kiosk, and early_voting. The IVR feature adds a fourth channel — telephone — using the same shape: a VotingStatus field plus a PeriodDates field. Hasura permissions, admin-portal UI, and any code that iterates over channels must be updated to treat the new channel uniformly with the others; there is nothing IVR-specific about the status/period representation itself.
This allows administrators to configure phone voting hours independently (e.g., phone voting 9am–5pm, online voting 24/7). TELEPHONE is selected at the authorization layer directly from the JWT azp claim (ivr-voting); the full AzpClient → VotingStatusChannel mapping — kiosk straight from azp, portal clients fanning out into ONLINE vs EARLY_VOTING via the area's early-voting window — lives in Appendix C.7. See also §5.2 and sequent_core::services::authorization::authorize_voter_election.
4. Data Models
4.1 DynamoDB Session State Table
Table Name: ivr-voting-sessions
Primary Key: contact_id (Amazon Connect Contact ID)
Design principle. The session is per-contact and stays well under DynamoDB's 400 KB limit — anything large (prompts, elections, candidates, auth steps, event presentation) lives in the process-level publication cache keyed by (tenant_id, election_event_id, publication_id). A large municipality (dozens of contests × hundreds of candidates × multiple language bundles) can exceed 400 KB on its own, so duplicating the publication per call would be both wasteful and fragile. See §5.1.8 for the publication-discovery flow.
Concurrency & idempotency. A given contact moves through its Connect flow strictly sequentially — Connect does not issue overlapping Lambda invocations for the same contact_id, and it does not auto-retry a synchronous Invoke Lambda block (unlike async Event-type invocations, Connect's sync calls fail over to the Error branch rather than being retried). The races that matter are therefore not Lambda-vs-Lambda inside one call; they are Lambda-vs-its-backends and Lambda-vs-other-callers:
- Harvest partial completion. The handler encrypts and submits a ballot, Harvest writes it and commits, then the response is lost — the Lambda times out mid-flight, the socket drops, or the process is OOM-killed. Connect follows the Error branch,
HandleErrorruns on the next turn, and, absent a defense, it might resubmit — silently recording a second ballot for the voter. This is the common-case race, and it is a property of any non-idempotent HTTP backend, not of Connect. - External invokers of a live session row. The
step-ivrCLI (§3.5.6), the text-in/text-out replay harness, and diagnostic replays during incident response can all re-enter the handler against acontact_idthat also has a live Connect call. They must fail safely instead of clobbering state. - Defense in depth against Connect edges we don't model. Transfers, holds, and future Connect features could introduce interleavings the current design does not anticipate. A cheap conditional-write guard is durable against "something we didn't think of."
A naive get_session → mutate → save_session in any of those scenarios would let the later write silently clobber the earlier one — potentially dropping a selection, double-submitting a vote already accepted by Harvest, or advancing the phase cursor to a position the voter never reached. Three layers prevent that:
-
Conditional writes on every
SessionPortmutation.IvrSessioncarries aversion: u64(see struct below) bumped on every update, and the DynamoDBPutItemis guarded byConditionExpression: version = :expected. The create path is guarded by the same mechanism, using a different precondition:ConditionExpression: attribute_not_exists(contact_id). One write model — "put succeeds only if the precondition holds" — applied uniformly to create and update, with no read-then-write window inside the adapter where a concurrent creator or updater could slip through unnoticed.The create-path guard is belt-and-suspenders. A given
contact_idshould not see two concurrent cold-starts in production: Connect runs the contact flow strictly sequentially for one contact and does not auto-retry the synchronousInvoke Lambdablock. The guard exists to protect against the same class of scenarios that motivates the update-path guard — external invokers against a livecontact_id(thestep-ivrCLI, replay harnesses, diagnostic re-runs), contact-flow authoring mistakes that fork two Invoke-Lambda branches before init completes, and the general "something we didn't model" category. Costs nothing at runtime (it's a single DynamoDB condition), stays consistent with the update-path pattern, and removes an otherwise-silent race class from the adapter contract.Both guards surface a lost race internally as
IvrError::SessionRaced, which is never presented to the voter as a user-facing prompt — the scenarios these guards defend against are not voter-caused, so "please try again" would be both confusing and pointless. Instead, the handler applies a reload-and-decide policy:- On
SessionRaced, re-get_sessionto see what the winning writer committed. - If the reloaded
positionhas already advanced past this invocation's starting cursor, the other writer did our work for us — drop silently and return a no-op response, logging the conflict with full context (who the winning writer was, if derivable). This is option (c) — ignore and log — from the finding's list, and it is the correct answer for every race this defends against. - If the reloaded session is still at our starting version but something else changed under us, retry the write exactly once against the fresh version. A second
SessionRacedon the same turn indicates a degenerate situation that should not occur in production: log aterrorlevel with full context and return the genericsystem_errorprompt with disconnect.
The result:
SessionRacedhas no voter-visible prompt, and the voter never hears a "please try again" for something they did not cause. The only voter-visible error that can come out of this path issystem_erroron the second-failure arm, which is already generic, already disconnects, and already covers arbitrary internal faults (§8.2). - On
-
Encrypt-once, resubmit-same for vote idempotency. This is the defense against the Harvest partial-completion race, and it is load-bearing.
ballot_idis the SHA-256 of the encrypted ballot content — Harvest recomputes it and rejectsBallotIdMismatch, so the Lambda cannot simply pick a deterministic ID. Instead, the Lambda encrypts each election's ballot exactly once per session and caches the encrypted payload + itsballot_idin the session (a per-election slot onIvrSession). AnElectionSubmitretry after a timeout resubmits the cached payload verbatim: same ciphertext → same hash → sameballot_id→ Harvest's existing revote check (CheckRevotesFailed/InsertFailedExceedsAllowedRevoteswhenmax_revotes = 1, see §5.4) rejects the second attempt rather than recording a second ballot. Re-encrypting on retry would produce a newballot_id(fresh ElGamal randomness) and defeat the de-dup, so the "encrypt once, store, resubmit" rule is a load-bearing invariant. -
Connect-side input-replay contract. Although Connect does not auto-retry
Invoke Lambda, the contact flow itself must be authored so it doesn't manually reintroduce a retry loop. On the "Invoke Lambda" block's failure branch, the flow playssystem_errorand disconnects — it does not wire the Error branch back into the same "Get customer input" block. This makes each(contact_id, turn)pair at-most-once by construction. The contract is asserted in the contact-flow fixture tests (§15) so a flow edit that reintroduces a retry loop fails CI.
/// Per-call session state stored in DynamoDB. The ballot publication is
/// NOT here — it is cached at the Lambda process level (see design note
/// above and §5.1.8).
#[derive(Serialize, Deserialize)]
pub struct IvrSession {
// Identity — pins the call to one publication snapshot
pub contact_id: String,
pub caller_phone: String,
pub call_start_time: DateTime<Utc>,
pub tenant_id: Uuid,
pub election_event_id: Uuid,
/// Process-level publication cache key. Resolved once at session init
/// so a mid-call republish cannot change the ballot under the voter.
pub publication_id: String,
// URL snapshot — copied once from PhoneConfig at session init
pub keycloak_url: String,
pub harvest_url: String,
pub hasura_url: String,
pub s3_public_base_url: String,
pub keycloak_realm: String,
// Authentication
pub voter_id: Option<String>,
pub access_token: Option<String>,
pub refresh_token: Option<String>,
/// Absolute Unix timestamp from the JWT `exp` claim — not a relative
/// `expires_in`, so it round-trips through DynamoDB without rebasing.
pub access_token_expires_at: Option<i64>,
pub session_started_at: Option<i64>,
pub area_id: Option<Uuid>,
/// Auth step list pinned at session init from Keycloak's /ivr-config
/// (§5.1.7). Read once; every subsequent turn reads from here, not
/// from Keycloak — a mid-call admin edit to the Direct Grant flow
/// cannot change what credentials this call collects.
pub auth_steps: Vec<AuthStep>,
// Event-level language — chosen once in the outer `LanguageSelect`
// phase and fixed for the rest of the call. Per-election overrides
// live on `BallotLoopState.election_language_override` (§3.3.4) and
// are read via `effective_language()`; this field is never mutated
// by the inner `LanguageSwitch` sub-phase.
pub language: Language,
// Votes in progress — accumulated during ballot loop, consumed by ElectionSubmit
pub votes: HashMap<Uuid, ContestVote>,
/// Per-election encrypted-ballot cache, populated the first time
/// `ElectionSubmit` is attempted for that election and reused on any
/// subsequent retry. Guarantees that an `ElectionSubmit` retry after a
/// timeout hashes to the same `ballot_id` (which is the SHA-256 of the
/// encrypted content — see §9.3), so Harvest rejects the resubmission
/// as a duplicate rather than recording a second ballot.
pub encrypted_ballots: HashMap<Uuid, EncryptedBallotCacheEntry>,
// Submission results — drives ElectionReceipt and the end-of-call summary
pub submission_results: Vec<ElectionSubmissionResult>,
// Flow engine cursor + phase-local state
pub position: FlowPosition,
/// Cached response from the previous turn, used by the dispatcher-level
/// `*` = repeat short-circuit (§3.5.3, §3.4). Overwritten on every turn
/// that produces a response with `expect_input = true`; not written on
/// auto-advancing turns (there is nothing to repeat yet). On `*` input,
/// the dispatcher returns this unchanged — the phase executor is not
/// invoked, so `*` cannot accidentally advance or mutate state. Kept
/// on the session rather than re-rendered on demand because phase
/// executors may auto-advance on `NoInput`; a dedicated "render-only"
/// mode would have to thread through every executor, and persisting
/// the response is cheaper.
pub last_response: Option<ConnectResponse>,
/// Per-error-class retry counters. Distinct reset semantics — see
/// `RetryCounters` below and §8.1.
pub retries: RetryCounters,
/// Optimistic-concurrency guard for the update path. Bumped on every
/// write; the DynamoDB `PutItem` is guarded by
/// `ConditionExpression: version = :expected`. The create path uses
/// `ConditionExpression: attribute_not_exists(contact_id)` instead —
/// same "put only if the precondition holds" model, different
/// precondition. Lost races (either guard) surface internally as
/// `IvrError::SessionRaced` and are handled via the reload-and-decide
/// policy described in §4.1 — never surfaced to the voter as a prompt.
pub version: u64,
/// DynamoDB TTL — sliding idle window (default 1 h) capped at a
/// hard ceiling of `session_started_at + ssoSessionMaxLifespan`, so
/// long calls don't lapse mid-flight but a looping contact flow
/// can't keep a row alive forever. See §9.2.1.
pub ttl: i64,
}
/// Separate retry counters by error class. A single counter would mix up
/// unrelated kinds of failure — "3rd invalid DTMF while picking a candidate"
/// must not cross-contaminate "3rd auth attempt". Reset semantics:
///
/// - `auth` — cleared on successful authentication.
/// - `invalid_input` — cleared on any phase or sub-phase transition.
/// - `timeout` — cleared on any successful DTMF capture.
///
/// Maximums are configurable per event via `presentation.ivr.retry_limits`
/// (§7.3). Default 3 for each counter; missing values fall back to this.
#[derive(Serialize, Deserialize, Clone, Default)]
pub struct RetryCounters {
pub auth: u8,
pub invalid_input: u8,
pub timeout: u8,
}
/// Flow position: cursor into the phase pipeline plus per-phase state.
/// The `state` variant must correspond to the `FlowPhase` variant at
/// `flow_config[phase_index]`. Enforced by construction, not by runtime
/// checks alone — see **Invariant: positional variant alignment** below.
#[derive(Serialize, Deserialize, Clone)]
pub struct FlowPosition {
pub(crate) phase_index: usize,
pub(crate) state: PhaseState,
}
/// Phase-internal state — one variant per `FlowPhase` variant. Each phase
/// carries its own state shape; no generic "entry / waiting / done" state
/// every phase has to interpret.
#[derive(Serialize, Deserialize, Clone)]
pub enum PhaseState {
Announcement(AnnouncementState),
LanguageSelect(SimpleState),
BlacklistCheck(SimpleState),
Auth(AuthState),
EligibilityCheck(SimpleState),
BallotLoop(BallotLoopState),
Goodbye(SimpleState),
}
/// Fallback state for phases that collapse to "play, optionally wait for
/// input, advance."
#[derive(Serialize, Deserialize, Clone)]
pub enum SimpleState {
Entry,
WaitingForInput,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct AnnouncementState {
pub simple: SimpleState,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct AuthState {
/// Current index into the auth step list discovered via /ivr-config.
pub step_index: usize,
}
Flow phases (typed dispatch). The flow is a list of typed phases, not
a list of { phase: String, config: HashMap<String, Value> } pairs —
per CLAUDE.md's "policies use enums, not magic strings" rule. An
exhaustive match in the dispatcher gives compile-time coverage, and the
admin portal can render form fields from each variant's shape. A typo in
a config key fails at deserialization time, not mid-call.
#[derive(Serialize, Deserialize, Clone)]
#[serde(tag = "phase", rename_all = "snake_case")]
pub enum FlowPhase {
/// Play a prompt, optionally wait for an acceptance key. Covers
/// welcome / declaration / pre-voting statement.
Announcement(AnnouncementConfig),
LanguageSelect,
BlacklistCheck,
Auth,
EligibilityCheck,
BallotLoop(BallotLoopConfig),
Goodbye,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct AnnouncementConfig {
/// Non-semantic label used for logs, metrics, and admin-portal
/// rendering. Examples: "welcome", "declaration", "pre_voting_statement".
pub name: String,
/// Prompt key looked up in the i18n bundle for the current language.
pub prompt_key: String,
/// If `Some("2")`, the voter must press `2` to advance (Barrie
/// declaration style). If `None`, the engine auto-advances.
pub accept_key: Option<String>,
}
#[derive(Serialize, Deserialize, Clone, Default)]
pub struct BallotLoopConfig {
/// A 4-character ballot locator read back phonetically, or none.
pub receipt_format: Option<ReceiptFormat>,
}
#[derive(Serialize, Deserialize, Clone)]
#[serde(rename_all = "snake_case")]
pub enum ReceiptFormat {
PhoneticHex4,
}
/// One auth step — retrieved from Keycloak's /ivr-config endpoint,
/// NOT from S3. The list reflects the realm's Direct Grant flow
/// execution order.
#[derive(Serialize, Deserialize, Clone)]
pub struct AuthStep {
/// Semantic name, e.g. "voter_id", "pin", "dob".
pub field: String,
pub max_digits: u8,
/// "#", "*", or "".
pub terminator: String,
/// ROPC form param: "username", "password", "dob", etc.
pub maps_to: String,
/// Override; if `None`, derive from `maps_to` (see §5.1.3).
pub prompt_key: Option<String>,
}
Invariant: positional variant alignment. FlowPhase and PhaseState are parallel enums whose variants must stay positionally matched (FlowPhase::Auth pairs with PhaseState::Auth, etc.). Enforced by construction, not by a single runtime assertion:
FlowPhase::initial_state()is the single mapping between the two enums — one exhaustive match. Adding aFlowPhasevariant without itsPhaseStatepeer fails to compile.FlowPosition::new(flow)andFlowPosition::advance(flow)are the only paths that construct or move a position; both go throughinitial_state(). Fields are crate-private so no call site can hand-build a mismatched pair.- Dispatch co-matches on
(phase, state)exhaustively; a surviving_arm returnsIvrError::PhaseStateMismatch— a last-line-of-defence, logged, treated the same way asSessionRaced(reload and decide, §4.1). - A unit test iterates both enums and asserts the matching is total.
Why not bundle config and state into one enum (Announcement(AnnouncementConfig, AnnouncementState), etc.)? Full compile-time enforcement would require this, but it conflates immutable flow config (from S3, cached at process level, shared across sessions — §5.1.8) with mutable per-turn session state (serialized to DynamoDB every invocation). That would either write the whole pipeline back to DynamoDB per turn or require custom serde that strips config on write and rebinds on read — in both cases adding more machinery than it saves. The (1)–(4) combination above shrinks the engineer-error surface to zero for practical purposes, which is the actual payoff; the residual runtime check exists only for defence in depth.
Ballot loop context. Rendering and validation need access to election / contest / candidate data from the publication (name, ordering, per-contest max_votes / min_votes, blank-vote policy, under-vote policy, the DTMF option assigned to each candidate at session init). Reuse the existing sequent-core types (EBlankVotePolicy, EUnderVotePolicy, VoteBehavior, candidate/contest/election models) rather than redeclaring them. Per-contest voter choices and per-election submission outcomes round-trip through DynamoDB:
#[derive(Serialize, Deserialize)]
pub struct ContestVote {
pub contest_id: Uuid,
pub selected_candidate_ids: Vec<Uuid>,
pub is_blank: bool,
pub is_declined: bool,
}
/// Cached encrypted ballot for one election. Populated once when
/// `ElectionSubmit` first runs for that election; reused verbatim on
/// any retry so `ballot_id` (= SHA-256 of `content`) stays stable and
/// Harvest's duplicate check catches the retry. See §9.3.
#[derive(Serialize, Deserialize)]
pub struct EncryptedBallotCacheEntry {
/// Serialized `SignedHashableBallot` — the exact bytes POSTed to
/// Harvest `/insert-cast-vote`.
pub content: String,
/// Hex-encoded SHA-256 of `content`; matches the `ballot_id`
/// Harvest will recompute and validate.
pub ballot_id: String,
}
/// Result of submitting one election's ballot during ElectionSubmit.
/// Extend at the enum, not with booleans.
#[derive(Serialize, Deserialize)]
pub struct ElectionSubmissionResult {
pub election_id: Uuid,
pub status: SubmissionStatus,
/// Ballot hash — used to derive the spoken ballot locator in
/// ElectionReceipt. Current format: first 4 hex characters read
/// phonetically.
pub ballot_hash: Option<String>,
}
#[derive(Serialize, Deserialize)]
pub enum SubmissionStatus {
Success,
/// Per-election rejection from Harvest — the adapter has already
/// classified the `CastVoteError` variant into a prompt-ready shape.
/// See §5.4 for the rejection taxonomy and the raw-variant mapping.
Rejected(CastVoteRejection),
/// Transport-level failure (timeout, 5xx, malformed body) — played
/// as `vote_failed`; the ballot loop advances to the next election.
Failed { error: String },
}
4.2 Lambda Request/Response Models
Request from Amazon Connect. Connect's invocation payload shape is a fixed AWS contract — prefer the AWS Lambda Rust runtime's published types over redefining them. The Lambda reads only the fields the handler needs: ContactId, the E.164 Address from CustomerEndpoint, any DTMF captured in the previous turn from Parameters, and any attributes set by earlier contact-flow blocks.
#[derive(Deserialize)]
pub struct ConnectEvent {
pub Details: ContactDetails,
}
#[derive(Deserialize)]
pub struct ContactDetails {
pub ContactData: ContactData,
/// Bag of `{ String: String }` set by the contact flow's "Invoke
/// Lambda" block — carries the DTMF captured on the previous turn.
pub Parameters: HashMap<String, String>,
}
#[derive(Deserialize)]
pub struct ContactData {
pub ContactId: String,
pub CustomerEndpoint: Endpoint,
pub Attributes: HashMap<String, String>,
}
#[derive(Deserialize)]
pub struct Endpoint {
/// E.164 caller phone.
pub Address: String,
/// "TELEPHONE_NUMBER".
pub Type: String,
}
Response to Amazon Connect. Deliberately minimal — just enough for the contact flow to play a prompt and optionally capture input. No SSML flag, no error flag, no debug dump. An "error" is just a prompt with should_disconnect = true; the contact flow does not need to know. Internal phase-state debugging belongs in CloudWatch structured logs (§10.2), not in the Connect attribute bag.
#[derive(Serialize)]
pub struct ConnectResponse {
/// Text (plain or SSML) played via Polly. SSML is allowed; see §7.2
/// for language-tag usage and the required sanitizer.
pub prompt_text: String,
/// Whether the contact flow should capture DTMF after the prompt.
pub expect_input: bool,
/// Characters allowed on this turn — digits `0`–`9`, `*`, `#`, or
/// multi-digit sequences like `00#`, `01#`, …. Empty when
/// `expect_input = false`. Enforced by the Lambda on the next turn;
/// the contact flow's input block does not filter. See §3.4 for the
/// reserved-key convention.
pub valid_inputs: String,
/// Seconds to wait for DTMF before timing out.
pub input_timeout: u8,
/// If `true`, the contact flow plays the prompt and disconnects.
pub should_disconnect: bool,
}
5. API Integration
5.1 Authentication Flow
Authentication uses standard OIDC Direct Grant (ROPC) via Keycloak's token endpoint. The Lambda does not know what authentication factors are required — it discovers them at runtime by asking Keycloak, collects credentials accordingly, and submits them to the token endpoint.
Design principle: Keycloak is the single source of truth for auth configuration. The realm's Direct Grant flow already defines which credentials are required; duplicating that into presentation.ivr.auth in S3 would create drift between the two. Instead, the Lambda queries a small custom Keycloak REST endpoint that derives the auth step list from the realm's flow executions.
5.1.1 How It Works
- At session init, Lambda calls
GET {KEYCLOAK_URL}/realms/\{realm\}/ivr-config— a custom Keycloak REST extension that walks the realm's Direct Grant flow and returns an ordered list of auth steps. The call carries theivr-serviceclient_credentials bearer token (the same service JWT reused for the pre-auth blacklist read in the same turn, §6.3); the Lambda fetches it viaTokenManager::get_service_token(realm)(§5.1.9) and the per-realm token cache is normally already warm from the blacklist call - Lambda caches the response in the DynamoDB session record (same cache used for S3 election config)
- For each step, Lambda prompts for DTMF input using a well-known prompt key derived from the step's
maps_tofield (see 5.1.3) - Lambda maps collected fields to ROPC form parameters and POSTs to Keycloak's token endpoint
- On success, Lambda stores the JWT and proceeds to the next flow phase
The Lambda doesn't know whether it's collecting a PIN, DoB, or any other credential — it just iterates the discovered steps, collects digits, and maps them to ROPC parameters. Keycloak validates them using the authenticators configured on the realm's Direct Grant flow.
5.1.2 The ivr-config Keycloak Endpoint
A new Keycloak REST extension (ivr-config-resource, see Appendix C.8.2) exposes a single endpoint:
GET /realms/{realm}/ivr-config
Response (voter_id + PIN deployment, both from stock authenticators):
{
"steps": [
{ "field": "voter_id", "max_digits": 8, "terminator": "#", "maps_to": "username" },
{ "field": "pin", "max_digits": 8, "terminator": "#", "maps_to": "password" }
]
}
Response (voter_id + DoB deployment, DoB coming from the custom IvrDobAuthenticator):
{
"steps": [
{ "field": "voter_id", "max_digits": 8, "terminator": "#", "maps_to": "username" },
{ "field": "dob", "max_digits": 8, "terminator": "#", "maps_to": "dob" }
]
}
How the endpoint builds the response (~100 lines of Java — see Appendix C.8.2 for full implementation notes):
- Look up the effective Direct Grant flow for the
ivr-votingclient (client-level override if present, else realm default) - Walk the flow's executions in order, filtering to
REQUIRED/CONDITIONAL(matches the Java filter in Appendix C.8.2;ALTERNATIVEandDISABLEDexecutions are ignored) - For each execution, produce a step from one of two sources:
- Stock Keycloak authenticators — a small static lookup table baked into the extension:
direct-grant-validate-username→{ field: "voter_id", max_digits: 8, terminator: "#", maps_to: "username" }direct-grant-validate-password→{ field: "pin", max_digits: 8, terminator: "#", maps_to: "password" }
- Custom IVR authenticators (
IvrDobAuthenticator, etc.) — read the execution'sAuthenticatorConfig, which the admin configures in the Keycloak admin UI. Each custom authenticator declares these keys in itsgetConfigProperties():field_name,max_digits,terminator,maps_to
- Stock Keycloak authenticators — a small static lookup table baked into the extension:
- Return the list as JSON
The endpoint requires the ivr-service client_credentials bearer token — the same service JWT the Lambda already obtains for the pre-auth blacklist read (§6.3, §C.8.b), fetched through TokenManager::get_service_token(realm) (§5.1.9). Authorization is gated by the same service-account role mapping (can_read_phone_blacklist — the role also carries /ivr-config read rights; see §C.8.b). Rationale: although individual fields (max_digits, terminator, maps_to) are low-sensitivity, the shape of the list (how many factors, whether DoB or PIN is active, whether a custom authenticator is configured) is a meaningful fingerprint of a realm's auth posture, and there is no reason to leave it anonymously enumerable per-realm. The marginal engineering cost is near-zero because the service-auth path already exists for the blacklist read earlier in the same turn: the Lambda simply attaches the cached service token to the outbound HTTP call. No new secret, no new client, no new cache.
If the admin adds a non-IVR-aware authenticator to the flow, the endpoint returns 500 Internal Server Error with a clear message identifying the unknown authenticator ID, so misconfigurations surface at deployment time instead of mid-call.
5.1.3 Prompt Keys — Well-Known by maps_to
The Lambda uses a fixed, well-known mapping from ROPC parameter name to prompt key. This keeps the config minimal — since auth fields are essentially just "username", "password", and a few standard custom fields, admins only need to provide translations for a handful of prompt keys that never vary per election.
maps_to value | Prompt key | Typical content |
|---|---|---|
username | auth_enter_username | "Please enter your voter ID followed by the number sign key." |
password | auth_enter_password | "Please enter your PIN (or date of birth) followed by the number sign key." |
dob (custom) | auth_enter_dob | "Please enter your date of birth as MMDDYYYY followed by the number sign key." |
These keys live in presentation.i18n[lang].ivr, the same namespace used for all IVR prompts and IVR-only spoken-text overrides. The admin provides translations in the admin portal's IVR Prompts editor. The Lambda ships sensible English/French defaults for each well-known key as a fallback.
If a custom authenticator uses a new maps_to value that isn't in the table, the admin can override the prompt key via the authenticator's AuthenticatorConfig (prompt_key property). The endpoint passes it through in the step response:
{ "field": "birth_year", "max_digits": 4, "terminator": "#", "maps_to": "birth_year", "prompt_key": "auth_enter_birth_year" }
Lambda precedence: step's explicit prompt_key (if present) > well-known mapping by maps_to > error.
5.1.4 OTP Flow — Possible Future Extension (Not In Scope)
OTP over IVR is not planned for the initial release. None of the deployments currently on the roadmap require a second factor delivered over the phone channel — voter-ID + PIN (or voter-ID + DoB) is sufficient. This section documents the shape a future extension could take so the current architecture doesn't foreclose it, not a feature to build now.
If it is ever added, the natural shape — which this design deliberately leaves room for — is:
- Lambda submits the first ROPC call with the collected credentials (unchanged).
- Keycloak's Direct Grant flow can return an
otp_requirederror the same way it does for TOTP today. No IVR-side config would be needed; whether OTP runs would be purely a Keycloak flow decision. - On
otp_required, the Lambda collects an OTP code via DTMF (new phase-internal state) and resubmits all original credentials plusotp={code}to the same token endpoint. - On success → JWT issued. On failure → retry or disconnect.
This fits cleanly because: (a) the auth credentials port (§3.5.2) is a tagged type that can grow a new Otp case, (b) the /ivr-config endpoint does not need to declare OTP — it would be discovered reactively, and (c) the well-known prompt-key table (§5.1.3) can grow new keys (auth_otp_sent, auth_enter_otp, auth_otp_invalid) without schema changes.
What the initial implementation should NOT do: build the IvrOtpDirectGrantAuthenticator Keycloak extension, add OTP prompt keys to the default i18n bundle, or reserve DynamoDB session fields for OTP state. All of that can be added when a deployment actually needs it.
5.1.5 Keycloak Direct Grant Flow Configuration
The realm's Direct Grant flow uses ConditionalClientAuthenticator (already in packages/keycloak-extensions/conditional-authenticators/) to branch by client ID:
The same realm handles both web portal and IVR authentication. The Keycloak admin configures which authenticators are active for the ivr-voting client in the Keycloak admin UI — this is the one and only place auth is configured. The IVR Lambda learns about it automatically via /ivr-config.
5.1.6 Custom Keycloak Authenticators & Extensions
| Component | When Needed | Complexity | Description |
|---|---|---|---|
ivr-config-resource | Always (replaces S3 auth config) | ~100 lines Java | RealmResourceProvider exposing GET /realms/\{realm\}/ivr-config. Walks the Direct Grant flow and returns auth steps |
IvrDobAuthenticator | Optional — only if DoB is NOT stored as password | ~80 lines Java | Reads dob from form params, validates against user's date_of_birth attribute. Declares field_name/max_digits/terminator/maps_to as config properties |
IvrOtpDirectGrantAuthenticator | Not in initial scope — see §5.1.4 | — | Deferred. Would follow the same pattern as message-otp-authenticator, triggering the Direct Grant otp_required flow |
Custom authenticators must declare the IVR metadata fields in their getConfigProperties() so the ivr-config-resource can read them back:
public static final List<ProviderConfigProperty> CONFIG_PROPERTIES = ProviderConfigurationBuilder.create()
.property().name("field_name").type(STRING_TYPE).label("IVR field name").add()
.property().name("max_digits").type(STRING_TYPE).label("IVR max DTMF digits").add()
.property().name("terminator").type(STRING_TYPE).label("IVR terminator key").defaultValue("#").add()
.property().name("maps_to").type(STRING_TYPE).label("ROPC form parameter").add()
.property().name("prompt_key").type(STRING_TYPE).label("IVR prompt key override (optional)").add()
.build();
If OTP is ever added as described in §5.1.4, the authenticator would reuse existing infrastructure from packages/keycloak-extensions/message-otp-authenticator/ (code generation, SMS/email couriers, constant-time validation) — it is not being built as part of the initial IVR release.
If the election uses simple voter ID + PIN (where PIN = Keycloak password), no custom authenticators are needed — only ivr-config-resource needs to be deployed.
5.1.7 Pinning & Caching
Per-session pinning (required). The resolved Vec<AuthStep> is read from /ivr-config exactly once, at session init, and stored on IvrSession (see §4.1). Every subsequent turn of the same call reads the auth step list from the session row, never from /ivr-config. This is symmetric with how the ballot publication is pinned per-session (§5.1.8) and guarantees a single call cannot observe two different auth-step lists — e.g. step 1 collecting PIN under the old flow, then step 2 being asked for a newly-added DoB after an admin edit.
To make this a compile-time invariant rather than a convention, IvrSession.auth_steps: Vec<AuthStep> is populated during session construction and the Auth phase engine reads only from the session — it has no port to call /ivr-config mid-call.
Per-realm cache (optimization for new sessions only). New sessions hitting the same realm within a short window share a cached /ivr-config response to avoid hammering Keycloak during a call spike. The cache lives in DynamoDB, keyed by realm, with a TTL controlled by the Lambda env var IVR_CONFIG_CACHE_TTL_SECONDS (default 300, i.e. 5 minutes). Setting it to 0 disables the cache entirely (every new session hits Keycloak).
- Sessions already in flight are unaffected by cache expiry or invalidation — they read from their pinned session row.
- New sessions pick up admin changes within
IVR_CONFIG_CACHE_TTL_SECONDS. - Ops can flush the cache manually (DynamoDB delete) for emergency rollout, or drop the TTL to
0during an incident.
The cache is strictly an optimization: removing it (or setting TTL to 0) only increases Keycloak load, never affects correctness, because pinning is the source of truth for any given call.
5.1.8 IVR Config Discovery — S3 + Keycloak
IVR session config comes from two sources:
- Public S3 (published ballot publication) — election structure, prompts, flow pipeline, presentation
- Keycloak
/ivr-configendpoint — authentication step list (see 5.1.2)
The IVR flow, prompts, and IVR-only spoken-text overrides are part of the frozen ballot publication. Once a publication is cut, its ivr.flow + i18n[lang].ivr data is immutable — admin edits in the portal only take effect after a new publication is produced. This is a deliberate choice: the ballot publication is an attested, signed artifact used by the voting portal in preview mode, and pulling IVR presentation out of it would fragment the source of truth. Admins who need to change IVR prompts or spoken overrides after ballot freeze run a new publication, same as any other presentation edit. (The blacklist is the one exception — it changes too frequently to live in the publication; see §6.3.)
Published ballot publication structure (tenant-\{tenantId\}/document-\{documentId\}/\{publicationId\}.json):
{
"ballot_styles": [
// Ballot EML: contests, candidates, public keys, presentation config
],
"elections": [
// Election metadata, presentation, voting channels
// Note: voting_status is always "OPEN" in published data (static snapshot)
],
"election_event": {
// Full event: presentation (IVR flow + prompts, NOT auth steps),
// i18n (including IVR prompts and spoken-text overrides), language_conf, voting_channels
},
"support_materials": [...],
"documents": [...]
}
What the IVR Lambda reads from published S3 data:
election_event.presentation.ivr.flow— phase pipelineelection_event.presentation.i18n[lang]["ivr"]— event-level prompts and spoken-text overrides (including the well-known auth prompt keys)election_event.presentation.language_conf— enabled languagesballot_styles[].ballot_eml— contests, candidates, min/max votes, public keyselections[].presentation.i18n[lang]["ivr"]— election-level prompts and spoken-text overridescontests[].presentation.i18n[lang]["ivr"]/candidates[].presentation.i18n[lang]["ivr"]— contest/candidate spoken-text overrides used only by IVRelections[].voting_channels— which channels are enabled
What the IVR Lambda reads from Keycloak /ivr-config:
- The ordered list of auth steps (field, max_digits, terminator, maps_to, optional prompt_key override)
What is NOT available from S3 (requires Harvest API):
- Real-time voting status (S3 always shows
voting_status: "OPEN") - Vote submission
Publication flow:
- Admin configures IVR flow + prompts/overrides in admin portal (not auth steps — those live in Keycloak)
- Settings stored in
presentation.ivr.flowandpresentation.i18n[lang]["ivr"]in PostgreSQL - Ballot publication task generates the publication JSON and uploads to public S3
- Auth flow is configured separately by the admin in the Keycloak admin UI (realm's Direct Grant flow)
- Published data is publicly accessible — no authentication needed
Lambda session initialization:
- Call arrives → Lambda reads the
ivr-phone-config.jsonobject from S3 (process-cached, see §6.2) → resolves the dialled number to S3 base URL + tenant_id + election_event_id + keycloak realm - Lambda fetches published ballot publication JSON from public S3
- Lambda fetches auth step list from
{KEYCLOAK_URL}/realms/\{realm\}/ivr-config(cached 5 min) - Both sets cached in DynamoDB session
- Flow engine begins executing the configured phase pipeline
Keycloak Realm: tenant-\{tenantId\}-event-\{eventId\}
Required Keycloak Configuration:
- Deploy
ivr-config-resourceextension (see Appendix C.8.2) - Create
ivr-votingclient withdirect-access-grantsenabled (see Appendix C.8.a) - Create
ivr-serviceclient withservice-accountsenabled and a service-account role mapping forcan_read_phone_blacklist(see Appendix C.8.b) — installed identically in every IVR-enabled realm; secret in AWS Secrets Manager. This same role also gates the authenticated/ivr-configread (§5.1.2, §C.8.2); the Lambda reuses the cached token across both pre-auth reads per turn - Configure Direct Grant flow with conditional branching for
ivr-votingclient — this is now the only place voter auth is configured - Configure voters with voter ID as username
- Credential storage matches the Direct Grant flow (e.g., password credential for PIN, or user attribute +
IvrDobAuthenticatorfor DoB) - For custom authenticators (
IvrDobAuthenticator, etc.): fill in theirAuthenticatorConfig(field_name,max_digits,terminator,maps_to) so the/ivr-configendpoint can return them - JWT claims include
area_idandauthorized_election_ids(via existingAuthorizedElectionsUserAttributeMapper)
5.1.9 Token Expiry Handling (Critical)
The Problem: JWT tokens have limited lifetimes. From the current Keycloak configuration:
accessTokenLifespan: 300 seconds (5 minutes)ssoSessionIdleTimeout: 1800 seconds (30 minutes) - refresh token idle timeoutssoSessionMaxLifespan: 36000 seconds (10 hours) - max session durationrefreshTokenMaxReuse: 0 (single-use refresh tokens)
Phone calls can easily exceed 5 minutes, especially for:
- Voters needing to repeat instructions
- Elections with multiple contests
- Elderly voters or those with accessibility needs
Risk: If access token expires mid-call and we can't refresh, the voter completes all selections but vote submission fails with 401.
Token Lifecycle Constraints:
- Access token (5 min): Can be refreshed using refresh token
- Refresh token: Valid while SSO session is active
- Idle timeout: 30 min of inactivity invalidates it
- Max lifespan: 10 hours absolute limit
- Single-use: Each refresh returns a new refresh token
- SSO Session: The underlying session that backs the refresh token
Proposed Solution - Proactive Token Refresh:
TokenManager is reconstructed on each invocation from the serialized session
fields (Lambda is stateless, so we can't keep an in-memory Instant). All time
bookkeeping uses absolute Unix seconds so it round-trips through DynamoDB
cleanly.
Contract. A token manager — reconstructed on each invocation from the session's token fields, because Lambda is stateless — must:
- Track access-token expiry as an absolute Unix timestamp (from the JWT
expclaim), not a relativeexpires_in, so it round-trips through DynamoDB without rebasing. - Expose a
needs_refresh(now)check with a safety margin (default 60 s) so a token about to expire during the current turn gets refreshed first. - Always persist the new refresh token returned by Keycloak — refresh tokens are single-use under
refreshTokenMaxReuse: 0. - Retry on transient failures (network / 5xx) with short backoff; fail fast on
400/401(refresh token dead) and403(client misconfigured).
Error classification. The refresh path collapses HTTP and network outcomes into three categories, each with a different policy:
| Category | Cause | Retry? | Maps to |
|---|---|---|---|
| Transient | Connection timeout, DNS, 5xx | Yes (≤2 retries, short backoff) | KeycloakUnavailable after budget |
| TokenExpired | 400 / 401 — refresh token invalid or SSO session timed out | No | SessionExpired |
| Unauthorized | 403 — IVR client disabled / realm misconfigured | No | ConfigurationError |
Session State in DynamoDB: Token management fields are part of IvrSession (see Section 4.1): access_token, refresh_token, access_token_expires_at, session_started_at.
When to Refresh:
- Before vote submission (critical path): Always refresh if within threshold
- On each Lambda invocation: Check and refresh proactively
- After authentication: Store both tokens and expiry
Refresh Failure Handling Strategy:
| Error Type | Cause | Detection | Retry? | User Message |
|---|---|---|---|---|
| Transient | Network issue, Keycloak restart, load spike | Connection timeout, DNS failure, 5xx errors | Yes (2 retries, 500ms delay) | "We're experiencing technical difficulties. Please try again later." |
| TokenExpired | Idle timeout (>30 min) or max lifespan (>10 hrs) | 400/401 from Keycloak | No | "Your session has expired. Please call back to vote again." |
| Unauthorized | IVR client disabled, realm misconfigured | 403 from Keycloak | No | "The voting system is temporarily unavailable. Please try again later." |
Auth error enum. Mapped from the categories above. Keep as an enum; don't collapse into strings or booleans.
pub enum AuthError {
/// 400 / 401 — refresh token invalid or SSO session timed out.
SessionExpired,
/// Transient network / 5xx after retry budget.
KeycloakUnavailable,
/// 403 — IVR client disabled or realm misconfigured.
ConfigurationError,
}
Critical vs. non-critical refresh policy.
- Vote submission (critical). Proactively refresh immediately before calling Harvest. Any refresh failure is fatal for the call:
SessionExpired→ "your session has expired, please call back";KeycloakUnavailable/ConfigurationError→ emit a critical ops alert (see monitoring below) and play the generic "system unavailable" prompt withshould_disconnect = true. Never submit a ballot with a stale token — a 401 from Harvest mid-submit is harder to recover from than a clean refusal up front. - Non-critical reads (e.g. election-status check). Be lenient: try the current token, and on
401attempt one refresh-then-retry. If the retry also fails, map toSessionExpiredand disconnect.
Both paths share the refresh_token port call and the AuthError classification — the difference is only in how aggressively refresh is attempted and how a failure is surfaced to the voter.
Operational Monitoring:
Critical metrics to track:
ivr.token.refresh.success- counterivr.token.refresh.failure.transient- counter (alerts if spike)ivr.token.refresh.failure.expired- counter (expected, monitor trends)ivr.token.refresh.failure.unauthorized- counter (ALERT immediately)ivr.vote.submission.failed.token_error- counter (CRITICAL alert)
Alerting Rules:
-
CRITICAL:
ivr.vote.submission.failed.token_error> 0 in 5 minutes- Action: Page on-call engineer immediately
- Reason: Voters completing calls but can't submit votes
-
HIGH:
ivr.token.refresh.failure.unauthorized> 5 in 1 minute- Action: Alert ops team
- Reason: IVR client misconfigured or disabled
-
MEDIUM:
ivr.token.refresh.failure.transient> 20% of attempts- Action: Alert ops team
- Reason: Keycloak connectivity issues
Keycloak Configuration Recommendations for IVR: Consider adjusting IVR client-specific settings (can be per-client in Keycloak):
accessTokenLifespan: Could increase to 15-30 min for IVR clientssoSessionIdleTimeout: 60 min for IVR (calls can have pauses)ssoSessionMaxLifespan: Keep at 10 hours (reasonable max call duration)
Implementation Notes:
- Store
refresh_tokensecurely in DynamoDB (encrypted at rest) - Always use the new refresh token after each refresh (single-use policy)
- Log token refresh events (without token values) for debugging
- Monitor refresh failure rate as operational metric
Scope of this section. Everything above describes the voter token lifecycle — tokens issued to the calling voter via the Direct Grant (ROPC) flow against the ivr-voting client. The Lambda also needs service-auth tokens for its two pre-authentication reads against the realm — the blacklist query (§6.3) and the /ivr-config auth-discovery call (§5.1.2). These service tokens belong to no voter and are obtained via Keycloak's client_credentials grant against a separate platform IVR service client (ivr-service, §C.8.b). That path is a distinct TokenManager::get_service_token(realm) method with a narrower signature — no refresh tokens, no session fields, no DynamoDB persistence — cached per-realm and reused across both pre-auth reads in the same turn. The path is specified in §6.3. Keeping the voter and service paths in separate methods (and reusing only the AuthError taxonomy) prevents service credentials from accidentally flowing through the voter code path, and vice versa.
5.2 Check Election Status via Hasura GraphQL
Election structure, contests, and candidates are loaded from the published S3 data (see 5.1.8). However, the published S3 data is a static snapshot where voting_status is always "OPEN". The IVR Lambda needs to query Hasura to check the real-time status of telephone voting before proceeding. This is the same mechanism the voting portal uses (GET_ELECTION_EVENT query).
Endpoint: POST https://\{HASURA_DOMAIN\}/v1/graphql
GraphQL Query:
query GetElectionEventStatus($eventId: uuid!) {
sequent_backend_election_event_by_pk(id: $eventId) {
status
}
}
The status field is a JSON object containing per-channel statuses:
{
"voting_status": "OPEN",
"kiosk_voting_status": "CLOSED",
"early_voting_status": "CLOSED",
"telephone_voting_status": "OPEN"
}
Purpose: Verify that telephone voting is currently open. The Lambda checks the telephone_voting_status field:
OPEN→ proceed with votingCLOSED/NOT_STARTED→ playelection_closedprompt and disconnect
When called: After authentication (JWT required), before entering the ballot loop.
Note: This is a UX optimization to fail early with a clear message. The backend also validates channel status during insert_cast_vote via status_by_channel(voting_channel), so a vote submitted to a closed telephone channel would be rejected regardless.
5.3 Cast Vote via Harvest API
IVR Lambda calls the Harvest API directly to submit encrypted ballots.
Endpoint: POST https://\{HARVEST_DOMAIN\}/insert-cast-vote
Input Structure:
{
"ballot_id": "...",
"election_id": "...",
"content": "{encrypted_ballot}"
}
Headers:
Authorization: Bearer \{jwt\}- JWT must have
azp: "ivr-voting"to identify TELEPHONE channel - Harvest extracts
area_idfrom JWT claims
5.4 Backend Error Handling for Vote Submission
Overview: Backend (Harvest) validates all vote submission rules — revote limits, channel enablement, ballot-hash integrity, area/election scoping. The IVR Lambda treats Harvest as the source of truth and maps its rejection variants to voter-facing prompts.
Source of truth. The authoritative error set lives in
CastVoteError
and is surfaced over HTTP by
packages/harvest/src/routes/insert_cast_vote.rs.
The IVR adapter classifies each variant into one of three adapter outcomes and
does not invent codes of its own.
| Adapter outcome | Domain mapping | Voter-facing effect |
|---|---|---|
Per-election rejection (Harvest returned a CastVoteError variant whose meaning is scoped to this one election) | Record on ElectionSubmissionResult, play the matching prompt, continue to next election | Announce via prompt; do not disconnect |
| Network / read timeout before any response | Fatal system error | Play system_error prompt and disconnect |
| Other transport failure | Generic transport error (§8.2) | Play system_error prompt and disconnect |
Relevant CastVoteError variants and their prompt keys. Only the
variants reachable on the IVR submission path are listed; every other variant
(e.g. DeserializeBallotFailed, BallotSignFailed, GetDbClientFailed) is an
internal/infra failure that collapses to vote_failed plus a raw-code log
entry for ops.
CastVoteError variant | Prompt key | Notes |
|---|---|---|
CheckRevotesFailed(_) | duplicate_vote when the election runs with max_revotes = 1 (the Canadian municipal default); max_revotes_exceeded otherwise | Today this is the only signal that a voter has already cast a ballot in this election — there is no dedicated DuplicateVote variant (see design note below). The adapter reads the election's max_revotes from the session-cached publication to decide which prompt to play |
InsertFailedExceedsAllowedRevotes | same as CheckRevotesFailed | Race-condition surfacing of the same business rule at INSERT time. Mapped identically |
CheckVotesInOtherAreasFailed(_) | vote_failed with a raw-code log entry | Means the voter has already voted for this election in a different area. Rare on IVR (area is derived from the voter's own identity, not caller input); treat as vote_failed until product decides whether a dedicated prompt is warranted |
VotingChannelNotEnabled(_) | election_closed | The election's TELEPHONE channel flipped off between the eligibility_check and the ballot submission — the same prompt used for the pre-flight channel check in §5.2 applies |
BallotIdMismatch(_) | vote_failed (fatal for this election) | Cannot recover by re-encrypting (would defeat the retry-idempotency invariant in §9.3). Log as a hard integrity failure |
CheckPreviousVotesFailed(_) / CheckStatusFailed(_) / CheckStatusInternalFailed(_) / CheckRevotesFailed query-layer errors | vote_failed | These are database/query failures inside the pre-insert checks, not business-rule rejections. Per-election outcome, not fatal to the call |
AreaNotFound / ElectionEventNotFound(_) / ElectoralLogNotFound(_) | system_error (disconnect) | Indicates a config/routing mismatch between the session and Harvest — nothing the voter can do about it, and the same mismatch will hit every subsequent election. Fail the call, alert ops |
All other variants (InsertFailed, CommitFailed, Deserialize*Failed, BallotSignFailed, UnknownError, …) | vote_failed | Per-election fallback; raw variant name goes into the structured log |
Mapping is implemented as an exhaustive match in the adapter (not string
comparisons in domain code) so a new CastVoteError variant added upstream
surfaces as a compiler warning here.
Design note — first-class
DuplicateVote. Today, "voter already cast a ballot in this election" is not a dedicatedCastVoteError; it is inferred fromCheckRevotesFailedundermax_revotes = 1(or fromInsertFailedExceedsAllowedRevoteson the INSERT-time race). Every channel — portal, kiosk, IVR — has to duplicate the same "readmax_revotes, decide which message to show" logic. A small, self-contained improvement worth doing alongside the IVR rollout is to add aDuplicateVotevariant toCastVoteErrorinwindmill/src/services/insert_cast_vote.rs, emitted whenmax_revotes = 1and a previous ballot exists.CheckRevotesFailedthen genuinely means "the voter has exceeded the allowed number of revotes" in elections that permit more than one, which is what its name suggests. The IVR adapter table above collapses to one line per variant with no max-revotes conditional; the portal and kiosk benefit equally. Tracked as a follow-up — seeCastVoteRejection::DuplicateVotebelow, which is already wired for that future variant.
/// IVR-adapter error. Distinct from the generic transport error
/// (§8.2) so per-election rejections are a first-class variant.
pub enum CastVoteAdapterError {
/// Harvest returned a `CastVoteError` variant (see the mapping
/// table above). The adapter has already collapsed it into the
/// prompt-ready `CastVoteRejection`.
Rejected(CastVoteRejection),
/// Network / read timeout before any response from Harvest.
Timeout,
/// Other transport failure (DNS, TLS, unexpected 5xx, malformed body).
Transport(String),
}
/// Prompt-ready classification of a per-election Harvest rejection.
/// Shapes match the i18n prompt keys in Appendix D, not the raw
/// `CastVoteError` variant names — the adapter bridges the two.
#[derive(Debug, PartialEq, Eq)]
pub enum CastVoteRejection {
/// Voter has already cast a ballot in this election
/// (today: `CheckRevotesFailed` / `InsertFailedExceedsAllowedRevotes`
/// when `max_revotes = 1`; future: a dedicated `DuplicateVote`
/// variant per the design note above).
DuplicateVote,
/// Voter has exhausted the configured revote budget
/// (`max_revotes > 1` case).
MaxRevotesExceeded,
/// TELEPHONE channel was closed between eligibility check and submit.
ChannelClosed,
/// Ballot-hash integrity failure — cannot recover for this election.
BallotIdMismatch,
/// Any other `CastVoteError` variant not classified above. Played
/// as `vote_failed`; the raw variant name is logged for ops.
Other(String),
}
Error Prompts:
Backend errors use prompt keys from i18n[lang]["ivr"]. Per-election rejections are announced but do not end the call — the ElectionSubmit sub-phase reports the error and the ballot loop advances to the next election:
duplicate_vote: "You have already voted in this election." (continue to next election)max_revotes_exceeded: "You have reached the maximum number of allowed votes for this election." (continue to next election)election_closed: "Telephone voting is not currently open for this election." (continue to next election)vote_failed: "We were unable to record your vote. Please try again later." (continue to next election)
Fatal errors (network timeout, session expired, Keycloak unavailable, area/election config mismatch) disconnect immediately since they affect all elections.
Simplicity:
- No frontend filtering needed
- Backend is source of truth — Harvest validates per-election
- IVR translates
CastVoteErrorvariants into user-friendly messages - Each election is submitted independently; one failure does not block others
6. Multi-Tenancy & Municipality Discrimination
6.1 Phone Number to Election Event Mapping
Each election event gets its own dedicated Amazon Connect phone number. The Lambda looks up the dialled number in the phone-config file (§6.2) and resolves it directly to (tenant_id, election_event_id) — no IVR-level "which municipality?" menu, no shared numbers. A municipality running multiple concurrent elections therefore operates multiple numbers, one per event. This keeps the voter-facing experience as short as possible (the caller reaches the right ballot on connect, without an extra menu) and makes routing, blacklisting, and per-event metrics trivially scoped by the dialled number.
6.2 Phone Number Configuration File
Location: s3://<ivr-routing-bucket>/ivr-phone-config.json — a single JSON file in a versioning-enabled S3 bucket, authored through gitops (§16.2) and read by the Lambda. Not a DynamoDB table.
Why S3 rather than DynamoDB. This is static routing config, not runtime state: a small number of Sequent-owned DIDs mapping to tenants and cluster URLs, edited infrequently by operators, and never written by the Lambda. DynamoDB's point-read strengths (high-throughput keyed access, conditional writes, TTL) buy nothing for this access pattern — the Lambda loads the whole file once per cold start and caches it in-process — while S3 gives, for free, the properties that actually matter here: native object versioning (every prior revision retained indefinitely, not just a 35-day PITR window), CloudTrail PutObject audit trail naming the principal and version, atomic whole-file writes (no torn multi-row updates), and the same 11-nines durability plus bucket-level lifecycle tooling we already use for the published ballot publication (§5.1.8). It is also the pattern the Lambda already knows: "fetch versioned static JSON from S3, cache in-process." Adding another DynamoDB table is new machinery; adding another S3 path is a trivial extension of an existing one.
/// One entry in the routing file. The file itself is a JSON array of
/// these (wrapped in `{ "entries": [...] }` for future-compatibility).
#[derive(Serialize, Deserialize)]
pub struct PhoneConfig {
/// Lookup key — E.164 format, e.g. "+14165551234".
pub phone_number: String,
// What this number resolves to
pub tenant_id: Uuid,
pub election_event_id: Uuid,
// Cluster + region routing (see "Multi-cluster / multi-region" below)
pub cluster_id: String, // e.g. "prod1-euw1", "googleinfra-euw4"
pub region: String, // AWS region hosting the cluster, e.g. "eu-west-1"
pub environment: String, // e.g. "qa", "staging", "cixug"
// Full set of per-cluster URLs — snapshot into IvrSession at session init
pub keycloak_url: String, // https://keycloak.{env}.{cluster}.sequentech.io
pub harvest_url: String, // https://harvest.{env}.{cluster}.sequentech.io
pub hasura_url: String, // https://hasura.{env}.{cluster}.sequentech.io
pub s3_public_base_url: String, // https://{public-bucket}.s3.amazonaws.com
/// First-turn language before `LanguageSelect` runs.
pub default_language: Language,
/// Allowlist flag. Disabled or missing entries are rejected at session init.
pub enabled: bool,
}
Bucket configuration — non-negotiable.
- Versioning: enabled. Every
PutObjectpreserves the prior version; accidental deletes are recoverable withs3api restore-object/ console. This is the backup story — no separate PITR decision needed. - Deletion protection: enabled via bucket policy denying
s3:DeleteBucketto every principal except a break-glass admin role. Bucket-level deletion-protection Terraform flag (force_destroy = false) on the resource. - Block public access: all four settings on. The routing file carries cluster URLs but is not voter data; still, public access has no legitimate use.
- Server-side encryption: SSE-S3 (default) is sufficient — the file has no secrets, just URLs and IDs.
- CloudTrail data events: enabled for this bucket so every
PutObjectandGetObjectis audit-logged with principal, version, and timestamp. This is the audit trail that DynamoDB only provides via the expensive data-plane-CloudTrail path.
Lambda IAM — strictly read-only. The IVR Lambda execution role has exactly one action on this bucket: s3:GetObject on ivr-phone-config.json. No PutObject, no DeleteObject, no ListBucket. Separating the read-only routing bucket from the read-write sessions table under distinct IAM statements means that a Lambda compromise cannot rewrite phone-number routing (cf. §4.8 of the review). Writes come from one place only: the gitops CI role, used by Atlantis when applying phone-map.yaml.
Cache TTL. Cold Lambda containers fetch the file on first use and cache it in-process for IVR_PHONE_CONFIG_CACHE_TTL_SECONDS (default 300, i.e. 5 minutes — matching the pattern established by IVR_CONFIG_CACHE_TTL_SECONDS in §5.1). Setting it to 0 forces every cold-start to re-fetch. Warm containers refresh the cache lazily on the next lookup after the TTL elapses; the lookup itself continues to serve the cached copy during the refresh so a slow S3 request never stalls a call. Propagation window. A routing-config edit takes full effect within one TTL plus any warm-container lifetime — typically well under 10 minutes. For an urgent change (wrong DID mapped to wrong tenant), ops can drop the TTL to 0 and cycle Lambda aliases to force a full refresh. Document this propagation window to operators explicitly so manual edits don't come with "but I just uploaded the file, why isn't it live?" surprise.
Edit workflow. The authoritative copy lives in gitops (§16.2); Atlantis's apply uploads it to S3 and bumps the S3 object version. Direct S3 console edits are permitted as a break-glass mechanism but go through CloudTrail, and the gitops repo file is the canonical source — any direct edit must be reconciled into gitops before the next apply, or it will be overwritten on the next reconcile. Concurrent editors are not a failure mode in practice (edits go through PR review), but if two uploads race, S3 object versioning keeps both; the loser can be restored.
Multi-cluster / multi-region support.
A single IVR Lambda deployment serves every cluster and every region that hosts a Sequent environment. Each entry in ivr-phone-config.json carries the full set of per-cluster routing URLs (Keycloak, Harvest, Hasura, public S3) plus cluster_id and region labels, so the Lambda looks up the dialled number and dispatches every downstream call to the cluster that owns that election event — whether that cluster is in prod1-euw1, prod2-use1, googleinfra-euw4, or anywhere else. Clusters are infrastructure groups (e.g. prod1-euw1, prod2-use1, testing-euw1); environments are tenants/deployments within a cluster (qa, dev, staging, cixug). Both dimensions are carried in the phone-config entry.
In practice we may start with a single cluster hosting all IVR-enabled events, but the configuration schema and the Lambda dispatch must support the multi-cluster / multi-region case from day one — we do not want to retrofit routing when a second cluster is added mid-election-season.
Isolation.
- Cluster-level — the phone-config entry is the only place that binds a dialled number to a cluster's URLs; a misconfigured entry cannot leak calls to the wrong cluster because Keycloak/Hasura/Harvest tokens are cluster-scoped.
- Environment-level — Keycloak realms (
tenant-{id}-event-{id}) provide tenant isolation; URLs are environment-scoped. - Phone-level — only entries with
enabled: trueare accepted at session init; a missing or disabled entry rejects the call before any authentication is attempted.
6.3 Phone Blacklist (Hasura-Backed)
The blacklist_check phase consults a Hasura table, not DynamoDB. The blacklist is domain data — it is managed alongside the rest of the election event by the same admin users who manage voters, and it benefits from Hasura's row-level authorization, audit trails, and migration tooling rather than being a sidecar AWS table owned by the IVR.
What needs to be built:
-
Hasura table
sequent_backend.ivr_phone_blacklistwith columns:phone_number(E.164, primary key or unique per tenant)tenant_id(FK)election_event_id(nullable — blacklist can be scoped to an event or tenant-wide)reason(optional free text)created_at,created_by
-
Hasura permissions — a new permission (e.g.
can_manage_phone_blacklist) granted to admin roles that should be able to CRUD blacklist entries. Scoped to their tenant. -
Harvest endpoints to create, list, and delete blacklist entries (these wrap the Hasura mutations with the existing permission-check middleware). The IVR Lambda reads the blacklist with a service-account JWT obtained via Keycloak's
client_credentialsgrant — the Lambda authenticates as a dedicated platform IVR service client (ivr-service, distinct from the voter-facingivr-votingclient; see §C.8.b) that is installed identically (sameclient_id, sameclient_secret) in every IVR-enabled realm. The client secret lives in AWS Secrets Manager (rotatable without Lambda redeploy); the Lambda reads it once at cold start. Because Keycloak realms are trust boundaries, each realm still signs its own access token — but the credential material is uniform across realms, so the Lambda's service-auth path has one grant flow, one set of credentials, and a token cache keyed by realm (not by(tenant, realm, credentials)). A new Hasura permissioncan_read_phone_blacklistis granted only to this service client's service-account role mapping (see §C.8.b);can_manage_phone_blacklistcontinues to gate CRUD from the admin portal. The same service-account role also gates the Lambda's/ivr-configauth-discovery read at session init (§5.1.2) — one principal, one role, one token cache, two pre-auth reads. This avoids exposing an anonymous blacklist oracle (which would leak moderation decisions, conflict with PIPEDA, and create a harassment vector).Why
client_credentials, notpasswordgrant. Both are realistic designs — a shared service user with ROPC would also work — butclient_credentialshas no user account to rotate, no voter-grade ROPC code path carrying service credentials (keeping voter auth and IVR-internal auth in strictly separate code paths, with no accidental cross-wiring), and no refresh-token lifecycle:client_credentialshas no refresh token, so the service-auth path just re-requests a fresh access token when the cached one'sexpis within the safety margin. Blast radius is identical in the two designs (one stolen secret compromises blacklist-read across every IVR-enabled realm), so there is no security regression from the simpler shape.TokenManager port signature — service-auth path. The existing
TokenManager(§5.1.9) handles voter tokens via ROPC + refresh. The service-auth path is a distinct concern and is modelled as a separate trait method (or a secondTokenManagerflavour — an implementation decision, not a contract one) with a narrower signature:get_service_token(realm) -> Result<AccessToken, IvrError>. The implementation fetchesclient_credentialsagainst the named realm using the shared client secret, caches the resulting token keyed by realm untilexp - safety_margin, and re-fetches on expiry. No refresh-token bookkeeping. Error classification reuses the voter-authTokenManager's three-category map (transient / auth / config) so the handler has one error-taxonomy, not two.Cold-start latency. The reviewer correctly flagged that the first call into a freshly-scaled-out Lambda container pays one round-trip to Keycloak for the service token. Quantify before treating this as a problem: a single
client_credentialsPOST to Keycloak in-region is typically well under 100 ms, and it happens once per cold container per realm. For the provisioned-concurrency tier the Lambda uses during an election event, cold starts are bounded. If benchmarks on representative hardware show the hop dominates first-call latency, the fallback is to move the blacklist query off Keycloak entirely — sign a short-lived service JWT in-Lambda with a private key held in Secrets Manager and have Hasura verify it directly. That eliminates the Keycloak round-trip at the cost of introducing a JWT-signing mechanism that does not exist elsewhere in Sequent today; treat it as a Phase-2 optimization contingent on measured latency, not a default. -
Admin-portal UI — a "Phone Blacklist" management view under the Election Event settings, with list + add + remove actions, tied to the new Hasura permission.
Why not DynamoDB? Same reason auth config went to Keycloak: it belongs to the domain. Putting it in DynamoDB would duplicate responsibilities, bypass Hasura's permission/migration/audit pipeline, and force the admin portal to talk to two different backends for data that is logically part of the election event. One source of truth wins.
Why not part of the published ballot publication (S3)? Because blacklists change more often than ballots are published, and an admin needs to be able to block a phone mid-election without re-running the ballot publication pipeline. Keep the publication immutable and artifact-like; keep the blacklist mutable and operational.
7. Internationalization (i18n) & IVR Prompts
7.1 Leveraging Existing Infrastructure
The platform already supports:
telephonechannel inVotingChannelsstruct (packages/sequent-core/src/types/hasura/core.rs,pub struct VotingChannels)- i18n pattern via
presentation.i18nwith nested structure\{lang: \{key: value\}\} - Per-election presentation via
ElectionPresentation(packages/sequent-core/src/ballot.rs,pub struct ElectionPresentation) - Per-event presentation via
ElectionEventPresentation(packages/sequent-core/src/ballot.rs,pub struct ElectionEventPresentation) - Channel-based authorization via the JWT
azpclaim, mapped intoVotingStatusChannelbyauthorize_voter_electionand theAzpClient::to_voting_channelresolver — portal clients fan out into ONLINE vs EARLY_VOTING via the area's early-voting window (packages/sequent-core/src/services/authorization.rs; see Appendix C.7 for the exhaustive match)
7.2 IVR Prompt Storage - Inside Existing i18n Structure
Key Decision: IVR prompts and IVR-only spoken-text overrides are stored inside the existing presentation.i18n object under an "ivr" key. This keeps all translations in one place and follows Felix's recommendation.
Structure Overview
No changes are needed to the existing presentation structs. ElectionEventPresentation, ElectionPresentation, ContestPresentation, and CandidatePresentation already expose the nested i18n shape that IVR can reuse for both prompt keys and spoken-text overrides:
pub struct ElectionEventPresentation {
pub i18n: Option<I18nContent<I18nContent<Option<String>>>>,
// ... existing fields ...
// NO separate ivr_prompts field needed
}
Storage Pattern
IVR strings are nested inside i18n under the "ivr" key:
presentation.i18n = {
"en": {
"name": "Election Name",
"description": "Portal-facing description",
"alias": "Election Alias",
"ivr": { // ← IVR prompts + IVR-only spoken-text overrides
"name": "Election name optimized for telephone readback",
"description": "Telephone version of the election description",
"greeting": "Welcome...",
"auth_enter_username": "Please enter your voter ID...",
"auth_enter_password": "Please enter your PIN...",
...
}
},
"fr": {
"name": "Nom de l'élection",
"ivr": {
"greeting": "Bienvenue...",
...
}
}
}
At contest and candidate scope, the same pattern lives under presentation.i18n[lang]["ivr"], leaving the existing name_i18n / description_i18n fields untouched for the voting portal while giving IVR an override path when the spoken version needs to differ.
IVR-Only Spoken Text Overrides
The ivr namespace is an override system, not a second full copy of the translation tree. If an IVR-only value is absent, the Lambda falls back to the normal portal text.
Typical keys:
namealiasdescription
Example candidate override:
{
"presentation": {
"i18n": {
"en": {
"ivr": {
"name": "<lang xml:lang=\"fr-CA\">Jean-François Côté</lang>"
}
}
}
}
}
In that example, the voting portal can continue showing the regular English or bilingual candidate name, while IVR gets a spoken-only override tailored for text-to-speech.
Mixed-Language Readback with SSML
Amazon Polly supports SSML <lang xml:lang="..."> tags, and Amazon Connect supports passing SSML prompts through to Polly. That makes it reasonable to allow SSML fragments directly inside IVR overrides and prompt templates for short mixed-language phrases such as:
<speak>You selected <lang xml:lang="fr-CA">Jean-François Côté</lang> for Mayor.</speak>
Design note:
- IVR overrides and prompt templates may contain SSML fragments such as
<lang>,<phoneme>, or<say-as> - If any resolved string contains SSML markup, the final rendered prompt should be sent to Polly as SSML and wrapped once in
<speak>...</speak> - This is best suited to names and short phrases; Polly's
langtag changes pronunciation rules, but many voices will still sound accented rather than fully native unless a bilingual voice is used
SSML sanitizer & allowlist (required)
SSML in prompts is a trust boundary: i18n overrides are admin-editable, and raw SSML is effectively "arbitrary instructions to the TTS engine." An admin with prompt-edit permission could otherwise make Polly say fabricated candidate names, inject instructions that contradict the ballot, or insert <break time="30s"/> filibusters that stall a call. Plain-text portal values interpolated into an SSML template can also contain <, &, or " and silently break the whole prompt. Both problems are addressed by a single pipeline component — the SSML renderer — which every prompt sent to Polly MUST pass through. No code path should construct an SSML string and hand it to Polly without going through this renderer.
The renderer has three responsibilities, applied in order:
-
Template interpolation with typed slots. Templates distinguish two marker styles:
{var}— escaped slot. The substituted value is XML-escaped (<,>,&,",'→ entities) before insertion. This is the default and applies to all placeholder variables listed in Appendix D ({candidate_name},{election_name},{contest_name},{number},{confirmation_number}, etc.) — even when the resolved value happens to contain SSML-looking characters, they are treated as literal text.{{ssml:var}}— SSML include. The substituted value is resolved from the same i18n scope chain as the template and is passed through without escaping, but still goes through the tag allowlist (step 2). This is the only path for mixed-language names such as<lang xml:lang="fr-CA">Jean-François Côté</lang>to reach Polly intact. Recursive resolution is bounded (max depth 3) to prevent cyclic overrides.
-
Tag allowlist. After interpolation, the renderer parses the resulting fragment as XML and strips any element not in the allowlist. Attributes are also allowlisted per tag. Anything outside the list is dropped (element removed, inner text preserved) rather than escaped — silent degradation to plain text is safer than surfacing broken markup to Polly at runtime.
Tag Allowed attributes Rationale speak— Root wrapper; renderer always emits exactly one at the outermost level. langxml:lang(value matched against a static locale allowlist:en-CA,en-US,fr-CA,fr-FR)Mixed-language name readback. phonemealphabet(ipaorx-sampa),phPronunciation overrides for names. say-asinterpret-as(characters,spell-out,digits,telephone,date,time),formatBallot-locator and date/number readback. breaktime(capped at 2 s by the renderer regardless of the input value),strengthPacing. The time cap prevents long-pause filibuster; longer pauses must be composed of multiple shorter breaks, which show up clearly in the audit log. subaliasAbbreviation expansion. p,s— Paragraph / sentence pacing. Every other SSML tag that Polly supports (
prosody,emphasis,voice,audio,mark,w, etc.) is stripped.voiceandaudioin particular are explicitly out of scope — changing voice mid-prompt or injecting external audio would make voter-audit of prompts much harder and has no justified use in a ballot readback. -
Wrap and emit. The sanitized fragment is wrapped once in
<speak>…</speak>(the renderer strips any caller-supplied outer<speak>before wrapping, so double-wrapping is not possible). The final string is what is sent to Polly and what is recorded in the electoral audit log (§9.3) so post-election review can replay exactly what each voter heard.
Fail-loud vs fail-soft. Malformed XML in a template (unbalanced tags after interpolation) is a fail-loud error: the renderer returns a domain error, the prompt is not sent to Polly, and the handler falls back to system_error. Unknown tags and attributes are fail-soft (stripped with a WARN-level structured log recording the dropped tag name and the prompt key) — this keeps a single bad override from taking down a live call while still surfacing the misconfiguration for ops.
Admin-portal editor requirement. The prompt editor in the admin portal (§7.4) MUST invoke validate_ivr_subtree + the same sanitizer on save (and ideally on keystroke, for inline feedback), before the value can be persisted. The editor consumes and produces a TypedIvrScope (§7.2 Rust Type), not a raw serde_json::Value, so the sanitizer operates on validated IvrTemplate values rather than hunting through untyped JSON. Both the validator and the sanitizer live in sequent-core, where the Lambda and the admin portal (via its WASM build) share them — do not implement either twice.
Sanitization is a pure function; audio preview is not part of it. The sanitizer takes (IvrTemplate, values, scope) → sanitized SSML String. It is pure, WASM-compatible, and has no AWS dependency — which is exactly why it can live in sequent-core. Polly synthesis is an AWS adapter call; it cannot live in sequent-core (WASM toolchain, credential boundary, and the "sequent-core holds domain, not adapters" rule all forbid it). The admin portal therefore does not render a Polly audio preview in the initial release (§7.4): the editor's contract is "validated, sanitized text in, validated, sanitized text out." Listening to what a voter would hear is a separate concern — exercised via the step-ivr CLI (§15.2.1) or end-to-end test calls (§15.4), not through the admin portal.
Testing. The sanitizer gets its own unit-test suite (tag allowlist, attribute allowlist, locale allowlist for xml:lang, break time cap, depth bound on {{ssml:…}} recursion, escape correctness for every placeholder in Appendix D, malformed-XML handling). Record-and-replay fixtures (§15.2) assert the final sanitized SSML string, not just the prompt key, so regressions in the renderer surface immediately.
Track this as part of the IVR prompt editor work, not as a separate ticket — the editor, escaper, and allowlist are one unit.
Official references:
Rust Type: Validated IVR Sub-Tree
Storage stays compatible with the existing I18nContent<I18nContent<Option<String>>> shape (the sub-tree under "ivr" is carried as serde_json::Value on the wire), but every consumer — Lambda, admin editor, SSML sanitizer — reads the sub-tree through a single validator that produces a strongly-typed intermediate. The untyped value never reaches domain code; it is an implementation detail of the serialization boundary.
/// Typed view of presentation.i18n[lang]["ivr"] for one scope
/// (event / election / contest / candidate). Produced by
/// `validate_ivr_subtree` — no code path constructs one directly.
pub struct TypedIvrScope {
/// Prompt overrides recognised by this Lambda version, keyed
/// by the prompt-key enum. Absence means "fall back to the
/// next scope up, or the built-in default".
pub prompts: BTreeMap<IvrPromptKey, IvrTemplate>,
/// Spoken-text overrides for this entity (`name`, `alias`,
/// `description`). Meaningful only on entity scopes.
pub overrides: IvrSpokenOverrides,
/// Keys we did not recognise — preserved verbatim so an older
/// admin-portal build cannot drop keys introduced by a newer
/// Lambda. Not rendered by this Lambda version; logged once
/// per publication load at INFO with the full key list.
pub unknown: BTreeMap<String, String>,
}
/// Validated prompt template — still a `String`, but the
/// placeholder set (`{var}`) and SSML allowlist have already been
/// checked. `contains_ssml` lets the renderer skip the XML parse
/// on pure-text prompts.
pub struct IvrTemplate {
pub raw: String,
pub contains_ssml: bool,
}
pub struct IvrSpokenOverrides {
pub name: Option<IvrTemplate>,
pub alias: Option<IvrTemplate>,
pub description: Option<IvrTemplate>,
}
/// The validator boundary. Called at **admin-save time** by the
/// prompt editor (so malformed input fails loudly before it ever
/// reaches the publication), and at **publication-load time** by
/// both the Lambda and the ballot-verifier as a defence-in-depth
/// parse (so a publication produced by an older admin-portal
/// cannot feed unsanitised markup to Polly). The two call sites
/// MUST produce identical output for identical input — enforced by
/// fixture tests that feed the same raw JSON through both paths
/// and assert the `TypedIvrScope` is equal.
pub fn validate_ivr_subtree(
raw: &serde_json::Value,
scope: IvrScope,
lang: Language,
) -> Result<TypedIvrScope, IvrValidationError>;
/// Thin loader used by the Lambda at publication-load time.
fn load_ivr_scope(
i18n: &serde_json::Map<String, serde_json::Value>,
lang: &str,
scope: IvrScope,
) -> Result<TypedIvrScope, IvrValidationError> {
let raw = i18n.get(lang)
.and_then(|lang_content| lang_content.get("ivr"))
.cloned()
.unwrap_or(serde_json::Value::Object(Default::default()));
validate_ivr_subtree(&raw, scope, lang.parse()?)
}
Adding a new prompt key means adding a variant to IvrPromptKey — a one-line change in sequent-core that the compiler then propagates to every match site. The admin-portal editor and the Lambda resolver pick up the new key at the same time because they both consume TypedIvrScope; they never hand-parse serde_json::Value.
Type-system note — where the
I18nContentshape starts and stops. The publishedI18nContent<T>type alias insequent-core::ballot.rsisHashMap<String, T>whereTdefaults toOption<String>, so the portal-facing presentation types use shapes likeOption<I18nContent<I18nContent<Option<String>>>>(lang → key → leaf string). The IVR"ivr"value is a nested object, not a leaf string, so it does not fit that pre-existing shape. Three ways to reconcile this, in increasing blast-radius order:
- Leak
serde_json::Valueeverywhere. Have every IVR consumer — the admin editor, the SSML sanitizer, the Lambda resolver — hand-parsei18n[lang]["ivr"]as untyped JSON. Rejected: the "typed dispatch" selling point of the i18n structure evaporates for the IVR sub-tree, the published shape silently diverges from what the Rust types describe, and every consumer re-implements the same schema with slightly different bugs.- Validated boundary (chosen). Keep the
serde_json::Valueonly at the serialization boundary and define a validator —validate_ivr_subtreeabove — that every consumer calls. The Rust types fully describe the sub-tree (TypedIvrScope,IvrPromptKey,IvrTemplate,IvrSpokenOverrides); the untyped value is an implementation detail of the two wire-boundary points (admin save and publication load). Cost: one extra parse per save/load, plus keeping the validator in lock-step with the prompt-key set — both small and localised.- Widen the leaf type of
I18nContent<T>(e.g. to anuntaggedenum ofString | Map), so the sub-tree fits natively underI18nContent<I18nContent<…>>. The right answer in a greenfield codebase but touches every existing consumer ofI18nContent<I18nContent<…>>in sequent-core, admin-portal, voting-portal, and ballot-verifier. Tracked as a follow-up meta issue; option 2's validator is the exact migration boundary that work would need, so option 2 is not throwaway scaffolding — it is the seam. Not a blocker for the IVR MVP.The validator in option 2 is the single authoritative description of what
i18n[lang]["ivr"]may contain; the Rust type aliases above are its codomain. No domain code should accept or produce aserde_json::Valuefor this sub-tree outside of that one function.
Benefits of This Approach
- All IVR strings in one place - no separate
ivr_promptsorivr_overridesfield - Backward compatible - missing
"ivr"key means no IVR prompts (use defaults) - Follows existing pattern - same structure as
"name","alias", etc. - Override-based - only spoken differences need to be entered; everything else falls back to portal text
- Extensible with typed well-known keys - adding a well-known prompt means one
IvrPromptKeyvariant insequent-core; deployment-specific custom keys ride the overflowunknownmap with no code change (§7.2) - Admin portal simplicity - edit within existing i18n editor
7.3 Example: Barrie-Style Full Configuration
ElectionEvent presentation (complex Barrie-style deployment with declaration, receipt, etc.):
{
"presentation": {
"ivr": {
"flow": [
{ "phase": "blacklist_check" },
{ "phase": "language_select" },
{ "phase": "announcement", "name": "welcome", "prompt_key": "greeting" },
{ "phase": "auth" },
{ "phase": "eligibility_check" },
{ "phase": "announcement", "name": "declaration", "prompt_key": "declaration_text", "accept_key": "2" },
{ "phase": "announcement", "name": "pre_voting_statement", "prompt_key": "pre_voting_statement" },
{ "phase": "ballot_loop", "receipt_format": "phonetic_hex_4" },
{ "phase": "goodbye" }
],
"retry_limits": { "auth": 3, "invalid_input": 3, "timeout": 3 },
"assistance_phone": "1-800-555-0199"
},
"i18n": {
"en": {
"name": "City of Barrie 2025 Municipal Election",
"ivr": {
"greeting": "Welcome to the phone voting service for the City of Barrie 2025 Municipal Election.",
"language_select": "For English, press 1. Pour le français, appuyez sur 2.",
"auth_enter_username": "Using your touch-tone phone, please enter your voter ID followed by the number sign key.",
"auth_enter_password": "Using your touch-tone phone, please enter your date of birth using two digits for the month and day, and four digits for the year. Please press the number sign key following your date of birth entry.",
"auth_failed": "Your voting credentials are not valid. Please refer to your voting instructions for the correct voter credentials and try again.",
"auth_max_attempts": "You seem to be having trouble. Please contact the Voter Assistance Line if you need assistance at {assistance_phone}.",
"blacklist_message": "Your telephone number is blocked. For English, please contact the Voter Assistance Line. Pour le français, veuillez communiquer avec la ligne d'assistance aux électeurs. Goodbye.",
"eligibility_check": "The system will now validate your eligibility to vote. One moment please.",
"not_eligible": "You are not authorized to vote in this election. Please refer to your voting instructions and contact the Voter Assistance Line if you need assistance. Goodbye.",
"not_active": "Your voting credentials have been deactivated. Please refer to your voting instructions and contact the Voter Assistance Line if you need assistance. Goodbye.",
"declaration_text": "In accordance with the Municipal Elections Act you are eligible to vote... [full legal declaration text]. Please press 2 to agree with the terms.",
"pre_voting_statement": "If you get disconnected or leave the phone voting process before you submit your ballot, you will need to hang up and call the phone voting system again. Your vote will only be cast once you confirmed all your selections AND submitted your ballot.",
"already_selected": "You have already selected this option. Please enter your next selection now.",
"blank_ballot_confirm": "You have not made a selection therefore your ballot will be cast as blank. To confirm your intent to cast a blank ballot, press the number sign key now. To repeat the list of options press the star key now.",
"decline_confirm": "By selecting 'Decline to vote' you will not vote for any candidate in this election. To submit your declined ballot, press the number sign key now. To not decline and start your selection, press zero key now.",
"summary_intro": "Here is a summary of your selections for {election_name}.",
"summary_item": "For contest {contest_number}, {contest_name}: you selected {candidate_name}.",
"summary_edit_prompt": "Press zero zero pound to submit, or press a contest number followed by pound to change your selection for that contest.",
"summary_edit_restart": "Changing your selection for {contest_name}. Your previous selections for this contest have been cleared.",
"receipt_info": "You are about to be given a 4-character ballot locator for each election. You may choose to write it down for your reference.",
"receipt_number": "Your ballot locator for {election_name} is {confirmation_number}. To repeat, please press the star key.",
"system_error": "We're experiencing technical difficulties. Please try your call again later.",
"invalid_input": "That is an invalid input. Please re-enter your selection.",
"timeout": "We have not detected any input or the number sign key.",
"goodbye": "Thank you for your participation. Goodbye."
}
},
"fr": {
"name": "Élections municipales de Barrie 2025",
"ivr": {
"greeting": "Bienvenue au service de vote téléphonique des élections municipales 2025 de Barrie.",
"auth_enter_username": "Veuillez entrer votre numéro d'électeur suivi de la touche carré.",
"auth_enter_password": "Veuillez entrer votre date de naissance en utilisant deux chiffres pour le mois et le jour, et quatre chiffres pour l'année. Appuyez sur la touche carré après votre saisie.",
"auth_failed": "Vos informations de vote ne sont pas valides. Veuillez vous référer à vos instructions de vote et réessayer.",
"goodbye": "Merci de votre participation. Au revoir."
}
}
},
"language_conf": {
"default_language_code": "en",
"enabled_language_codes": ["en", "fr"]
}
}
}
Simple deployment (voter ID + PIN, no declaration/receipt):
{
"presentation": {
"ivr": {
"flow": [
{ "phase": "language_select" },
{ "phase": "announcement", "name": "welcome", "prompt_key": "greeting" },
{ "phase": "auth" },
{ "phase": "ballot_loop" },
{ "phase": "goodbye" }
]
},
"i18n": {
"en": {
"name": "City of Toronto 2025 Elections",
"ivr": {
"greeting": "Welcome to the City of Toronto telephone voting system.",
"auth_enter_username": "Please enter your 8-digit voter ID followed by the pound key.",
"auth_enter_password": "Please enter your 4-digit PIN followed by the pound key.",
"auth_failed": "The voter ID or PIN you entered is incorrect.",
"goodbye": "Thank you for using the telephone voting system. Goodbye."
}
}
}
}
}
Note that neither example contains an ivr.auth section — the auth step list is no longer part of S3 config. It is fetched at session init from Keycloak's /realms/\{realm\}/ivr-config endpoint (see §5.1). The only auth-related data in S3 is the i18n for the well-known prompt keys (auth_enter_username, auth_enter_password, auth_enter_dob, etc. — see §5.1.3).
Same Lambda code handles both configurations. The Barrie deployment has declaration, blacklist, eligibility check, and a 4-character phonetic ballot locator receipt — all through config. The per-election summary/confirm/submit/receipt cycle is always part of ballot_loop and runs for every election. Which credentials are collected (voter ID + DoB for Barrie, voter ID + PIN for Toronto) is determined entirely by each realm's Direct Grant flow in Keycloak — not by the S3 config.
7.4 Admin Portal Integration
Scope of the admin portal for IVR. The admin portal is a text-only editor for IVR configuration. Concretely, an admin can:
- Edit IVR translations (prompt text and spoken-text overrides) per language, as plain text / SSML fragments — no audio playback, no synthesis, no listen button.
- Configure the flow — reorder / add / remove the big flow blocks (phases) and fill in the subset of per-block fields that are surfaced as typed form inputs.
- Edit the raw IVR JSON through the escape-hatch panel for anything not surfaced as a typed input.
Explicitly out of scope for the initial release: Polly audio preview, in-browser audio playback of prompts, in-browser flow dry-run / transcript preview, any other interaction that would require the admin-portal backend to call Polly or drive the Lambda's flow engine. These would each require a server-side adapter (Polly synthesis, a hosted step-ivr harness) that we are deliberately not building now. If any of these land later they are separate projects with their own design — not implied by this document.
What the admin hears the voter hear is verified through the step-ivr CLI (§15.2.1) and end-to-end test calls (§15.4), not through the admin portal.
When telephone channel is enabled in voting_channels:
ElectionEvent settings → new "IVR Prompts" tab:
- Text fields for event-level prompts and optional spoken-text overrides — including the well-known auth prompt keys (
auth_enter_username,auth_enter_password,auth_enter_dob, etc. — see §5.1.3) - Language tabs from
language_conf.enabled_language_codes - Editor state is a
TypedIvrScopeproduced byvalidate_ivr_subtreeon load and re-validated on save (§7.2). Malformed placeholders, stray SSML tags, and unknown prompt keys surface as inline field errors before the form can be persisted — the untypedserde_json::Valuenever reaches the UI layer. Validation is a pure client-side check (the validator and sanitizer compile to WASM viasequent-core, see §7.2) — no server round-trip, no AWS call
ElectionEvent settings → "IVR Flow" tab:
- Flow pipeline editor (
presentation.ivr.flow) — an ordered list of flow steps with drag-to-reorder, add, and remove controls. Each step surfaces type-specific configuration inline:- Announcement steps (
announcement:welcome,announcement:declaration,announcement:pre_voting_statement, …) — edited structurally: pick the announcement key from a dropdown (sourced from the prompt catalogue), tick whetherexpect_inputis required, and link to the matching prompt in the IVR Prompts tab. No JSON required for the common case - Other step types (
auth,language_select,blacklist_check,eligibility_check,ballot_loop,goodbye, …) — if a step type exposes typed fields today, they appear as form inputs; otherwise the UI falls through to a raw-JSON editor for the step (see below)
- Announcement steps (
- Raw-JSON editor (escape hatch). A collapsible "Edit JSON" panel shows the underlying
presentation.ivr.flowobject. Saving runs the samesequent-coredeserializer the Lambda uses, so any malformed or unknown-phase input fails loudly at save time rather than at runtime mid-call. This lets us support new phase types or unusual shapes immediately inbeyond/Lambda without waiting on an admin-portal release - Retry limits — three separate numeric inputs for
auth,invalid_input,timeout(stored underivr.retry_limits, see §8.1), applied per election event (uniform across all elections in the event) - Assistance phone number and other non-auth settings
Election settings → new "IVR Prompts" section:
- Text fields for election-specific prompts and optional IVR-only
name/alias/descriptionoverrides - Inherits languages from parent event
Contest and candidate editors:
- Optional IVR-only
name/alias/descriptionoverride inputs beside the standard portal text - Empty override fields mean "reuse the portal translation"
Phone Blacklist management view — separate admin portal section (not per-election-event) where operators with the can_manage_phone_blacklist Keycloak permission can add/remove/annotate blacklisted E.164 numbers backed by the sequent_backend.ivr_phone_blacklist Hasura table. See §6.3 for the full data model, Harvest endpoints, and rationale for why the blacklist lives in Hasura rather than in the frozen ballot publication.
What is NOT configured in the admin portal — auth steps. The authentication flow (which credentials to collect, in what order, validated against what) is configured in the Keycloak admin UI for the election event's realm, under Authentication → Flows → IVR Direct Grant Flow. The admin portal intentionally does not duplicate this — there is only one source of truth for auth, and it is Keycloak.
For the common case, the admin portal can link directly to the Keycloak admin URL for the realm's Direct Grant flow to simplify the workflow.
7.5 Lambda Prompt Resolution (Fallback Chain)
Since prompts and spoken-text overrides are flat key/value maps, resolution is a simple key lookup with fallback. The resolver takes a prompt scope — the set of TypedIvrScope views visible on the current turn (candidate / contest / election / event, each produced by validate_ivr_subtree, §7.2) — and walks it narrowest-first, ending at a built-in default bundle. Each caller passes only the scopes that apply on its turn: ContestIntro fills election + contest; blacklist_check fills only event.
Prompt/template fallback order (narrowest first):
- candidate
presentation.i18n[lang]["ivr"][key]— only meaningful for candidate-scoped prompts (e.g. a phonetic pronunciation override for a single candidate's name) - contest
presentation.i18n[lang]["ivr"][key] - election
presentation.i18n[lang]["ivr"][key] - event
presentation.i18n[lang]["ivr"][key] - built-in default prompt
A missing key returns a visible sentinel (e.g. [missing prompt: <key>]) rather than an empty string, so a translator forgetting a key shows up loudly in a test call instead of producing silent dead air.
Template interpolation. Resolved templates contain {placeholder} tokens (e.g. {candidate_name}, {ballot_locator}). Substitution happens after resolution, against a variables map supplied by the caller. See the design-review blockquote in §7.2 on SSML interpolation — placeholder content that may end up inside SSML must be escaped at the substitution point, not left to each prompt author.
Spoken dynamic-text fallback is:
- entity
presentation.i18n[lang]["ivr"][field] - normal portal translation for that field
- default-language translation
- base non-i18n field
If the resolved value contains SSML markup, the renderer should preserve it and emit the final prompt as SSML rather than escaping the tags.
7.6 Using Existing i18n for Dynamic Content
Election, contest, and candidate names already have translation helpers in sequent-core that resolve from presentation.i18n. IVR reuses them directly: first check the optional IVR-only override at presentation.i18n[lang]["ivr"].name / alias / description on the relevant entity, then fall back to the standard portal helper. No new translation machinery.
Template variables and well-known prompt keys are listed in Appendix D.
8. Error Handling
8.1 Retry Logic
Retry budgets are configured per election event in
presentation.ivr.retry_limits (editable in the admin portal's IVR Flow
tab on the election event). The same budget applies to every election inside
the event — we explicitly do not expose per-election retry limits, because
retry semantics are voter-facing behaviour that should be uniform across the
ballot within a single call. Runtime counters are tracked in
IvrSession.retries: RetryCounters (see §4.1). Each class of retry has its
own counter and its own reset semantics.
| Error Class | Counter | Reset on | Default max | Action on exceed |
|---|---|---|---|---|
| Invalid DTMF input | retries.invalid_input | Any phase or sub-phase transition | 3 | Play invalid_input_final and disconnect |
| Input timeout | retries.timeout | Any successful DTMF capture | 3 | Play timeout_final and disconnect |
| Authentication failure | retries.auth | Successful authentication | 3 | Play auth_max_attempts and disconnect |
| API timeout (internal) | — | — | 2 retries | After retries, return IvrError::ApiTimeout → disconnect |
| API error (internal) | — | — | 1 retry | Return IvrError::ApiError → disconnect |
Keeping the counters separate means "3rd invalid DTMF while picking a
candidate" can never cross-contaminate "3rd auth attempt," and each sub-phase
gets its own fresh invalid_input budget. Timeout resets on any successful
DTMF (not just per-phase) so a voter who is pausing thoughtfully but still
pressing keys does not run down their timeout budget unfairly.
8.2 Error States
Shape contract. The domain error is an enum split into two groups, both exhaustively matched at the handler boundary:
- Presented-to-voter errors — every variant carries the same pair: a static
prompt_key(resolved to an i18n message at the adapter boundary) and ashould_disconnectflag. This forces every voter-facing error through a uniform presentation contract: the domain never decides how to phrase something, and no variant carries a free-form string payload that could leak internal detail into a prompt. Variants needed today: authentication failed, voter not eligible, election closed, invalid input, max retries exceeded, session expired, vote rejected, API timeout, system temporarily unavailable (with ais_criticalflag for alerting), system configuration error.SessionRacedis deliberately not in this list — per §4.1 it is handled internally via reload-and-decide and never surfaces as a voter-facing prompt; the only voter-visible fallout is the genericsystem_errordisconnect on the degenerate double-failure arm, which is already covered by the internal-error group below. - Internal / system errors — unknown phone number, invalid state, invalid phase index, transport failures by backend (Keycloak / Hasura / Harvest / S3 / DynamoDB). These are logged verbatim, then mapped to a single generic
system_errorprompt at the handler boundary; the voter never hears the raw message.
Keep the backend-classified transport errors (Keycloak / Hasura / Harvest / S3 / Dynamo) as enum variants, not strings — metrics and alerting rules key off the variant, not a parsed message. There is deliberately no UnknownPhaseType variant: with the typed FlowPhase enum (§4.1), unknown phase strings fail at JSON deserialization when the publication is loaded, never at runtime mid-call.
9. Security Considerations
9.1 Network Security
- Lambda deployed in VPC with access to Keycloak, Hasura, and Harvest API
- Lambda IP whitelisted in Keycloak, Hasura, and Harvest (as noted in CTO notes)
- All API calls over HTTPS
- No sensitive data in CloudWatch logs (PINs, full phone numbers)
9.2 Data Protection
- PIN never stored in DynamoDB session
- JWT access tokens have short TTL (determined from
expclaim after login; configurable in Keycloak, default 5 min); proactive refresh viaTokenManager(see 5.1.9) - Session data TTL: 1 hour (auto-cleanup)
- Phone numbers hashed in logs (see §9.2.1 for the full retention / salt policy)
9.2.1 PIPEDA-aligned phone-number retention
Caller phone numbers are personal information under PIPEDA. The IVR stack handles them at three different tiers, each with its own rule:
| Tier | What is stored | Retention | Notes |
|---|---|---|---|
In-flight session (DynamoDB ivr-sessions) | Raw E.164 number, only for the duration of the call (needed for blacklist check, Hasura queries, and admin dashboard) | DynamoDB TTL = 1 h sliding (see §9.2 TTL blockquote); record hard-deleted | The record is keyed by contact_id, not by phone, so it is not queryable by phone after the call ends |
| Electoral audit log (Harvest / ImmuDB, see §9.3) | No phone number — only voter attestations keyed by voter id + ballot_id | Follows Harvest's existing electoral retention policy | Channel is identified by azp: ivr-voting on the JWT; the phone number never reaches this tier |
| Operational log (CloudWatch) | Salted SHA-256 hash of the E.164 number — never the raw value | 90 days (log group retention) then auto-deleted | See below for salt handling |
| Admin-portal dashboard (see §14.2) | Raw E.164 at rest in Hasura (for ivr_phone_blacklist and live per-call rows); masked on display | Live-call rows expire when their DynamoDB session expires; blacklist rows are operator-managed | Display format masks all but the last four digits, e.g. +1 ***-***-1234 |
Salt for the CloudWatch hash — per-tenant, rotated on each tenant's own calendar. The platform is shared across tenants, so on any given day some tenant is mid-election; a single global salt could never rotate without cutting some tenant's log timeline in half. The salt is therefore scoped per tenant, which is the smallest scope that makes "never rotate mid-election" enforceable (because "mid-election" is now something a tenant operator actually knows).
Storage and access.
- One Secrets Manager entry per
(env, tenant_id), e.g.ivr/log-salt/prod/{tenant_id}. AWS Secrets Manager's built-in versioning is the rotation mechanism:AWSCURRENTis the active salt,AWSPREVIOUSis the last-rotated salt (retained while old logs are still in the 90-day window), and older versions are deleted on schedule. - Port signature:
PhoneHasher::hash(tenant_id, e164) -> (hash, salt_gen). The Lambda always hastenant_idbefore any log line that references the caller — phone-config resolution (§6.2) is the very first thing that runs. Hashing beforetenant_idis known is not a case we need to support; operator-level CloudWatch entries about phone-config failures log the dialled DID, not the caller ANI. - Per-container cache:
HashMap<TenantId, (Salt, SaltGen)>, populated on first use per tenant, no TTL. A rotation only takes effect in containers that cold-start after it — which is exactly the drain behaviour we want (both generations coexist in logs for the drain window, both are tagged, and queries over that window must know to union the two generations; this is a feature, not a bug).
Rotation policy (per tenant).
- Cadence. Quarterly by default, and immediately on any suspected leak. A tenant with no natural dead zones can defer rotation — the 90-day CloudWatch retention still ages everything out on its own, so rotation is a privacy hardening on top of the retention floor, not the mechanism itself. A skipped rotation is not a compliance failure.
- Window. Rotation runs in a tenant-local dead zone (between that tenant's own elections), making "never rotate mid-election" a real rule rather than aspirational.
- Mechanics. A gitops IaC job (
rotate-ivr-salt --tenant X) generates 32 random bytes, writes them to that tenant's SM entry (AWS automatically promotes the new value toAWSCURRENTand demotes the old toAWSPREVIOUS), and tags the new version with an ISO month stamp. No human sees the raw salt. - Forgetting. A scheduled job deletes SM versions older than 90 days — i.e. older than the CloudWatch retention window they could be used to reverse. Until that step runs, an insider with SM read access could in principle brute-force old hashes; after it runs, the old hashes are irreversible even to the operator. This is the step that achieves PIPEDA "right to forget" semantics on the operational log tier.
- Compromise response. A salt leak at one tenant triggers immediate rotation for that tenant only — the blast radius of a leaked per-tenant salt is contained to that tenant's logs, not the platform, which is another reason per-tenant beats global here.
What this does not break.
- Cross-call correlation within a tenant + generation (the
IvrRepeatedCallsSameNumber30-minute window alert, §10.3) is preserved because the alert fires inside a single tenant's stream and the salt is stable across that 30-minute window in all realistic rotation cadences. - Cross-tenant correlation was never a supported query — the platform already partitions operational data by tenant, and logs were already filtered by
tenant_idfor any meaningful search.
Cost sanity. Per-tenant Secrets Manager entries are ~$0.05/secret/month; at O(100) tenants that is ~$5/month baseline. Read cost is O(cold-containers × tenants-seen-per-container) — a handful of reads per container lifetime, negligibly small under SM's $0.05 / 10k-calls pricing. The Lambda already performs per-tenant bootstrap I/O (phone-config resolution, Keycloak realm discovery) so an additional per-tenant SM lookup fits the existing cold-start shape rather than adding a new I/O class.
What this gives us.
- A full phone number is reversible for at most 1 h after hang-up (DynamoDB session) plus the voter's own right to access under PIPEDA (the blacklist table, explicitly operator-managed).
- Operational analytics over CloudWatch logs work within a single retention window (same hash identifies the same caller across calls within the window), but cease to correlate across rotations — which is the right trade-off: brute-force / abuse investigation is a short-window concern, longitudinal tracking is not a legitimate use case.
- Electoral audit remains intact because it never stored the phone number in the first place.
Implementation notes. The hashing helper lives in sequent-core so the
Lambda and any batch export script use the same canonicalisation (E.164
normalisation before hashing) and the same per-tenant salt-lookup path. Log lines tag the current salt's
generation id as salt_gen: "{tenant_id}-{yyyymm}" (e.g. tenant-acme-202604) so dashboards can correctly group
within a tenant and a generation without needing to decrypt anything, and cross-generation queries are explicit about which salts they are unioning.
Sliding TTL with a hard ceiling. The IvrSession row's DynamoDB TTL is refreshed on every save_session so long calls don't lapse mid-flight, but it is capped at an absolute ceiling so a misbehaving contact flow or hostile client cannot keep a row alive forever by poking it every <1 h. The adapter computes:
ttl = min(
now + IDLE_WINDOW, // sliding component
session_started_at + SSO_MAX_LIFESPAN, // hard ceiling
)
IDLE_WINDOW— 1 hour by default. A voter turn that takes longer than this is already well outside the intended UX envelope, and the next Lambda invocation will fail cleanly withSessionExpired.SSO_MAX_LIFESPAN— matches the Keycloak realm'sssoSessionMaxLifespan(10 h by default). Past this, the refresh token is dead and the Lambda cannot do anything useful anyway, so the row should evaporate with it.
Written on every PutItem under the same ConditionExpression: version = :expected guard from §4.1. This closes both failure modes: the row vanishing mid-call while the refresh token is still valid (original bug), and a row living beyond its own authenticated lifespan (the "calls forever" hole a pure sliding TTL would open).
9.3 Vote Integrity
- Votes only submitted after explicit confirmation (§3.3
VoteConfirm). - Duplicate vote prevention via Harvest — today surfaced through
CheckRevotesFailed/InsertFailedExceedsAllowedRevoteswhenmax_revotes = 1(see §5.4 for the full adapter mapping and the proposed dedicatedDuplicateVotevariant). - Retry idempotency on vote submission (§4.1 concurrency): the Lambda encrypts the ballot once per
(session, election), caches the resulting encrypted payload and its content hash (ballot_id) in the session, and reuses that exact payload on retry. Becauseballot_idis the SHA-256 hash of the encrypted ballot content — validated by Harvest (computed_hash != input.ballot_id → BallotIdMismatchatpackages/windmill/src/services/insert_cast_vote.rs) — an identical resubmission hashes to the sameballot_idand hits Harvest's existing duplicate check. Re-encrypting on retry would produce a newballot_id(new ElGamal randomness → different ciphertext) and defeat the de-dup, so "encrypt once, store, resubmit" is a load-bearing invariant, not an optimization.
Re-entrant voting across dropped calls. The ballot loop can submit to multiple elections in one call (Mayor, Council, School Board…). A dropped call after one ballot commits but before the next means the voter has partially voted. On redial the Lambda gets a fresh contact_id with no memory of what already succeeded, so the handler must reconstruct progress from Harvest:
-
At
ballot_loopentry, the election-selection sub-phase reads through theCastVoteHistoryPort(§3.5.2). The Hasura adapter behind that port runs the same queries the voting portal runs —sequent_backend_cast_vote(GetCastVotes) to enumerate ballots already cast by this voter, andsequent_backend_election(GetElections) for per-election metadata likenum_allowed_revotes. Hasura's row-level permissions scopesequent_backend_cast_voteto the authenticated voter via the JWT's voter claims, so the Lambda sees exactly the same already-voted set the portal would show the same voter. No new Harvest endpoint is introduced; the Lambda reuses the platform's existing read surface. The selection UI renders the authoritative state: elections already submitted are marked "already voted" (and, ifnum_allowed_revotes = 0, not selectable); eligible elections are selectable as normal. This is the summary surface — voters don't need a separate end-of-call roll-up because the selection screen always reflects Hasura's truth, which is the same sourcecheck_previous_votes/check_revotesconsult at insert time (the ones that raiseCheckRevotesFailed/InsertFailedExceedsAllowedRevotes).Exit path —
0atElectionSelect. If every election is already voted (or none are currently selectable for any other reason), the voter presses0to exit the ballot loop and advance to the next outer phase (typicallygoodbye). This is the escape hatch for the dead-state case that would otherwise arise whenskip_election_list=true, exactly one election is configured, and the voter already cast it on a prior call: without the exit path the voter would be dropped straight intoLanguageSwitch → ElectionIntro → ContestLoop → …for an election they can no longer vote in. Theskip_election_listshortcut in §3.3.1 is therefore gated on selectability — the skip only fires if the single election is still selectable at entry; otherwiseElectionSelectruns normally and the0-to-exit path is available. See §3.3.3ElectionSelectand §3.4 for the reserved-key semantics. -
Where max-revotes is disabled (one ballot per voter per election — the default for Canadian municipal ballots), this re-entrant path is the voter's only recovery route after a dropped call. Without it, a dropped call mid-ballot-loop means permanent disenfranchisement for the remaining elections.
Electoral audit log — existing pipeline, no new components. Sequent already has a tamper-evident audit pipeline for vote events: Harvest's /insert-cast-vote calls windmill::services::insert_cast_vote::try_insert_cast_vote, which invokes ElectoralLog::post_cast_vote → enqueues an ElectoralLogMessage via Celery/RabbitMQ → Windmill workers drain the queue → the message is written to ImmuDB. The IVR inherits this end-to-end: vote attempts, successes, and Harvest-rule rejections are written exactly as for portal votes, differentiated only by the azp: ivr-voting JWT claim and the VotingStatusChannel::TELEPHONE value already propagated through try_insert_cast_vote (see voting_channel: VotingStatusChannel at packages/windmill/src/services/insert_cast_vote.rs). No new Lambda → ImmuDB integration is needed — giving the Lambda direct ImmuDB write access would expand attack surface for no gain.
Call-lifecycle events (call started, auth attempted, abandoned) go to CloudWatch structured logs (§10.2) — those are operational, not auditable, and belong outside the tamper-evident ledger. If a future requirement surfaces that demands IVR-specific events in the electoral log (e.g. "voter began a session via TELEPHONE channel at T"), the clean extension is a new Harvest write endpoint that reuses the same Windmill/Celery/RabbitMQ/ImmuDB pipeline — not a parallel path from the Lambda.
Brute-force protection against hang-up-and-redial. The per-call retries.auth counter (§8.1) resets on every new contact_id, so without additional controls an attacker could redial to reset their attempt budget. Defense in depth:
- Keycloak user-level brute-force detection (primary). Set
bruteforceProtected=trueon the tenant realm withfailureFactor,maxFailureWaitSeconds, andwaitIncrementSecondstuned for voice latency (the defaults assume sub-second web retries and are too aggressive for IVR). Keycloak locks the voter account after N failed attempts across all calls and all channels, so the portal and the IVR share a single lockout policy. When Keycloak returnsuser_disabled/account_temporarily_disabled, the Lambda plays a dedicatedauth_lockedprompt ("this account is temporarily locked; please contact support") and disconnects — never looping on "incorrect PIN." - Phone blacklist (already in place, §6.3). Operators can hard-block a number via the
ivr_phone_blacklistHasura table. This is the right tool for known-abusive callers, not for automated rate-limiting. - Alert on repeated calls from the same number. The CloudWatch operational log already records a salted SHA-256 of the caller phone (§9.2.1). A Prometheus rule on the
salted_phone_hashdimension — e.g. "more than 5 calls from the same hash within 30 minutes" — fires a medium-severity Alertmanager alert to the same receiver tree as the rest of the IVR alerts (§10.3). Operators decide whether the pattern is a legitimate accessibility use case (a supporter calling on behalf of multiple voters) or abuse that warrants adding the number to the blacklist. This is detection-and-respond, not automated throttling — the cost of a false positive on the detection path is an ops page, not a disenfranchised voter.
A per-call DTMF cooldown is explicitly not added: it would punish voters with dexterity or accessibility challenges, and the controls above already close the bulk-guessing attack.
10. Monitoring & Logging
10.1 CloudWatch Metrics
| Metric | Description |
|---|---|
ivr.calls.total | Total calls received |
ivr.calls.completed | Calls that completed voting |
ivr.calls.abandoned | Calls dropped before completion |
ivr.auth.success | Successful authentications |
ivr.auth.failure | Failed authentications |
ivr.votes.cast | Votes successfully cast |
ivr.votes.duplicate | Duplicate vote attempts |
ivr.errors.api | API errors |
ivr.latency.auth | Authentication latency |
ivr.latency.vote | Vote submission latency |
10.2 Structured Logging
Each log line is a single structured JSON object. The required fields are:
- Timing / correlation — ISO-8601 timestamp,
contact_id(for correlating a whole call across invocations), latency in ms for this turn. - Who / where (privacy-aware) — a salted SHA-256 hash of the caller phone (never the raw number — see §9.1),
tenant_id, and where applicableelection_event_id/election_id. Nothing that could identify the voter on its own. - What happened — an event discriminator (typed enum, not a free string). The set needed today:
CallStarted,LanguageSelected,AuthAttempt/AuthSuccess/AuthFailed,ElectionSelected,VoteRecorded,VoteSubmitted,VoteRejected,CallCompleted,CallAbandoned,Error. Extend at the enum when a new operational question can't be answered by existing variants. - Flow position — current phase and phase-internal state (for debugging stuck calls via CloudWatch Insights).
- Error detail — only on error events; never contains credentials, token values, or raw DTMF bytes.
Do not log: PINs, DOBs, any auth-step credential value, access/refresh tokens, raw phone numbers, ballot contents. Anything that would be considered voter-identifying or credential-adjacent must either be hashed with a rotated salt or dropped. See the electoral audit-log design-review blockquote in §9.3 — the operational log here is distinct from the electoral audit log, which has different retention and tamper-evidence requirements.
10.3 Alerting
Alerts are configured to flow into the same Alertmanager + Slack + PagerDuty
pipeline that gitops already wires up for every cluster (see
gitops/unified/cluster-apps/<cluster>/prometheus/values.yaml —
slack-notifications, slack-warning, slack-medium-critical,
slack-pagerduty-critical receivers). We do not introduce a new alerting
channel for IVR.
Metric source. CloudWatch alarms by themselves do not reach the cluster Alertmanager. Two viable integrations — pick one and standardise:
- CloudWatch → SNS → Alertmanager webhook. A lightweight receiver in the infra cluster converts SNS messages into Alertmanager alerts. Simplest path and closest to existing beyond patterns.
- CloudWatch exporter → Prometheus scrape → PrometheusRule. Run
cloudwatch-exporter(orYACE) as a scraped target in the infra cluster and write IVR alert rules asPrometheusRuleCRDs alongside the existing rules for RabbitMQ / ImmuDB. Richer expression language, aligns IVR alerts with the rest of the stack.
Option 2 is the recommended direction because it lets alert severity,
routing, and silencing reuse the existing labels and receiver tree
(severity: critical → slack-pagerduty-critical, severity: warning →
slack-warning).
Alert catalogue (initial).
| Alert | Condition | Severity / Receiver |
|---|---|---|
IvrLambdaErrorRateHigh | Lambda error rate > 2 % over 5 min | warning → slack-warning |
IvrLambdaErrorRateCritical | Lambda error rate > 10 % over 5 min, during an active election window | critical → slack-pagerduty-critical |
IvrLambdaLatencyHigh | p99 invocation latency > 5 s over 10 min | warning → slack-warning |
IvrAuthFailureSpike | ivr.auth.failure rate > 3× baseline over 10 min | medium → slack-medium-critical (brute-force signal, §9.3) |
IvrRepeatedCallsSameNumber | > 5 calls with the same salted_phone_hash within 30 min | medium → slack-medium-critical (possible abuse, §9.3 — operator decides whether to blacklist) |
IvrAbandonmentRateHigh | ivr.calls.abandoned / ivr.calls.total > 20 % over 15 min during election window | medium → slack-medium-critical (Polly outage, broken prompt, or bad flow) |
IvrPartialSubmitRatio | completed elections / attempted elections per call < 0.9 rolling 30 min | medium → slack-medium-critical (multi-election partial-submit, §9.3) |
IvrKeycloakUnreachable | sustained ivr.errors.api{backend="keycloak"} > 1/min for 5 min | critical → slack-pagerduty-critical |
IvrHarvestUnreachable | sustained ivr.errors.api{backend="harvest"} > 1/min for 5 min | critical → slack-pagerduty-critical |
IvrHasuraUnreachable | sustained ivr.errors.api{backend="hasura"} > 1/min for 5 min | critical → slack-pagerduty-critical |
IvrDynamoSessionWriteErrors | DynamoDB conditional-write failure rate > 0.5 % over 10 min | warning → slack-warning (concurrency violation signal) |
IvrNatGatewayErrorPortAllocation | ErrorPortAllocation > 0 for 5 min | critical → slack-pagerduty-critical (imminent NAT exhaustion) |
IvrConnectConcurrentCallsNearQuota | active calls > 80 % of Connect service quota | warning → slack-warning |
IvrNoCallsDuringElection | ivr.calls.total == 0 for 30 min while telephone_voting_status is OPEN | critical → slack-pagerduty-critical (dead-air canary) |
Election-window gating. Alerts tagged "during an active election window"
use a recording rule derived from Hasura's telephone_voting_status (scraped
via the same cloudwatch/harvest exporter path) so severity can escalate
only when an election is actually open — off-hours noise goes to warning
instead of paging oncall.
Silencing. Maintenance windows (Keycloak upgrades, contact-flow redeploys) are silenced via the normal Alertmanager silence flow — no IVR-specific tooling needed.
Definitions live in gitops. All PrometheusRule definitions ship in
gitops/unified/cluster-apps/<cluster>/prometheus/resources/ivr-alerts.yaml
so severity/threshold changes go through the same PR/Atlantis flow as any
other alert change.
11. AWS Infrastructure
11.1 Required Resources
| Resource | Purpose |
|---|---|
| Amazon Connect Instance | IVR platform |
| Connect Contact Flow | Call routing and DTMF capture |
| Connect Phone Number(s) | Inbound calling |
| Lambda Function | IVR logic (Rust) |
| DynamoDB Table | Session state (ephemeral, per-call) |
| S3 Bucket (versioned) | Phone number → cluster/environment/tenant/event routing file (§6.2) — read-only from the Lambda |
| IAM Role | Lambda execution role |
| VPC | Network isolation |
| NAT Gateway | Outbound API access (multi-AZ — see note below) |
| CloudWatch Log Group | Lambda logs |
| CloudWatch Alarms | Error alerting |
| Secrets Manager | API credentials |
Multi-AZ NAT for reliability. A single NAT Gateway is a single-AZ SPOF: if the AZ hosting it degrades, every outbound Lambda call (Keycloak, Hasura, Harvest) fails and the IVR is offline for the duration. On an election day that is unacceptable. Explore deploying one NAT Gateway per AZ that the Lambda's VPC subnets span (typically two or three AZs in the chosen region), with the Lambda attached to private subnets in each AZ so AWS routes outbound traffic through the local-AZ NAT. Cost impact is roughly 2–3× the single-NAT cost (~$32/mo per NAT plus data-transfer) but removes the AZ SPOF. Decide before Phase 3 (Production Pilot) and reflect the decision in the cost model (§17).
11.2 Lambda Configuration
Runtime: provided.al2023 (custom runtime for Rust)
Architecture: arm64
Memory: 256 MB
Timeout: 30 seconds
VPC: Yes (for API access)
Environment Variables:
- DYNAMODB_SESSION_TABLE
- DYNAMODB_PHONE_CONFIG_TABLE
- IVR_CONFIG_CACHE_TTL_SECONDS # default 300; 0 disables the cache (§5.1.7)
- LOG_LEVEL
Lambda region vs target cluster. The Lambda is deployed in a single AWS
region (chosen for Amazon Connect availability and proximity to the target
voter base — for Canadian deployments, ca-central-1 or us-east-1). It is
not co-located with any particular Sequent cluster. The per-phone-number
config record (§6.2) carries the cluster's Keycloak / Hasura / Harvest base
URLs, so a single Lambda deployment routes each call to whichever cluster
owns the dialled number — including clusters in other regions or clouds
(e.g. prod1-euw1, googleinfra-euw4). This keeps Amazon Connect + Lambda
as a single shared telephony-edge tier and avoids duplicating the IVR
stack per cluster. Cross-region egress cost is covered in §17.
12. Amazon Connect Contact Flow Design
12.1 Flow Structure
Reading the diagram if you're new to Amazon Connect. A Connect contact
flow is an authored graph of blocks — each block performs a fixed
operation (play a prompt, capture DTMF, branch on a condition, invoke a
Lambda, etc.) and has a fixed set of output branches wired to whatever
follows. The graph is the entire runtime: there is no scripting language,
no shared in-memory state between blocks, and no way to do arithmetic or
data transformation outside an "Invoke Lambda" block. Data flows block-to-block
through contact attributes — a flat key/value map that persists for the
duration of the call and is the only thing Connect can pass into a
"Play Prompt" or "Invoke Lambda" block (hence the $.Attributes.prompt_text
reference on the Play node). Every one of this design's five Invoke-Lambda
calls returns its response as a set of attributes that the subsequent
Connect blocks read.
Why there are four invoke blocks inside the loop, not one. Connect's
"Get customer input" block has three hardwired output branches — DTMF Received, Timeout, Error — and you cannot merge them inside Connect
before calling Lambda, nor can you pass "which branch fired" as an attribute
to a single common invoke block. So each branch must terminate in its own
Invoke-Lambda node, and the Connect flow ends up with ProcessInput,
HandleTimeout, and HandleError as three separate nodes even though, from
the Lambda's point of view, each one is the same kind of event: one turn
of the phase engine, triggered by one input variant. ProcessStep is the
fourth — the no-input-expected case that still needs to advance the state
machine after an announcement-style prompt. From inside the handler, all
four (plus InitSession) are a single dispatch on enum LambdaInput { Init, NoInput, Dtmf(String), Timeout, Error }; the one-phase-per-invocation
contract in §3.5.3 still holds. The multiplication in the diagram is a
Connect-side authoring artifact, not five different handlers.
Other Connect-side constraints to know. Set Logging Behavior at the
entry point is contact-flow-level config (log retention, redaction policy)
that fires once per call and has no per-turn state. "Play Prompt" with
$.Attributes.prompt_text renders through Amazon Polly TTS — meaning the
Lambda can return SSML in that attribute and Polly will interpret it, which
is how this design supports phonetic ballot-ID readback and paced
announcements (§7). An "Invoke Lambda" block has an 8-second hard total
synchronous timeout — anything slower must either be chunked across turns
or pre-fetched into the session on a fast turn; the session model in §4.1
is deliberately shaped around that ceiling. And the contact-flow JSON is
treated as code in this design (§16.2), not as something to be hand-edited
in the Connect console, because the graph structure is the control flow
and a console edit is equivalent to an unreviewed source-code change.
12.2 Contact Flow Attributes
| Attribute | Description |
|---|---|
prompt_text | Text-to-speech content |
expect_input | Whether to capture DTMF |
valid_inputs | Valid DTMF digits — advisory; see note below |
input_timeout | Seconds to wait |
should_disconnect | End call flag |
user_input | Captured DTMF input (inbound, set by Connect) |
valid_inputsis enforced in the Lambda, not by Connect. Amazon Connect's "Get customer input" block does not accept a per-invocation whitelist of allowed digits from contact attributes — itsInputType=DTMFjust captures whatever the caller presses (bounded by the static block configuration such as max digits and terminator). The IVR Lambda therefore validatesuser_inputagainstvalid_inputson the next turn, and if the press is outside the set it returns a "Sorry, please try again" prompt withexpect_input=trueand the same phase state — i.e. retries are driven from the domain layer, not the contact flow. Treatvalid_inputsas documentation of what the Lambda accepts, not as a Connect-level guard.
13. Ballot Encryption
Design Decision: The IVR Lambda behaves as a voter from the platform's perspective.
The IVR will:
- Construct the ballot from voter selections (DTMF input)
- Encrypt the ballot using existing
sequent-coreencryption logic (same as voting-portal) - Submit encrypted ballot via the existing
/insert-cast-voteAPI endpoint - Include JWT with
azp: "ivr-voting"to identify the channel as TELEPHONE
Implementation:
- Lambda includes
sequent-coreas dependency (already written in Rust) - Use election's public key from election data (fetched during setup)
- Ballot construction follows same structure as online voting
- Encryption is identical to voting-portal - no special handling needed
Security Benefits:
- Vote secrecy maintained end-to-end
- No plaintext votes in API calls
- Consistent security model across all voting channels
- Existing audit mechanisms work unchanged
14. Admin Portal Integration
14.1 New Election Event Configuration
Add to Election Event settings:
- Phone Voting Enabled: Boolean toggle
- Phone Numbers: List of assigned phone numbers
- Phone Voting Start/End: Optional separate voting period
- Default Language: For greeting before language selection
14.2 New Admin Views
- Phone Voting Dashboard: Real-time call statistics
- Call Logs: Searchable call history (without PINs)
- Phone Number Management: Assign/unassign numbers
- IVR Flow / IVR Prompts tabs (per election event): flow pipeline editing plus
ivr.retry_limits(auth,invalid_input,timeout) configuration — see §7.4 - Phone Blacklist: manage the Hasura-backed
sequent_backend.ivr_phone_blacklisttable (add/remove/annotate E.164 numbers, optionally scoped to a specific election event). Gated by thecan_manage_phone_blacklistKeycloak permission. See §6.3 for the data model and Harvest endpoints
Per-election-event and per-election dashboards
Both the Election Event dashboard and the Election dashboard in the
admin portal gain two new widgets when the telephone channel is enabled.
They parallel the existing IP-address view (see
ListIpAddress.tsx)
and reuse the same patterns (react-admin List, filters, polling via
QUERY_POLL_INTERVAL_MS, configurable columns).
1. Voters by channel, over time. A time-series chart of ballots cast,
grouped by VotingStatusChannel (ONLINE, KIOSK, EARLY_VOTING,
TELEPHONE). Controls:
- Time window filter (last hour / last 24 h / custom range), defaulting to "since voting opened on this channel"
- Granularity bucket (1 min / 15 min / 1 h) — auto-selected from the window
- Cumulative toggle (stacked area = cumulative count per channel; line = rate per bucket)
- Channel toggle (show/hide each channel legend entry)
Data source: existing cast_vote records in Hasura, grouped by the
channel column (populated by Harvest via AzpClient::to_voting_channel
— straight from the JWT azp claim for kiosk and IVR, and from azp
combined with the area's early-voting window for portal clients; see
Appendix C.7 — no new pipeline). The telephone series starts populating as soon as the
TELEPHONE variant lands (see Appendix C). Available at both Election
Event scope (all elections within the event) and Election scope
(single election), same as the existing IP view.
2. Phone-number activity list (obfuscated). A list view of phone numbers that have placed calls, modelled on ListIpAddress.tsx. Columns:
| Column | Source | Notes |
|---|---|---|
| Phone (masked) | ivr_call_log.phone_e164 | Display-masked: +1 ***-***-1234 — only the country code and last four digits are shown in the UI. The raw number never leaves the server except inside blacklist actions |
| Country | Derived from E.164 country code | For Canadian deployments typically a single value; kept for consistency with the IP view |
| Call count | Aggregate | Total completed + abandoned calls from this number within the filter window |
| Vote count | Aggregate | Ballots cast from this number (joined via the voter id recorded on success) |
| Last call at | Max timestamp | |
| Election | election_presentation | Mirrors the IP view |
| Voter id | Aggregate | Present only where authentication succeeded; omitted by default in the DatagridConfigurable (same pattern as voters_id is omitted in the IP view) |
Filters: masked-phone substring search (matches only against the visible
last-four suffix server-side, to avoid exposing raw numbers through the
filter input), country, election. Actions: Add to blacklist (one-click
from a row, gated by can_manage_phone_blacklist) and Export (CSV
export carries the masked form, not the raw number — an explicit "Export
raw (privileged)" action requires a separate permission and produces an
audit entry).
Data source: a new Hasura view sequent_backend.ivr_call_log populated by
the Lambda at call end. Row TTL follows §9.2.1 — live rows expire when
their DynamoDB session does; aggregate totals persist for the election
event's normal reporting window. Raw phone numbers are stored in Hasura
server-side but row-level security denies SELECT phone_e164 to all
roles; only a masked computed column and aggregate counts are selectable.
The "Add to blacklist" action calls a Hasura action that reads the raw
value inside Harvest and inserts into ivr_phone_blacklist without
surfacing the raw number to the client.
15. Testing Strategy
15.1 Unit Tests
- Each phase and sub-phase engine tested in isolation with mock ports (see §3.5.6)
- Every
FlowPhase/BallotSubPhasetransition covered, including error paths - Prompt resolution / i18n fallback chain
- Input validation per phase
RetryCountersreset semantics per phase transition
15.2 Record-and-Replay Session Tests
Since the engine is a pure function of (session state, input) → (session state, response), the most valuable integration layer is a record-and-replay harness: a test file is a sequence of (input, expected_prompt_key, expected_expect_input, expected_disconnect) tuples driven through a fake PhasePorts implementation. Client IVR specs (e.g. Barrie) are encoded directly as replay fixtures, so regressions against a known-good script fail loudly at CI time.
15.2.1 Text-In / Text-Out Harness
Because the flow engine is a pure function of (session state, input) → (session state, response) and Amazon Connect only ever sees prompt_text + valid_inputs + user_input (§4.2), the entire voter-facing flow can be exercised without Amazon Connect at all. A text harness substitutes the Connect adapter with a pair of streams: stdin/stdout (CLI), a file (replay fixture), or an HTTP endpoint (admin portal). The Lambda's domain logic, flow engine, phase engines, port adapters for Keycloak/Hasura/Harvest, prompt resolution, SSML rendering, retry counters, and ballot construction all run unchanged — only the Connect adapter is swapped.
Initial deliverables.
- Automated-test harness — a Rust module in the IVR Lambda crate that drives the engine from a fixture describing
(input, expected prompt_key, expected_expect_input, expected_disconnect)tuples, exactly as §15.2. The same harness also supports free-form scripting (send arbitrary input, assert on the rendered prompt text or the final session state) so scenarios that are not keyed off prompt keys (e.g. "after 3 invalid inputs the call ends") can be expressed naturally step-ivrcommand-line tool — a small binary that boots the engine, points it at any environment's Keycloak/Hasura/Harvest (via the same config the Lambda consumes), and exposes an interactive REPL: the tool prints the rendered prompt text (optionally with SSML expanded, optionally with a Polly-synth preview), waits for a DTMF line on stdin, and loops. Non-interactive mode reads inputs from a fixture file and writes a transcript. Useful for: manual UX walkthroughs, reproducing production issues from a captured session, and local development when Connect is not available. Lives underbeyond/packages/ivr-lambda/src/bin/step-ivr.rs(same crate as the Lambda itself — see §16.2)
Port substitutions. The harness runs in two modes:
| Mode | Session port | Keycloak / Hasura / Harvest ports |
|---|---|---|
| Hermetic (unit / CI) | In-memory HashMap<contact_id, IvrSession> | Recorded fixtures — deterministic, no network |
| Live (manual dev / ops dry-run) | In-memory or real DynamoDB (configurable) | Real endpoints with a real JWT — exercises the actual auth and Harvest path end-to-end |
Hermetic mode is what CI runs on every PR; live mode is what a developer or on-call engineer uses to dry-run a real election event's flow against real Keycloak without placing a phone call. The admin portal is not a consumer of the live mode (see §7.4) — it is a text-only editor in the initial release.
What this harness is not. It does not exercise Amazon Connect itself (the contact flow JSON, DTMF collection block behaviour, Polly voice synthesis quality, telephony jitter). Those remain the job of §15.4 end-to-end tests. The harness covers everything on the Lambda side of the Connect boundary, which is where essentially all of the risk lives.
15.3 Integration Tests
- Keycloak authentication via ROPC against a test realm
- Contract test between the
ivr-config-resourceKeycloak extension and the Lambda: spin up Keycloak with a representative Direct Grant flow configuration and assert the/ivr-configresponse shape matches what the Lambda expects. The test covers both the authenticated happy path (request carries theivr-serviceservice-account token with thecan_read_phone_blacklistrole, see §C.8.b) and the negative cases (no token → 401; wrong-audience/voter token → 401/403; missing role → 403) so auth-shape drift between the two sides is caught alongside response-shape drift - Harvest API
/insert-cast-vote - DynamoDB session round-trip
15.4 End-to-End Tests
- Full voting flow simulation via Amazon Connect test calls
- Multi-language paths
- Error scenarios
- Timeout handling
15.5 Load Testing
- Concurrent call simulation — must actually drive concurrent telephone calls into the Connect instance, not just parallel Lambda invocations; only the former exercises the Connect per-instance concurrent-calls quota (§17.4). Run this after the quota increase AWS ticket is granted, so the test verifies the raised quota rather than the default of 10
- API latency under load
- DynamoDB throughput
16. Deployment Strategy
16.1 Phased Rollout
Phase 1: Development
- Local testing with mocked Amazon Connect
- Integration with dev Keycloak/Harvest
Phase 2: Staging
- Full Amazon Connect setup in staging
- Test phone number provisioned
- End-to-end testing
Phase 3: Production Pilot
- Single municipality deployment
- Limited voter pool
- Close monitoring
- AWS Connect concurrent-calls-per-instance quota raised via Service Quotas / AWS support ticket before the pilot's voting window opens — the default of 10 is insufficient for any real election (§17.4). Budget several business days of AWS lead time
Phase 4: Full Rollout
- All municipalities enabled
- Automated provisioning
- Operational runbooks
- Connect quota reviewed per municipality ahead of each election; a single shared Connect instance accumulates concurrent load across simultaneous elections, so the raised quota must cover the combined peak, not the largest single event
16.2 Repository Layout & GitOps
All paths in this section are proposed, not existing. The long-term IVR
stack is deliberately split across three repositories so that code lives
near its domain and instantiation lives in GitOps, matching how every other
Sequent service is shipped. The initial MVP that exists today lives in
playground/ivr/, where the Rust Lambda, Terraform, and Amazon Connect
contact-flow prototype are kept together for fast iteration. The repository
split below describes the target steady-state layout once that MVP is
promoted into the main Sequent repos — none of the target paths exist yet.
Current state of the target locations (2026-04):
beyond/packages/today contains onlyballot-audit/. There is nokeycloak-extensions/tree inbeyond, noivr-lambda/, and noivr-contact-flows/. Every existing Keycloak extension (conditional-authenticators,message-otp-authenticator,voter-enrollment,sequent-theme,custom-event-listener,url-truststore-provider,aws-ses-email-sender-provider,security-question-authenticator,dummy-email-sender-provider) lives instep/packages/keycloak-extensions/, not inbeyond. The table below puts IVR extensions underbeyond/packages/keycloak-extensions/on the working assumption that newly-added, non-core Sequent extensions belong inbeyond— but that split is an unmade design decision. A reasonable alternative is to keepivr-config-resourceandIvrDobAuthenticatorinstep/packages/keycloak-extensions/next to the existing extensions and defer thebeyondsplit to a broader reorganisation. Pick one consciously in the promotion ticket.gitops/iac-aws/today containscluster/,rds/,vpc/,vpc-peering/,client-apps-setup/,client-apps-setup-infra-cluster/,client-postgres-init/,tf-modules/. Theivr/<env>/layout below is proposed as parallel to those — it does not exist.gitops/unified/global-config-apps/today holds one directory per Argo app (admin-portal, harvest, keycloakx, hasura, windmill, voting-portal, etc.). Noivr/subdir exists; thephone-map.yamlfile below is new.
| Artifact | Repo | Path (proposed — none exist today) | Why |
|---|---|---|---|
| IVR Lambda (Rust) source | beyond (or step — see note above) | beyond/packages/ivr-lambda/ | Source of truth for the Lambda code. If placed in beyond, the crate is pulled into step's Cargo workspace as a workspace member (via a path reference from the beyond checkout, or a vendored/submoduled include) so it compiles against the exact same sequent-core revision that produces the portal WASM — ballot construction and encryption therefore cannot drift between channels. step owns the compilation and release artifact; beyond owns the code. If placed in step, the workspace reference is direct |
ivr-config-resource Keycloak extension (Java) | beyond (or step — see note above) | beyond/packages/keycloak-extensions/ivr-config-resource/ or step/packages/keycloak-extensions/ivr-config-resource/ | If beyond: forms a new keycloak-extensions/ tree there, pulled into the Keycloak image build (see §16.3.2). If step: sits alongside existing extensions with no cross-repo build plumbing needed |
IvrDobAuthenticator (if needed) | same as above | beyond/packages/keycloak-extensions/ivr-dob-authenticator/ | Same placement decision as ivr-config-resource |
| Amazon Connect contact-flow JSON (source of truth) | beyond | beyond/packages/ivr-contact-flows/<flow-name>.json | Treated as code: PR-reviewed, versioned, diffed. Each flow is referenced by a stable name from IaC. New directory |
| IaC to instantiate Connect instance, flows, phone numbers, Lambda alias, DynamoDB session table, S3 routing bucket, NAT, CloudWatch alarms | gitops | gitops/iac-aws/ivr/<env>/ | GitOps owns per-environment parameters (which region, which phone numbers, which cluster endpoints). Proposed as a new peer of iac-aws/rds/, iac-aws/vpc/ |
| Per-phone-number routing records (source of truth) | gitops | gitops/unified/global-config-apps/ivr/phone-map.yaml | Each record maps a DID to (cluster, tenant, event). Change = PR in gitops; Atlantis apply renders the YAML to ivr-phone-config.json and uploads it to the routing bucket (§6.2) with S3 versioning preserving every prior revision. YAML is the authored format, JSON in S3 is the deployed artifact. New directory + file |
Lambda deployment boundary. The Lambda is deployed once per region that hosts an Amazon Connect instance (today: one region, covering all deployments). It is decoupled from Sequent clusters — a single Lambda deployment can dispatch calls to any cluster in any region by reading the cluster endpoints from the phone-config file in S3 (§6.2). This keeps the IVR telephony edge as a shared tier, the way the Sequent CDN / edge services already work.
Contact-flow versioning discipline. The contact-flow JSON in beyond
is the source of truth. The gitops IaC reads the JSON at apply time
(e.g. via Terraform file() or a released beyond artifact version) and
calls aws_connect_contact_flow to create/update the flow in the target
Connect instance. If an operator edits a flow in the Connect console for
debugging, the ritual is: export the JSON, PR it into beyond, and
re-apply from gitops. The console is never the source of truth.
Promotion flow. A change that touches all three layers promotes in
order: beyond merges the IVR-lambda source / Keycloak extension / contact-flow JSON change → step pulls the updated beyond revision into its workspace, builds the Lambda artifact, and releases it → gitops PR bumps the referenced Lambda version and (where relevant) the contact-flow or Keycloak-extension version, then applies via Atlantis. This matches the existing release cadence for the admin-portal / voting-portal stack.
16.3 Build & Packaging
step's release pipeline (.github/workflows/release.yml → reusable_build_push.yml) builds every shipped service as a Docker image and pushes it to the shared ECR registry (AWS_ECR_REGISTRY_GLOBALDOT) tagged with SHORT_SHA + the release tag. The IVR introduces two deltas on that pipeline.
16.3.1 IVR Lambda — new ECR image
Yes — add a new ECR package. The Lambda is a net-new deployable and must ship the same way every other service does, so it plugs directly into the existing matrix in reusable_build_push.yml.
| Field | Value |
|---|---|
service | ivr-lambda |
context | packages |
file | packages/ivr-lambda/Dockerfile.prod (Dockerfile lives in step, sources pulled from the beyond-owned crate — see §16.2) |
| Base image | public.ecr.aws/lambda/provided:al2023 (Lambda custom-runtime base) |
| Architecture | linux/arm64 (matches §11.2) |
| Registry | ${AWS_ECR_REGISTRY_GLOBALDOT}/ivr-lambda:<SHORT_SHA> + :<release-tag> |
The Lambda is deployed as a container image rather than a ZIP artifact because (a) it reuses the existing ECR + docker/build-push-action plumbing with zero new secrets or runners, (b) container-based Lambda publishes are idempotent and version-pinnable from gitops (aws_lambda_function.image_uri = "${ecr}/ivr-lambda:<tag>"), and (c) the existing buildcache-backed layer caching in reusable_build_push.yml applies to it without modification.
Dockerfile outline: multi-stage — stage 1 uses cargo-lambda (or cargo build --release --target aarch64-unknown-linux-gnu with a bootstrap entrypoint) against the step Cargo workspace, which transitively compiles the beyond-hosted ivr-lambda crate against the workspace's pinned sequent-core. Stage 2 copies the bootstrap binary into /var/task/ on the Lambda base image.
Gitops deployment reads the tag from the same version-bump PR described in the promotion flow above and applies aws_lambda_function pointing to image_uri = ...:<tag>.
16.3.2 Keycloak image — pulling extensions from beyond
Today packages/Dockerfile.keycloak builds the Keycloak image by copying a local ./keycloak-extensions/ tree into a Maven build stage and then copying the resulting JARs (one per extension: voter-enrollment, message-otp-authenticator, conditional-authenticators, sequent-theme, custom-event-listener, url-truststore-provider, aws-ses-email-sender-provider, security-question-authenticator, dummy-email-sender-provider) into /opt/keycloak/providers/.
This subsection only matters if the §16.2 placement decision puts the new ivr-config-resource extension (and optionally ivr-dob-authenticator) in beyond rather than next to the existing extensions in step/packages/keycloak-extensions/. If they stay in step, the existing build picks them up with no changes — add the new module directories, extend the JAR-copy list, done. The rest of this subsection covers the beyond-placement case, where the Keycloak image build must reach into a new (to-be-created) beyond/packages/keycloak-extensions/ tree to pick them up.
Pick one of two integration patterns — they are equivalent for correctness, so the choice is about how beyond integrates into step's build more broadly:
- Source-level include (submodule / workspace pull).
beyond'skeycloak-extensions/subtree is made available inside thestepcheckout at build time (git submodule, sparse clone, or whatever mechanismstepadopts for pulling in thebeyond-owned Rust IVR crate — they should use the same mechanism).Dockerfile.keycloak's first stage addsCOPY ./beyond/keycloak-extensions/ivr-config-resource/ /build/keycloak-extensions/ivr-config-resource/(plusivr-dob-authenticatorif present) and extends the JAR-copy list in the second stage:COPY --from=spis-build \
/build/keycloak-extensions/ivr-config-resource/target/sequent.ivr-config-resource.jar \
/build/keycloak-extensions/ivr-dob-authenticator/target/sequent.ivr-dob-authenticator.jar \
/opt/keycloak/providers/ - Pre-built JAR artifact from
beyond.beyondhas its own release pipeline that builds the Keycloak extensions and publishes them as a versioned OCI artifact (or Maven package).Dockerfile.keycloakCOPY --from=<pinned-beyond-image>pulls in the JARs directly. Promotion order becomesbeyondpublishes artifact version →stepbumps the pinned artifact version inDockerfile.keycloak(or an ARG) →stepreleases a new Keycloak image.
Pattern 1 is simpler and matches today's monorepo feel; pattern 2 is more rigorous in isolating the build graphs and maps 1:1 onto how the Rust IVR crate could also be pulled in. Use the same pattern for both Rust and Java to keep the two pipelines symmetric.
Either way, the java_test.yml workflow that currently runs mvn verify on packages/keycloak-extensions/pom.xml must also verify the beyond-hosted extensions (or be reorganised so those tests run in beyond's own CI and step consumes a tested artifact). Don't let the ivr-config-resource JAR ship untested through the integration.
Nothing else in the Keycloak image changes. The realm-template changes (new ivr-voting client, new ivr-service client with its service-account role mapping for can_read_phone_blacklist, and the Direct Grant flow override — see Appendix C.8) are data, not code; they flow through the existing realm-bootstrap mechanism the same way any other Keycloak realm change does. The ivr-service client_secret is provisioned the same way other shared secrets are: the bootstrap writes the realm with a placeholder, the operator seeds AWS Secrets Manager once per environment, and each realm's ivr-service secret is reset to match via a scripted admin-API call — no secret ever committed to git.
16.3.3 Summary
- IVR Lambda: new ECR package (
ivr-lambda), new row in thereusable_build_push.ymlmatrix, new Dockerfile inpackages/ivr-lambda/. Released on the same cadence and tag as the rest of step. - Keycloak image: no new image — the existing
keycloakECR package continues to be the sole Keycloak artifact. What changes is its build input: theDockerfile.keycloakbuild stage picks up the newivr-config-resourceextension (and optionalivr-dob-authenticator) from whichever repo §16.2 places them in (step/packages/keycloak-extensions/requires no new plumbing;beyond/packages/keycloak-extensions/requires the cross-repo integration in §16.3.2). Same image, expanded set of bundled JARs. - gitops: references both the new
ivr-lambdaECR tag and the existingkeycloakECR tag (the latter is already in gitops — only the tag bump is new).
17. Cost Considerations
All numbers below are list-price AWS as of the most recent published rates
for ca-central-1; FX, private pricing, and committed-use discounts are
ignored. Rates change — treat this model as a sanity-check, not a quote.
17.1 Per-Call Assumptions
Realistic reference call for a Canadian municipal ballot (Mayor + Council + School Board, ~15 contests total, English/French readback, one re-listen):
| Parameter | Value | Rationale |
|---|---|---|
| Call duration | 9 min | 1 min auth + greeting, 7 min ballot readback + selection, 1 min summary + submit + receipt |
| Lambda invocations | ~60 | one per DTMF press / timeout / announcement transition |
| Avg Lambda duration | 400 ms | cold-start ≤2%; most invocations are pure-compute + 1 DynamoDB read/write |
| DynamoDB requests | ~120 | 2 per Lambda invocation (read + conditional write) |
| S3 publication fetch | 1 | once per call, cached in-process thereafter |
| Polly characters | ~12 000 | mixed English/French readback, with ~15% re-listens |
| CloudWatch log volume | ~50 KB | structured JSON, one line per Lambda turn + error detail |
17.2 Per-Call Cost Breakdown (ca-central-1, list price)
| Line item | Unit rate | Quantity | Cost |
|---|---|---|---|
| Amazon Connect voice (inbound) | $0.018/min | 9 min | $0.162 |
| Amazon Connect DID usage (per-minute) | $0.004/min (toll) | 9 min | $0.036 |
| Lambda invocations | $0.20 / 1M | 60 | ~$0.000012 |
| Lambda compute (256 MB, arm64) | $0.0000133/GB·s | 60 × 0.4 s × 0.25 GB = 6 GB·s | ~$0.00008 |
| DynamoDB on-demand (read + write avg) | ~$0.625 / 1M req (blended) | 120 | ~$0.000075 |
| S3 GET | $0.0004 / 1 000 | 1 | negligible |
| Polly Neural TTS | $16 / 1M chars | 12 000 | $0.192 |
| CloudWatch Logs ingestion | $0.76 / GB | 50 KB | ~$0.000038 |
| Cross-region egress (Lambda → cluster in another region) | $0.02 / GB | ~0.5 MB per call | ~$0.00001 |
| Total per 9-min call | ~$0.39 |
Polly Standard TTS (not Neural) is ~$4 / 1M chars and drops the Polly line item to ~$0.05, bringing the total to ~$0.25 — but Neural voices are materially more intelligible for older voters and worth the premium for a public-election channel. A re-listen-heavy call (voter re-listens to every contest) pushes Polly characters to ~20 000 and the total to ~$0.50.
17.3 Fixed Monthly Costs
| Line item | Rate | Notes |
|---|---|---|
| Canadian DID phone number (toll) | $1.00/mo per number | per Connect pricing |
| Canadian toll-free number | $2.00/mo per number | optional |
| NAT Gateway (single-AZ, baseline) | $32/mo + $0.045/GB data | see §11.1 |
| NAT Gateway (multi-AZ, recommended) | ~$96/mo + data | 3 × single-AZ; removes SPOF |
| Amazon Connect instance | $0 | no per-instance charge; pay per usage |
| DynamoDB storage (sessions, 1 h TTL) | negligible | < 1 GB at any point in time |
| CloudWatch Logs retention (90 days) | $0.03/GB-mo | ~$3/mo at 100 GB stored |
Phone-blacklist table (Hasura row in existing PostgreSQL, §6.3) and phone-config S3 object (§6.2, a few KB, one file, versioning-enabled) are both trivial (< $1/mo combined).
17.4 Election-Day Capacity Example
For a 50 000-voter municipality with an expected 5 % telephone-channel turnout (2 500 voters) concentrated into a 12-hour voting window:
-
Calls: 2 500 × ~1.1 (some retries / dropped calls) ≈ 2 750 calls
-
Variable cost: 2 750 × $0.39 ≈ $1 070
-
Peak concurrency: rough Erlang estimate at peak hour assuming 10 % of daily calls in peak hour → ~275 calls/hour × 9 min / 60 min ≈ ~42 concurrent calls. Fine on the Lambda side (default account-level reserved-concurrency headroom is 1 000), but not fine on Amazon Connect's default concurrent-calls-per-instance quota, which is 10 for a fresh Connect instance and must be raised via an AWS support ticket. First election-day spike against the default quota would trip it and drop calls.
Go-live action item (must happen weeks before each election, not the day before). Open a Service Quotas / AWS support case to raise "Concurrent active calls per instance" on the IVR's Connect instance to a value comfortably above the peak projection — recommend 2× the Erlang estimate as a rule of thumb to absorb retry bursts and the long tail of the call-duration distribution (for the 50 K-voter example: request ≥ 100). AWS typically processes these in a few business days; build the lead time into the election timeline. Validate the raised quota with a pre-election load test (§15.5) that actually drives concurrent calls, not just Lambda invocations — Lambda-side load tests will not exercise the Connect-instance quota.
Quota dimensions worth checking alongside "concurrent active calls" (each is per-instance and may also need raising for larger deployments): concurrent calls per flow, concurrent API requests per instance, and any Polly request-rate limits relevant to the chosen region. The existing
IvrConnectConcurrentCallsNearQuotaalert (§10.3) is the runtime guard; the quota increase is the prerequisite.
Add monthly fixed costs (multi-AZ NAT + DIDs + logs retention) for a rough ~$1 200 all-in for a one-day election at this size. Scale is roughly linear in voters once fixed costs are amortised across multiple municipalities sharing the same Lambda + Connect instance.
17.5 Cost Optimization
- Publication cache. The in-process publication cache (§3.5.2) avoids paying the S3 GET and JSON parse on every Lambda invocation — critical because without it a 60-turn call pays 60× S3 GETs.
- Polly voice selection. Standard voices are 4× cheaper than Neural;
Long-form is the most expensive tier and should not be used for IVR.
Cache Polly output for static prompts (greeting, goodbye,
invalid_input) in S3 and reference as pre-synthesised audio from the contact flow — these prompts account for a large share of characters across all calls. - DynamoDB. On-demand is correct for bursty election-day traffic; only switch to provisioned + autoscaling if running continuous high-volume elections. Use short TTLs to keep storage cost near zero.
- Keep prompts concise. Polly cost is the largest variable line item after Connect voice; shaving 20 % off prompt length shaves ~$0.04 off per-call cost.
- Share NAT across tenants. The Lambda is one deployment serving many clusters (§11.2), so the multi-AZ NAT cost amortises across every tenant using the IVR channel.
18. Open Questions / Decisions Needed
-
Scheduled Opening/Closing: Telephone voting opens and closes independently of the ONLINE and KIOSK channels, following the same model KIOSK already uses in sequent-core/src/ballot.rs: a dedicated status + period_dates pair, set via
ElectionEventStatus::set_status_by_channel(VotingStatusChannel::TELEPHONE, …). The only auto-coupling in the codebase isclose_early_voting_if_online_status_change(EARLY_VOTING ↔ ONLINE); TELEPHONE stays decoupled.Scheduled transitions reuse the existing infrastructure with no new machinery:
- Data model:
ScheduledEventrows in Hasura withevent_processor ∈ {START_VOTING_PERIOD, END_VOTING_PERIOD}and aCronConfig { cron, scheduled_date }(sequent-core/src/types/scheduled_event.rs). - Execution: Windmill's
manage_election_dates/manage_election_event_datetasks (packages/windmill/src/tasks/manage_election_dates.rs) fire on cron, map the event processor to aVotingStatus, and callvoting_status::update_election_statuswith aVec<VotingStatusChannel>. The channel list is already the extension point — today it hard-codes[ONLINE, KIOSK]for START and[ONLINE]for END; extending to TELEPHONE means either (a) adding TELEPHONE to those lists when the event event has a telephone channel configured, or (b) carrying the target channel set on theScheduledEventpayload so admins can schedule per-channel transitions. - Admin Portal: the scheduled-event editor that today produces
START_VOTING_PERIOD/END_VOTING_PERIODrows gains a per-channel selector so operators can schedule "open TELEPHONE on 2026-05-01 09:00, close 2026-05-03 20:00" independently from ONLINE/KIOSK.
Possible breaking refactor (tracked separately, not a blocker for IVR MVP): the three parallel fields on
ElectionEventStatus(voting_status/kiosk_voting_status/early_voting_status+ their*_period_dates) should be collapsed into a singleBTreeMap<VotingStatusChannel, ChannelStatus>. See Appendix C.7. - Data model:
-
Audio File Support: Should the IVR support pre-recorded audio files in addition to TTS?
- Barrie specs reference
.mp3/.wavfiles for all prompts - Amazon Connect supports both Polly TTS and S3-hosted audio
- Could extend prompt values to support
{"type": "audio", "url": "s3://..."}vs{"type": "tts", "text": "..."}
- Barrie specs reference
19. Implementation Plan — Ticket Breakdown
Survey of existing code vs. design:
playground/ivr/— throwaway number-collection demo (~300 lines), not a baseivr-lambdas/— older parallel attempt, not promotedstep/packages/keycloak-extensions/— conditional-authenticators, message-otp-authenticator exist; IVR extensions do notstep/packages/sequent-core/—VotingChannels.telephoneflag exists;VotingStatusChannel::TELEPHONE+ status fields do notstep/packages/harvest/—/insert-cast-voteexists; blacklist endpoints do notbeyond/packages/— onlyballot-audit/; noivr-lambda/,ivr-contact-flows/,keycloak-extensions/gitops/iac-aws/,gitops/unified/— noivr/tree, nophone-map.yaml
Every ticket below is TDD: write failing tests → implement → make green. Listed small enough to ship in a day or two each.
19.1 Epic 0 — Placement & scaffolding
- ADR:
beyondvsstepplacement forivr-lambdacrate,ivr-config-resourceKeycloak extension, and contact-flow JSON (§16.2). Decision doc, no code. - Scaffold
ivr-lambdacrate in chosen repo — empty binary,cargo-lambdabuild,Dockerfile.prod, wire into step's Cargo workspace. - Add
ivr-lambdatoreusable_build_push.ymlmatrix + create ECR repo.
19.2 Epic 1 — sequent-core TELEPHONE channel (Appendix C.1–C.9)
- Add
VotingStatusChannel::TELEPHONEvariant +channel_from()mapping. - Add
telephone_voting_status+telephone_voting_period_datestoElectionEventStatus+ElectionStatus+Defaultimpls + helper methods. - Wire
AzpClient::ivr-voting→VotingStatusChannel::TELEPHONEinauthorize_voter_election(Appendix C.7).
19.3 Epic 2 — Keycloak extensions
ivr-config-resourceextension — walk Direct Grant flow, stock-authenticator lookup, custom-authenticator config read, unknown-authenticator → 500.- Bearer-token gate on
ivr-config-resource— requireivr-servicetoken withcan_read_phone_blacklistrole; 401/403 negatives covered (§5.1.2). - Realm-bootstrap additions —
ivr-votingclient (ROPC),ivr-serviceclient (client_credentials), Direct Grant flow override, service-account role mapping (Appendix C.8.a/b). - (Conditional)
IvrDobAuthenticator— only if first deployment needs DoB auth (Appendix C.8.1).
19.4 Epic 3 — Blacklist backend
- Hasura migration —
sequent_backend.ivr_phone_blacklisttable + indexes + FKs. - Hasura permissions —
can_read_phone_blacklist(service role),can_manage_phone_blacklist(admin role). - Harvest CRUD endpoints for blacklist entries, reusing existing permission middleware.
TokenManager::get_service_token(realm)— per-realm token cache, Secrets Manager lookup,AuthErrortaxonomy reuse (§5.1.9).
19.5 Epic 4 — Lambda ports & adapters
- Port trait definitions — all 9 ports (Session, Auth, ElectionConfig, ElectionStatus, CastVoteHistory, VoteCasting, PhoneConfig, Blacklist, PhoneHasher); object-safety enforced; in-memory fakes.
- Shared
HasuraClient— onereqwest::Client, one retry/backoff/circuit-breaker,Arc-shared across Hasura-backed adapters (§3.5.2). - DynamoDB
Sessionadapter — conditional writes (attribute_not_exists+ version CAS), round-trip against local DynamoDB. - S3
ElectionConfigadapter — process cache keyed by(tenant_id, event_id, publication_id). - S3
PhoneConfigadapter — read-only, narrow IAM, process-cached (§6.2). - Keycloak
Authadapter — ROPC, refresh, absolute expiry, 3-category error classifier (§5.1.9). - Hasura
Blacklistadapter using service token. - Hasura
ElectionStatus+CastVoteHistoryadapters using voter JWT. - Harvest
VoteCastingadapter with deterministic idempotency key. PhoneHasheradapter — per-tenant salt in Secrets Manager, per-container cache,(hash, salt_gen)output (§9.2.1).
19.6 Epic 5 — Domain & flow engine
IvrSessionmodel — full struct per §4.1 with version field + DynamoDB serde.FlowPhaseenum +PhaseStatevariants +FlowPosition— invariant-enforced viaFlowPhase::initial_state(),FlowPosition::new/advance, exhaustiveness unit test (§3.5.3).- Outer dispatcher —
*reserved-key interception,last_responsecache, phase lookup. PhaseCtx<'a>struct of&'a dyn Portrefs +async_trait(§3.5.3).- Phase:
announcement— one executor covering welcome / declaration / pre-voting / … - Phase:
language_select. - Phase:
blacklist_check(pre-auth, PhoneHasher + Blacklist). - Phase:
auth— iteratesauth_stepsfrom/ivr-config, ROPC submission. - Phase:
eligibility_check. - Phase:
goodbye. ballot_loopshell + sub-phase dispatcher (§3.5.4).- Sub-phase:
ElectionSelect(+CastVoteHistoryPortannotation,skip_election_listlogic). - Sub-phases:
LanguageSwitch+ElectionIntro. - Sub-phases:
ContestLoop+ContestIntro. - Sub-phases:
CandidateSelect+SelectionCheck+ multi-digit DTMF handling (§3.4). - Sub-phase:
VoteConfirm+ edit mode. - Sub-phase:
ElectionSummary(edit-contest targeting,enter_contest_edithelper). - Sub-phase:
ElectionSubmit— pre-submit refresh, encrypt, POST, §5.4 error taxonomy. - Sub-phase:
ElectionReceipt— phonetic hex spelling +*repeat.
19.7 Epic 6 — i18n, prompts, SSML
validate_ivr_subtreevalidator insequent-core→TypedIvrScope(WASM-compatible, §7.2).- Prompt fallback resolver — candidate → contest → election → event → default with sentinel on miss (§7.5).
- SSML placeholder interpolation — structurally-safe vs user-supplied classes,
escape(x) == xinvariant on safe inputs (§7.2). - Default EN/FR bundle for well-known prompt keys (Appendix D).
19.8 Epic 7 — Connect & Lambda edge
- Contact-flow JSON authoring —
GetCallerPhoneNumber→ Lambda loop → Play/GetDigits → Disconnect (§12.1). - Lambda input/output types —
ConnectEvent/ConnectResponseserde round-trip tests (§4.2).
19.9 Epic 8 — Security & PIPEDA
- Per-tenant salt rotation — AWSCURRENT/AWSPREVIOUS cycle, 90-day cleanup script (§9.2.1).
- CloudWatch log redaction — raw E.164 filter, hash-only emission.
- Session TTL + post-call phone wipe on DynamoDB (§9.2.1).
19.10 Epic 9 — Monitoring
- CloudWatch metrics + structured logging (§10.1, §10.2).
- Alerts — token-error, vote-submission failure, backlog, blacklist spikes (§10.3, §5.1.9).
19.11 Epic 10 — Admin portal
- "IVR Prompts" tab — text inputs per language,
TypedIvrScopeWASM validator inline errors (§7.4). - "IVR Flow" tab — typed editor for announcement blocks + raw-JSON escape hatch using
sequent-coredeserializer (§7.4). - "Phone Blacklist" view — list/add/remove/annotate gated by
can_manage_phone_blacklist(§14.2, §6.3). - Per-election/contest/candidate IVR overrides — optional
name/alias/descriptioninputs (§14.2).
19.12 Epic 11 — GitOps / IaC
- TF module: IVR Lambda — function, alias, IAM role, log group (
gitops/iac-aws/ivr/<env>/). - TF: DynamoDB session table + TTL + autoscaling.
- TF: S3 routing bucket (versioned) + narrow IAM.
- TF: Amazon Connect instance + DIDs + contact-flow import.
- phone-map YAML → JSON renderer + Atlantis apply → S3 upload (§16.2).
- Connect concurrent-calls quota raise — AWS Support ticket template + runbook (§17.4, §16.1 Phase 3).
19.13 Epic 12 — Cross-layer tests
- Contract test —
ivr-config-resource↔ Lambda parser, happy + auth negatives (§15.3). - Record-and-replay harness +
step-ivrCLI — text-in/text-out (§15.2.1). - E2E test — scripted DTMF against dev Connect + real Keycloak + Hasura (§15.4).
- Load test — concurrent real telephony calls after quota raise (§15.5).
19.14 Epic 13 — Docs & runbooks
- Keycloak realm-bootstrap runbook for IVR clients + secret provisioning.
- Operator runbook — blacklist ops, quota escalation, salt rotation.
19.15 Dependencies & Parallelization
Critical path: 2 → 15 → 25/26 → 27/28 → phase tickets 29–43 → 49 → 48 → 65 → 67 → 68.
Parallelizable once scaffolded: Epic 1 (sequent-core), Epic 2 (Java/Keycloak), Epic 3 (Hasura/Harvest), Epic 6 (i18n in sequent-core), Epic 10 (admin portal), Epic 11 (gitops) — each team can pick up independently after Epic 0 lands.
Appendix A: Sequence Diagrams
A.1 Complete Voting Flow
Appendix B: Glossary
| Term | Definition |
|---|---|
| DTMF | Dual-Tone Multi-Frequency - touch-tone phone signals |
| IVR | Interactive Voice Response |
| Contact Flow | Amazon Connect's visual call routing builder |
| Polly | AWS text-to-speech service |
| EML | Election Markup Language - ballot definition format |
| Hasura | GraphQL engine over PostgreSQL |
| Harvest | Backend API for vote casting |
| Keycloak | Identity and access management platform |
Appendix C: Required Code Changes for TELEPHONE Channel
To support scheduled phone voting with independent start/stop times, the following code changes are required.
What already exists (no code change needed). The per-event channel-enablement flag telephone: Option<bool> is already present on VotingChannels in packages/sequent-core/src/types/hasura/core.rs alongside online, kiosk, early_voting, and paper. Admin-portal UI and Hasura schema already let operators toggle it. The changes in C.1 and C.2 below wire a matching VotingStatusChannel::TELEPHONE enum variant to that pre-existing data — they do not add the flag itself.
C.1 Add TELEPHONE to VotingStatusChannel Enum
File: packages/sequent-core/src/ballot.rs (pub enum VotingStatusChannel)
#[allow(non_camel_case_types)]
#[derive(
Serialize,
Deserialize,
Debug,
PartialEq,
Eq,
Clone,
Copy,
EnumString,
JsonSchema,
IntoStaticStr,
)]
pub enum VotingStatusChannel {
ONLINE,
KIOSK,
EARLY_VOTING,
TELEPHONE, // ADD THIS
}
C.2 Update channel_from() Method
File: packages/sequent-core/src/ballot.rs (impl VotingStatusChannel::channel_from)
One new match arm reads the pre-existing VotingChannels.telephone field:
impl VotingStatusChannel {
pub fn channel_from(
&self,
channels: &core::VotingChannels,
) -> Option<bool> {
match self {
&VotingStatusChannel::ONLINE => channels.online.clone(),
&VotingStatusChannel::KIOSK => channels.kiosk.clone(),
&VotingStatusChannel::EARLY_VOTING => channels.early_voting.clone(),
// Reads the existing `telephone: Option<bool>` flag on
// `VotingChannels` (core.rs). No struct change needed.
&VotingStatusChannel::TELEPHONE => channels.telephone.clone(),
}
}
}
C.3 Add telephone_voting_status to ElectionEventStatus
File: packages/sequent-core/src/ballot.rs (pub struct ElectionEventStatus)
#[derive(
BorshSerialize,
BorshDeserialize,
Serialize,
Deserialize,
JsonSchema,
PartialEq,
Eq,
Debug,
Clone,
Default,
)]
pub struct ElectionEventStatus {
pub voting_status: VotingStatus,
pub kiosk_voting_status: VotingStatus,
pub early_voting_status: VotingStatus,
pub telephone_voting_status: VotingStatus, // ADD THIS
pub voting_period_dates: PeriodDates,
pub kiosk_voting_period_dates: PeriodDates,
pub early_voting_period_dates: PeriodDates,
pub telephone_voting_period_dates: PeriodDates, // ADD THIS
}
C.4 Update ElectionEventStatus Methods
File: packages/sequent-core/src/ballot.rs
Update status_by_channel():
impl ElectionEventStatus {
pub fn status_by_channel(
&self,
channel: VotingStatusChannel,
) -> VotingStatus {
match channel {
VotingStatusChannel::ONLINE => self.voting_status.clone(),
VotingStatusChannel::KIOSK => self.kiosk_voting_status.clone(),
VotingStatusChannel::EARLY_VOTING => self.early_voting_status.clone(),
VotingStatusChannel::TELEPHONE => self.telephone_voting_status.clone(), // ADD THIS
}
}
}
Update set_status_by_channel():
impl ElectionEventStatus {
pub fn set_status_by_channel(
&mut self,
channel: VotingStatusChannel,
new_status: VotingStatus,
) {
let mut period_dates = match channel {
VotingStatusChannel::ONLINE => {
self.voting_status = new_status.clone();
&mut self.voting_period_dates
}
VotingStatusChannel::KIOSK => {
self.kiosk_voting_status = new_status.clone();
&mut self.kiosk_voting_period_dates
}
VotingStatusChannel::EARLY_VOTING => {
self.early_voting_status = new_status.clone();
&mut self.early_voting_period_dates
}
VotingStatusChannel::TELEPHONE => { // ADD THIS
self.telephone_voting_status = new_status.clone();
&mut self.telephone_voting_period_dates
}
};
period_dates.update_period_dates(&new_status);
}
}
C.5 Add telephone_voting_status to ElectionStatus
File: packages/sequent-core/src/ballot.rs (pub struct ElectionStatus)
#[derive(
BorshSerialize,
BorshDeserialize,
Serialize,
Deserialize,
JsonSchema,
PartialEq,
Eq,
Debug,
Clone,
)]
pub struct ElectionStatus {
pub voting_status: VotingStatus,
pub kiosk_voting_status: VotingStatus,
pub early_voting_status: VotingStatus,
pub telephone_voting_status: VotingStatus, // ADD THIS
pub voting_period_dates: PeriodDates,
pub kiosk_voting_period_dates: PeriodDates,
pub early_voting_period_dates: PeriodDates,
pub telephone_voting_period_dates: PeriodDates, // ADD THIS
pub allow_tally: Option<bool>,
}
C.6 Update ElectionStatus Methods
Similar to ElectionEventStatus, update:
status_by_channel()dates_by_channel()set_status_by_channel()
To include VotingStatusChannel::TELEPHONE cases.
C.7 Update Authorization for IVR Client
File: packages/sequent-core/src/services/authorization.rs (the azp match inside authorize_voter_election)
Per CLAUDE.md ("policies use enums, not magic strings") the azp match should not be keyed off ad-hoc string literals. Introduce an AzpClient enum in sequent-core that owns the canonical set of Keycloak client ids, annotated with the same strum derives already used elsewhere in sequent-core (see VotingStatusChannel in ballot.rs for the reference pattern: EnumString, IntoStaticStr, etc.). FromStr parses the string claim; the match on the enum is then exhaustive and compiler-checked.
// packages/sequent-core/src/types/auth.rs (new)
#[derive(
Serialize,
Deserialize,
Debug,
PartialEq,
Eq,
Clone,
Copy,
EnumString,
IntoStaticStr,
Display,
)]
pub enum AzpClient {
#[strum(serialize = "voting-portal")]
VotingPortal,
#[strum(serialize = "voting-portal-kiosk")]
VotingPortalKiosk,
#[strum(serialize = "ivr-voting")]
IvrVoting,
}
AzpClient is 1:1 with the Keycloak client ID Keycloak emits in azp for voter-issued tokens and intentionally has three variants, not four — the ONLINE and EARLY_VOTING channels share the voting-portal client. Early voting is a per-area policy (AreaPresentation.allow_early_voting) evaluated against the election event's early_voting_period_dates, not a distinct identity. The enum therefore models who authenticated; a second step resolves which VotingStatusChannel this submission belongs to, where the portal case fans out into ONLINE vs EARLY_VOTING:
The ivr-service client (Appendix C.8.b) is deliberately not an AzpClient variant. Its tokens are obtained via client_credentials — they carry no voter identity, are never submitted as ballot-casting credentials, and are never resolved into a VotingStatusChannel. AzpClient is specifically the "voter-facing client that represents a channel" enum; service clients sit outside it on purpose, so authorize_voter_election cannot accidentally accept a service-auth token as if it were a voter token.
/// Whether a portal-client submission falls inside the voter's
/// early-voting window. Computed at the call site from the area's
/// `allow_early_voting` presentation policy and the election event's
/// `early_voting_period_dates`; ignored for kiosk and IVR.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PortalTimeWindow {
Online,
EarlyVoting,
}
impl AzpClient {
/// Resolve the Keycloak client ID to the `VotingStatusChannel`
/// the submission will be tagged with. The match is exhaustive
/// on `VotingStatusChannel`, so adding a new client **or** a new
/// channel variant forces a compile error here.
pub fn to_voting_channel(
self,
portal_window: PortalTimeWindow,
) -> VotingStatusChannel {
match (self, portal_window) {
(AzpClient::VotingPortal, PortalTimeWindow::Online)
=> VotingStatusChannel::ONLINE,
(AzpClient::VotingPortal, PortalTimeWindow::EarlyVoting)
=> VotingStatusChannel::EARLY_VOTING,
(AzpClient::VotingPortalKiosk, _)
=> VotingStatusChannel::KIOSK,
(AzpClient::IvrVoting, _)
=> VotingStatusChannel::TELEPHONE,
}
}
}
authorize_voter_election parses the claim once and hands the authenticated client back to the caller, which already loads the area and election event while building the cast-vote and is the right place to evaluate the early-voting window:
pub fn authorize_voter_election(
claims: &JwtClaims,
permissions: Vec<VoterPermissions>,
election_id: &String,
) -> Result<(String, AzpClient), (Status, String)> {
// ... existing validation ...
let client = AzpClient::from_str(claims.azp.as_str())
.map_err(|_| (Status::Unauthorized, "Unknown Client".into()))?;
Ok((area_id, client))
}
The insert-cast-vote route then composes the two:
let (area_id, client) = authorize_voter_election(&claims, …, &election_id)?;
// area + election event are already loaded further down the cast-vote
// pipeline; `PortalTimeWindow` is a one-liner against
// `area.presentation.allow_early_voting` and
// `election_event.early_voting_period_dates`.
let portal_window = portal_time_window_for(&area, &election_event, now);
let voting_channel = client.to_voting_channel(portal_window);
Callers that do not care about the resulting channel (e.g. voter_electoral_log.rs, which discards _voting_channel today) can skip the resolution step entirely and match on AzpClient directly.
All four VotingStatusChannel variants are now reachable through a single compile-checked match: ONLINE and EARLY_VOTING via AzpClient::VotingPortal, KIOSK via AzpClient::VotingPortalKiosk, TELEPHONE via AzpClient::IvrVoting. The previous runtime-only "unknown client" branch is gone, and the EARLY_VOTING gap that existed on main — authorization.rs had no arm for it — is closed as part of this refactor rather than deferred.
Any other call site that currently compares claims.azp == "voting-portal" should be migrated to the enum at the same time. One of those sites deserves special attention because it has a wire-level consequence that cannot be hand-waved.
Kiosk client-ID migration: voting-portal-kiosk wins. authorization.rs accepts the kiosk azp as "voting-portal-kiosk", but packages/sequent-core/src/services/keycloak/realm.rs (line 625) also special-cases a second string — "onsite-voting-portal" — when it rewrites redirect URLs at realm-bootstrap time. That second string is a separate client in the COMELEC realm template (packages/windmill/external-bin/janitor/templates/COMELEC/keycloak.hbs ships both onsite-voting-portal and voting-portal-kiosk as distinct clients), and some fielded realms historically ship only one of the two as the polling-station client. Any realm whose polling stations authenticate through onsite-voting-portal emits azp: "onsite-voting-portal" on cast-vote, which authorization.rs today rejects as "Unknown Client" — a latent pre-existing bug, not just a cosmetic drift.
The enum refactor forces the decision. Pick voting-portal-kiosk as the canonical kiosk client:
- it is the name
authorization.rsalready accepts in production, so realms already standardised on it keep working with zero wire churn; - it matches the naming convention the rest of the realm uses (
voting-portal,voting-portal-kiosk,ivr-voting) — the-kiosksuffix is semantically parallel to theVotingStatusChannel::KIOSKvariant; onsite-voting-portalin the COMELEC template is in fact a second, separately-deployed portal web app (differentrootUrl/baseUrl, port 3003 in the template) whose purpose overlaps but is not identical to the kiosk auth client. Collapsing both names into one enum variant without picking a winner would silently paper over that deployment distinction.
Migration for realms currently shipping onsite-voting-portal as the kiosk client (wire-level, non-cosmetic):
- Realm templates and realm-bootstrap code — rename
onsite-voting-portal→voting-portal-kioskin the COMELEC template and any tenant realm templates, and update therealm.rsURL-override arm at line 625 to match only the canonical string. (If an existing deployment genuinely needs two separate polling-station clients, that is a design decision worth its own ticket — not a reason to preserve the drift here.) - Transitional compatibility shim in
AzpClient::FromStrfor the duration of the deployment rollout:The shim is narrow by construction: one extra string, one extra arm, explicitly marked for removal. It stays out of theimpl FromStr for AzpClient {
type Err = strum::ParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
// Canonical names — `#[strum(serialize = …)]` already
// generates these; listed here for clarity.
"voting-portal" => Ok(AzpClient::VotingPortal),
"voting-portal-kiosk" => Ok(AzpClient::VotingPortalKiosk),
"ivr-voting" => Ok(AzpClient::IvrVoting),
// Deprecated legacy kiosk name — some realms still ship
// this as their polling-station client. Accept it so the
// enum refactor does not become a breaking change for
// those deployments. Remove once every realm has been
// migrated (tracked on the rollout checklist below).
"onsite-voting-portal" => Ok(AzpClient::VotingPortalKiosk),
_ => Err(strum::ParseError::VariantNotFound),
}
}
}Display/IntoStaticStrdirection — serialization always emits the canonical name, so no new clients start being issued under the legacy string. - Rollout checklist: (a) merge enum + compat shim + realm-template rename; (b) per-deployment: re-run the realm-bootstrap so clients are renamed in each Keycloak realm, verify polling stations issue
azp: "voting-portal-kiosk"after the re-import, update any integration test fixtures that hard-code the legacy string; (c) once every deployment reports the legacy string as unused (a Prometheus counter on the compat arm, incremented once per legacy-string parse, is the cheapest way to tell — the counter at zero across all prod realms for a full election cycle is the go-ahead), delete the compat arm and therealm.rsURL-override branch. Track as a single meta issue so the compat-shim removal is not forgotten.
This migration is in scope for the IVR change because the refactor is the point where the drift becomes a compile-time invariant rather than a runtime surprise — deferring it would mean re-opening authorization.rs a second time for the same enum, which the refactor exists to avoid.
C.8 Create Keycloak IVR Clients
The IVR uses two Keycloak clients per realm, each a single-purpose credential. Both are installed by the realm-bootstrap (data, not code — see §13).
C.8.a ivr-voting — voter authentication (ROPC)
The client the Lambda uses to exchange voter-entered credentials (voter ID + PIN or DoB, optionally OTP) for a voter access token. One instance per realm; its azp is what identifies the TELEPHONE channel downstream (§C.7, §3.5.2).
- Client ID:
ivr-voting - Access Type: Confidential
- Direct Access Grants: Enabled (this is the ROPC voter path)
- Service Accounts Enabled: Disabled — this client must never hold a service identity. Service-auth lives on the separate
ivr-serviceclient (C.8.b) so that voter credentials and service credentials can never be confused in code or in logs - Valid Redirect URIs: N/A (no browser flow)
- Direct Grant Flow Override: Set to a custom flow that uses
ConditionalClientAuthenticatorto branch IVR-specific authentication (e.g. DoB validation) away from the standard password flow used by web clients
C.8.b ivr-service — platform IVR service client (client_credentials)
The client the Lambda uses for non-voter calls — today that means the blacklist read that runs before voter authentication (§6.3) and the /ivr-config auth-discovery read at session init (§5.1.2). One logical client installed identically in every IVR-enabled realm (same client_id: ivr-service, same client_secret), because Keycloak realms are trust boundaries and the Lambda needs a credential shape that does not depend on a caller identity.
- Client ID:
ivr-service - Access Type: Confidential
- Direct Access Grants: Disabled — no ROPC on this client, ever. It is not a user-login path
- Service Accounts Enabled: Required — this is the whole point of the client. The Lambda calls
POST /realms/{realm}/protocol/openid-connect/tokenwithgrant_type=client_credentialsand receives a service-account access token scoped to the two pre-auth reads the Lambda performs against the realm: the Hasura blacklist read (§6.3) and/ivr-configauth discovery (§5.1.2) - Valid Redirect URIs: N/A
- Service-account role mapping: grant the service account the Hasura role that carries
can_read_phone_blacklist(and only that — nevercan_manage_phone_blacklist, never voter roles, never admin roles). This is the token-level enforcement that pairs with the Hasura permission in §6.3. The same role also gates/ivr-configreads (§5.1.2, §C.8.2) — one role for both pre-auth Lambda reads, so there is still exactly one principal, one audit footprint, one rotation story - Secret storage: the
client_secretlives in AWS Secrets Manager (one secret, reused across realms because the credential material is uniform), read once by the Lambda at cold start. Rotation is a Secrets Manager update + a per-realmivr-servicesecret-reset in Keycloak, scripted through the same realm-bootstrap pipeline — no Lambda redeploy - Token caching (Lambda side): keyed by realm, refreshed when
exp - safety_marginis reached; no refresh token (client_credentials has none). SeeTokenManager::get_service_token(realm)in §5.1.9 / §6.3
Why two clients and not one with both grants enabled. Keycloak lets a single client enable both Direct Access Grants and Service Accounts, but doing so would mean a compromise of ivr-voting's secret also exposes a service identity capable of reading the blacklist (and vice versa). Splitting gives each client exactly one grant flow, exactly one role-mapping concern, and exactly one audit trail — consistent with the "policies use enums, not booleans; credentials serve one purpose" rule the rest of the design follows.
C.8.1 Custom Keycloak Authenticators for IVR
The following authenticators may be needed depending on the election event's authentication requirements:
IvrDobAuthenticator (optional — only if DoB is NOT stored as the password):
- Implements
Authenticatorfor the Direct Grant flow - Reads
dobfromcontext.getHttpRequest().getDecodedFormParameters().getFirst("dob") - Validates against the user's
date_of_birthattribute getConfigProperties()returns the IVR metadata properties (field_name,max_digits,terminator,maps_to, optionalprompt_key) so theivr-config-resourceendpoint can read them back- ~80 lines of Java, following the same pattern as existing authenticators in
packages/keycloak-extensions/
IvrOtpDirectGrantAuthenticator — deferred, not in initial scope. OTP over IVR is a possible future extension (see §5.1.4) and does not need to be built now. If ever added, it would implement Authenticator for the Direct Grant flow, check for an otp form param, generate/send/validate the code via the existing infrastructure in message-otp-authenticator, and surface otp_required to the IVR Lambda through the standard Direct Grant error channel. No Rust, Keycloak, admin-portal, or i18n work for OTP should land in the initial IVR release.
Direct Grant Flow configuration per realm:
This ensures web portal authentication (via voting-portal client) is unaffected.
C.8.2 ivr-config-resource Keycloak Extension (required)
Location (proposed): <repo>/packages/keycloak-extensions/ivr-config-resource/ — see §16.2 for the unmade beyond vs step placement decision. The directory does not exist yet in either repo; the snippet below is the extension to be written.
This is a new, always-required Keycloak extension. It exposes a single REST endpoint that the IVR Lambda calls at session init to discover the auth step list for the realm, replacing the old presentation.ivr.auth S3 config.
Endpoint:
GET /realms/{realm}/ivr-config
Response:
{
"steps": [
{ "field": "voter_id", "max_digits": 8, "terminator": "#", "maps_to": "username" },
{ "field": "pin", "max_digits": 4, "terminator": "#", "maps_to": "password" }
]
}
Implementation (~100 lines of Java):
public class IvrConfigResourceProvider implements RealmResourceProvider {
private final KeycloakSession session;
// Well-known mapping for stock Keycloak authenticators
private static final Map<String, AuthStep> STOCK_AUTHENTICATORS = Map.of(
"direct-grant-validate-username",
new AuthStep("voter_id", 8, "#", "username", null),
"direct-grant-validate-password",
new AuthStep("pin", 8, "#", "password", null)
);
// Authenticators that are present in the Direct Grant flow but should not
// surface as an IVR-collected step. Empty today; see §5.1.4 for why OTP is
// not listed here yet (it would be added if OTP-over-IVR is ever built).
private static final Set<String> SKIPPED_AUTHENTICATORS = Set.of();
@GET
@Path("/")
@Produces(MediaType.APPLICATION_JSON)
public Response getIvrConfig() {
RealmModel realm = session.getContext().getRealm();
// 1. Find effective Direct Grant flow for ivr-voting client
ClientModel ivrClient = realm.getClientByClientId("ivr-voting");
AuthenticationFlowModel flow = (ivrClient != null && ivrClient.getAuthenticationFlowBindingOverride("direct_grant") != null)
? realm.getAuthenticationFlowById(ivrClient.getAuthenticationFlowBindingOverride("direct_grant"))
: realm.getDirectGrantFlow();
// 2. Walk executions in order, filter to ENABLED/REQUIRED
List<AuthStep> steps = new ArrayList<>();
realm.getAuthenticationExecutionsStream(flow.getId())
.filter(e -> e.getRequirement() == REQUIRED || e.getRequirement() == CONDITIONAL)
.filter(e -> !SKIPPED_AUTHENTICATORS.contains(e.getAuthenticator()))
.forEachOrdered(e -> steps.add(buildStep(realm, e)));
return Response.ok(Map.of("steps", steps)).build();
}
private AuthStep buildStep(RealmModel realm, AuthenticationExecutionModel exec) {
// 3a. Stock authenticator — use static lookup
if (STOCK_AUTHENTICATORS.containsKey(exec.getAuthenticator())) {
return STOCK_AUTHENTICATORS.get(exec.getAuthenticator());
}
// 3b. Custom authenticator — read AuthenticatorConfig
AuthenticatorConfigModel cfg = realm.getAuthenticatorConfigById(exec.getAuthenticatorConfig());
if (cfg == null) {
throw new WebApplicationException(
"Unknown IVR authenticator '" + exec.getAuthenticator() +
"' has no AuthenticatorConfig — cannot derive IVR auth step",
Response.Status.INTERNAL_SERVER_ERROR);
}
Map<String, String> c = cfg.getConfig();
return new AuthStep(
c.get("field_name"),
Integer.parseInt(c.getOrDefault("max_digits", "10")),
c.getOrDefault("terminator", "#"),
c.get("maps_to"),
c.get("prompt_key") // optional override
);
}
@Override public void close() {}
}
Factory (IvrConfigResourceProviderFactory implements RealmResourceProviderFactory, ~20 lines) registers the provider under /realms/{realm}/ivr-config.
Key design points:
- Authentication required — the endpoint validates a bearer token issued by the same realm under the
ivr-serviceclient (§C.8.b) and verifies the token carries thecan_read_phone_blacklistservice-account role (the role that already gates the Lambda's pre-auth Hasura read is widened to cover this endpoint — one role, two reads, same principal). An unauthenticated or wrong-audience request returns401. The Lambda's actual call path usesTokenManager::get_service_token(realm)(§5.1.9) and reuses the cached token from the blacklist call earlier in the same turn. Rationale: see §5.1.2 — the shape of the step list is a per-realm auth fingerprint, not something to expose anonymously. - Stock authenticator lookup is hardcoded in the extension. If Keycloak renames
direct-grant-validate-usernamein a major upgrade, the extension must be updated — covered by a startup integration test that calls the endpoint against a well-known realm configuration. - Skipped authenticators list is a seam for authenticators that should not surface as an IVR-collected step — currently empty. If OTP-over-IVR is ever added (§5.1.4), its authenticator id would go here so it is reached reactively through the
otp_requirederror response rather than declared up front. - Unknown authenticators fail loudly with HTTP 500 — misconfigurations surface at deployment time (first call after deploy) instead of silently producing a broken auth flow mid-election.
- Custom authenticator config properties (
field_name,max_digits,terminator,maps_to,prompt_key) are declared by each custom authenticator'sgetConfigProperties()— Keycloak renders them as fields in the admin UI.
Build integration (proposed): create a new Maven module at the location chosen in §16.2 and include it in the Keycloak image alongside conditional-authenticators. If the module lands in step/packages/keycloak-extensions/, it slots into the existing pom.xml aggregator and Dockerfile.keycloak build stage with no cross-repo plumbing. If it lands in beyond/packages/keycloak-extensions/ (a tree that does not yet exist), the Keycloak image build must additionally reach into beyond to pick up the JAR — see §16.3.2 for the two integration patterns.
C.9 Update Default Values
File: packages/sequent-core/src/ballot.rs
Update Default implementations:
impl Default for ElectionEventStatus {
fn default() -> Self {
Self {
voting_status: Default::default(),
kiosk_voting_status: Default::default(),
early_voting_status: Default::default(),
telephone_voting_status: Default::default(), // ADD THIS
voting_period_dates: Default::default(),
kiosk_voting_period_dates: Default::default(),
early_voting_period_dates: Default::default(),
telephone_voting_period_dates: Default::default(), // ADD THIS
}
}
}
C.7 Possible Refactor: Generalize Voting Status Per Channel
The per-channel fan-out in C.3–C.6 (adding a fourth parallel telephone_voting_status + telephone_voting_period_dates pair) is structurally identical to what already happened for KIOSK and EARLY_VOTING. Each new channel doubles a pair of fields and adds a match arm everywhere. This doesn't compose — per CLAUDE.md "Product Design Philosophy," channels should scale as data, not as struct fields.
The refactor collapses the parallel fields into a single map keyed by channel:
pub struct ElectionEventStatus {
pub is_published: Option<bool>,
pub channels: BTreeMap<VotingStatusChannel, ChannelStatus>,
}
#[derive(Default, Serialize, Deserialize, …)]
pub struct ChannelStatus {
pub status: VotingStatus,
pub period_dates: PeriodDates,
}
impl ElectionEventStatus {
pub fn status_by_channel(&self, channel: VotingStatusChannel) -> VotingStatus {
self.channels.get(&channel).map(|c| c.status.clone()).unwrap_or(VotingStatus::NOT_STARTED)
}
pub fn set_status_by_channel(&mut self, channel: VotingStatusChannel, new_status: VotingStatus) {
let entry = self.channels.entry(channel).or_default();
entry.status = new_status.clone();
entry.period_dates.update_period_dates(&new_status);
}
}
With this shape, adding TELEPHONE (or any future channel) is a single enum variant — no struct changes, no new match arms in status_by_channel / set_status_by_channel, no new GraphQL columns or Hasura permissions per channel.
Why this is classified as "possible" and not a prerequisite for the IVR MVP: ElectionEventStatus is serialized on the wire in many places — it is persisted in Hasura, exported/imported as part of election bundles, referenced by close_early_voting_if_online_status_change, read by admin-portal and voting-portal TypeScript, and signed as part of the bulletin board state. A refactor touches:
- sequent-core: struct +
status_by_channel/set_status_by_channel/close_early_voting_if_online_status_change+ every match arm that pattern-matches on the flat fields. - Hasura: the PostgreSQL column (JSONB) is shape-compatible, but any computed fields, permissions, or subscriptions that project specific sub-fields (
voting_status,kiosk_voting_status, …) need to be rewritten to index intochannels. - windmill:
manage_election_dates/manage_election_event_date/voting_status::update_election_status, plus import/export in packages/windmill/src/services/import/import_election_event.rs and export counterpart. The scheduled-event pipeline already acceptsVec<VotingStatusChannel>, so the map shape is a natural fit. - harvest: any REST handlers returning or accepting
ElectionEventStatus. - admin-portal: the election-status UI, the scheduled-event editor, and anything that reads
election_event.status.voting_statusdirectly. After the refactor, these all go throughchannels[CHANNEL]. - voting-portal: any gating UI that checks
voting_statusto decide whether the "Vote" button is active. - GraphQL codegen:
yarn generate:voting-portal/yarn generate:admin-portalmust be re-run. - Migration: a one-shot data migration reads the three-field shape and writes the map shape. Export bundles need a version bump so older bundles can still be imported (read old shape → write new). This is the same backwards-compatibility concern called out in CLAUDE.md "Code Quality Standards."
Recommended sequencing: ship TELEPHONE using the C.3–C.6 parallel-field pattern (adds exactly one more channel to a pattern the codebase already tolerates), then do the map refactor as its own meta-issue. The IVR MVP does not block on it, but the refactor is worth doing before a fifth channel is ever added.
Appendix D: IVR Prompt Keys Reference
The ivr namespace is strongly typed at the boundary (see §7.2 "Rust Type: Validated IVR Sub-Tree"): every well-known prompt or spoken-text override is a variant of the IvrPromptKey enum and is consumed via TypedIvrScope, while deployment-specific custom keys are preserved on the overflow unknown map. Adding a new well-known key means adding an IvrPromptKey variant in sequent-core; adding a custom key for one deployment is a data-only change that flows through the overflow path. The tables below list the well-known keys that the built-in phase engines reference.
Event-Level Prompts
Stored in ElectionEvent.presentation.i18n[lang]["ivr"]
Core prompts (used by most deployments):
| Key | Phase | Description |
|---|---|---|
greeting | announcement: welcome | Welcome message |
language_select | language_select | Language menu |
auth_enter_username | auth | Played for the step whose maps_to is username (typically voter ID) |
auth_enter_password | auth | Played for the step whose maps_to is password (typically PIN or DoB) |
auth_enter_dob | auth | Played for custom DoB step (maps_to: dob) if IvrDobAuthenticator is in the flow |
auth_failed | auth | Authentication failed |
auth_max_attempts | auth | Max auth retries exceeded |
system_error | (any) | System error |
invalid_input | (any) | Invalid DTMF input |
timeout | (any) | Input timeout |
repeat_instruction | (any) | Reminder that pressing * repeats the current prompt. Typically included once in the greeting and on long prompts where re-listening is likely |
goodbye | goodbye | Farewell message |
Extended prompts (Barrie-style deployments):
| Key | Phase | Description |
|---|---|---|
blacklist_message | blacklist_check | Phone number blocked. Since blacklist runs before language selection, this prompt should work before the caller has chosen a language |
eligibility_check | eligibility_check | Eligibility validation in progress |
not_eligible | eligibility_check | Not authorized to vote |
not_active | eligibility_check | Credentials deactivated |
election_closed | ballot_loop | Telephone voting not open (played when telephone_voting_status is not OPEN) |
declaration_text | announcement: declaration | Legal declaration text |
pre_voting_statement | announcement: pre_voting_statement | Disconnect warning / info |
receipt_info | ballot_loop (ElectionReceipt) | About to read the ballot locator for this election |
receipt_number | ballot_loop (ElectionReceipt) | Per-election ballot locator readback — first 4 hex characters of ballot_id, spoken phonetically (uses \{confirmation_number\}, \{election_name\}) |
session_expired | (any) | Session timeout |
IVR-Only Spoken Text Overrides
Stored in *.presentation.i18n[lang]["ivr"] at event, election, contest, and candidate scope
| Key | Typical Scope | Fallback |
|---|---|---|
name | Event, election, contest, candidate | Portal name / name_i18n |
alias | Event, election, contest, candidate | Portal alias / alias_i18n |
description | Event, election, contest, candidate | Portal description / description_i18n |
Election-Level Prompts
Stored in Election.presentation.i18n[lang]["ivr"]
| Key | Phase | Template Variables | Description |
|---|---|---|---|
election_intro | ballot_loop | \{election_name\} | Election introduction |
contest_intro | ballot_loop | \{contest_name\}, \{max_votes\} | Contest introduction |
candidate_option | ballot_loop | \{number\}, \{candidate_name\} | Candidate option |
vote_confirm | ballot_loop | \{candidate_name\}, \{contest_name\} | Vote confirmation |
already_selected | ballot_loop | - | Duplicate selection (only reachable via race condition; normally unselected candidates are omitted from list) |
blank_ballot_confirm | ballot_loop | - | Blank ballot confirmation |
decline_confirm | ballot_loop | - | Decline-to-vote confirmation |
summary_intro | ballot_loop (ElectionSummary) | - | Per-election summary introduction |
summary_item | ballot_loop (ElectionSummary) | \{contest_name\}, \{candidate_name\}, \{contest_number\} | Summary line item per contest — includes contest number for edit selection |
summary_edit_prompt | ballot_loop (ElectionSummary) | - | "Press 00# to submit, or press a contest number followed by # to change your selection for that contest" |
summary_edit_restart | ballot_loop (ElectionSummary) | \{contest_name\} | "Changing your selection for {contest_name}. Your previous selections for this contest have been cleared." |
vote_success | ballot_loop (ElectionSubmit) | \{election_name\} | Ballot submitted for this election |
vote_failed | ballot_loop (ElectionSubmit) | - | Vote submission failed |
duplicate_vote | ballot_loop (ElectionSubmit) | - | Already voted in this election |
max_revotes_exceeded | ballot_loop (ElectionSubmit) | - | Max revotes exceeded for this election |
Template Variables
| Variable | Source | Example |
|---|---|---|
\{election_name\} | IVR name override if present, else election.get_name(lang) | "Municipal Council" |
\{contest_name\} | IVR name override if present, else contest.get_name(lang) | "Mayor" |
\{candidate_name\} | IVR name override if present, else candidate name / name_i18n | <lang xml:lang="fr-CA">Jean-François Côté</lang> |
\{number\} | DTMF mapping | "1" |
\{max_votes\} | contest.max_votes | "3" |
\{min_votes\} | contest.min_votes | "1" |
\{confirmation_number\} | First 4 hex characters of ballot_id, formatted phonetically per ballot_loop.config.receipt_format | "alpha three foxtrot two" |
\{assistance_phone\} | ivr.assistance_phone config | "1-800-555-0199" |