MCC AI Interviewer — Giovanni Braghieri
01 · Overview

I designed and built the AI Interviewer, a production conversational-control system for consulting case interviews. It runs structured simulations (problem identification → framework → analysis → recommendation) while measuring how far each candidate deviates from an ideal solution path and applying correction policy in real time.

The interviewer speaks naturally; a separate orchestration layer owns belief state, advancement, and extraction. Domain experts configure cases declaratively, and the engine compiles that into phase gates, extraction ontologies, and behavioral policy at runtime.

The product thesis: a credible AI interviewer requires explicit state and governable policy, not a better persona.

02 · Context

Case interviews are rigidly structured. A real interviewer enforces phase discipline, catches structural mistakes early, and only advances when the key artifacts are locked: problem definition, a MECE framework, correct math, a clear recommendation.

Most AI interview products collapse this into one LLM call per turn. That fails in production: the model forgets gating rules, advances too early, or cannot tell whether the candidate stated segment-level profit or an option total.

The platform needed an interviewer that feels human and stays governable: coach-configurable, testable in CI, and debuggable turn by turn. I owned the core engine: orchestrator, perception layer, prompt policy, advancement logic, the coach authoring abstraction, and the automated regression harness.

03 · Problem

The hard problem is controlled dialogue under partial observability:

01
Unstructured input, structured expectations
Candidates speak naturally. The system must infer whether they completed clarify_objective, proposed a concrete framework, or filled the correct ROI slot in a granular analysis ontology.
02
Drift is continuous, not binary
Running ahead into analysis before structure is locked is a different failure mode from a vague problem definition or segment-versus-total confusion. The interviewer treats conversation as trajectory control, not turn-by-turn Q&A.
03
One model call cannot own everything
Generation, perception, and policy (when to advance, when to redirect) have different failure modes. Mixing them produces an interviewer that sounds authoritative while advancing on vibes.
04
Experts must steer behavior without redeploying
Coaches need to define case logic, gates, and rubrics as data. Engineering should scale with engine capability, not with N× prompt maintenance.
04 · What I built
01Multi-component orchestration pipeline

Each turn runs through orchestrator.ts as a deterministic pipeline with selective LLM calls:

Component
Role
Interviewer service
Builds phase-aware prompts; generates the coach utterance
Action extractor
Per-action classification over the full turn (candidate + interviewer)
Semantic gates
Deterministic advancement preconditions on artifact readiness
Advancement judge
Rescue-only LLM adjudication for ambiguous cases; never blocks a gate-ready advance
Framework compiler
Multi-turn synthesis of the candidate's framework across all structuring turns
Analysis extractor
Ontology-driven numeric slot filling; deterministic-first, LLM escalation on low confidence
Phase summarizer
Bounded context carry-forward on bucket advance

Each turn emits structured diagnostics: action status updates, drift-control artifact deltas, gate failures, judge verdicts, and optional advance proposals.

02Conversation as state-space control

I modeled the interview as movement through buckets (global phases) along an ideal action graph. The ideal path is the ordered actions per bucket with required / sufficient / contributing / optional states; the observed path is extracted artifacts plus completed actions plus turn-band pressure.

Canonical state vector (DriftControlArtifacts)

problem_definition decision, objective, metrics solution_structure MECE tree / nested nodes analysis_results[] dimension, option, result (segment → aggregate) case_conclusion preferred option + rationale compiled_framework read-only multi-turn summary (not used for gating)

Control signals

  • Turn bands (early → light → heavy → lastTurn): correction aggressiveness increases with time in phase.
  • Consecutive next-bucket evidence: a guarded override requires two turns, not single-turn hallucination.
  • Locked artifact keys: immutable on phase ADVANCE.
  • Advance proposals: borderline judge verdicts become offers the candidate can accept or decline.
03Four behavioral abstractions

Instead of one monolithic prompt, interviewer behavior decomposes into composable policy dimensions mapped to prompt sections:

Abstraction
Controls
Drift control
How far off-path is tolerable before redirect
Correction aggressiveness
One short probe or a firm redirect, modulated by turn band
Advancement standard
Quality bar to unlock a phase transition
Tone
Interviewer voice within constraint-based leading

A two-tier prompt architecture composes these at runtime from case JSON: a universal base for behavioral realism, plus phase-specific rules (identify_problem, frame_solution, lead_analysis, provide_recommendations). bucketPromptBuilder.ts assembles case description, previous-bucket context, in-run artifacts, turn-band instructions, drift rules, calculation references, and coach-authored guidance.

04Perception layer

The action extractor runs one LLM call per action for clearer classification with action-specific examples. Key design choices:

  • Dyadic acceptance: framework_proposed only completes when the interviewer turn shows acceptance language, not a refinement question.
  • Solution-graph matching: MECE node alignment against a case-defined semantic graph.
  • Artifact merge with lock respect: upsert analysis results without overwriting locked keys.

The framework compiler fixes a real bug class: solution_structure from a single acceptance turn misses multi-turn frameworks. The compiler reads the full frame_solution transcript on ADVANCE; gating still uses the gate artifact, display uses the compiled summary. Analysis extraction uses an explicit state space of dozens of granular slots (for example consumer_credit_revenue_cars versus consumer_credit_revenue) that aggregate to validation slots.

05Advancement policy (deterministic + rescue LLM)

Advancement is evidence-based, not conversational:

gateShouldAdvance = f(artifact readiness, action completion, turn cap, semantic gate) shouldAdvance = gateShouldAdvance OR judge.rescuedAdvance OR proposal.accepted policy cascade: bucket.advancePolicy → case.advancePolicyOverrides[phase] → PHASE_ADVANCE_DEFAULTS → GLOBAL_DEFAULT

The semantic gate requires strong shape (for example decision + objective + metrics) before identify_problem can close. The advancement judge is a single LLM call over gate failures, turn band, merged artifacts, transcript, and per-phase rubric, returning advance / propose / hold. Effective bias escalates with turn pressure so candidates are not trapped. The judge is rescue-only: it can unblock a stuck candidate, but it never overrides a gate-ready advance.

06Coach authoring abstraction

Cases are declarative programs for the engine, with no code per scenario:

  • Buckets with objectives, actions, gates, turn caps, and turn-band instruction overrides
  • Calculations with expected results for in-prompt validation
  • Information-to-provide with triggers (exhibit routing, whitelisted facts)
  • Solution-graph node IDs for structure matching, plus a runtimeConfig ontology
  • Phase editors in the coach dashboard; the orchestrator always runs authoritative published content

Coaches input domain knowledge once; the engine interprets it as constraints on the state space, and the four behavioral abstractions inject at compile time rather than being hand-edited in TypeScript.

07User Agent test harness

Automated regression runs against the real orchestrator, not mocks:

User Simulator (LLM candidate) → AI Interviewer (orchestrator under test)
  • Deterministic suite (hard gate): phase transitions, ADVANCE/ROLLBACK, artifact extraction, numeric parsing, gate readiness.
  • LLM scenario runners: behavioral regression, drift detection, key collection.
  • Turn-level telemetry and structured TurnDetail make prompt and policy iteration measurable.

The voice path shares the same orchestrator contract over HTTP, so text and voice are surfaces on one engine.

05 · Product decisions
Separate orchestration from generation.

The interviewer sounds human; the orchestrator is paranoid. Mixing them creates confident-sounding premature advances.

Advancement is evidence-based.

Phase transitions require artifact readiness, action completion, and turn caps. The judge rescues; it does not govern.

Acceptance is dyadic.

Artifacts credit only when the interviewer validated them in the same turn, which prevents marking structure the coach never accepted.

Compile multi-turn for display, extract single-turn for gating.

Different consumers, different correctness requirements.

Declarative cases over prompt forks.

New cases ship as JSON and DB content, so engineering scales with engine features.

Deterministic-first extraction.

Ontology mapping before the LLM: cheaper, faster, auditable, the kind of NLU discipline production needs.

Rescue-only judge behind flags.

Higher-autonomy behaviors roll out safely behind feature flags (ADVANCEMENT_JUDGE_ENABLED, ADVANCEMENT_PROPOSALS_ENABLED).

Interviewer-first turn ordering.

The interviewer generates first; action extraction then runs on the full turn including coach acceptance, so advancement ties to interviewer behavior rather than pre-response guesswork.

06 · Technical approach

One interview turn, end to end:

candidate message → orchestrator → prompt builder (4 abstractions + phase rules) → interviewer LLM → response → action extractor (per-action perception) → artifact merge → semantic gate → advance? ── ambiguous → advancement judge (rescue-only) → phase summary (lock artifacts) → structured response + telemetry

RunState (client + server)

Messages (ring buffer, cap 50) activeBucketId, actionStatus, driftControlArtifacts lockedArtifactKeys, phaseSummaries, advanceProposal consecutiveTurnsPreviousBucketOnly, consecutiveNextBucketEvidenceTurns analysisExtractionEpoch (stale async write rejection)

Engineering properties

  • Pure policy functions in advancementPolicy.ts, testable without the API
  • Gate diagnostics on every turn, production-debuggable
  • Phase summaries plus a rolling message window (last 6 for the model) for bounded context
  • Exhibit eager-resolve and data-claim reconciliation, keeping spoken claims consistent with shown exhibits
  • API: POST /api/orchestrator/process (turn loop) and /opening; a Zustand store applies structured updates client-side
07 · Outcome

A production-grade conversational control system, running live in the product:

  • Coaches author full interview flows without engineering per case
  • Turn-level advancement is explainable: gate failures, missing artifacts, judge rationale
  • Automated regression catches drift in policy and language
  • The analysis phase supports fine-grained state tracking validated against case calculations
  • One orchestrator powers the text interview room, the voice agent, and the User Agent QA harness

You can try the live interviewer at myconsultingcoach.com/practice.