Build a Real-Time Voice Interview Coach with TypeScript and LiveKit
Do you struggle with interviews? You're not alone! You can have the best interview notes in the world, but when you start having that vocal conversation, you might end up a deer in the headlights, freezing and forgetting everything you thought you prepared for.
There's good news though!
With modern AI tools like LiveKit, you can have a voice conversation with an AI agent, mocking a real interview experience. Imagine uploading a job description and your resume and being immediately paired with an expert (the agent) to ask you real questions for the job and providing feedback on how you answer and present yourself.
In this tutorial we'll explore using LiveKit and TypeScript, paired with various LLM models and Apache Tika, to establish a very realistic interview coaching experience.
The Prerequisites
To be successful with this tutorial and build a great interview coach, you'll need the following:
- Node.js 24+
- A LiveKit account
- Apache Tika (optional)
The truth is, I don't know what the minimum acceptable Node.js version is for this project. I am using Node.js 24 and it works great, so we can just assume that is the version you'll want.
LiveKit will be doing all of the orchestration between our agents and our frontend. You're going to need an account, but the good news is that the free tier should work fine for this project, at least at the scale that we're operating on.
Want to elevate the project and the interview experience? Consider installing Apache Tika or a similar tool that can extract text from PDFs and other popular document formats. We can fall back to Markdown in this project, but how often do you find yourself submitting your resume as Markdown? Probably not frequently, but the good news is Apache Tika is super easy to set up and use.
The Project Architecture
Before we jump into the actual development of our interview coach, it is best that we take a step back to understand each of the components that will go into our project.
The interview coach will consist of the following core components:
- A web service to manage file uploads and application memory.
- An agent that will communicate with our web service and the LiveKit service.
- A frontend that will communicate with our web service and the LiveKit service.
So what are we attempting to accomplish here and what aren't we trying to accomplish?
We could skip the web service component and just have our agents and our frontend. While we could come up with a bulletproof prompt for our agent to conduct interviews, the number of job possibilities is going to be huge, so the interview experience might be shallow. Instead, the goal with the web service is to allow resume uploads and job description uploads. This data is then consumed by a dispatched agent to make the coaching experience more authentic and tailored to the job and the recipient's experience.
What's interesting is the frontend never actually communicates with a dispatched agent. Both will communicate to the web service, but LiveKit does the orchestration between agent and frontend.
With all this in mind, we're going to have the following root directory structure:
- backend
- frontend
The backend will have an agent directory and a server directory while the frontend will just have our TypeScript and HTML. Just remember, when it comes to the backend we will have a separation of services.
Manage Resumes and Job Descriptions with an Express Web Server
We're going to start with the web service for managing our context material. The following will represent our project structure:
backend/
├── .env
├── package.json
├── tsconfig.json
└── src/
├── shared/
│ └── types.ts
└── server/
├── index.ts
├── routes/
│ └── session.ts
├── services/
│ └── tika.ts
└── store/
└── sessionStore.tsWe'll kick things off with installing the dependencies in our package.json file. This can be done with the following commands:
pnpm add @livekit/agents @livekit/agents-plugin-deepgram @livekit/agents-plugin-openai @livekit/agents-plugin-livekit @livekit/agents-plugin-silero @livekit/protocol cors dotenv express livekit-server-sdk multer zod
pnpm add @types/cors @types/express @types/multer @types/node concurrently tsx typescript --save-devRather than breaking down each of the dependencies we just installed right now, we'll talk about them as we approach them in the tutorial. If you're not using PNPM in your development setup, go ahead and make the necessary changes to use NPM instead.
We need to add some fairly standard TypeScript configuration to the project. Add the following to the tsconfig.json file:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "Bundler",
"lib": ["ES2022"],
"outDir": "dist",
"rootDir": "src",
"strict": true,
"noUncheckedIndexedAccess": true,
"esModuleInterop": true,
"skipLibCheck": true,
"resolveJsonModule": true,
"forceConsistentCasingInFileNames": true,
"declaration": false,
"sourceMap": true
},
"include": ["src/**/*.ts"]
}Next, let's get our custom type definitions created. Add the following to the shared/types.ts file:
export interface SessionContext {
jobDescription: string;
resume: string;
createdAt: Date;
}
export interface CreateSessionResponse {
sessionId: string;
token: string;
roomName: string;
serverUrl: string;
}
export interface JobMetadata {
sessionId: string;
}
export const SUMMARY_STREAM_TOPIC = 'interview.summary';
export const STATUS_STREAM_TOPIC = 'interview.status';It is important to note that this type definition file will be shared between the agent project and the web service project. Remember, the agents will communicate with the web service over HTTP so it's probably a good idea to share this configuration rather than duplicate it between projects. Ignore the SUMMARY_STREAM_TOPIC and STATUS_STREAM_TOPIC constants for now as they are more relevant to the agent, not the web service.
With a lot of the setup out of the way, we can focus on the fun stuff!
To reduce the number of moving pieces as much as possible, we're not going to use a database in this tutorial. In production you'd probably want to use MongoDB, Redis, or something similar. For this example we're just going to store everything in memory.
Open the project's server/store/sessionStore.ts file and include the following:
import type { SessionContext } from '../../shared/types.js';
export interface SessionStore {
create(id: string, ctx: Omit<SessionContext, 'createdAt'>): void;
get(id: string): SessionContext | undefined;
list(): Array<{ id: string } & SessionContext>;
delete(id: string): boolean;
}
const SESSION_TTL_MS = 30 * 60 * 1000; // 30 minutes
const SWEEP_INTERVAL_MS = 60 * 60 * 1000; // 1 hour
class InMemorySessionStore implements SessionStore {
private sessions = new Map<string, SessionContext>();
private sweepTimer: ReturnType<typeof setInterval>;
constructor() {
console.warn('[store] using in-memory session store — data lost on restart');
this.sweepTimer = setInterval(() => this.sweep(), SWEEP_INTERVAL_MS);
if (this.sweepTimer.unref) this.sweepTimer.unref();
}
create(id: string, ctx: Omit<SessionContext, 'createdAt'>): void {
this.sessions.set(id, { ...ctx, createdAt: new Date() });
}
get(id: string): SessionContext | undefined {
return this.sessions.get(id);
}
list(): Array<{ id: string } & SessionContext> {
return Array.from(this.sessions.entries()).map(([id, ctx]) => ({ id, ...ctx }));
}
delete(id: string): boolean {
return this.sessions.delete(id);
}
private sweep(): void {
const cutoff = Date.now() - SESSION_TTL_MS;
for (const [id, ctx] of this.sessions) {
if (ctx.createdAt.getTime() < cutoff) {
this.sessions.delete(id);
}
}
}
}
export const sessionStore: SessionStore = new InMemorySessionStore();The above code is for a very basic key-value store with TTL functionality. For every key (id) we'll be storing a SessionContext which includes the plaintext job description, a plaintext resume, and a timestamp when the entry was created. The sweep function is called on a timer and when called the timestamp of each entry will be compared against the defined expiration duration. If considered expired it will be released from memory. While not required in a non-production scenario, it doesn't hurt to have it.
I should reiterate once more that this in-memory solution works fine for development, but don't use it in production. It is not stateful and your resources could deplete quite quickly. Use a database and you'll be better off.
With the session store in place, let's jump over to the Apache Tika component if you've chosen to make use of it.
Open the project's server/services/tika.ts file and include the following:
const TIKA_URL = process.env.TIKA_URL ?? 'http://localhost:9998';
function sanitizeFilename(name: string): string {
return name.replace(/[^a-zA-Z0-9._-]/g, '_');
}
export async function extractText(
buffer: Buffer,
filename: string,
): Promise<string> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 30_000);
try {
const res = await fetch(`${TIKA_URL}/tika`, {
method: 'PUT',
headers: {
'Content-Disposition': `attachment; filename="${sanitizeFilename(filename)}"`,
Accept: 'text/plain',
},
body: buffer,
signal: controller.signal,
});
if (!res.ok) {
throw new Error(
`Tika extraction failed for "${filename}": ${res.status} ${await res.text()}`,
);
}
return (await res.text()).trim();
} finally {
clearTimeout(timeout);
}
}The beauty of Apache Tika is you can use it as a web service when it is set up. With the extractText function, we are taking the uploaded PDF document or similar and sending it to Apache Tika with an HTTP request. You could use axios if you wanted, but we're using fetch here. The response we get from Apache Tika will be a plaintext variant of what we sent.
Fun fact, if you send your resume to Apache Tika and you can't make sense of the response, it's likely the job application services such as Greenhouse can't make sense of it either and could explain why you're not receiving interviews.
If you can believe it, we're almost done with the web service.
Open the project's server/routes/session.ts file and include the following API endpoints:
import { Router, type Request, type Response } from 'express';
import { randomUUID } from 'node:crypto';
import multer from 'multer';
import { AccessToken } from 'livekit-server-sdk';
import { RoomAgentDispatch, RoomConfiguration } from '@livekit/protocol';
import { sessionStore } from '../store/sessionStore.js';
import { extractText } from '../services/tika.js';
import type { CreateSessionResponse } from '../../shared/types.js';
const router = Router();
const upload = multer({
storage: multer.memoryStorage(),
limits: { fileSize: 5 * 1024 * 1024 }, // 5 MB
});
const ALLOWED_MIME_TYPES = new Set([
'text/plain',
'text/markdown',
'application/pdf',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
]);
const ALLOWED_EXTENSIONS = new Set(['.txt', '.md', '.pdf', '.docx']);
const {
LIVEKIT_API_KEY,
LIVEKIT_API_SECRET,
LIVEKIT_URL,
AGENT_NAME = 'interview-assistant',
} = process.env;
if (!LIVEKIT_API_KEY || !LIVEKIT_API_SECRET || !LIVEKIT_URL) {
throw new Error('LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET must be set');
}
async function resolveField(
textField: string | undefined,
file: Express.Multer.File | undefined,
): Promise<string | undefined> {
if (file) {
return extractText(file.buffer, file.originalname);
}
return textField?.trim() || undefined;
}
router.post(
'/session',
upload.fields([
{ name: 'jobDescriptionFile', maxCount: 1 },
{ name: 'resumeFile', maxCount: 1 },
]),
async (req: Request, res: Response) => {
const files = req.files as
| { [fieldname: string]: Express.Multer.File[] }
| undefined;
const jobDescriptionText = req.body.jobDescription as string | undefined;
const resumeText = req.body.resume as string | undefined;
const jobDescriptionFile = files?.jobDescriptionFile?.[0];
const resumeFile = files?.resumeFile?.[0];
for (const file of [jobDescriptionFile, resumeFile]) {
if (!file) continue;
const ext = '.' + file.originalname.split('.').pop()?.toLowerCase();
if (!ALLOWED_EXTENSIONS.has(ext) && !ALLOWED_MIME_TYPES.has(file.mimetype)) {
return res.status(400).json({
error: `Unsupported file type: ${file.originalname}. Accepted: .txt, .md, .pdf, .docx`,
});
}
}
const [jobDescription, resume] = await Promise.all([
resolveField(jobDescriptionText, jobDescriptionFile),
resolveField(resumeText, resumeFile),
]);
if (!jobDescription) {
return res.status(400).json({
error: 'jobDescription is required (text or file upload)',
});
}
if (!resume) {
return res.status(400).json({
error: 'resume is required (text or file upload)',
});
}
const sessionId = randomUUID();
const roomName = `interview-${sessionId}`;
const participantIdentity = `candidate-${sessionId}`;
sessionStore.create(sessionId, { jobDescription, resume });
const at = new AccessToken(LIVEKIT_API_KEY!, LIVEKIT_API_SECRET!, {
identity: participantIdentity,
ttl: '30m',
});
at.addGrant({
room: roomName,
roomJoin: true,
canPublish: true,
canSubscribe: true,
canPublishData: true,
});
at.roomConfig = new RoomConfiguration({
agents: [
new RoomAgentDispatch({
agentName: AGENT_NAME,
metadata: JSON.stringify({ sessionId }),
}),
],
});
const token = await at.toJwt();
const body: CreateSessionResponse = {
sessionId,
token,
roomName,
serverUrl: LIVEKIT_URL!,
};
return res.status(201).json(body);
},
);
router.get('/session', (_req: Request, res: Response) => {
return res.json(sessionStore.list());
});
router.get('/session/:id', (req: Request, res: Response) => {
const { id } = req.params;
if (!id) return res.status(400).json({ error: 'missing session id' });
const ctx = sessionStore.get(id);
if (!ctx) return res.status(404).json({ error: 'session not found' });
return res.json({
jobDescription: ctx.jobDescription,
resume: ctx.resume,
createdAt: ctx.createdAt,
});
});
router.delete('/session/:id', (req: Request, res: Response) => {
const { id } = req.params;
if (!id) return res.status(400).json({ error: 'missing session id' });
const deleted = sessionStore.delete(id);
return res.status(deleted ? 204 : 404).end();
});
export default router;The above file actually contains quite a bit. We're going to break it down one endpoint at a time to make sense of it.
const upload = multer({
storage: multer.memoryStorage(),
limits: { fileSize: 5 * 1024 * 1024 }, // 5 MB
});
const ALLOWED_MIME_TYPES = new Set([
'text/plain',
'text/markdown',
'application/pdf',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
]);
const ALLOWED_EXTENSIONS = new Set(['.txt', '.md', '.pdf', '.docx']);We're not shooting for production ready, but we still want some basic validation in place. We're going to limit our file uploads (assuming we're using Apache Tika) to 5MB in size. We also want to make sure we restrict the file types and file extensions.
To accommodate plaintext and document format scenarios, we want to route whatever was provided:
async function resolveField(
textField: string | undefined,
file: Express.Multer.File | undefined,
): Promise<string | undefined> {
if (file) {
return extractText(file.buffer, file.originalname);
}
return textField?.trim() || undefined;
}If a file was provided, that will take priority and we'll return the plaintext response. Otherwise, we'll just return whatever plaintext or Markdown was provided in an input field.
Most of our logic will be in the POST endpoint:
router.post(
'/session',
upload.fields([
{ name: 'jobDescriptionFile', maxCount: 1 },
{ name: 'resumeFile', maxCount: 1 },
]),
async (req: Request, res: Response) => {
const files = req.files as
| { [fieldname: string]: Express.Multer.File[] }
| undefined;
const jobDescriptionText = req.body.jobDescription as string | undefined;
const resumeText = req.body.resume as string | undefined;
const jobDescriptionFile = files?.jobDescriptionFile?.[0];
const resumeFile = files?.resumeFile?.[0];
for (const file of [jobDescriptionFile, resumeFile]) {
if (!file) continue;
const ext = '.' + file.originalname.split('.').pop()?.toLowerCase();
if (!ALLOWED_EXTENSIONS.has(ext) && !ALLOWED_MIME_TYPES.has(file.mimetype)) {
return res.status(400).json({
error: `Unsupported file type: ${file.originalname}. Accepted: .txt, .md, .pdf, .docx`,
});
}
}
const [jobDescription, resume] = await Promise.all([
resolveField(jobDescriptionText, jobDescriptionFile),
resolveField(resumeText, resumeFile),
]);
if (!jobDescription) {
return res.status(400).json({
error: 'jobDescription is required (text or file upload)',
});
}
if (!resume) {
return res.status(400).json({
error: 'resume is required (text or file upload)',
});
}
const sessionId = randomUUID();
const roomName = `interview-${sessionId}`;
const participantIdentity = `candidate-${sessionId}`;
sessionStore.create(sessionId, { jobDescription, resume });
const at = new AccessToken(LIVEKIT_API_KEY!, LIVEKIT_API_SECRET!, {
identity: participantIdentity,
ttl: '30m',
});
at.addGrant({
room: roomName,
roomJoin: true,
canPublish: true,
canSubscribe: true,
canPublishData: true,
});
at.roomConfig = new RoomConfiguration({
agents: [
new RoomAgentDispatch({
agentName: AGENT_NAME,
metadata: JSON.stringify({ sessionId }),
}),
],
});
const token = await at.toJwt();
const body: CreateSessionResponse = {
sessionId,
token,
roomName,
serverUrl: LIVEKIT_URL!,
};
return res.status(201).json(body);
},
);In the above endpoint we start by looking at the payload, validating it, and passing it to our resolveField function, waiting for a plaintext response to be used as our context for the agent.
With the plaintext variants of our user-provided data, we can save it to our session store with a unique identification key. This is where we start to include LiveKit into the discussion.
const at = new AccessToken(LIVEKIT_API_KEY!, LIVEKIT_API_SECRET!, {
identity: participantIdentity,
ttl: '30m',
});
at.addGrant({
room: roomName,
roomJoin: true,
canPublish: true,
canSubscribe: true,
canPublishData: true,
});
at.roomConfig = new RoomConfiguration({
agents: [
new RoomAgentDispatch({
agentName: AGENT_NAME,
metadata: JSON.stringify({ sessionId }),
}),
],
});
const token = await at.toJwt();We need to create an access token for the candidate (interviewee). We are only using the SDK here to create our token. We are not communicating with an agent and we are not communicating with LiveKit. The big thing here is in the RoomAgentDispatch object. We are adding the session id to this object which will later allow a dispatched agent to query our web service for the context data.
So if we're not communicating with LiveKit or an agent from this endpoint, how is this going to work?
The frontend will call this POST endpoint and the response which includes the access token will be sent from the frontend to LiveKit. Remember, the frontend doesn't communicate directly with the agents. The access token is signed with our secret keys, so we don't have to worry about any shenanigans down the road.
Remember, at some point a dispatched agent will know about the session id. That brings us to the following endpoint:
router.get('/session/:id', (req: Request, res: Response) => {
const { id } = req.params;
if (!id) return res.status(400).json({ error: 'missing session id' });
const ctx = sessionStore.get(id);
if (!ctx) return res.status(404).json({ error: 'session not found' });
return res.json({
jobDescription: ctx.jobDescription,
resume: ctx.resume,
createdAt: ctx.createdAt,
});
});The agent will make a request to our web service with the session id to get the job description and resume information. We are not adding this directly to the RoomAgentDispatch and the access token in general because that could result in a large string leading to performance or other issues. It's better practice to have the agent make the request for more information.
When the coaching session ends, the agent will call the following endpoint:
router.delete('/session/:id', (req: Request, res: Response) => {
const { id } = req.params;
if (!id) return res.status(400).json({ error: 'missing session id' });
const deleted = sessionStore.delete(id);
return res.status(deleted ? 204 : 404).end();
});The above endpoint will remove the session information from memory. If you wanted, you could just let the sweep function in our in-memory datastore take care of it, but it doesn't hurt to have an endpoint as well.
We're almost done! We just need to configure Express.
Open the project's server/index.ts file and include the following TypeScript code:
import 'dotenv/config';
import express from 'express';
import cors from 'cors';
import sessionRouter from './routes/session.js';
const app = express();
app.use(cors());
app.use(express.json({ limit: '1mb' }));
app.get('/health', (_req, res) => res.json({ ok: true }));
app.get('/capabilities', (_req, res) => {
res.json({
fileUpload: !!process.env.TIKA_URL,
});
});
app.use('/', sessionRouter);
const port = Number(process.env.PORT ?? 3000);
app.listen(port, () => {
console.log(`[server] listening on http://localhost:${port}`);
});For the sake of this tutorial, we're not going to worry about cross-origin resource sharing (CORS). Instead we're allowing requests from all origins, or in this case our frontend which will serve on a different port.
We have two additional endpoints added to this file. The first is for health checking, something useful if you wanted to package with Docker. The second is a capabilities endpoint that we can use to tell our frontend that we do or don't have Apache Tika available. This is further determined by an environment variable that we'll set in a moment.
When we serve our backend web service we are doing it on port 3000 by default.
You've seen quite a few environment variables that we've skipped over so far. You can set them in the .env file of the project:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=
OPENAI_API_KEY=
DEEPGRAM_API_KEY=
PORT=3000
INTERNAL_API_URL=http://localhost:3000
AGENT_NAME=interview-coach
TIKA_URL=http://localhost:9998This .env file will be shared with the dispatched agents and the web service. If the TIKA_URL is set, the capabilities endpoint will say it is available, otherwise it will say it isn't. This is important for our frontend in the future. Take a moment to sign into your LiveKit account to populate the other environment variables which will be used for the access token creation and by our agents.
Make sure AGENT_NAME is set. Both the web service and the agent worker read it from this .env file, and both processes need to agree on the same value. The web service uses it when it embeds the dispatch on the token, and the agent worker uses it when it registers itself with LiveKit. If the values diverge, dispatch will silently fail and the agent will never join the room.
You'll also notice we're listing OPENAI_API_KEY and DEEPGRAM_API_KEY. The shorthand provider strings used later in the agent (openai/gpt-4o, deepgram/nova-3, cartesia/...) route through LiveKit Inference, which means LiveKit handles provider authentication for you. Whether you actually need to set those provider keys depends on your LiveKit project configuration, so check the current LiveKit docs for whether your account is using Inference or your own provider credentials.
Believe it or not, the web service for managing our job descriptions and resumes is complete. We can run the service with the following command:
tsx watch src/server/index.tsFor convenience, you probably want to add the above as a script in your package.json file.
Delegate Dispatched Agents with the LiveKit SDK for Node.js Backends
The web service we just created does not handle any AI or voice interactions. This is where our agent development comes in, and believe it or not it might be simpler than what we had just created.
The following represents the project structure for our agents:
backend/
└── src/
└── agent/
├── coach.ts
├── index.ts
└── prompt.tsRemember, this should be merged with our web service even though they will be run as two different processes.
The dispatched agent has a few jobs:
- It will maintain a conversation history.
- It will generate a summary of the conversation.
- It will interact with LiveKit and the web service.
Let's start with the file that really doesn't have any application logic. Open the project's agent/prompt.ts file and include the following:
export function buildCoachInstructions(
jobDescription: string,
resume: string,
): string {
return `You are an expert interview coach conducting a realistic mock interview.
Your goal is to help the candidate practice for a specific role and give them useful, actionable feedback.
## Role being interviewed for
${jobDescription}
## Candidate resume
${resume}
## How to run the interview
- Open with a brief, warm introduction. Confirm the role you are practicing and set expectations: a mix of behavioral and role-specific technical questions, one at a time.
- Ask ONE question at a time and wait for the candidate to finish before responding. Never stack multiple questions.
- Draw technical questions from the requirements in the job description. Draw behavioral and follow-up questions from specific details in the resume. If the job description is sparse or lacks specific requirements, supplement your questions with your own knowledge of what a typical interview for this type of role would cover.
- After each answer, give brief, encouraging, concrete feedback (one or two sentences). If the answer is shallow or vague, ask a follow-up to probe for more depth before moving on. Only transition to the next question once the candidate has had a real opportunity to elaborate.
- Keep a mental running list of strengths, gaps, and noteworthy answers to inform the written summary at the end.
- Keep your speaking turns short and conversational — this is a voice conversation, not an essay.
## Ending the interview
When the candidate signals they want to stop (phrases like "end the interview", "that's all", "I'm done", "stop here", "wrap up"), OR when you have covered enough ground for a complete session, call the \`endInterview\` tool immediately. Do not verbally acknowledge the request first — call the tool directly without any prior closing statement. The tool handles the closing and will emit the written summary to the candidate and disconnect. Do not end the interview any other way.
## Tone
Supportive, professional, and direct. You are a coach, not a judge. Celebrate strong answers and reframe weak ones as opportunities to improve.`;
}
export function buildSummaryInstructions(): string {
return `The interview is now complete. Based on the full conversation so far, produce a structured written summary for the candidate in Markdown. Use exactly these sections and headings:
# Interview Feedback
## Overall impression
A short paragraph capturing how the candidate performed overall.
## Strengths
A bulleted list of specific strengths observed, citing concrete moments from the conversation.
## Areas for improvement
A bulleted list of specific areas to work on, each paired with a concrete suggestion.
## Question-by-question notes
For each question you asked, one short bullet: the question (paraphrased) followed by a one-line note on the answer.
## Recommended next steps
A short bulleted list of concrete preparation steps tailored to this role and this candidate.
Return ONLY the Markdown. Do not include any preamble, apology, or commentary outside the sections above.`;
}As you can probably guess from the file name and from eyeballing the content, this file just has some prompts, two prompts in particular.
The buildCoachInstructions function returns a prompt with our job description and resume injected into it. It provides instructions to our agent on how to conduct the interview. The instructions are important, but the real win here is the injection of our user-uploaded material. This will make the interaction with the voice agent that much better.
The buildSummaryInstructions function returns a prompt with instructions on what to do when the conversation ends. It defines how the summary should be presented to the user (interviewee). The gist of it here is that the user will be provided with feedback on how to improve based on the contents of the transcript.
Let's jump into the agent/coach.ts file:
import { voice, llm, inference, type JobContext } from '@livekit/agents';
import { z } from 'zod';
import { SUMMARY_STREAM_TOPIC, STATUS_STREAM_TOPIC } from '../shared/types.js';
import { buildSummaryInstructions } from './prompt.js';
type Room = JobContext['room'];
interface CoachAgentOptions {
instructions: string;
room: Room;
sessionId: string;
onEnd: () => Promise<void> | void;
}
export class CoachAgent extends voice.Agent {
constructor(opts: CoachAgentOptions) {
super({
instructions: opts.instructions,
tools: {
endInterview: llm.tool({
description:
'Call this as soon as the candidate signals they want to end the interview (e.g. "that\'s all", "end the interview", "I\'m done", "stop here") or when the interview has naturally concluded. Do not verbally acknowledge the request before calling — call this tool directly. This tool will emit the written summary to the candidate and disconnect. Do not end the interview any other way.',
parameters: z.object({}),
execute: async (_args, { ctx: runCtx }) => {
runCtx.speechHandle.allowInterruptions = false;
await opts.room.localParticipant?.sendText('generating-summary', {
topic: STATUS_STREAM_TOPIC,
});
const summary = await generateSummary(this);
await publishSummary(opts.room, summary);
await opts.onEnd();
return 'Interview ended. Summary delivered.';
},
}),
},
});
}
}
function buildTranscript(agent: voice.Agent): string {
return agent.chatCtx.items
.filter((item): item is llm.ChatMessage => item.type === 'message')
.map(({ role, content }) => {
const text = Array.isArray(content)
? content.filter((c): c is string => typeof c === 'string').join(' ')
: typeof content === 'string'
? content
: '';
return text ? `${role}: ${text}` : null;
})
.filter((line): line is string => line !== null)
.join('\n');
}
async function generateSummary(agent: voice.Agent): Promise<string> {
const transcript = buildTranscript(agent);
const chatCtx = llm.ChatContext.empty();
chatCtx.addMessage({ role: 'system', content: buildSummaryInstructions() });
chatCtx.addMessage({
role: 'user',
content: `Here is the full interview transcript:\n\n${transcript}`,
});
const summaryLLM = new inference.LLM({
model: 'openai/gpt-4o',
provider: 'openai',
modelOptions: { temperature: 0.4 },
});
try {
const stream = summaryLLM.chat({ chatCtx });
let out = '';
for await (const chunk of stream) {
if (chunk.delta?.content) out += chunk.delta.content;
}
return out.trim() || '# Interview Feedback\n\n_Empty summary._';
} catch (err) {
console.error('[agent] summary generation failed', err);
return '# Interview Feedback\n\n_Sorry — we were unable to generate your written summary. Please try another session._';
}
}
async function publishSummary(room: Room, summary: string): Promise<void> {
try {
await room.localParticipant?.sendText(summary, {
topic: SUMMARY_STREAM_TOPIC,
});
} catch (err) {
console.error('[agent] failed to publish summary', err);
}
}We have a lot going on here, so we're going to break it down.
We'll start by jumping straight into the buildTranscript function:
function buildTranscript(agent: voice.Agent): string {
return agent.chatCtx.items
.filter((item): item is llm.ChatMessage => item.type === 'message')
.map(({ role, content }) => {
const text = Array.isArray(content)
? content.filter((c): c is string => typeof c === 'string').join(' ')
: typeof content === 'string'
? content
: '';
return text ? `${role}: ${text}` : null;
})
.filter((line): line is string => line !== null)
.join('\n');
}Remember, the agent will be maintaining a transcript of our conversation. The chat context can include items other than spoken messages such as tool-call records and their results. The filter on item.type === 'message' keeps just the user and assistant messages so the summary prompt sees a clean transcript instead of internal tool plumbing.
We'll see it in a moment, but the generateSummary function will make use of gpt-4o to summarize the transcript:
async function generateSummary(agent: voice.Agent): Promise<string> {
const transcript = buildTranscript(agent);
const chatCtx = llm.ChatContext.empty();
chatCtx.addMessage({ role: 'system', content: buildSummaryInstructions() });
chatCtx.addMessage({
role: 'user',
content: `Here is the full interview transcript:\n\n${transcript}`,
});
const summaryLLM = new inference.LLM({
model: 'openai/gpt-4o',
provider: 'openai',
modelOptions: { temperature: 0.4 },
});
try {
const stream = summaryLLM.chat({ chatCtx });
let out = '';
for await (const chunk of stream) {
if (chunk.delta?.content) out += chunk.delta.content;
}
return out.trim() || '# Interview Feedback\n\n_Empty summary._';
} catch (err) {
console.error('[agent] summary generation failed', err);
return '# Interview Feedback\n\n_Sorry — we were unable to generate your written summary. Please try another session._';
}
}The summary instructions from our agent/prompt.ts file will be used in the generateSummary function to guide what happens. It's important to recognize that LiveKit is not generating this summary. The LLM we defined is generating the summary and it doesn't have to be gpt-4o if you don't want it to be.
Next we have the publishSummary function:
async function publishSummary(room: Room, summary: string): Promise<void> {
try {
await room.localParticipant?.sendText(summary, {
topic: SUMMARY_STREAM_TOPIC,
});
} catch (err) {
console.error('[agent] failed to publish summary', err);
}
}The publishSummary function will send the summary as a text stream to LiveKit and then from LiveKit to the frontend. At this point the frontend is listening for it and then will render it when received.
So how do we wire it all up? This is where the CoachAgent class comes into play:
export class CoachAgent extends voice.Agent {
constructor(opts: CoachAgentOptions) {
super({
instructions: opts.instructions,
tools: {
endInterview: llm.tool({
description:
'Call this as soon as the candidate signals they want to end the interview (e.g. "that\'s all", "end the interview", "I\'m done", "stop here") or when the interview has naturally concluded. Do not verbally acknowledge the request before calling — call this tool directly. This tool will emit the written summary to the candidate and disconnect. Do not end the interview any other way.',
parameters: z.object({}),
execute: async (_args, { ctx: runCtx }) => {
runCtx.speechHandle.allowInterruptions = false;
await opts.room.localParticipant?.sendText('generating-summary', {
topic: STATUS_STREAM_TOPIC,
});
const summary = await generateSummary(this);
await publishSummary(opts.room, summary);
await opts.onEnd();
return 'Interview ended. Summary delivered.';
},
}),
},
});
}
}The CoachAgent class will be used in our project's agent/index.ts file, but the foundation we are putting into place will define what happens when the interview ends and how it ends. When the LLM determines that we are ending the interview based on the instructions, we send a text stream on the STATUS_STREAM_TOPIC which is a different topic than what we saw in the publishSummary function. We have this particular text stream to tell our frontend that the application didn't freeze, we're just taking a moment to generate the summary. We can show a spinner in the frontend until the summary publishes.
We have one file to go to bring it all together!
Open the project's agent/index.ts file and include the following TypeScript code:
import 'dotenv/config';
import {
type JobContext,
type JobProcess,
ServerOptions,
cli,
defineAgent,
voice,
} from '@livekit/agents';
import * as silero from '@livekit/agents-plugin-silero';
import * as livekit from '@livekit/agents-plugin-livekit';
import { fileURLToPath } from 'node:url';
import { CoachAgent } from './coach.js';
import { buildCoachInstructions } from './prompt.js';
import type { JobMetadata, SessionContext } from '../shared/types.js';
const AGENT_NAME = process.env.AGENT_NAME ?? 'interview-coach';
const INTERNAL_API_URL = process.env.INTERNAL_API_URL ?? 'http://localhost:3000';
async function fetchSessionContext(
sessionId: string,
): Promise<Pick<SessionContext, 'jobDescription' | 'resume'>> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
const res = await fetch(`${INTERNAL_API_URL}/session/${sessionId}`, {
signal: controller.signal,
});
if (!res.ok) {
throw new Error(
`failed to load session ${sessionId}: ${res.status} ${await res.text()}`,
);
}
return (await res.json()) as Pick<SessionContext, 'jobDescription' | 'resume'>;
} finally {
clearTimeout(timeout);
}
}
async function deleteSessionContext(sessionId: string): Promise<void> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
await fetch(`${INTERNAL_API_URL}/session/${sessionId}`, {
method: 'DELETE',
signal: controller.signal,
});
} catch (err) {
console.error('[agent] failed to delete session', sessionId, err);
} finally {
clearTimeout(timeout);
}
}
export default defineAgent({
prewarm: async (proc: JobProcess) => {
proc.userData.vad = await silero.VAD.load();
},
entry: async (ctx: JobContext) => {
const vad = ctx.proc.userData.vad as silero.VAD;
let sessionId: string | undefined;
try {
const meta = ctx.job.metadata ? (JSON.parse(ctx.job.metadata) as JobMetadata) : undefined;
sessionId = meta?.sessionId;
} catch {
// ignore — validated below
}
if (!sessionId) {
console.error('[agent] missing sessionId in job metadata — aborting');
await ctx.shutdown();
return;
}
const { jobDescription, resume } = await fetchSessionContext(sessionId);
const instructions = buildCoachInstructions(jobDescription, resume);
const session = new voice.AgentSession({
vad,
stt: 'deepgram/nova-3',
llm: 'openai/gpt-4o',
tts: 'cartesia/sonic-2:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc',
turnHandling: {
turnDetection: new livekit.turnDetector.MultilingualModel(),
endpointing: {
minDelay: 1200,
maxDelay: 4000,
},
interruption: {
minWords: 2,
resumeFalseInterruption: true,
falseInterruptionTimeout: 2000,
},
},
});
let ended = false;
const endSession = async () => {
if (ended) return;
ended = true;
await deleteSessionContext(sessionId!);
ctx.shutdown();
};
ctx.addShutdownCallback(async () => {
if (!ended) await deleteSessionContext(sessionId!);
});
const agent = new CoachAgent({
instructions,
room: ctx.room,
sessionId,
onEnd: endSession,
});
await session.start({ agent, room: ctx.room });
await ctx.connect();
session.generateReply({
instructions:
'Greet the candidate warmly, confirm the role being practiced (from the job description in your system prompt), briefly explain that you will ask a mix of behavioral and role-specific questions one at a time, and ask your first question.',
});
},
});
cli.runApp(
new ServerOptions({
agent: fileURLToPath(import.meta.url),
agentName: AGENT_NAME,
}),
);Once again, we have a lot going on and need to break it down.
Remember that GET endpoint we created in the web service for doing a session lookup by id? Here we're making use of it:
async function fetchSessionContext(
sessionId: string,
): Promise<Pick<SessionContext, 'jobDescription' | 'resume'>> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
const res = await fetch(`${INTERNAL_API_URL}/session/${sessionId}`, {
signal: controller.signal,
});
if (!res.ok) {
throw new Error(
`failed to load session ${sessionId}: ${res.status} ${await res.text()}`,
);
}
return (await res.json()) as Pick<SessionContext, 'jobDescription' | 'resume'>;
} finally {
clearTimeout(timeout);
}
}By the time the fetchSessionContext is called, the session id has already been extracted from the room's dispatch metadata and it can be used to find the job description and resume for the person being interviewed.
Speaking of endpoints, remember the session cleanup endpoint? Here we're doing our cleanup:
async function deleteSessionContext(sessionId: string): Promise<void> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10_000);
try {
await fetch(`${INTERNAL_API_URL}/session/${sessionId}`, {
method: 'DELETE',
signal: controller.signal,
});
} catch (err) {
console.error('[agent] failed to delete session', sessionId, err);
} finally {
clearTimeout(timeout);
}
}We do have a sweep happening on our session store in case you forgot about the deleteSessionContext function or didn't feel like including it. However, it is good practice to do cleanup on-demand rather than relying on a TTL which may come at a way later time.
Next we can define the LiveKit agent worker:
export default defineAgent({
prewarm: async (proc: JobProcess) => {
proc.userData.vad = await silero.VAD.load();
},
entry: async (ctx: JobContext) => {
const vad = ctx.proc.userData.vad as silero.VAD;
// Session ID comes from the dispatch metadata set on the token.
let sessionId: string | undefined;
try {
const meta = ctx.job.metadata ? (JSON.parse(ctx.job.metadata) as JobMetadata) : undefined;
sessionId = meta?.sessionId;
} catch {
// ignore — validated below
}
if (!sessionId) {
console.error('[agent] missing sessionId in job metadata — aborting');
await ctx.shutdown();
return;
}
const { jobDescription, resume } = await fetchSessionContext(sessionId);
const instructions = buildCoachInstructions(jobDescription, resume);
const session = new voice.AgentSession({
vad,
stt: 'deepgram/nova-3',
llm: 'openai/gpt-4o',
tts: 'cartesia/sonic-2:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc',
turnHandling: {
turnDetection: new livekit.turnDetector.MultilingualModel(),
// Interview answers need more pause room than casual chat.
// 1.2 s min gives candidates time to collect thoughts mid-sentence.
endpointing: {
minDelay: 1200,
maxDelay: 4000,
},
// Require at least 2 words before treating user speech as an
// interruption, reducing false positives from filler sounds.
interruption: {
minWords: 2,
resumeFalseInterruption: true,
falseInterruptionTimeout: 2000,
},
},
});
let ended = false;
const endSession = async () => {
if (ended) return;
ended = true;
await deleteSessionContext(sessionId!);
ctx.shutdown();
};
ctx.addShutdownCallback(async () => {
if (!ended) await deleteSessionContext(sessionId!);
});
const agent = new CoachAgent({
instructions,
room: ctx.room,
sessionId,
onEnd: endSession,
});
await session.start({ agent, room: ctx.room });
await ctx.connect();
session.generateReply({
instructions:
'Greet the candidate warmly, confirm the role being practiced (from the job description in your system prompt), briefly explain that you will ask a mix of behavioral and role-specific questions one at a time, and ask your first question.',
});
},
});The defineAgent has two functions:
- The
prewarmfunction runs once per worker process before agents join a room. This helps prevent any delay when an agent and interviewee conversation starts. - The
entryfunction runs for every dispatched room job. So if three candidates use the application to have an interview, theentryfunction will be run three times.
So what is happening in the prewarm function? We're loading the Silero VAD model, which is a voice activity detection model. This model is useful because it will help with a natural conversation between the human and the agent. In an interview it is common for the interviewee to pause frequently, stutter, etc., and you wouldn't want the agent to assume the user is ready for a response.
The entry function extracts the session id from the room's dispatch metadata. Remember, the Express POST endpoint embeds this in the token's RoomAgentDispatch config, and LiveKit hands it to the worker as job metadata when the room is dispatched. Next, the job description and resume is fetched using the session id value. Using all our available information we can construct a voice.AgentSession with various models and MultilingualModel turn detection enabled. MultilingualModel handles turn detection itself. The endpointing and interruption blocks alongside it are tuned independently for interview cadence, with longer minimum pauses so candidates can collect their thoughts, and a 2-word minimum before treating speech as an interruption to avoid false positives from filler sounds.
We are also defining cleanup in the entry function for when the session ends and we are finally starting the agent session. Starting the agent session connects the worker to the room and triggers a greeting from the interview coach.
The agent needs to be run when we start our application, just like starting Express to listen for connections. This can be done through the following:
cli.runApp(
new ServerOptions({
agent: fileURLToPath(import.meta.url),
agentName: AGENT_NAME,
}),
);To run the agent, we execute the following:
tsx src/agent/index.ts devJust like with Express, you'd probably want to add a script to your package.json file.
There is one thing to note. The Silero VAD plugin and the MultilingualModel turn detector both rely on local model assets that ship with their plugins. We need to download those assets before starting the agent for the first time. The STT, LLM, and TTS providers we configured (deepgram/nova-3, openai/gpt-4o, cartesia/...) are remote inference and require no local download. We can fetch the local assets by executing the following:
tsx src/agent/index.ts download-filesThe above only needs to be run once, unless you need model updates.
Since the web service and the agent worker are two separate processes that have to run side-by-side, it's worth wiring up a single command to start them both. The concurrently dev dependency we installed earlier is exactly for that. A script like the following in package.json will run both with labelled output:
"dev": "concurrently -n server,agent -c blue,magenta \"tsx watch src/server/index.ts\" \"tsx src/agent/index.ts dev\""Believe it or not, most of the difficult work is over. When we start working on the frontend, the client SDK does most of the work.
Interact with the Web Server and Voice Agents with the LiveKit Client SDK
As previously mentioned, the frontend will leverage the LiveKit Client SDK to communicate between LiveKit and a dispatched agent. The SDK will handle all of the heavy lifting such as the user's microphone, sending and receiving audio, etc.
The project structure for the frontend will look like the following:
frontend/
├── .env
├── index.html
├── package.json
├── tsconfig.json
├── vite.config.ts
└── src/
├── components/
│ ├── app-shell.ts
│ ├── live-session-view.ts
│ ├── post-session-view.ts
│ └── pre-session-view.ts
├── lib/
│ └── session-client.ts
├── main.ts
├── styles.css
└── vite-env.d.tsWe're trying to keep things as simple as possible, so we're not working with React or a bloated framework. In a production environment, React is probably the way to go.
The important code in this project exists in the src/components and lib directories. The high-level idea here is that the src/lib/session-client.ts file handles all the LiveKit work and each of the files in the src/components directory represents a different stage in the user interactions. The src/components/app-shell.ts file is a wrapper for the other views and acts as an orchestrator.
Before we can start development, we need to install the various dependencies. With the frontend project as the working directory, execute the following commands from the command line:
pnpm add dompurify lit livekit-client marked
pnpm add @tailwindcss/typography @tailwindcss/vite tailwindcss typescript vite --save-devI'm using PNPM, but if you're not, just adjust all the commands to use NPM instead.
We'll start by getting the basic configuration out of the way. If you're using various CLIs, you probably won't need to do this manually, but a quick copy and paste won't hurt.
In the project's vite.config.ts file, include the following:
import { defineConfig } from 'vite';
import tailwindcss from '@tailwindcss/vite';
export default defineConfig({
plugins: [tailwindcss()],
server: {
port: 5173,
},
build: {
outDir: 'dist',
emptyOutDir: true,
},
});The above configuration file will be valuable when serving from your local computer. Next, move into the project's tsconfig.json file and add the following:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "Bundler",
"lib": ["ES2022", "DOM", "DOM.Iterable"],
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"skipLibCheck": true,
"esModuleInterop": true,
"resolveJsonModule": true,
"isolatedModules": true,
"verbatimModuleSyntax": true,
"experimentalDecorators": true,
"useDefineForClassFields": false,
"noEmit": true
},
"include": ["src"]
}Going down the line of project bootstrapping, open the project's index.html file and add the following HTML:
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>LiveKit Interview Assistant</title>
</head>
<body class="bg-slate-950 text-slate-100 antialiased">
<app-shell></app-shell>
<script type="module" src="/src/main.ts"></script>
</body>
</html>Because the backend and frontend exist separately, we need to tell Vite where to find our backend. In the .env file, add the following:
VITE_BACKEND_URL=http://localhost:3000If you've chosen to use a different port from the default 3000 on the backend web service, make sure to update it in the environment variable file for the frontend.
We're almost done with the basic setup!
Open the project's src/vite-env.d.ts file and add the following type definitions:
/// <reference types="vite/client" />
interface ImportMetaEnv {
readonly VITE_BACKEND_URL: string;
}
interface ImportMeta {
readonly env: ImportMetaEnv;
}Again, this is all basic setup for Vite and whatnot. This isn't the fun stuff we'll see for LiveKit. If you're using a different frontend framework, your boilerplate setup will be different.
The UI will use Tailwind, so add the following to the src/styles.css file:
@import "tailwindcss";
@plugin "@tailwindcss/typography";Finally we can wire up all of our files in the src/main.ts file, the same file referenced in the index.html file. Add the following to the src/main.ts file:
import './styles.css';
import './components/app-shell';
import './components/pre-session-view';
import './components/live-session-view';
import './components/post-session-view';It took a little time to get here, but we're ready to add actual features to our project. We're going to start with the LiveKit library file because it will be referenced in our other files.
It's big, but we're going to break it down after. Add the following to the project's src/lib/session-client.ts file:
import {
Room,
RoomEvent,
Track,
type RemoteTrack,
type RemoteTrackPublication,
type RemoteParticipant,
} from 'livekit-client';
export interface CreateSessionResponse {
sessionId: string;
token: string;
roomName: string;
serverUrl: string;
}
export const SUMMARY_STREAM_TOPIC = 'interview.summary';
export const STATUS_STREAM_TOPIC = 'interview.status';
const BACKEND_URL = import.meta.env.VITE_BACKEND_URL;
export interface Capabilities {
fileUpload: boolean;
}
export async function fetchCapabilities(): Promise<Capabilities> {
const res = await fetch(`${BACKEND_URL}/capabilities`);
if (!res.ok) throw new Error(`Failed to fetch capabilities: ${res.status}`);
return res.json() as Promise<Capabilities>;
}
export interface MixedSessionParams {
jobDescription?: string;
jobDescriptionFile?: File;
resume?: string;
resumeFile?: File;
}
export async function createSession(
params: MixedSessionParams,
): Promise<CreateSessionResponse> {
const form = new FormData();
if (params.jobDescription) {
form.append('jobDescription', params.jobDescription);
}
if (params.jobDescriptionFile) {
form.append('jobDescriptionFile', params.jobDescriptionFile);
}
if (params.resume) {
form.append('resume', params.resume);
}
if (params.resumeFile) {
form.append('resumeFile', params.resumeFile);
}
const res = await fetch(`${BACKEND_URL}/session`, {
method: 'POST',
body: form,
});
if (!res.ok) {
const body = (await res.json().catch(() => ({}))) as { error?: string };
throw new Error(body.error ?? `Server error ${res.status}`);
}
return (await res.json()) as CreateSessionResponse;
}
export type LiveStatus = 'connecting' | 'listening' | 'agent-speaking' | 'disconnected';
export interface LiveSessionCallbacks {
onStatus: (status: LiveStatus) => void;
onSummary: (markdown: string) => void;
onGeneratingSummary: () => void;
onError: (message: string) => void;
}
export class LiveSession {
private room: Room;
private summaryReceived = false;
private audioElements: HTMLElement[] = [];
constructor(private callbacks: LiveSessionCallbacks) {
this.room = new Room({ adaptiveStream: true, dynacast: true });
}
async join(serverUrl: string, token: string): Promise<void> {
this.callbacks.onStatus('connecting');
this.room.registerTextStreamHandler(SUMMARY_STREAM_TOPIC, async (reader) => {
this.summaryReceived = true;
const md = await reader.readAll();
this.callbacks.onSummary(md);
});
this.room.registerTextStreamHandler(STATUS_STREAM_TOPIC, async () => {
this.callbacks.onGeneratingSummary();
});
this.room.on(
RoomEvent.TrackSubscribed,
(track: RemoteTrack, _pub: RemoteTrackPublication, _participant: RemoteParticipant) => {
if (track.kind === Track.Kind.Audio) {
const el = track.attach();
el.style.display = 'none';
document.body.appendChild(el);
this.audioElements.push(el);
}
},
);
this.room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => {
const agentSpeaking = speakers.some((p) => p !== this.room.localParticipant);
this.callbacks.onStatus(agentSpeaking ? 'agent-speaking' : 'listening');
});
this.room.on(RoomEvent.Disconnected, () => {
this.callbacks.onStatus('disconnected');
if (!this.summaryReceived) {
this.callbacks.onSummary(
'# Interview Ended\n\n_The session disconnected before a written summary was generated._',
);
}
});
try {
await this.room.connect(serverUrl, token);
await this.room.localParticipant.setMicrophoneEnabled(true);
this.callbacks.onStatus('listening');
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
this.callbacks.onError(`Connection failed: ${message}`);
}
}
async endInterview(): Promise<void> {
try {
await this.room.localParticipant.sendText(
'End the interview now.',
{ topic: 'lk.chat' },
);
} catch {
// If sending fails, fall back to disconnect
await this.leave();
}
}
async leave(): Promise<void> {
for (const el of this.audioElements) {
el.remove();
}
this.audioElements = [];
try {
await this.room.disconnect();
} catch {
// swallow — Disconnected event handles UI transition
}
}
}Let's explore the highlights and thought process in the above file.
Remember the capabilities endpoint in the backend web service? We have a fetchCapabilities function that will get the information for us:
export async function fetchCapabilities(): Promise<Capabilities> {
const res = await fetch(`${BACKEND_URL}/capabilities`);
if (!res.ok) throw new Error(`Failed to fetch capabilities: ${res.status}`);
return res.json() as Promise<Capabilities>;
}This will tell us later if we have access to Apache Tika. It was an optional feature, but I strongly recommend it. Otherwise, we're just going to default to an input text field for a plaintext-formatted resume and job description. That is usually inconvenient for the user.
The createSession function is responsible for sending the user data to the backend web service:
export async function createSession(
params: MixedSessionParams,
): Promise<CreateSessionResponse> {
const form = new FormData();
if (params.jobDescription) {
form.append('jobDescription', params.jobDescription);
}
if (params.jobDescriptionFile) {
form.append('jobDescriptionFile', params.jobDescriptionFile);
}
if (params.resume) {
form.append('resume', params.resume);
}
if (params.resumeFile) {
form.append('resumeFile', params.resumeFile);
}
const res = await fetch(`${BACKEND_URL}/session`, {
method: 'POST',
body: form,
});
if (!res.ok) {
const body = (await res.json().catch(() => ({}))) as { error?: string };
throw new Error(body.error ?? `Server error ${res.status}`);
}
return (await res.json()) as CreateSessionResponse;
}The function will send over whatever is provided, whether that be plaintext data or files. The backend, as we've seen, will sort out the finer details and determine what to store in our session storage.
The join function within the LiveSession class is where most of the magic happens.
When attempting to connect to a LiveKit room, we first register our TOPIC strings:
this.room.registerTextStreamHandler(SUMMARY_STREAM_TOPIC, async (reader) => {
this.summaryReceived = true;
const md = await reader.readAll();
this.callbacks.onSummary(md);
});
this.room.registerTextStreamHandler(STATUS_STREAM_TOPIC, async () => {
this.callbacks.onGeneratingSummary();
});Remember, the dispatched agent will send text streams over these named topics, so we're registering what happens when the frontend receives them. In this case, we're sending information to the callbacks that appear in each of the components TypeScript files. We are not defining that logic here, only what we're sending.
this.room.on(
RoomEvent.TrackSubscribed,
(track: RemoteTrack, _pub: RemoteTrackPublication, _participant: RemoteParticipant) => {
if (track.kind === Track.Kind.Audio) {
const el = track.attach();
el.style.display = 'none';
document.body.appendChild(el);
this.audioElements.push(el);
}
},
);The RoomEvent.TrackSubscribed is fired when a remote participant publishes a track. In this case the only remote participant is the AI voice agent, not another human. So when the voice audio arrives, the browser plays it.
While not 100% necessary, we can have the following:
this.room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => {
const agentSpeaking = speakers.some((p) => p !== this.room.localParticipant);
this.callbacks.onStatus(agentSpeaking ? 'agent-speaking' : 'listening');
});The above event is valuable in case we want to change the UI when the active speaker changes. If the human is speaking, show it in the UI. If the AI agent is speaking, show that. The functionality of this event happens through the callback defined in each of the components pages.
When all of the events are registered, we can make an attempt to connect:
try {
await this.room.connect(serverUrl, token);
await this.room.localParticipant.setMicrophoneEnabled(true);
this.callbacks.onStatus('listening');
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
this.callbacks.onError(`Connection failed: ${message}`);
}Remember that access token we got from the web service? The token that included our session id? We're using it here when we try to connect. We're also enabling the microphone so our conversation can start.
Now we need to focus on the cleanup found within the src/lib/session-client.ts file.
async endInterview(): Promise<void> {
try {
await this.room.localParticipant.sendText(
'End the interview now.',
{ topic: 'lk.chat' },
);
} catch {
// If sending fails, fall back to disconnect
await this.leave();
}
}In the endInterview function we are sending a message to the agent with the text "End the interview now". In the backend agent code, our LLM logic is looking for phrases like this. When the LLM picks up this message, the processing starts in the backend to generate a summary.
To clean up the frontend, we have the following:
async leave(): Promise<void> {
for (const el of this.audioElements) {
el.remove();
}
this.audioElements = [];
try {
await this.room.disconnect();
} catch {
// swallow — Disconnected event handles UI transition
}
}The leave function will remove any audio elements and disconnect from the LiveKit room.
Believe it or not, that was all the LiveKit frontend logic. The rest of the files just construct the UX and interact with our library file.
The remaining frontend files are going to be a bit messy. Rather than explaining every piece, it may be worth copying them as they are, or exploring the GitHub project listed at the end of the tutorial.
Looking at the src/components/pre-session-view.ts file, we have the following:
import { LitElement, html } from 'lit';
import { customElement, state } from 'lit/decorators.js';
import { createSession, fetchCapabilities, type MixedSessionParams } from '../lib/session-client';
type InputMode = 'paste' | 'upload';
interface FieldState {
mode: InputMode;
textValue: string;
file: File | null;
dragging: boolean;
}
@customElement('pre-session-view')
export class PreSessionView extends LitElement {
protected createRenderRoot() {
return this;
}
@state() private loading = false;
@state() private error = '';
@state() private fileUploadSupported = false;
@state() private jd: FieldState = {
mode: 'paste',
textValue: '',
file: null,
dragging: false,
};
@state() private resume: FieldState = {
mode: 'paste',
textValue: '',
file: null,
dragging: false,
};
override connectedCallback(): void {
super.connectedCallback();
fetchCapabilities()
.then((caps) => {
this.fileUploadSupported = caps.fileUpload;
if (caps.fileUpload) {
this.jd = { ...this.jd, mode: 'upload' };
this.resume = { ...this.resume, mode: 'upload' };
}
})
.catch(() => {
// Capabilities fetch failed — default to paste-only
});
}
private canSubmit(): boolean {
const jdOk = this.jd.mode === 'paste'
? this.jd.textValue.trim().length > 0
: this.jd.file !== null;
const resumeOk = this.resume.mode === 'paste'
? this.resume.textValue.trim().length > 0
: this.resume.file !== null;
return jdOk && resumeOk && !this.loading;
}
private setMode(field: 'jd' | 'resume', mode: InputMode) {
if (field === 'jd') {
this.jd = { ...this.jd, mode };
} else {
this.resume = { ...this.resume, mode };
}
}
private setText(field: 'jd' | 'resume', value: string) {
if (field === 'jd') {
this.jd = { ...this.jd, textValue: value, file: null };
} else {
this.resume = { ...this.resume, textValue: value, file: null };
}
}
private setFile(field: 'jd' | 'resume', file: File | null) {
if (field === 'jd') {
this.jd = { ...this.jd, file, textValue: '' };
} else {
this.resume = { ...this.resume, file, textValue: '' };
}
this.error = '';
}
private setDragging(field: 'jd' | 'resume', dragging: boolean) {
if (field === 'jd') {
this.jd = { ...this.jd, dragging };
} else {
this.resume = { ...this.resume, dragging };
}
}
private async onStart() {
if (!this.canSubmit()) return;
this.loading = true;
this.error = '';
const params: MixedSessionParams = {};
if (this.jd.mode === 'paste') {
params.jobDescription = this.jd.textValue;
} else if (this.jd.file) {
params.jobDescriptionFile = this.jd.file;
}
if (this.resume.mode === 'paste') {
params.resume = this.resume.textValue;
} else if (this.resume.file) {
params.resumeFile = this.resume.file;
}
try {
const session = await createSession(params);
this.dispatchEvent(
new CustomEvent('session-created', { detail: session, bubbles: true, composed: true }),
);
} catch (err) {
this.error = err instanceof Error ? err.message : String(err);
this.loading = false;
}
}
private renderModeToggle(field: 'jd' | 'resume') {
if (!this.fileUploadSupported) return html``;
const state = field === 'jd' ? this.jd : this.resume;
const base = 'px-3 py-1.5 text-xs font-medium rounded-md transition-colors';
const active = 'bg-emerald-500/20 text-emerald-400 border border-emerald-500/40';
const inactive = 'text-slate-400 hover:text-slate-300 border border-transparent';
return html`
<div class="flex gap-1 bg-slate-900/80 rounded-lg p-1">
<button
class="${base} ${state.mode === 'paste' ? active : inactive}"
@click=${() => this.setMode(field, 'paste')}
>Paste</button>
<button
class="${base} ${state.mode === 'upload' ? active : inactive}"
@click=${() => this.setMode(field, 'upload')}
>Upload</button>
</div>
`;
}
private renderTextarea(field: 'jd' | 'resume', id: string, rows: number, placeholder: string) {
const state = field === 'jd' ? this.jd : this.resume;
return html`
<textarea
id="${id}"
rows="${rows}"
placeholder="${placeholder}"
class="w-full rounded-lg bg-slate-900 border border-slate-800 px-4 py-3 text-sm font-mono
focus:outline-none focus:ring-2 focus:ring-emerald-400/50 focus:border-emerald-400/50
placeholder:text-slate-600 resize-y"
.value=${state.textValue}
@input=${(e: Event) => this.setText(field, (e.target as HTMLTextAreaElement).value)}
></textarea>
`;
}
private renderDropZone(field: 'jd' | 'resume') {
const state = field === 'jd' ? this.jd : this.resume;
const border = state.dragging
? 'border-emerald-400 bg-emerald-500/5'
: 'border-slate-700 bg-slate-900/50';
return html`
<div
class="relative rounded-lg border-2 border-dashed ${border} px-4 py-8 text-center transition-colors cursor-pointer"
@dragover=${(e: DragEvent) => { e.preventDefault(); this.setDragging(field, true); }}
@dragleave=${() => this.setDragging(field, false)}
@drop=${(e: DragEvent) => {
e.preventDefault();
this.setDragging(field, false);
const file = e.dataTransfer?.files[0];
if (file) this.setFile(field, file);
}}
@click=${() => {
const input = this.querySelector(`input[data-field="${field}"]`);
(input as HTMLInputElement)?.click();
}}
>
${state.file
? html`
<p class="text-sm text-emerald-400 font-medium">${state.file.name}</p>
<p class="text-xs text-slate-500 mt-1">Click or drop to replace</p>
`
: html`
<p class="text-sm text-slate-400">Drag & drop a file here, or click to browse</p>
<p class="text-xs text-slate-600 mt-1">Accepts .txt, .md, .pdf, .docx</p>
`}
<input
type="file"
data-field="${field}"
accept=".txt,.md,.pdf,.docx"
class="hidden"
@change=${(e: Event) => {
const file = (e.target as HTMLInputElement).files?.[0];
if (file) this.setFile(field, file);
}}
/>
</div>
`;
}
private renderField(
field: 'jd' | 'resume',
id: string,
label: string,
rows: number,
placeholder: string,
) {
const state = field === 'jd' ? this.jd : this.resume;
return html`
<div>
<div class="flex items-center justify-between mb-2">
<label for="${id}" class="block text-sm font-medium text-slate-300">
${label}
</label>
${this.renderModeToggle(field)}
</div>
${state.mode === 'paste'
? this.renderTextarea(field, id, rows, placeholder)
: this.renderDropZone(field)}
</div>
`;
}
render() {
return html`
<section class="space-y-8">
<header class="space-y-2">
<h2 class="text-3xl font-semibold tracking-tight">Practice your next interview</h2>
<p class="text-slate-400">
Provide the job description and your resume below — paste text or upload a file.
Your AI coach will tailor a mock interview to the role and give you written feedback
when you're done.
</p>
</header>
<div class="space-y-5">
${this.renderField('jd', 'jd', 'Job Description', 6,
'Paste the job description (Markdown supported)…')}
${this.renderField('resume', 'resume', 'Your Resume', 8,
'Paste your resume (Markdown supported)…')}
${this.error
? html`<p class="text-sm text-rose-400 bg-rose-950/40 border border-rose-900/60 rounded-md px-4 py-3">
${this.error}
</p>`
: ''}
<button
@click=${this.onStart}
?disabled=${!this.canSubmit()}
class="w-full sm:w-auto px-6 py-3 rounded-lg bg-emerald-500 hover:bg-emerald-400
text-slate-950 font-semibold text-sm transition-colors
disabled:bg-slate-800 disabled:text-slate-500 disabled:cursor-not-allowed"
>
${this.loading ? 'Starting…' : 'Start Interview'}
</button>
</div>
</section>
`;
}
}The high level logic in the above file is that we present the user with two fields, a job description and a resume, each with a paste/upload toggle. The user can choose to paste text or flip the toggle to upload a file. If the capabilities check says that Apache Tika is unavailable, the toggle is hidden and only the plain text area is shown.
The most important part of the src/components/pre-session-view.ts file is the following:
const session = await createSession(params);
this.dispatchEvent(
new CustomEvent('session-created', { detail: session, bubbles: true, composed: true }),
);Using the user-provided information, we call the createSession function which sends it to our backend which returns the access token. The dispatchEvent bubbles up the data to the src/components/app-shell.ts file which acts as the orchestrator of our frontend. This orchestrator will later pass the data to the next step.
Speaking of next step, we have the src/components/live-session-view.ts file:
import { LitElement, html } from 'lit';
import { customElement, property, state } from 'lit/decorators.js';
import {
LiveSession,
type CreateSessionResponse,
type LiveStatus,
} from '../lib/session-client';
@customElement('live-session-view')
export class LiveSessionView extends LitElement {
protected createRenderRoot() {
return this;
}
@property({ attribute: false }) session!: CreateSessionResponse;
@state() private status: LiveStatus = 'connecting';
@state() private error = '';
@state() private ending = false;
@state() private generating = false;
private live?: LiveSession;
private isMounted = false;
connectedCallback(): void {
super.connectedCallback();
this.isMounted = true;
this.live = new LiveSession({
onStatus: (status) => {
if (this.isMounted) this.status = status;
},
onSummary: (markdown) => {
if (!this.isMounted) return;
this.dispatchEvent(
new CustomEvent('session-complete', {
detail: { summary: markdown },
bubbles: true,
composed: true,
}),
);
},
onGeneratingSummary: () => {
if (this.isMounted) this.generating = true;
},
onError: (message) => {
if (this.isMounted) this.error = message;
},
});
void this.live.join(this.session.serverUrl, this.session.token);
}
disconnectedCallback(): void {
this.isMounted = false;
super.disconnectedCallback();
void this.live?.leave();
}
private async onEnd() {
this.ending = true;
await this.live?.endInterview();
}
private statusLabel(): string {
switch (this.status) {
case 'connecting':
return 'Connecting to your coach…';
case 'listening':
return 'Listening';
case 'agent-speaking':
return 'Coach is speaking';
case 'disconnected':
return 'Disconnected';
}
}
private statusColor(): string {
switch (this.status) {
case 'agent-speaking':
return 'bg-sky-400';
case 'listening':
return 'bg-emerald-400';
case 'connecting':
return 'bg-amber-400';
case 'disconnected':
return 'bg-slate-500';
}
}
render() {
return html`
<section class="space-y-10">
<header class="space-y-2 text-center">
<p class="text-xs uppercase tracking-widest text-slate-500">Interview in progress</p>
<h2 class="text-2xl font-semibold tracking-tight">Speak naturally with your coach</h2>
</header>
<div class="flex flex-col items-center gap-6 py-10">
<div class="relative">
<div
class="w-40 h-40 rounded-full border border-slate-800 bg-slate-900/60 flex items-center justify-center"
>
<div
class="w-24 h-24 rounded-full ${this.statusColor()} ${this.status === 'agent-speaking'
? 'animate-pulse'
: ''} opacity-80 transition-colors"
></div>
</div>
${this.status === 'listening'
? html`<div
class="absolute inset-0 rounded-full border-2 border-emerald-400/40 animate-ping"
></div>`
: ''}
</div>
<div class="text-center space-y-1">
<p class="text-lg font-medium">${this.statusLabel()}</p>
<p class="text-sm text-slate-400">
Say "end the interview" when you're done, or click below.
</p>
</div>
${this.generating
? html`<div class="flex items-center gap-2 text-sm text-slate-400">
<svg class="animate-spin h-4 w-4" viewBox="0 0 24 24" fill="none">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4" />
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
</svg>
Generating your feedback…
</div>`
: ''}
${this.error
? html`<p class="text-sm text-rose-400 bg-rose-950/40 border border-rose-900/60 rounded-md px-4 py-3 max-w-md">
${this.error}
</p>`
: ''}
<button
@click=${this.onEnd}
?disabled=${this.ending || this.status === 'connecting'}
class="px-6 py-3 rounded-lg bg-slate-800 hover:bg-slate-700 text-slate-100
font-medium text-sm transition-colors border border-slate-700
disabled:opacity-50 disabled:cursor-not-allowed"
>
${this.ending ? 'Ending…' : 'End Interview'}
</button>
</div>
</section>
`;
}
}Once again, we won't explain every piece here because it is really just a bunch of frontend voodoo not super related to LiveKit and our interview coach functionality.
The big deal here is in the LiveSessionView class. Remember all those callbacks that we saw were coming in our frontend pages? This is where we're defining what happens. However, they are only really just updating UI elements on the screen. For example the onGeneratingSummary callback just sets a boolean that tells our frontend to show a spinner.
However, we bubbled the access token from the previous step. Our orchestrator sent that information to the src/components/live-session-view.ts file, and we use the join function like so:
void this.live.join(this.session.serverUrl, this.session.token);The join function in our library handles all the LiveKit room joining. When the conversation is complete and the onSummary callback fires, we dispatch the Markdown to the next step:
this.dispatchEvent(
new CustomEvent('session-complete', {
detail: { summary: markdown },
bubbles: true,
composed: true,
}),
);This brings us to the src/components/post-session-view.ts file:
import { LitElement, html } from 'lit';
import { customElement, property } from 'lit/decorators.js';
import { marked } from 'marked';
import DOMPurify from 'dompurify';
@customElement('post-session-view')
export class PostSessionView extends LitElement {
protected createRenderRoot() {
return this;
}
@property({ type: String }) summary = '';
private renderMarkdown(): string {
return DOMPurify.sanitize(marked.parse(this.summary, { async: false }) as string);
}
private onRestart() {
this.dispatchEvent(new CustomEvent('restart', { bubbles: true, composed: true }));
}
render() {
return html`
<section class="space-y-8">
<header class="space-y-2">
<p class="text-xs uppercase tracking-widest text-emerald-400">Session complete</p>
<h2 class="text-3xl font-semibold tracking-tight">Your interview feedback</h2>
</header>
<article
class="rounded-xl border border-slate-800 bg-slate-900/40 p-8
prose prose-invert prose-slate max-w-none
prose-headings:tracking-tight prose-headings:text-slate-100
prose-h1:text-2xl prose-h2:text-xl prose-h3:text-lg
prose-p:text-slate-300 prose-li:text-slate-300
prose-strong:text-slate-100 prose-code:text-emerald-300"
.innerHTML=${this.renderMarkdown()}
></article>
<div class="flex gap-3">
<button
@click=${this.onRestart}
class="px-6 py-3 rounded-lg bg-emerald-500 hover:bg-emerald-400
text-slate-950 font-semibold text-sm transition-colors"
>
Start a New Session
</button>
</div>
</section>
`;
}
}The Markdown that was sent to this file is rendered on the screen. Remember, the Markdown in question is the summary of the conversation with feedback included. The voice agent doesn't determine this feedback, the LLM model does based on the transcript.
To bring closure to the frontend, we have the src/components/app-shell.ts file that acts as our orchestrator. It has the following code:
import { LitElement, html, css } from 'lit';
import { customElement, state } from 'lit/decorators.js';
import type { CreateSessionResponse } from '../lib/session-client';
type Screen = 'pre' | 'live' | 'done';
@customElement('app-shell')
export class AppShell extends LitElement {
protected createRenderRoot() {
return this;
}
@state() private screen: Screen = 'pre';
@state() private session?: CreateSessionResponse;
@state() private summary = '';
static styles = css``;
private onSessionCreated = (e: CustomEvent<CreateSessionResponse>) => {
this.session = e.detail;
this.screen = 'live';
};
private onSessionComplete = (e: CustomEvent<{ summary: string }>) => {
this.summary = e.detail.summary;
this.screen = 'done';
};
private onRestart = () => {
this.session = undefined;
this.summary = '';
this.screen = 'pre';
};
render() {
return html`
<main class="min-h-screen flex flex-col">
<header class="border-b border-slate-800 bg-slate-900/60 backdrop-blur">
<div class="max-w-3xl mx-auto px-6 py-4 flex items-center justify-between">
<div class="flex items-center gap-3">
<div class="w-2 h-2 rounded-full bg-emerald-400 animate-pulse"></div>
<h1 class="text-lg font-semibold tracking-tight">LiveKit Interview Assistant</h1>
</div>
<p class="text-xs text-slate-400 uppercase tracking-widest">AI Interview Coach</p>
</div>
</header>
<div class="flex-1 max-w-3xl w-full mx-auto px-6 py-10">
${this.screen === 'pre'
? html`<pre-session-view
@session-created=${this.onSessionCreated}
></pre-session-view>`
: ''}
${this.screen === 'live' && this.session
? html`<live-session-view
.session=${this.session}
@session-complete=${this.onSessionComplete}
></live-session-view>`
: ''}
${this.screen === 'done'
? html`<post-session-view
.summary=${this.summary}
@restart=${this.onRestart}
></post-session-view>`
: ''}
</div>
<footer class="border-t border-slate-800 py-4 text-center text-xs text-slate-500">
Built with LiveKit Agents · Powered by LiveKit Inference
</footer>
</main>
`;
}
}If you want to run the frontend, you can execute the following from your command line:
viteWhile shorter than the commands we saw in the backend files, you may still want to add it as a script in your package.json file.
Conclusion
You just saw how to add a LiveKit voice agent to your project! In this example we saw how to build an interview coach where the human interviewee could have a voice conversation with an AI voice agent powered by LiveKit. To add realism to the example, we saw how to upload and retrieve personal job details such as a job description and a resume that the agent can interact with to give the user a realistic interview experience.
Here are some reminders:
- The frontend communicates with LiveKit which communicates with the dispatched agent and vice-versa, but the frontend never communicates directly with the agent.
- The frontend communicates with the backend web service.
- The dispatched agent communicates with the backend web service, but the web service never directly interacts with LiveKit.
- LiveKit does not generate the summary, the LLM model does based on the chat transcript maintained by the agent.
If you plan to take this project into production, I strongly recommend that you switch from the in-memory session store to a proper database. This will keep things stateful between application restarts, will be more efficient on resources, and will let you load balance if you need to.
Play around with the prompts and see if you can improve the experience. If you use this project to help you with interviews, let me know if it helped you secure your next dream job!
You can find the full project on GitHub.

Nic Raboy
Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.
Search
Recent Posts
- Building a REST API With Express Framework and MongoDB
- Build a Real-Time Voice Interview Coach with TypeScript and LiveKit
- Introducing CFP Manager to Manage Speaking Engagements for the Team
- Using Dot Notation to Query Nested Fields in MongoDB
- Build a Movie Watchlist with Node.js, TypeScript, and MongoDB