Project · 2025–2026

Parliamo

A voice app for practicing spoken Italian: you talk to an AI tutor, it corrects you, and it remembers your progress. The interesting part was that almost none of the design lived on a screen — it lived in how the tutor behaves.

Voice interface design Conversational AI Agentic systems Human-in-the-loop Claude Code
Parliamo home screen — early build
Early build
Parliamo UI — second iteration
Second iteration
Parliamo UI — current build
Current build

A voice tutor you actually talk to

Parliamo lets you practice speaking Italian out loud. You talk to an AI tutor named Giulia; she answers in Italian, corrects your mistakes on the spot, and picks up where you left off last time.

I built it because I'm learning Italian and had a gap in my practice: I could read and listen, but I had nothing to speak with that would correct me in real time. So I made one — and used building it to get hands-on experience designing for voice AI: shaping how an agent behaves, designing around real model limits, and building the memory layer that makes it feel like a tutor instead of a chatbot.

The speech-to-text couldn't understand a beginner — so I designed around it

The first thing I learned when I tested it: the speech-to-text kept mishearing my beginner Italian as other languages — Spanish, Hebrew, Korean, Japanese — about a third of the time. ("Buongiorno" came back as "Bom Joinal.") No amount of tuning fixes this today; it's a limit of current models on accented beginner speech.

A Parliamo session showing the tutor mishearing beginner Italian as other languages
Beginner Italian misheard as Spanish, Hebrew, and others in a single session.

So instead of pretending it worked, I designed around it. Giulia always assumes you're attempting Italian, and there's a button to correct her when she mishears. I also tested two ways of building the voice loop and chose the modular one: it's a little slower, but I can swap in a better speech model the moment one ships, without rebuilding the whole thing.

Voice pipeline — concept
🎙️ You mic
Speech-to-Text STT
Language Model LLM
Text-to-Speech TTS
🔈 Giulia speaker
Parliamo Baseline — modular pipeline
🎙️ You push-to-talk
gpt-4o-mini-transcribe STT
gpt-4o · Giulia LLM · persona + memory
gpt-4o-mini-tts TTS
🔈 Giulia speaker
Parliamo Realtime — end-to-end
🎙️ You always-on
gpt-realtime-2 STT + LLM + TTS · end-to-end
🔈 Giulia speaker
You / Giulia (audio)
OpenAI models

The screen is four buttons. The design is how she talks.

A voice tutor barely has an interface — a mic button, a list of what's been said, a mode toggle, a progress bar. The product is the conversation, so the real design work was deciding how Giulia behaves. A few of those decisions:

She corrects beginners directly
A gentle correction slipped into her reply ("oh, you mean vai?") just sails past a beginner in a live conversation. So at my level she stops and says it plainly: "that was sono, not sto — try again."
She keeps her turns short
Two or three sentences, never more. You can't hold five sentences of spoken instruction in your head and still remember what you wanted to say back.
She coaches pronunciation out loud
"Let the R buzz, like a tiny motor." You can only give a cue like that in speech — it's the one thing a voice tutor does that a text app simply can't.
She assumes you meant Italian
Beginner speech gets misheard constantly (more on that below), so she's built to read your attempt charitably rather than correct you for a word you never said.

Without memory it's a chatbot; with memory it's a tutor

A plain chatbot starts cold every time — "Ciao, what's your name?" — no matter how long you've been practicing. I gave Giulia a memory: she tracks the words you know, what's due for review, and the mistakes that keep recurring, and she opens each session where the last one ended.

She also notices which grammar mistakes repeat — carefully not counting a mispronounced word as a grammar error — and uses them to decide what to revisit. That memory layer is the whole difference between a demo and something that feels like a real tutor.

What this taught me about designing for AI

01
The behavior is the product
With a voice agent there's almost no screen to design. The work is shaping how it talks, corrects, and decides what's next. It's interaction design — just aimed at a personality instead of a layout.
02
Design around the model, not the demo
My most important design decision came from something the model couldn't do. Designing honestly for AI means building for its real limits, not its best-case behavior.
03
You find the problems by using it
Every real issue — the misheard speech, the repetitive drilling — showed up only once I started talking to the thing. For AI products, using it is the testing method.
Parliamo is the clearest example in my portfolio of designing for AI behavior rather than UI. The decisions that shaped the product — correction strategy, turn length, memory architecture, designing around a model limit rather than ignoring it — are the same class of decisions that come up in any product where AI is doing the talking.

What building this taught me about conversational AI design

Designing for voice is mostly designing behavior, not layout. The decisions that mattered — how Giulia corrects, how long her turns are, how she handles a mishear — all lived in the system prompt and the memory layer, not on a screen. That's a different design practice than screen-based UX, and a more useful one as AI moves into more of the products I want to work on. The open questions (multi-user support, grammar drilling, fuller curriculum) are sequels to that problem, not prerequisites for what I learned from it.