A voice app for practicing spoken Italian: you talk to an AI tutor, it corrects you, and it remembers your progress. The interesting part was that almost none of the design lived on a screen — it lived in how the tutor behaves.
Overview
Parliamo lets you practice speaking Italian out loud. You talk to an AI tutor named Giulia; she answers in Italian, corrects your mistakes on the spot, and picks up where you left off last time.
I built it because I'm learning Italian and had a gap in my practice: I could read and listen, but I had nothing to speak with that would correct me in real time. So I made one — and used building it to get hands-on experience designing for voice AI: shaping how an agent behaves, designing around real model limits, and building the memory layer that makes it feel like a tutor instead of a chatbot.
The hard problem
The first thing I learned when I tested it: the speech-to-text kept mishearing my beginner Italian as other languages — Spanish, Hebrew, Korean, Japanese — about a third of the time. ("Buongiorno" came back as "Bom Joinal.") No amount of tuning fixes this today; it's a limit of current models on accented beginner speech.
So instead of pretending it worked, I designed around it. Giulia always assumes you're attempting Italian, and there's a button to correct her when she mishears. I also tested two ways of building the voice loop and chose the modular one: it's a little slower, but I can swap in a better speech model the moment one ships, without rebuilding the whole thing.
The design
A voice tutor barely has an interface — a mic button, a list of what's been said, a mode toggle, a progress bar. The product is the conversation, so the real design work was deciding how Giulia behaves. A few of those decisions:
Memory
A plain chatbot starts cold every time — "Ciao, what's your name?" — no matter how long you've been practicing. I gave Giulia a memory: she tracks the words you know, what's due for review, and the mistakes that keep recurring, and she opens each session where the last one ended.
She also notices which grammar mistakes repeat — carefully not counting a mispronounced word as a grammar error — and uses them to decide what to revisit. That memory layer is the whole difference between a demo and something that feels like a real tutor.
What I learned
What's next
Designing for voice is mostly designing behavior, not layout. The decisions that mattered — how Giulia corrects, how long her turns are, how she handles a mishear — all lived in the system prompt and the memory layer, not on a screen. That's a different design practice than screen-based UX, and a more useful one as AI moves into more of the products I want to work on. The open questions (multi-user support, grammar drilling, fuller curriculum) are sequels to that problem, not prerequisites for what I learned from it.