Week 4 Progress Update

정원 배
May 27
2 min read

Smart Understanding with Our Model

This week, we took a big step forward: our system can now understand typed commands more intelligently.

For example, if a user writes:

“weather of Liverpool”

the model can now detect that the user wants weather information, extract the location, and update the display accordingly. We're using a custom-built intent recognition model powered by the MobileBERT tokenizer, which helps us classify user commands like:

play music
get news
get weather

Each intent is mapped to different response actions, and we've fine-tuned the model to accurately identify the user's request from natural language.

Speech-to-Text Prototype Working!

We’ve also built a working speech-to-text system using IBM Watson that runs on a laptop. Using pyaudio, our system captures voice input, streams it to IBM's Speech-to-Text API, and then processes the results in real time.

Here’s how it works:

The user speaks into the mic.
IBM Watson transcribes the message.
Our assistant processes the intent (e.g., play music or fetch weather).
A response is printed.

We’re currently using basic microphone input on laptop but will soon connect this to the ReSpeaker Dual Mic HAT we’ve integrated into the Raspberry Pi hardware.

What’s Next?

Finalize text-based intent triggering with typed queries and button input
Adapt the live transcribe system to our USB microphone on Raspberry Pi
Begin integration of text-to-speech (TTS). Instead of typing the requests, we could tell our bot what to do.
Expand the music API integration so that when a user types (or later, says) “play a song by Bruno Mars”, the system will fetch a list of Bruno Mars songs, allow the user to select from the suggestions, and then play the chosen track

This will bring us even closer to a fully interactive assistant that responds to natural language — both typed and spoken — in real time.

IBM SkillsBuild + Design Thinking

We’ve leaned on principles from the “Build Your Own AI Assistant” course to create and test our conversation flow. Watson's STT tools helped us build faster and more accurately — and we’re working in Agile cycles with continuous feedback built into each weekly sprint.

Week 4 Progress Update

Smart Understanding with Our Model

Speech-to-Text Prototype Working!

What’s Next?

IBM SkillsBuild + Design Thinking

Recent Posts

Comments