By @ehiber and other contributors at 4Geeks Academy
Estas instrucciones tambien estan disponibles en espanol.
Before you start: Read the how to start a coding project guide before writing code.
We need you! These exercises are built and maintained in collaboration with people like you. If you find any bug 🐞 or typo, please contribute and/or report it.
This repository is the starter template for the Voice Command API project.
The frontend is already built. It records up to 20 seconds of audio in the browser, sends that audio to your backend, and shows:
- the transcription returned by the API
- the final task response returned by the API
Your job is to implement the backend so the full voice-to-action flow works end to end.
The frontend uses a single public entry point:
POST /transcribe
The frontend does not resolve intents with the Web Speech API. It only captures audio (up to 20 seconds), sends the file to POST /transcribe, and shows the backend transcription to make debugging easier.
That endpoint must:
- receive recorded audio from the frontend
- transcribe it to text
- reuse the same routing logic as
POST /instruction - execute the corresponding task action in memory
- return the transcription, the instruction payload, and the final result
Your backend must also expose:
POST /instructionGET /tasksPOST /tasksPUT /tasks/{task_id}PATCH /tasks/{task_id}DELETE /tasks/{task_id}
Important:
- Use in-memory storage only. No database and no files.
- The frontend is provided and should not be modified as part of the exercise.
- The backend included in this repository is only a template. You must implement the missing logic.
voice-command-api/
|-- .devcontainer/ # Codespaces setup
|-- frontend/ # Ready-made frontend
| |-- public/
| `-- src/
|-- src/
| `-- app/
| |-- api/routes/ # /transcribe, /instruction, /tasks
| |-- core/ # Settings and config
| |-- schemas/ # Request and response contracts
| |-- services/ # Your implementation goes here
| `-- utils/
|-- Pipfile
|-- README.md
`-- README.es.md
You can open this project in GitHub Codespaces or clone it locally.
If you use Codespaces, the repository already includes a .devcontainer prepared for Python, Node, FastAPI, and Vite.
- Open the repository in Codespaces.
- Wait for the dev container to finish installing dependencies.
- Create
.envfrom.env.example. - Create
frontend/.envfromfrontend/.env.example. - Run the backend and frontend from the terminal tabs.
git clone https://github.com/4GeeksAcademy/voice-command-api
cd voice-command-apiCreate your own repository and update the remote:
git remote set-url origin https://github.com/YOUR_USERNAME/YOUR_REPOSITORYCreate a .env file from .env.example and fill in your Groq credentials.
Install dependencies and run the API:
pipenv install
pipenv run uvicorn src.main:app --reloadCreate frontend/.env from frontend/.env.example.
Run the frontend:
cd frontend
npm install
npm run dev- Create a module-level
taskslist withid,title, anddone, using unique incremental IDs. - Implement
GET /tasks,POST /tasks,PUT /tasks/{task_id},PATCH /tasks/{task_id}, andDELETE /tasks/{task_id}using in-memory state. - Implement
POST /instructionto receive{ "transcription": "..." }, call Groq, and return only routing JSON (no task execution):
{
"endpoint": "/tasks",
"method": "POST",
"params": { "title": "Buy groceries" }
}- Implement
POST /transcribeto acceptmultipart/form-data, convert audio to text, reuse/instructionlogic, execute the selected action, and returntranscription,instruction, andresult. - Do not hardcode intent matching with manual rules such as
if "add" in text.
POST /transcribe, and shows backend transcription to debug STT vs. routing issues.
{
"transcription": "add buy groceries to my list",
"instruction": {
"endpoint": "/tasks",
"method": "POST",
"params": {
"title": "Buy groceries"
}
},
"result": {
"id": 1,
"title": "Buy groceries",
"done": false
}
}If the transcription shown in the frontend is already wrong, the problem is in the audio capture or speech-to-text step.
If the transcription is correct but the action is wrong, the problem is in /instruction.
-
POST /transcribeaccepts audio, transcribes it, and reuses/instructionrouting logic. -
POST /instructionreceives plain text and returns only routing JSON (no action execution). -
GET /tasks,POST /tasks,PUT /tasks/{task_id},PATCH /tasks/{task_id}, andDELETE /tasks/{task_id}work correctly with in-memory state. - The frontend displays the transcription returned by the backend to help distinguish STT errors from routing errors.
- Push your solution to your GitHub repository.
- Make sure backend and frontend are included and runnable locally.
- Share the repository URL and a short video/GIF showing:
- audio recording (20 seconds max),
- transcription visible in the frontend,
- correct task action execution.
This and many other projects are built by students as part of the Coding Bootcamps at 4Geeks Academy. Learn more about the Full-Stack Software Developer, Data Science & Machine Learning, Cybersecurity, and AI Engineering programs.