Voice Command API at 4Geeks Academy

By @ehiber and other contributors at 4Geeks Academy

Estas instrucciones tambien estan disponibles en espanol.

Before you start: Read the how to start a coding project guide before writing code.

We need you! These exercises are built and maintained in collaboration with people like you. If you find any bug 🐞 or typo, please contribute and/or report it.

🎯 Your challenge

This repository is the starter template for the Voice Command API project.

The frontend is already built. It records up to 20 seconds of audio in the browser, sends that audio to your backend, and shows:

the transcription returned by the API
the final task response returned by the API

Your job is to implement the backend so the full voice-to-action flow works end to end.

How the project works

The frontend uses a single public entry point:

POST /transcribe

The frontend does not resolve intents with the Web Speech API. It only captures audio (up to 20 seconds), sends the file to POST /transcribe, and shows the backend transcription to make debugging easier.

That endpoint must:

receive recorded audio from the frontend
transcribe it to text
reuse the same routing logic as POST /instruction
execute the corresponding task action in memory
return the transcription, the instruction payload, and the final result

Your backend must also expose:

POST /instruction
GET /tasks
POST /tasks
PUT /tasks/{task_id}
PATCH /tasks/{task_id}
DELETE /tasks/{task_id}

Important:

Use in-memory storage only. No database and no files.
The frontend is provided and should not be modified as part of the exercise.
The backend included in this repository is only a template. You must implement the missing logic.

Repository structure

voice-command-api/
|-- .devcontainer/           # Codespaces setup
|-- frontend/                # Ready-made frontend
|   |-- public/
|   `-- src/
|-- src/
|   `-- app/
|       |-- api/routes/      # /transcribe, /instruction, /tasks
|       |-- core/            # Settings and config
|       |-- schemas/         # Request and response contracts
|       |-- services/        # Your implementation goes here
|       `-- utils/
|-- Pipfile
|-- README.md
`-- README.es.md

🌱 How to start the project

You can open this project in GitHub Codespaces or clone it locally.

If you use Codespaces, the repository already includes a .devcontainer prepared for Python, Node, FastAPI, and Vite.

Option A: GitHub Codespaces

Open the repository in Codespaces.
Wait for the dev container to finish installing dependencies.
Create .env from .env.example.
Create frontend/.env from frontend/.env.example.
Run the backend and frontend from the terminal tabs.

Option B: Local setup

git clone https://github.com/4GeeksAcademy/voice-command-api
cd voice-command-api

Create your own repository and update the remote:

git remote set-url origin https://github.com/YOUR_USERNAME/YOUR_REPOSITORY

Backend setup

Create a .env file from .env.example and fill in your Groq credentials.

Install dependencies and run the API:

pipenv install
pipenv run uvicorn src.main:app --reload

Frontend setup

Create frontend/.env from frontend/.env.example.

Run the frontend:

cd frontend
npm install
npm run dev

💻 What you need to do

Create a module-level tasks list with id, title, and done, using unique incremental IDs.
Implement GET /tasks, POST /tasks, PUT /tasks/{task_id}, PATCH /tasks/{task_id}, and DELETE /tasks/{task_id} using in-memory state.
Implement POST /instruction to receive { "transcription": "..." }, call Groq, and return only routing JSON (no task execution):

{
  "endpoint": "/tasks",
  "method": "POST",
  "params": { "title": "Buy groceries" }
}

Implement POST /transcribe to accept multipart/form-data, convert audio to text, reuse /instruction logic, execute the selected action, and return transcription, instruction, and result.
Do not hardcode intent matching with manual rules such as if "add" in text.

⚠️ IMPORTANT: The frontend does not resolve intents with Web Speech API. It only captures audio (up to 20 seconds), sends it to POST /transcribe, and shows backend transcription to debug STT vs. routing issues.

{
  "transcription": "add buy groceries to my list",
  "instruction": {
    "endpoint": "/tasks",
    "method": "POST",
    "params": {
      "title": "Buy groceries"
    }
  },
  "result": {
    "id": 1,
    "title": "Buy groceries",
    "done": false
  }
}

Debugging tip

If the transcription shown in the frontend is already wrong, the problem is in the audio capture or speech-to-text step.

If the transcription is correct but the action is wrong, the problem is in /instruction.

✅ What we will evaluate

POST /transcribe accepts audio, transcribes it, and reuses /instruction routing logic.
POST /instruction receives plain text and returns only routing JSON (no action execution).
GET /tasks, POST /tasks, PUT /tasks/{task_id}, PATCH /tasks/{task_id}, and DELETE /tasks/{task_id} work correctly with in-memory state.
The frontend displays the transcription returned by the backend to help distinguish STT errors from routing errors.

📦 How to submit this project

Push your solution to your GitHub repository.
Make sure backend and frontend are included and runnable locally.
Share the repository URL and a short video/GIF showing:
- audio recording (20 seconds max),
- transcription visible in the frontend,
- correct task action execution.

This and many other projects are built by students as part of the Coding Bootcamps at 4Geeks Academy. Learn more about the Full-Stack Software Developer, Data Science & Machine Learning, Cybersecurity, and AI Engineering programs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Command API at 4Geeks Academy

🎯 Your challenge

How the project works

Repository structure

🌱 How to start the project

Option A: GitHub Codespaces

Option B: Local setup

Backend setup

Frontend setup

💻 What you need to do

Debugging tip

✅ What we will evaluate

📦 How to submit this project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
.learn		.learn
frontend		frontend
src		src
.env.example		.env.example
.gitignore		.gitignore
Pipfile		Pipfile
README.es.md		README.es.md
README.md		README.md
learn.json		learn.json

Folders and files

Latest commit

History

Repository files navigation

Voice Command API at 4Geeks Academy

🎯 Your challenge

How the project works

Repository structure

🌱 How to start the project

Option A: GitHub Codespaces

Option B: Local setup

Backend setup

Frontend setup

💻 What you need to do

Debugging tip

✅ What we will evaluate

📦 How to submit this project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages