Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
5aad042
fix: merge conflict
Prajna1999 Jan 20, 2026
f47c20c
chore: update dependencies
Prajna1999 Jan 20, 2026
1e03961
feat: add google ai provider for Gemini models
Prajna1999 Jan 20, 2026
4ac4de8
feat: working stt with gemini and hotfixing circular import
Prajna1999 Jan 21, 2026
3c0bae7
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Jan 21, 2026
7db94f1
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Jan 21, 2026
196eb5c
feat: llm_call table, type enforce gAI stt response
Prajna1999 Jan 21, 2026
dca3139
Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…
Prajna1999 Jan 21, 2026
271d677
feat: discriminated union type enforcing for stt, tts and text comple…
Prajna1999 Jan 22, 2026
5ae59e5
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Jan 22, 2026
250ce9f
fix: type annotation
Prajna1999 Jan 22, 2026
1742a8b
chore: fix alembic revision for shure
Prajna1999 Jan 23, 2026
ebb2394
feat: add google stt task to async job
Prajna1999 Jan 23, 2026
0bcb697
feat: yolo commit and linting issues
Prajna1999 Jan 26, 2026
a6850a3
feat: query input takes audio_url and base64 as audio file input
Prajna1999 Jan 26, 2026
f4693f6
chore: test cases for google ai and async job fixes, supress mappers …
Prajna1999 Jan 26, 2026
909e249
fix: test cases for config
Prajna1999 Jan 27, 2026
a7b0062
chore: clean PLAN.md
Prajna1999 Jan 27, 2026
fa25199
chore: extract stt code into its own
Prajna1999 Jan 28, 2026
24007a2
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Jan 29, 2026
bbd2c7f
Refactor evaluation endpoint to use stored configuration and remove a…
avirajsingh7 Dec 9, 2025
b907440
fix: default original provider bug
Prajna1999 Jan 30, 2026
9f38f45
fix: coderrabbit comments
Prajna1999 Jan 31, 2026
f6348b5
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Jan 31, 2026
b3ea8ec
fix: migration number
Prajna1999 Jan 31, 2026
26e0a6a
chore: formatting issue solved
Prajna1999 Jan 31, 2026
a623efa
fix: eval core crud test cases
Prajna1999 Jan 31, 2026
5c86cf2
fix: test cases for evaluation and test_llm
Prajna1999 Feb 1, 2026
c8f165a
chore: test formatting reset to main
Prajna1999 Feb 3, 2026
19a6ef7
chore: fix formatting issues
Prajna1999 Feb 3, 2026
9bf057b
chore: squash llm_call table migration to sno.43
Prajna1999 Feb 3, 2026
665102e
chore: change SQL model signature from ConfigVersionCreatePartial to…
Prajna1999 Feb 3, 2026
325ff4d
fix: remove extra imports and add util functions
Prajna1999 Feb 4, 2026
237dd97
fix: change llm_call input type and other changes
Prajna1999 Feb 5, 2026
df920d2
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Feb 7, 2026
354a0fc
fix: alembic version for llm_call table
Prajna1999 Feb 7, 2026
37ac37f
fix: test cases llm_call and jobs
Prajna1999 Feb 8, 2026
9b6a829
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Feb 9, 2026
321c41c
Merge branch 'main' into feature/unified-api-stt-new
Prajna1999 Feb 11, 2026
e9d60e6
chore: variable reference name change and enforced type safety for re…
Prajna1999 Feb 11, 2026
1943d3d
chore: remove ad hoc testing code
Prajna1999 Feb 12, 2026
7a5d8a1
feat: basic tts implementation with gemini-2.5-pro-preview-tts
Prajna1999 Feb 2, 2026
cd0da46
refactor: use pydub for wav to ogg, mp3 conversion
Prajna1999 Feb 7, 2026
379a132
feat: add tts config fields to mappers function
Prajna1999 Feb 7, 2026
c9be67a
chore: fix test cases
Prajna1999 Feb 8, 2026
3fb6ce6
refactor: fix version crud naming
Prajna1999 Feb 12, 2026
ee3dd60
chore: fix test cases
Prajna1999 Feb 12, 2026
75cccf6
feat: basic tts implementation with gemini-2.5-pro-preview-tts
Prajna1999 Feb 2, 2026
243e6f1
refactor: use pydub for wav to ogg, mp3 conversion
Prajna1999 Feb 7, 2026
cc3fa95
feat: add tts config fields to mappers function
Prajna1999 Feb 7, 2026
f54c1a9
chore: fix test cases
Prajna1999 Feb 8, 2026
c044e66
Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…
Prajna1999 Feb 16, 2026
065ac74
Merge branch 'main' into feature/unified-api-tts
Prajna1999 Feb 16, 2026
1d953e6
Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…
Prajna1999 Feb 16, 2026
1f93b0e
fix: comments
Prajna1999 Feb 17, 2026
afe349f
Merge branch 'main' into feature/unified-api-tts
Prajna1999 Feb 19, 2026
2f1e32d
chore: remove unsued imports
Prajna1999 Feb 19, 2026
6553e78
reafactor: refactor execute_job function and resolved other comments
Prajna1999 Feb 19, 2026
9fffc81
fix: test_job test case
Prajna1999 Feb 19, 2026
3a2d625
Merge branch 'main' into feature/unified-api-tts
Prajna1999 Feb 19, 2026
1999cbf
chore: test cases coverage and cleanups
Prajna1999 Feb 20, 2026
3e0f069
Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…
Prajna1999 Feb 20, 2026
8d9ae7d
fix_name error
Prajna1999 Feb 20, 2026
6245e8e
fix: test cases
Prajna1999 Feb 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ WORKDIR /app/
RUN apt-get update && apt-get install -y \
curl \
poppler-utils \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*

# Install uv package manager
Expand Down
45 changes: 45 additions & 0 deletions backend/app/core/audio_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""
Audio processing utilities for format conversion.

This module provides utilities for converting audio between different formats,
particularly for TTS output post-processing.
"""
import io
import logging
from pydub import AudioSegment


logger = logging.getLogger(__name__)


def convert_pcm_to_mp3(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)

output_buffer = io.BytesIO()
audio.export(output_buffer, format="mp3", bitrate="192k")
return output_buffer.getvalue(), None
except Exception as e:
return None, str(e)


def convert_pcm_to_ogg(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
"""Convert raw PCM to OGG with Opus codec."""
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)

output_buffer = io.BytesIO()
audio.export(
output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"]
)
return output_buffer.getvalue(), None
except Exception as e:
return None, str(e)
40 changes: 25 additions & 15 deletions backend/app/crud/llm.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,10 @@
"""
CRUD operations for LLM calls.

This module handles database operations for LLM calls including:
1. Creating new LLM call records
2. Updating LLM call responses
3. Fetching LLM calls by ID
"""

import logging
from typing import Any, Literal

from uuid import UUID
from sqlmodel import Session, select
from app.core.util import now
import base64
import json
from app.models.llm import LlmCall, LLMCallRequest, ConfigBlob
from app.models.llm.request import (
Expand Down Expand Up @@ -41,7 +33,8 @@ def serialize_input(query_input: QueryInput | str) -> str:
"type": "audio",
"format": query_input.content.format,
"mime_type": query_input.content.mime_type,
"size_bytes": len(query_input.content.value),
# approximate byte size from b64encoded value
"size_bytes": len(query_input.content.value) * 3 // 4,
}
)
else:
Expand Down Expand Up @@ -74,8 +67,10 @@ def create_llm_call(
"""
# Determine input/output types based on completion config type
completion_config = resolved_config.completion
completion_type = completion_config.type or getattr(
completion_config.params, "type", "text"
completion_type = completion_config.type or (
completion_config.params.get("type", "text")
if isinstance(completion_config.params, dict)
else getattr(completion_config.params, "type", "text")
)

input_type: Literal["text", "audio", "image"]
Expand All @@ -92,9 +87,9 @@ def create_llm_call(
output_type = "text"

model = (
completion_config.params.model
if hasattr(completion_config.params, "model")
else completion_config.params.get("model", "")
completion_config.params.get("model", "")
if isinstance(completion_config.params, dict)
else getattr(completion_config.params, "model", "")
)

# Build config dict for storage
Expand Down Expand Up @@ -174,8 +169,23 @@ def update_llm_call_response(

if provider_response_id is not None:
db_llm_call.provider_response_id = provider_response_id

if content is not None:
# For audio outputs (AudioOutput model): calculate size metadata from base64 content
# AudioOutput serializes as: {"type": "audio", "content": {"format": "base64", "value": "...", "mime_type": "..."}}
if content.get("type") == "audio":
audio_value = content.get("content", {}).get("value")
if audio_value:
try:
audio_data = base64.b64decode(audio_value)
content["audio_size_bytes"] = len(audio_data)
except Exception as e:
logger.warning(
f"[update_llm_call_response] Failed to calculate audio size: {e}"
)

db_llm_call.content = content

if usage is not None:
db_llm_call.usage = usage
if conversation_id is not None:
Expand Down
22 changes: 12 additions & 10 deletions backend/app/models/llm/request.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
import sqlalchemy as sa
from typing import Annotated, Any, Literal, Union

from uuid import UUID, uuid4
from sqlmodel import Field, SQLModel
from pydantic import Discriminator, model_validator, HttpUrl
from pydantic import model_validator, HttpUrl
from datetime import datetime
from app.core.util import now

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB
from sqlmodel import Field, SQLModel, Index, text
from app.core.util import now


class TextLLMParams(SQLModel):
Expand Down Expand Up @@ -70,8 +67,8 @@ class TextContent(SQLModel):

class AudioContent(SQLModel):
format: Literal["base64"] = "base64"
value: str = Field(..., min_length=1, description="Base64 encoded audio")
# keeping the mime_type liberal here, since does not affect transcription type
value: str = Field(..., description="Base64 encoded audio")
# keeping the mime_type liberal here, since does not affect base64 encoding
mime_type: str | None = Field(
None,
description="MIME type of the audio (e.g., audio/wav, audio/mp3, audio/ogg)",
Expand Down Expand Up @@ -487,8 +484,13 @@ class LlmCall(SQLModel, table=True):

updated_at: datetime = Field(
default_factory=now,
nullable=False,
sa_column_kwargs={"comment": "Timestamp when the LLM call was last updated"},
sa_column=sa.Column(
sa.DateTime,
default=now,
nullable=False,
onupdate=now,
comment="Timestamp when the LLM call was last updated",
),
)

deleted_at: datetime | None = Field(
Expand Down
5 changes: 2 additions & 3 deletions backend/app/models/llm/response.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

This module contains structured response models for LLM API calls.
"""

from sqlmodel import SQLModel, Field
from typing import Literal, Annotated
from app.models.llm.request import AudioContent, TextContent
Expand All @@ -27,7 +26,7 @@ class AudioOutput(SQLModel):


# Type alias for LLM output (discriminated union)
LLMOutput = Annotated[TextOutput | AudioOutput | None, Field(discriminator="type")]
LLMOutput = Annotated[TextOutput | AudioOutput, Field(discriminator="type")]


class LLMResponse(SQLModel):
Expand All @@ -45,7 +44,7 @@ class LLMResponse(SQLModel):
model: str = Field(
..., description="Model used by the provider (e.g., gpt-4-turbo)."
)
output: LLMOutput = Field(
output: LLMOutput | None = Field(
...,
description="Structured output containing text and optional additional data.",
)
Expand Down
87 changes: 0 additions & 87 deletions backend/app/services/llm/input_resolver.py

This file was deleted.

Loading