This repository is part of the RAGNalysis project to create a RAG-based search engine for corporate data (essentially a private chatGPT).
-
LLM Backend: This layer provides the pretrained LLM models for embedding and inference. 3 different Azure resources are used for experimentation purposes, but in production, only a single provider and model will be used.
- ML Studio: Provides easily deployable endpoints for common LLM models. Most endpoints are real-time and operate 24/7 so these are the most inflexible and expensive models to host. Alternatively, custom models can be deployed via PromptFlow through ML Studio.
- AI Studio: Provides pay-as-you-go endpoints for the far more powerful GPT models that cannot be self-hosted due to their size.
- Containerized App: Provides very flexible access to models via GGUF files from HuggingFace, but take longer to set up compared to ML Studio models and are weaker than AI Studio models.
-
RAG Engine: This is the 'heart' that connects to all the infrastructure. It connects to the LLM Backend layer based on the user-specified model to generate an augmented response. It also provides the public endpoints that end-users can call.
- Azure Function App: This is an Azure resource specialized around event-driven microservices with short processes. Two alternatives are Azure Container Apps (for standalone applications) and Azure Web Application (for websites).
-
Accessory Services: A centralized storage for the model files, dataset, and FAISS index.
Refer to the demo notebook for data models and usage. This code is also present in the MS Fabric environment
experiment.ipynb: Demo of the inference API via theRagnalysisClientclass that wraps around the API.
Running this notebook requires an API key that must be defined in a .env file at the project root directory. Create a .env file and paste the API key in the following field:
FUNCTION_KEY=
This repository contains both the Function App code (app/) and the user-facing experiment code. An overview of the files:
| File | Purpose |
|---|---|
app/.funcignore |
Like .gitignore but for excluding files that get pushed to the Function App |
app/function_app |
API routing for the function app. Routes requests to backend.py for processing |
app/backend.py |
Processor for the function app. Does the computations and accesses the necessary storage |
app/utils.py |
Utility functions to help process requests in backend.py |
app/host.json |
Configuration for the function app (ex bindings to other Azure service) |
app/local.settings.json |
(must be manually created) Like a .env file. Provides access to private keys that exist in the Function App environment |
.github/main_med-app.yml |
Action for building and deploying the Function App to Azure |
.github/unit_tests.yml |
Action for running the unit tests in tests/ |
api/app.py |
Wrapper class to simplify usage of the Function App API |
setup.py |
Project metadata |
experiment.ipynb |
Demo code for using the repo to access the Function App |
Note: For pushing directly from VSCode to Azure Function App, you will need VSCode v1.87.1. This is not the version on Company Portal
For setting up the project for development:
- Clone the repository
- Add a
.envfile in the project root directory (for using the experiment notebook) with the below template. Please contact a developer for the API key. Alternatively, you could try to find them through Azure:- Function Key (
FUNCTION_KEY): This is the Azure Functions App key that is found in theApp Keyssection of the Function App
- Function Key (
FUNCTION_KEY=
- Create a
local.settings.jsonfile in theappdirectory. The purpose of this file is to simulate the Azure environment by defining environment variables that are only accessible in the Azure environment. Contact a developer for these
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "",
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
"BlobStorageConnectionString": "",
"OPENAI_KEY": "",
"CONNECTION_STRING": "",
"STORAGE_KEY": "",
"STORAGE_URL": "",
"STORAGE_CONTAINER": "app-data",
"MISTRAL_MODEL": "mistralai-mistral-7b-instruct-5",
"MISTRAL_ENDPOINT": "jludq",
"MISTRAL_KEY": "",
"LLAMA_MODEL": "llama-2-7b-chat-18",
"LLAMA_ENDPOINT": "ebbtk",
"LLAMA_KEY": ""
}
}- Install the 'Azure Functions App' extension and sign into Azure via this extension
- Use the extension to locally test the endpoint. This creates endpoints on localhost that you can make
getrequests to via the URL (or Python code). You may have to:
- re-open VSCode to the
appdirectory as the root directory - install the Azure Function toolkit through the website or through
npm. The extension is a wrapper around this toolkit; you can use the toolkit itself to start a local server viafunc startin theapp/directory - manually create a
.venvin theappdirectory
- Use the extension to publish the code to Azure Function App OR push to GitHub, get it into the main branch, and run the GitHub Actions workflow to update the Azure Function App
Unit tests are defined in the tests/ directory. They call a mocked version of the local function app which does not require a server to be running. However, this requires a temp.txt file be defined in the project root directory that contains all the keys in local.settings.json:
AzureWebJobsStorage=
FUNCTIONS_WORKER_RUNTIME=
AzureWebJobsFeatureFlags=
BlobStorageConnectionString=
OPENAI_KEY=
FUNCTION_KEY=
CONNECTION_STRING=
STORAGE_KEY=
STORAGE_URL=
STORAGE_CONTAINER=
MISTRAL_MODEL=
MISTRAL_ENDPOINT=
MISTRAL_KEY=
LLAMA_MODEL=
LLAMA_ENDPOINT=
LLAMA_KEY=
The reason for this file is that we want to have the function app run in CI/CD for unit testing, but we can't push .env files to GitHub. Therefore, we have .env private keys in GitHub secrets, and a temp.txt file is produced in the unit_tests.yml GitHub Actions file during runtime. This temp.txt file exists locally for local testing purposes but is not pushed to GitHub. Ultimately both temp.txt and local.settings.json accomplish the same task of populating environment variables in os.environ to emulate keys that exist in Azure, but temp.txt is for local unit testing and local.settings.json is for local deployment.
Run the following command to execute the tests (from the project root directory):
pytest --log-cli-level=INFO tests/Remember to add any new private keys to
temp.txt(for unit testing),local.settings.json(for local deployment), the Function AppConfigurationpage (for cloud deployment), and GitHub secrets (for CI/CD)
Currently, the LLM models are sourced from either ML Studio, AI Studio, or a Container App. Adding models from additional sources may require the addition of new functions. To add a model from one of the existing sources, follow the below guide:
- Add the endpoint to
app/function_app.py. Choose any name but be consistent with it - Add the necessary API keys to
local.settings.jsonand also manually add them to the Function App in Azure via theConfigurationpage. Thelocal.settings.jsonfile's purpose is actually to emulate access to these private keys as they live on the Cloud but we want to do experimentation locally. - In
app/backend.py:
- If applicable, add API rate information to the
LLM_RATESconstant. This is for pay-as-you-go APIs - Add your model name to the
self._augment(self)method based on where the model is coming from. If it is a container app, then the_containerized_model(self)method must be updated by changing the request URL. One way to do this is by making the request URL dynamic via environment variables, similar to the setup withMISTRALandLLAMAprefixes inlocal.settings.json
- Test the endpoint by running the Function App locally (using the VSCode extension or toolkit) and sending an API request
The container app and ML Studio LLMs are disabled as they run on real-time endpoints which charge an hourly fee, as opposed to the more preferred pay-as-you-go endpoints in AI Studio. To re-enable the ML studio and container app models, see below:
-
ML Studio: These models must be re-created as their endpoints are auto-deleted after 7 days. Go to the
Model Catalogin ML Studio and find the text-based LLM of choice. Deploy the endpoint (pay-as-you-go, if possible; but real-time otherwise) and copy over the endpoint prefix, API key, and model name tolocal.settings.jsonas new environment variables. The endpoint takes a while to make (~10 minutes), and is not fully provisioned until the 'Consume' tab is visible in the endpoint page. Remember to also add the keys to theConfigurationmenu on the Function App in Azure. -
Container App: Go to the
localai-selfhostContainer App and turn it on. This will take about 8 minutes to fully wake up, but the model should be accessible afterward.



