Standard Occupational Classification (SOC) Library, initially developed for Survey Assist API but can be used elsewhere.
SOC classification library, utilities used to classify occupation code based off the official ONS SOC 2020 structure and coding index.
- SOC Lookup. A utility that uses a well-known set of SOC mappings of job titles to SOC classification codes.
- SOC Classification. A RAG approach to classification of SOC using input data, semantic search and LLM.
- SOC Rephrase. Packaged example data and
SOCRephraseLookupfor mappingsoc_codevalues to respondent-friendly rephrased descriptions.
Ensure you have the following installed on your local machine:
- Python 3.12 (Recommended: use
pyenvto manage versions) -
poetry(for dependency management) - Colima (if running locally with containers)
- Terraform (for infrastructure management)
- Google Cloud SDK (
gcloud) with appropriate permissions
The Makefile defines a set of commonly used commands and workflows. Where possible use the files defined in the Makefile.
git clone https://github.com/ONSdigital/soc-classification-library.git
cd soc-classification-librarypoetry installGit hooks can be used to check code before commit. To install run:
pre-commit installThere is example source for using the SOC Lookup functionality in soc_lookup_example.py to run:
poetry run python src/occupational_classification/lookup/soc_lookup_example.pyThe library also ships with small packaged example datasets used by downstream services (e.g. survey-assist-api) for end-to-end testing:
- SOC lookup example CSV:
src/occupational_classification/data/example_soc_lookup_data.csv - SOC rephrase example CSV:
src/occupational_classification/data/example_rephrased_soc_data.csv
Code quality and static analysis will be enforced using isort, black, ruff, mypy and pylint. Security checking will be enhanced by running bandit.
To check the code quality, but only report any errors without auto-fix run:
make check-python-nofixTo check the code quality and automatically fix errors where possible run:
make check-pythonDocumentation is available in the docs/ folder and can be viewed using mkdocs:
make run-docsPytest is used for testing alongside pytest-cov for coverage testing. /tests/conftest.py defines config used by the tests.
Unit testing for utility functions is added to the /tests/tests_utils.py
make unit-testsAll tests can be run using
make all-testsThis library is designed to be consumed as a Python package and does not require
any environment variables on its own. Downstream services (such as survey-assist-api)
may define their own configuration around it.