From a4d60884944b9c4f295ad7ed01cbc7f794249bd2 Mon Sep 17 00:00:00 2001 From: Vladimir Queiroz Sejas Date: Sun, 7 Jun 2026 15:38:56 -0300 Subject: [PATCH] docs: add beginner workflow notebook for PySUS --- .../databases/getting_started_pysus.ipynb | 732 ++++++++++++++++++ 1 file changed, 732 insertions(+) create mode 100644 docs/source/databases/getting_started_pysus.ipynb diff --git a/docs/source/databases/getting_started_pysus.ipynb b/docs/source/databases/getting_started_pysus.ipynb new file mode 100644 index 00000000..8caa0009 --- /dev/null +++ b/docs/source/databases/getting_started_pysus.ipynb @@ -0,0 +1,732 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "md-6993225989846975460", + "metadata": {}, + "source": [ + "# Getting Started with PySUS\n", + "\n", + "*Your complete guide to accessing Brazilian public health data*\n", + "\n", + "**PySUS v2.1.0 · Python 3.10+**\n", + "\n", + "Notebook contribution — [AlertaDengue/PySUS](https://github.com/AlertaDengue/PySUS) · Issue #277\n", + "\n", + "---\n", + "\n", + "## Introduction\n", + "\n", + "PySUS is a Python library that provides easy access to publicly available datasets\n", + "from Brazil's Unified Health System (SUS), published by DATASUS.\n", + "It handles file discovery, downloading, and parsing — so you can focus on data analysis\n", + "rather than dealing with legacy file formats.\n", + "\n", + "This notebook presents a complete beginner-friendly workflow,\n", + "from installation to the first data exploration using real SUS data.\n", + "\n", + "> **No prior knowledge of DATASUS is required.**\n", + "> All datasets are freely available at [datasus.saude.gov.br](https://datasus.saude.gov.br)\n", + "> and fetched directly from official government servers." + ] + }, + { + "cell_type": "markdown", + "id": "md-8239463963230294527", + "metadata": {}, + "source": [ + "---\n", + "## 1. Installation\n", + "\n", + "Install PySUS using `pip`.\n", + "If you are working inside a virtual environment (recommended), activate it first.\n", + "\n", + "```\n", + "pip install pysus\n", + "```\n", + "\n", + "> **Tip:** Using a virtual environment keeps your project dependencies isolated.\n", + "> ```\n", + "> python -m venv venv\n", + "> source venv/bin/activate # Linux / macOS\n", + "> venv\\Scripts\\activate # Windows\n", + "> ```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "cd-8334704359266914152", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: pysus in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (2.2.0)\n", + "Requirement already satisfied: Unidecode<2.0.0,>=1.3.6 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (1.4.0)\n", + "Requirement already satisfied: aioftp<0.22.0,>=0.21.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.21.4)\n", + "Requirement already satisfied: anyio<5.0.0,>=4.13.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (4.13.0)\n", + "Requirement already satisfied: bigtree<0.13.0,>=0.12.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.12.5)\n", + "Requirement already satisfied: boto3<2.0.0,>=1.42.89 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (1.43.24)\n", + "Requirement already satisfied: chardet<8.0.0,>=7.4.0.post2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (7.4.3)\n", + "Requirement already satisfied: dateparser<2.0.0,>=1.1.8 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (1.4.0)\n", + "Requirement already satisfied: dbfread==2.0.7 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.0.7)\n", + "Requirement already satisfied: dotenv<0.10.0,>=0.9.9 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.9.9)\n", + "Requirement already satisfied: duckdb<2.0.0,>=1.4.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (1.5.3)\n", + "Requirement already satisfied: duckdb-engine<0.18.0,>=0.17.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.17.0)\n", + "Requirement already satisfied: fastparquet<=2024.11.0,>=2023.10.1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2024.11.0)\n", + "Requirement already satisfied: httpx>=0.28.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.28.1)\n", + "Requirement already satisfied: loguru<0.7.0,>=0.6.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.6.0)\n", + "Requirement already satisfied: numpy<2,>=1.22 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (1.26.4)\n", + "Requirement already satisfied: pandas<3.0.0,>=2.2.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.3.3)\n", + "Requirement already satisfied: pyarrow>=11.0.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (24.0.0)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.12.5 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.13.4)\n", + "Requirement already satisfied: pyreaddbc>=2.0.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.0.4)\n", + "Requirement already satisfied: python-dateutil==2.8.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.8.2)\n", + "Requirement already satisfied: python-magic<0.5.0,>=0.4.27 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.4.27)\n", + "Requirement already satisfied: sqlalchemy<3.0.0,>=2.0.48 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (2.0.50)\n", + "Requirement already satisfied: tqdm>=4.67.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (4.68.1)\n", + "Requirement already satisfied: typer<0.25.0,>=0.24.1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (0.24.2)\n", + "Requirement already satisfied: typing-extensions>=4.10.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (4.15.0)\n", + "Requirement already satisfied: wget<4.0,>=3.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pysus) (3.2)\n", + "Requirement already satisfied: six>=1.5 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from python-dateutil==2.8.2->pysus) (1.17.0)\n", + "Requirement already satisfied: idna>=2.8 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from anyio<5.0.0,>=4.13.0->pysus) (3.18)\n", + "Requirement already satisfied: botocore<1.44.0,>=1.43.24 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from boto3<2.0.0,>=1.42.89->pysus) (1.43.24)\n", + "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from boto3<2.0.0,>=1.42.89->pysus) (1.1.0)\n", + "Requirement already satisfied: s3transfer<0.19.0,>=0.18.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from boto3<2.0.0,>=1.42.89->pysus) (0.18.0)\n", + "Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from botocore<1.44.0,>=1.43.24->boto3<2.0.0,>=1.42.89->pysus) (2.7.0)\n", + "Requirement already satisfied: pytz>=2024.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from dateparser<2.0.0,>=1.1.8->pysus) (2026.2)\n", + "Requirement already satisfied: regex>=2024.9.11 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from dateparser<2.0.0,>=1.1.8->pysus) (2026.5.9)\n", + "Requirement already satisfied: tzlocal>=0.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from dateparser<2.0.0,>=1.1.8->pysus) (5.3.1)\n", + "Requirement already satisfied: python-dotenv in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from dotenv<0.10.0,>=0.9.9->pysus) (1.2.2)\n", + "Requirement already satisfied: packaging>=21 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from duckdb-engine<0.18.0,>=0.17.0->pysus) (26.2)\n", + "Requirement already satisfied: cramjam>=2.3 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from fastparquet<=2024.11.0,>=2023.10.1->pysus) (2.11.0)\n", + "Requirement already satisfied: fsspec in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from fastparquet<=2024.11.0,>=2023.10.1->pysus) (2026.4.0)\n", + "Requirement already satisfied: colorama>=0.3.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from loguru<0.7.0,>=0.6.0->pysus) (0.4.6)\n", + "Requirement already satisfied: win32-setctime>=1.0.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from loguru<0.7.0,>=0.6.0->pysus) (1.2.0)\n", + "Requirement already satisfied: tzdata>=2022.7 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pandas<3.0.0,>=2.2.2->pysus) (2026.2)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pydantic<3.0.0,>=2.12.5->pysus) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.46.4 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pydantic<3.0.0,>=2.12.5->pysus) (2.46.4)\n", + "Requirement already satisfied: typing-inspection>=0.4.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from pydantic<3.0.0,>=2.12.5->pysus) (0.4.2)\n", + "Requirement already satisfied: greenlet>=1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from sqlalchemy<3.0.0,>=2.0.48->pysus) (3.5.1)\n", + "Requirement already satisfied: click>=8.2.1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from typer<0.25.0,>=0.24.1->pysus) (8.4.1)\n", + "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from typer<0.25.0,>=0.24.1->pysus) (1.5.4)\n", + "Requirement already satisfied: rich>=12.3.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from typer<0.25.0,>=0.24.1->pysus) (15.0.0)\n", + "Requirement already satisfied: annotated-doc>=0.0.2 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from typer<0.25.0,>=0.24.1->pysus) (0.0.4)\n", + "Requirement already satisfied: certifi in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from httpx>=0.28.0->pysus) (2026.5.20)\n", + "Requirement already satisfied: httpcore==1.* in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from httpx>=0.28.0->pysus) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from httpcore==1.*->httpx>=0.28.0->pysus) (0.16.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from rich>=12.3.0->typer<0.25.0,>=0.24.1->pysus) (4.2.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from rich>=12.3.0->typer<0.25.0,>=0.24.1->pysus) (2.20.0)\n", + "Requirement already satisfied: mdurl~=0.1 in c:\\users\\vladi\\appdata\\local\\programs\\python\\python313\\lib\\site-packages (from markdown-it-py>=2.2.0->rich>=12.3.0->typer<0.25.0,>=0.24.1->pysus) (0.1.2)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 25.2 -> 26.1.2\n", + "[notice] To update, run: C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "# Run this cell if PySUS is not yet installed\n", + "%pip install pysus" + ] + }, + { + "cell_type": "markdown", + "id": "md-7861405087406607636", + "metadata": {}, + "source": [ + "---\n", + "## 2. Checking Your Installation\n", + "\n", + "After installing PySUS, verify that the package is available and check the installed version.\n", + "If the installation was successful, Python will display the version number." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "cd-2865917029134968551", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.2.0\n" + ] + } + ], + "source": [ + "import pysus\n", + "\n", + "print(pysus.__version__)" + ] + }, + { + "cell_type": "markdown", + "id": "md-7886435397085505865", + "metadata": {}, + "source": [ + "---\n", + "## 3. Exploring the Package\n", + "\n", + "PySUS exposes each health dataset as a simple callable function.\n", + "Use `dir(pysus)` to see everything available:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "cd-3896900325229140893", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['CACHEPATH', 'Final', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'api', 'ciha', 'cnes', 'get_version', 'ibge', 'importlib_metadata', 'list_files', 'os', 'pathlib', 'pni', 'sia', 'sih', 'sim', 'sinan', 'sinasc', 'version']\n" + ] + } + ], + "source": [ + "import pysus\n", + "\n", + "print(dir(pysus))" + ] + }, + { + "cell_type": "markdown", + "id": "md-5907404130116600740", + "metadata": {}, + "source": [ + "The main dataset functions are:\n", + "\n", + "| Function | Dataset | Description |\n", + "|----------|---------|-------------|\n", + "| `sinasc()` | SINASC | Live birth records |\n", + "| `sim()` | SIM | Mortality records |\n", + "| `sinan()` | SINAN | Notifiable diseases |\n", + "| `sih()` | SIH | Hospital admissions |\n", + "| `sia()` | SIA | Outpatient procedures |\n", + "| `cnes()` | CNES | Health facilities |\n", + "| `pni()` | PNI | Immunisation programme |\n", + "| `ibge()` | IBGE | Demographic data |" + ] + }, + { + "cell_type": "markdown", + "id": "md-6714993952696256954", + "metadata": {}, + "source": [ + "---\n", + "## 4. Understanding the Parameters\n", + "\n", + "All dataset functions share the same parameter pattern.\n", + "Here is the signature for `sinasc()`:\n", + "\n", + "```python\n", + "sinasc(\n", + " state = \"SP\", # two-letter Brazilian state code\n", + " year = 2022, # integer or list of integers\n", + ")\n", + "```\n", + "\n", + "| Parameter | Type | Description | Example |\n", + "|-----------|------|-------------|---------|\n", + "| `state` | `str` | Two-letter state abbreviation | `\"SP\"`, `\"RJ\"`, `\"MG\"` |\n", + "| `year` | `int` or `list[int]` | Single year or list of years | `2022` or `[2021, 2022]` |\n", + "| `group` | `str` or `None` | Sub-group code (SINAN only) | `\"DENG\"` (dengue) |" + ] + }, + { + "cell_type": "markdown", + "id": "md-5061227682282249364", + "metadata": {}, + "source": [ + "---\n", + "## 5. Listing Available Files\n", + "\n", + "Before downloading, use `list_files()` to discover what is available\n", + "for a given dataset, state, and year.\n", + "\n", + "> **Note:** Jupyter runs an async event loop internally.\n", + "> We use `nest_asyncio` to allow PySUS async calls inside the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "cd-974912376655675624", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n", + "Task was destroyed but it is pending!\n", + "task: .run_in_context() done, defined at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\ipykernel\\utils.py:57> wait_for= cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.._log_error() at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\zmq\\eventloop\\zmqstream.py:563]>\n", + "C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\ast.py:50: RuntimeWarning: coroutine 'Kernel.shell_main' was never awaited\n", + " return compile(source, filename, mode, flags,\n", + "RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n", + "Task was destroyed but it is pending!\n", + "task: cb=[Task.__wakeup()]>\n", + "Task was destroyed but it is pending!\n", + "task: .run_in_context() done, defined at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\ipykernel\\utils.py:57> wait_for= cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.._log_error() at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\zmq\\eventloop\\zmqstream.py:563]>\n", + "Task was destroyed but it is pending!\n", + "task: cb=[Task.__wakeup()]>\n", + "Task was destroyed but it is pending!\n", + "task: .run_in_context() done, defined at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\ipykernel\\utils.py:57> wait_for= cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.._log_error() at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\zmq\\eventloop\\zmqstream.py:563]>\n", + "Task was destroyed but it is pending!\n", + "task: cb=[Task.__wakeup()]>\n", + "Task was destroyed but it is pending!\n", + "task: .run_in_context() done, defined at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\ipykernel\\utils.py:57> wait_for= cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.._log_error() at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\zmq\\eventloop\\zmqstream.py:563]>\n", + "Task was destroyed but it is pending!\n", + "task: cb=[Task.__wakeup()]>\n", + "Task was destroyed but it is pending!\n", + "task: .run_in_context() done, defined at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\ipykernel\\utils.py:57> wait_for= cb=[Task.__wakeup()]> cb=[ZMQStream._run_callback.._log_error() at C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\zmq\\eventloop\\zmqstream.py:563]>\n", + "Task was destroyed but it is pending!\n", + "task: cb=[Task.__wakeup()]>\n", + "Exception in callback Task.__step()\n", + "handle: \n", + "Traceback (most recent call last):\n", + " File \"C:\\Users\\vladi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\events.py\", line 89, in _run\n", + " self._context.run(self._callback, *self._args)\n", + " ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + "RuntimeError: cannot enter context: <_contextvars.Context object at 0x0000026B7552CC80> is already entered\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Files found: 1\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namepathdatasetgroupyearmonthstatemodify
0public\\data\\ftp\\sinasc\\DNRJ2022.parquetpublic\\data\\ftp\\sinasc\\DNRJ2022.parquetsinascNone2022NoneRJ2023-12-20 16:45:00
\n", + "
" + ], + "text/plain": [ + " name \\\n", + "0 public\\data\\ftp\\sinasc\\DNRJ2022.parquet \n", + "\n", + " path dataset group year month state \\\n", + "0 public\\data\\ftp\\sinasc\\DNRJ2022.parquet sinasc None 2022 None RJ \n", + "\n", + " modify \n", + "0 2023-12-20 16:45:00 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Required to run PySUS async functions inside Jupyter\n", + "import nest_asyncio\n", + "nest_asyncio.apply()\n", + "\n", + "from pysus import list_files\n", + "\n", + "# List SINASC files available for Rio de Janeiro, 2022\n", + "available = list_files(dataset=\"SINASC\", state=\"RJ\", year=2022)\n", + "\n", + "print(f\"Files found: {len(available)}\")\n", + "available" + ] + }, + { + "cell_type": "markdown", + "id": "md-1656243573859326293", + "metadata": {}, + "source": [ + "---\n", + "## 6. Downloading Your First Dataset\n", + "\n", + "Now download SINASC birth records for Rio de Janeiro, 2022.\n", + "The function returns a **pandas DataFrame** directly —\n", + "no manual file handling required.\n", + "\n", + "> **Note:** The first download may take about 30 seconds depending on your connection.\n", + "> PySUS caches files locally, so the next call for the same data will be much faster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-3795208869041744895", + "metadata": {}, + "outputs": [], + "source": [ + "from pysus import sinasc\n", + "\n", + "# Download live birth records — Rio de Janeiro, 2022\n", + "df = sinasc(state=\"RJ\", year=2022)\n", + "\n", + "print(f\"Rows : {len(df):,}\")\n", + "print(f\"Columns: {df.shape[1]}\")\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "md-7974618639364585060", + "metadata": {}, + "source": [ + "> **Tip:** To download multiple years at once, pass a list:\n", + "> ```python\n", + "> df = sinasc(state=\"SP\", year=[2020, 2021, 2022])\n", + "> ```\n", + "> PySUS concatenates the results into a single DataFrame automatically." + ] + }, + { + "cell_type": "markdown", + "id": "md-8849289477909562647", + "metadata": {}, + "source": [ + "---\n", + "## 7. Inspecting the Data\n", + "\n", + "Three essential commands for exploring any new DataFrame:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-4586221015219041710", + "metadata": {}, + "outputs": [], + "source": [ + "# Shape: number of rows and columns\n", + "print(df.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-6113157604871589284", + "metadata": {}, + "outputs": [], + "source": [ + "# Column names and data types\n", + "df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-1603410318595514302", + "metadata": {}, + "outputs": [], + "source": [ + "# Descriptive statistics for numeric columns\n", + "df.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-4002606025712713324", + "metadata": {}, + "outputs": [], + "source": [ + "# Check for missing values (top 10 columns with most nulls)\n", + "missing = df.isnull().sum().sort_values(ascending=False)\n", + "print(\"Columns with missing values:\")\n", + "print(missing[missing > 0].head(10))" + ] + }, + { + "cell_type": "markdown", + "id": "md-9198316201049293265", + "metadata": {}, + "source": [ + "Key SINASC columns you will encounter:\n", + "\n", + "| Column | Description |\n", + "|--------|-------------|\n", + "| `DTNASC` | Birth date (format: DDMMYYYY) |\n", + "| `IDADEMAE` | Mother's age in years |\n", + "| `ESCMAE` | Mother's years of schooling |\n", + "| `PARTO` | Type of delivery (1 = vaginal, 2 = caesarean) |\n", + "| `CONSULTAS` | Number of prenatal visits |\n", + "| `SEXO` | Sex of the newborn |\n", + "| `PESO` | Birth weight in grams |" + ] + }, + { + "cell_type": "markdown", + "id": "md-7153773555296681005", + "metadata": {}, + "source": [ + "---\n", + "## 8. Understanding the Local Cache\n", + "\n", + "PySUS caches downloaded files locally.\n", + "The second time you request the same data, it loads from disk instead of\n", + "re-downloading — making repeated analysis much faster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-2621658557253256783", + "metadata": {}, + "outputs": [], + "source": [ + "import pysus\n", + "\n", + "# See where PySUS stores its cached files\n", + "print(pysus.CACHEPATH)\n", + "# Typical output: ~/.pysus" + ] + }, + { + "cell_type": "markdown", + "id": "md-5829319428437279100", + "metadata": {}, + "source": [ + "> The cache directory is `~/.pysus` by default on Linux and macOS,\n", + "> and `C:\\Users\\\\.pysus` on Windows.\n", + "> You can safely delete it to free disk space;\n", + "> data will be re-downloaded on the next call." + ] + }, + { + "cell_type": "markdown", + "id": "md-4035189673861699865", + "metadata": {}, + "source": [ + "---\n", + "## 9. Your First Visualisation\n", + "\n", + "Let's plot the **monthly distribution of births** in Rio de Janeiro for 2022.\n", + "\n", + "The column `DTNASC` stores the birth date in the format `DDMMYYYY`.\n", + "We extract the month from it and create a simple bar chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd-7840088167428654087", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.ticker as ticker\n", + "\n", + "# Parse birth month from DTNASC (format: DDMMYYYY)\n", + "df[\"birth_month\"] = pd.to_datetime(\n", + " df[\"DTNASC\"], format=\"%d%m%Y\", errors=\"coerce\"\n", + ").dt.month\n", + "\n", + "# Count births per month\n", + "monthly = df[\"birth_month\"].value_counts().sort_index()\n", + "\n", + "month_names = [\n", + " \"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\",\n", + " \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"\n", + "]\n", + "\n", + "# Plot\n", + "fig, ax = plt.subplots(figsize=(10, 5))\n", + "\n", + "ax.bar(\n", + " range(1, len(monthly) + 1),\n", + " monthly.values,\n", + " color=\"steelblue\",\n", + " edgecolor=\"white\",\n", + " width=0.7,\n", + ")\n", + "\n", + "ax.set_xticks(range(1, 13))\n", + "ax.set_xticklabels(month_names)\n", + "ax.yaxis.set_major_formatter(\n", + " ticker.FuncFormatter(lambda x, _: f\"{int(x):,}\")\n", + ")\n", + "ax.set_title(\"Monthly Live Births — Rio de Janeiro (2022)\", fontsize=14, pad=12)\n", + "ax.set_xlabel(\"Month\")\n", + "ax.set_ylabel(\"Number of births\")\n", + "ax.spines[[\"top\", \"right\"]].set_visible(False)\n", + "\n", + "plt.tight_layout()\n", + "plt.show()\n", + "\n", + "print(f\"\\nTotal births plotted: {monthly.sum():,}\")" + ] + }, + { + "cell_type": "markdown", + "id": "md-7639515120531613048", + "metadata": {}, + "source": [ + "---\n", + "## 10. Available Data Sources\n", + "\n", + "PySUS gives access to the following official DATASUS datasets:\n", + "\n", + "| Dataset | Full name | Coverage |\n", + "|---------|-----------|----------|\n", + "| SINASC | Sistema de Informação sobre Nascidos Vivos | Live births, all states, 1994–present |\n", + "| SIM | Sistema de Informação sobre Mortalidade | Deaths, all states, 1979–present |\n", + "| SINAN | Sistema de Informação de Agravos de Notificação | Notifiable diseases (dengue, TB, etc.) |\n", + "| SIH | Sistema de Informações Hospitalares | Hospital admissions, 1992–present |\n", + "| SIA | Sistema de Informações Ambulatoriais | Outpatient procedures |\n", + "| CNES | Cadastro Nacional de Estabelecimentos de Saúde | Health facilities registry |\n", + "| PNI | Programa Nacional de Imunizações | Vaccination data |\n", + "| IBGE | Instituto Brasileiro de Geografia e Estatística | Population and demographic data |" + ] + }, + { + "cell_type": "markdown", + "id": "md-334929490140752845", + "metadata": {}, + "source": [ + "---\n", + "## 11. Next Steps\n", + "\n", + "You now have a working PySUS workflow. Here are some directions to explore next:\n", + "\n", + "- **Analyse maternal age** — use the `IDADEMAE` column\n", + "- **Compare delivery types** — vaginal vs. caesarean using `PARTO`\n", + "- **Explore prenatal visits** — `CONSULTAS` column\n", + "- **Download mortality data** — `sim(state=\"SP\", year=2022)`\n", + "- **Investigate notifiable diseases** — `sinan(state=\"RJ\", year=2022, group=\"DENG\")`\n", + "- **Compare multiple states** — loop over a list of state codes\n", + "\n", + "---\n", + "\n", + "**Useful resources:**\n", + "\n", + "- [PySUS documentation](https://pysus.readthedocs.io)\n", + "- [PySUS GitHub repository](https://github.com/AlertaDengue/PySUS)\n", + "- [DATASUS portal](https://datasus.saude.gov.br) — official Brazilian health data portal" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}