Changed output format and tmcf for import Eurostatdata_lifeexpectancy#1943
Changed output format and tmcf for import Eurostatdata_lifeexpectancy#1943niveditasing wants to merge 6 commits intodatacommonsorg:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Eurostat life expectancy data pipeline by simplifying the TMCF file into a single template node and overhauling the preprocessing script. The Python script has been updated to produce a long-format CSV, improve StatVar and place mapping logic, and remove the automated download functionality. Feedback includes removing an unused re import, cleaning up internal notes within the code comments, and simplifying a redundant argument in a string split operation.
| import pandas as pd | ||
| from six.moves import urllib | ||
| import numpy as np | ||
| import re |
| # But wait, freq is stripped in the original preprocess.py? | ||
| # data['unit,sex,age,geo\time'] = data['unit,sex,age,geo\time'].str.slice(2) | ||
| # Let's check the first column content |
| if pd.isna(entry) or entry == ':': | ||
| return np.nan | ||
| if isinstance(entry, str): | ||
| entry = entry.split(' ', maxsplit=-1)[0] | ||
| if entry == ':': | ||
| return np.nan | ||
| try: | ||
| return float(entry) | ||
| except ValueError: | ||
| return np.nan | ||
| return entry |
There was a problem hiding this comment.
The maxsplit=-1 argument in split() is the default behavior and can be omitted for brevity. The logic for handling the ':' character and converting to float is maintained for correctness.
| if pd.isna(entry) or entry == ':': | |
| return np.nan | |
| if isinstance(entry, str): | |
| entry = entry.split(' ', maxsplit=-1)[0] | |
| if entry == ':': | |
| return np.nan | |
| try: | |
| return float(entry) | |
| except ValueError: | |
| return np.nan | |
| return entry | |
| def obtain_value(entry): | |
| """Extract value from entry.""" | |
| if pd.isna(entry) or entry == ':': | |
| return np.nan | |
| if isinstance(entry, str): | |
| entry = entry.split(' ')[0] | |
| if entry == ':': | |
| return np.nan | |
| try: | |
| return float(entry) | |
| except ValueError: | |
| return np.nan | |
| return entry |
No description provided.