Conversation
| output_path = os.path.dirname(xml_path) | ||
| output_directory = f'{output_path}/json' | ||
| else: | ||
| output_directory = f'{output_path}/json' |
There was a problem hiding this comment.
If an output path is explicitly provided, will we still need to add the JSON subfolder?
| os.makedirs(output_directory) | ||
|
|
||
| # Write the contents of the JSON file into the output folder path. | ||
| file_path = os.path.join(output_directory, f'{file_name}.json') |
There was a problem hiding this comment.
Same question here. Should a non-empty output_path be a file or a directory?
| file_path = os.path.join(output_directory, f'{file_name}.json') | ||
| try: | ||
| with open(file_path, "w") as json_file: | ||
| json_file.write(json_data) |
There was a problem hiding this comment.
I'm not very familiar with XML. Will this write out rows that are bounded by brackets ([])? If so, we'll want to write out JSON lines by iterating over the dictionary rows and writing them out with a newline at the end.
There was a problem hiding this comment.
Good question about the naming conventions. Because we already have a translate_csv_file_to_jsonl() method in the jsonl callable submodule, I'd argue we should put it there.
|
@AngelicaLastra I left a few comments a while ago. Apologies for not tagging you to elevate them back to attention! If we can resolve them, we can get this merged into 0.3.2. |
Description & motivation
While developing an Earthmover Template for Stadium Texas’ HRIS Staff Data, we encountered challenges with Earthmover’s ability to parse variables within non-flattened XML files. To address this, we created a Python script that converts nested XML files into JSON format. After some consideration, we felt that this script general enough that it can be used by other projects, and thus thought it would be a good opportunity to include this in our
EA_airflow_utilrepo.PR Merge Priority:
New files created:
file_type.py: Not sure if the name of this model is the one we want to go with, but something longer like "file_type_conversion.py" felt out of place given the short and straightforward names in the other files.Tests and QC done: