Skip to content

Expand Dataset.from_files so it works properly with derived variables#2777

Closed
schlunma wants to merge 95 commits intomainfrom
from_files_with_derived_vars
Closed

Expand Dataset.from_files so it works properly with derived variables#2777
schlunma wants to merge 95 commits intomainfrom
from_files_with_derived_vars

Conversation

@schlunma
Copy link
Copy Markdown
Contributor

@schlunma schlunma commented Jul 16, 2025

Description

This PR expands Dataset.from_files so it works properly with derived variables. In addition, a new attribute Dataset.input_datasets is available which returns the datasets necessary for derivation (or simply the dataset itself is no derivation is required). This can also be used within the derive preprocessor function.

This PR is the second step to make Dataset.load work with derived variables.

Example

dataset_template = Dataset(
    short_name="lwcre",
    mip="Amon",
    project="CMIP6",
    exp="historical",
    dataset="*",
    institute="*",
    ensemble="r1i1p1f1",
    grid="gn",
    derive=True,
    force_derivation=True,
)

datasets = list(dataset_template.from_files())
print(f"Found {len(datasets)} datasets")  # Found 36 datasets

dataset = datasets[0]
dataset.files  # []

for d in dataset.input_datasets:
    print(d["short_name"])
    print(d.files)

# rlut
# [ESGFFile:CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/rlut/gn/v20200623/rlut_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc on hosts ['esgf.ceda.ac.uk', 'esgf.rcec.sinica.edu.tw', 'esgf3.dkrz.de', 'esgf3.dkrz.de']]
# rlutcs
# [ESGFFile:CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/Amon/rlutcs/gn/v20200623/rlutcs_Amon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc on hosts ['esgf.ceda.ac.uk', 'esgf.rcec.sinica.edu.tw', 'esgf3.dkrz.de']]

Related to #2769.

Link to documentation:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

schlunma and others added 22 commits January 9, 2026 09:59
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
@bouweandela
Copy link
Copy Markdown
Member

My apologies for being slow with looking at this. I agree that it would be a great feature, but I don't know if this is the right way to implement it. I would like to investigate if we can find a way to do it without making the Dataset class more complicated. I'll try to find time to do that soon.

@schlunma
Copy link
Copy Markdown
Contributor Author

schlunma commented Feb 2, 2026

Thanks for your answer. I am sorry to hear that this is not the "right" way of implementing it.

It would have been nice to receive this kind of feedback after I opened the corresponding issue in July 2025, after opening an associated PR that also clearly outlined this plan in July 2025, after opening this PR in July 2025, or at least after my answer to your comments last month. This would have saved me at least 3 days of work (adapting this to the new data sources configuration alone took me a full day 2 weeks ago).

@schlunma schlunma closed this Feb 2, 2026
@schlunma
Copy link
Copy Markdown
Contributor Author

Just ran into this problem again...would be nice to have a proper solution to that at some point.

@bouweandela
Copy link
Copy Markdown
Member

Hi Manuel,

I had a look this morning and came up with an alternative that requires fewer changes to the existing code: #3051. Do you think that could work?

@bouweandela
Copy link
Copy Markdown
Member

and I did another one in #3053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

variable derivation Related to variable derivation functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants