Add quick-start notebook for Open Data Registry reference#127
Add quick-start notebook for Open Data Registry reference#127CodyCBakerPhD wants to merge 1 commit intomasterfrom
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
thanks for this. i would perhaps focus on published dandisets instead of a draft one. perhaps a draft dandiset could be referred to as something that could change as an example of why people should publish. since that was one of the issues brought to us about draft dandisets being used in a paper and others could not replicate it afterwards because it had changed. |
|
@satra Indeed that is the primary weakness in the choice of 000409 However, I would still defend its use here because
|
|
@CodyCBakerPhD - i think we should wait then till it's published. also if these are the only two useful datasets for tutorials then something is amiss in our efforts to run a repository. for example a tutorial could also be about how to replicate the results of a study that was published. finally as a repository instead of using a single dataset, it may also be useful to see how two datasets could be connected, for example similar to sabera's work on creating a timeseries model. this doesn't mean this tutorial isn't useful, but more on the note of how dandi can be used for different purposes, not just how to use dandi. |
|
Thanks for the feedback! I think I forgot to highlight some constraints that were specified regarding the scope of this effort. The most basic requirement is that it ought to be as short as possible to act as an enticing entrypoint to the rest of the vast ecosystem. To that end, we happily embed links to many other resources, including some of your suggested next steps. This design was largely based on the tutorial Brock pointed us to for BossDB when they submitted their answer to the Open Data request. As you can see, they focus on a single dataset despite BossDB hosting many others. Essentially, the prompts given by the Open Data request are to focus on one or a few demonstrations, emphasizing their scientific value, together with the technical possibilities allowed by the S3 bucket.
Apologies, I did not mean to imply that no other datasets are useful - merely that these are among the best examples of their respective approaches, so we thought to 'put our best foot forward' here. Throughout the notebook we tried our best to emphasize that the archive hosts a wide diversity of modalities across hundreds of datasets - if you can think of a better way to de-emphasize the focus, please let us know.
I have always believed that an ideal dataset-specific tutorial (on this example notebook repository) would do exactly what you describe in reproducing results! Unfortunately, I also agree that it truly has yet to be done to the utmost satisfaction... The extended visual coding tutorial, adapted from some of Saskia's courses, is one of the notebooks that comes close to doing just that. That is why I targeted it for the last two scientific sections and redirected to the full notebook at the end of the section.
FWIW for the upcoming BBQS talk + July workshop I am going as deep as possible into reproducing the thesis work of a recent graduate student here at Dartmouth in a 'fully reproducible' style, including all data hosted on DANDI and all data processing/analysis contained in a BIDS study to either be added or linked here, so keep an eye out for that - it sounds like exactly what you want (but will NOT be a 'quick-start' sub 20-minute tutorial, hah)
Hmmm... we certainly lack a tutorial on meta-analysis of that style - I know Kailin from Kris Bouchard's group was doing something along those lines in a relatively simple study Yarik also wanted a tutorial showing how to perform various operations directly on the S3 bucket without needing the DANDI API at all Would you like me to raise issues here to request and track the development of those ideas? |
|
thanks for this detailed response. i think we are aligned in our thinking for now.
that would be great. and it's something we can encourage students to perhaps take on. also may make interesting use-cases for us to evaluate agentic systems. |

Primarily for the Open Data Registry reference but also for any other purpose
@h-mayorquin and I put this together over the past week - opening up for review. Let's plan to discuss/finalize next scientific meeting or before
Guiding philosophy: https://github.com/dandi/example-notebooks/blob/3da8bccae59587bba5e506fe5128401ef0c71ddf/tutorials/open_data_quick_start_2026/README.md
Rendered appearance: https://github.com/dandi/example-notebooks/blob/3da8bccae59587bba5e506fe5128401ef0c71ddf/tutorials/open_data_quick_start_2026/Get-to-know-a-Dandiset.ipynb
Original design inspired by: https://github.com/aplbrain/bossdb_cookbook/blob/main/notebooks/Get-to-know-a-dataset-template.ipynb
After being accepted, I will add links/references to both DANDI Docs and the Open Data landing page