Skip to content

Add quick-start notebook for Open Data Registry reference#127

Open
CodyCBakerPhD wants to merge 1 commit intomasterfrom
quick_start
Open

Add quick-start notebook for Open Data Registry reference#127
CodyCBakerPhD wants to merge 1 commit intomasterfrom
quick_start

Conversation

@CodyCBakerPhD
Copy link
Contributor

@CodyCBakerPhD CodyCBakerPhD commented Feb 10, 2026

Primarily for the Open Data Registry reference but also for any other purpose

@h-mayorquin and I put this together over the past week - opening up for review. Let's plan to discuss/finalize next scientific meeting or before

Guiding philosophy: https://github.com/dandi/example-notebooks/blob/3da8bccae59587bba5e506fe5128401ef0c71ddf/tutorials/open_data_quick_start_2026/README.md

Rendered appearance: https://github.com/dandi/example-notebooks/blob/3da8bccae59587bba5e506fe5128401ef0c71ddf/tutorials/open_data_quick_start_2026/Get-to-know-a-Dandiset.ipynb

Original design inspired by: https://github.com/aplbrain/bossdb_cookbook/blob/main/notebooks/Get-to-know-a-dataset-template.ipynb

After being accepted, I will add links/references to both DANDI Docs and the Open Data landing page

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@satra
Copy link
Member

satra commented Feb 10, 2026

thanks for this.

i would perhaps focus on published dandisets instead of a draft one. perhaps a draft dandiset could be referred to as something that could change as an example of why people should publish. since that was one of the issues brought to us about draft dandisets being used in a paper and others could not replicate it afterwards because it had changed.

@CodyCBakerPhD
Copy link
Contributor Author

@satra Indeed that is the primary weakness in the choice of 000409

However, I would still defend its use here because

  1. they plan on publishing within the next month or so - we could wait until then to submit this officially if you feel strongly about that

  2. these are among the most popular datasets for re-use

  3. they both have much more detailed 'continued' tutorials we can link to if someone likes what they see here

  • @h-mayorquin is also working on a new tutorial for the dataset based on the new files and the preview he showed looks great
  1. there isn't really another ideal replacement for an ecephys+behavior showcase
  • MAP doesn't have
    • raw ecephys data
    • pose estimation
    • stores raw video in a way we really don't want to showcase
    • notebook is pretty decent but doesn't employ as many data streams
  • visual coding (ecephys) could potentially work but
    • lacks pose
    • might make DANDI seem more like it only hosts the visual coding collection

@satra
Copy link
Member

satra commented Feb 11, 2026

@CodyCBakerPhD - i think we should wait then till it's published. also if these are the only two useful datasets for tutorials then something is amiss in our efforts to run a repository.

for example a tutorial could also be about how to replicate the results of a study that was published. finally as a repository instead of using a single dataset, it may also be useful to see how two datasets could be connected, for example similar to sabera's work on creating a timeseries model. this doesn't mean this tutorial isn't useful, but more on the note of how dandi can be used for different purposes, not just how to use dandi.

@CodyCBakerPhD
Copy link
Contributor Author

Thanks for the feedback! I think I forgot to highlight some constraints that were specified regarding the scope of this effort.

The most basic requirement is that it ought to be as short as possible to act as an enticing entrypoint to the rest of the vast ecosystem. To that end, we happily embed links to many other resources, including some of your suggested next steps.

This design was largely based on the tutorial Brock pointed us to for BossDB when they submitted their answer to the Open Data request. As you can see, they focus on a single dataset despite BossDB hosting many others.

Essentially, the prompts given by the Open Data request are to focus on one or a few demonstrations, emphasizing their scientific value, together with the technical possibilities allowed by the S3 bucket.

also if these are the only two useful datasets for tutorials then something is amiss in our efforts to run a repository.

Apologies, I did not mean to imply that no other datasets are useful - merely that these are among the best examples of their respective approaches, so we thought to 'put our best foot forward' here.

Throughout the notebook we tried our best to emphasize that the archive hosts a wide diversity of modalities across hundreds of datasets - if you can think of a better way to de-emphasize the focus, please let us know.

for example a tutorial could also be about how to replicate the results of a study that was published.

I have always believed that an ideal dataset-specific tutorial (on this example notebook repository) would do exactly what you describe in reproducing results!

Unfortunately, I also agree that it truly has yet to be done to the utmost satisfaction...

The extended visual coding tutorial, adapted from some of Saskia's courses, is one of the notebooks that comes close to doing just that. That is why I targeted it for the last two scientific sections and redirected to the full notebook at the end of the section.

image

FWIW for the upcoming BBQS talk + July workshop I am going as deep as possible into reproducing the thesis work of a recent graduate student here at Dartmouth in a 'fully reproducible' style, including all data hosted on DANDI and all data processing/analysis contained in a BIDS study to either be added or linked here, so keep an eye out for that - it sounds like exactly what you want (but will NOT be a 'quick-start' sub 20-minute tutorial, hah)

finally as a repository instead of using a single dataset, it may also be useful to see how two datasets could be connected, for example similar to sabera's work on creating a timeseries model. this doesn't mean this tutorial isn't useful, but more on the note of how dandi can be used for different purposes, not just how to use dandi.

Hmmm... we certainly lack a tutorial on meta-analysis of that style - I know Kailin from Kris Bouchard's group was doing something along those lines in a relatively simple study

Yarik also wanted a tutorial showing how to perform various operations directly on the S3 bucket without needing the DANDI API at all

Would you like me to raise issues here to request and track the development of those ideas?

@satra
Copy link
Member

satra commented Feb 11, 2026

thanks for this detailed response. i think we are aligned in our thinking for now.

Would you like me to raise issues here to request and track the development of those ideas?

that would be great. and it's something we can encourage students to perhaps take on. also may make interesting use-cases for us to evaluate agentic systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants