Support for detached crates#250
Conversation
|
@simleo just letting you know I've seen this and I'm looking at it today |
elichad
left a comment
There was a problem hiding this comment.
Thanks @simleo, I made a few comments. Looks like this will resolve #231 and #232.
This review also got me thinking: do you think it would be useful to implement specific helper functions for converting between attached & detached crates, which follow the processes described in the spec appendix:
Not suggesting we do that in this PR (it's a bunch more work), but could make another issue for it if you think it would be a good idea.
| def write(self, base_path): | ||
| out_file_path = Path(base_path) / unquote(self.id) | ||
| local_path = self.get("localPath") | ||
| out_file_path = Path(base_path) / unquote(local_path or self.id) |
There was a problem hiding this comment.
If self.id is an absolute URI and localPath is not set, then the out_file_path becomes convoluted, for example:
/tmp/crate/https:/raw.githubusercontent.com/ResearchObject/ro-crate-py/master/test/test-data/sample_file.txt
It'd be good to error/warn here if self.id is an absolute URI and fetch_remote is true, and remind the user to set localPath. I would favour an error, but don't know if there are use cases where this would be the desired behavior?
Alternatively, could implement some default behavior where if fetch_remote is true and self.id begins with the RDE @id as a base, then that base can be stripped off and the remaining relative path can be used as the local path. This is the kind of behaviour I would expect.
There was a problem hiding this comment.
Though checking the Converting from Detached to Attached RO-Crate Package section in the spec made me think about how it does get a bit more complex when nested Datasets/Files are involved - need to consider the case where a Dataset has localPath but the Files in its hasPart don't - and probably make the Files follow the Dataset's localPath.
There was a problem hiding this comment.
Alternatively, could implement some default behavior where if
fetch_remoteis true andself.idbegins with the RDE@idas a base, then that base can be stripped off and the remaining relative path can be used as the local path.
Done in 8c16c88
There was a problem hiding this comment.
it does get a bit more complex when nested Datasets/Files are involved - need to consider the case where a Dataset has
localPathbut the Files in itshasPartdon't - and probably make the Files follow the Dataset's localPath
Done in commits deda277 to fc61747, which also include fixes for the way remote Datasets are handled in general.
| rcrate = ROCrate(out_path) | ||
| assert rcrate.get(f"{base_uri}sample_file.txt") | ||
| assert rcrate.get(f"{base_uri}test_file_galaxy.txt") | ||
| # now set fetch_remote to True |
There was a problem hiding this comment.
same as above - this test is quite long which makes it harder to follow - perhaps the part from here onward could be split into a different test
| rp = crate.get(orcid) | ||
| assert rp["name"] == name | ||
|
|
||
| detached_md_path = tmpdir / "example-ro-crate-metadata.json" |
There was a problem hiding this comment.
This section could be a separate test
| with pytest.raises(ValueError): | ||
| ROCrate(root_dataset_id="foo/bar") | ||
| with pytest.raises(ValueError): | ||
| ROCrate(root_dataset_id="/foo/bar") |
There was a problem hiding this comment.
not clear what these are testing - that you can't make a crate with a relative path as the RDE id? Would be clearer as a separate test since these lines don't seem to relate to the stuff above
Co-authored-by: Eli Chadwick <eli.chadwick@manchester.ac.uk>
I think all comments have been addressed at this point. I'm going to merge, then open a new issue for conversion functions. The main difference with respect to using |
Closes #231.
Closes #232.
Adds several new features to support detached crates:
ROCrateinitializer now accepts aroot_dataset_idargument to specify an arbitrary absolute URIfile://) URLNote I didn't add any
is_detachedor similar property: the distinction between attached and detached is quite subtle, and being one or the other depends on the current state of the crate in memory. For instance, if you add a data entity with a relative URI to a crate that had only web-based data entities, it won't be detached anymore.