Draft
Conversation
changelog: - previously, unique sample IDs were being assigned to samples with the same metadata parameters due to improper grouping of duplicate samples. The code has been updated to correct sample ID assignment to require uniqueness.
changelog: - add bmdrc code to main build pipeline via `fitCurveFiles` function. validated to work locally
changelog: - style: reformat and lint code using ruff - refactor: change output filename format - refactor: file cleanup-- e.g. move large list/dict params to separate params file to make code easier to follow. - style: add some documentation - feat: add new CLI args for specifying filename format
changelog: - Manifest handler object for interfacing with and downloading files from the manifest now exists. - Schema parser to pull columns and slots from schema classes.
changelog: - fixed issue with sample IDs not correctly populating (i.e. 1004 used instead of 1004-1) - enable FSES files from figshare to also be handled and loaded correctly
changelog: - feat: `src.data.load_figshare_url` now directly loads data from a Figshare URL, accessing the `FigshareDataLoader` object, for convenience. Utility function `src.figshare_url_to_id` extracts figshare IDs from URLs as a helper function. - feat: Now requires dotenv to load environmental variables (for storing secrets/access tokens locally)
changelog: - refactor: use `src.data.figshare_url_to_id` utility function to load data from figshare. Flexibly modify file retrieval to fully pull from manifest and load from figshare (this will align the code with planned future development to fully only pull files from figshare) - refactor: minor changes, such as renaming variables (`build_script.runSampMap`) to be more human-readable - refactor: use dotenv to load variables (for local execution of pipeline) - fix: enforce "NULL" value for any `NaN` values in tables. fixes #118 - refactor: use token/variable instead of exposing comptox API key - docs: added missing docstring
changelog: - feat: add schema checks and update related functions to correctly parse class name from file name types with and without underscores. - refactor: rename `res` variable -> `result` for clearer code
Member
|
I converted PR to draft status, just to keep track :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: IN PROGRESS
Ideally, I would like to update the build manifest to fully use figshare files, but not all github data has been uploaded to figshare yet. However, if we want to merge changes and open a separate branch to update the manifest, that's fine too.
Summary
Fixes duplication of
Sample_IDand updates pipeline to fully produce zebrafish benchmark dose curve response files using new data files. Includes several QOL improvements to the code.Changelog
Sample_IDs from non-unique samples (i.e. duplicates)is_aconstruct as a template for Dose, Fits, and BMD types.Issues