Skip to content

Uniform labels across dictionary, MarkDown and example CIF files#18

Draft
nicholasfrancia wants to merge 1 commit into
mainfrom
dictionary_file
Draft

Uniform labels across dictionary, MarkDown and example CIF files#18
nicholasfrancia wants to merge 1 commit into
mainfrom
dictionary_file

Conversation

@nicholasfrancia

Copy link
Copy Markdown
Collaborator

We've created a series of (sub)categories of CSP and COMPCHEM to better divide the data fields by application and to assign them the "Loop" or "Set" class. Markdown files and examples have been changed accordingly.
Changes from the last commit can be seen here:
4dcd407

@vaitkus

vaitkus commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@nicholasfrancia I found a few minor issues introduced by the recent recategorisation. Is this PR the intended place to report them?

Also, I have two additional questions:

  1. The dictionary *.dic files are automagically generated from the *.md file (?) therefore I reported issues in human readable way instead of issuing PRs. What about the CIFs in the Example directory? I found a few syntactic mistakes in them that I would like to report and the quickest way of doing that would be just to issue a PR with the corrections. Would that work on your side?
  2. The IUCr has a GitHub action workflow that runs checks on the dictionary files (syntactic, semantic, etc.). I am quite familiar with the system and could created a PR that introduces such checks to this repository as well. Would that be welcome?

@nicholasfrancia

Copy link
Copy Markdown
Collaborator Author

Hi @vaitkus !
In short, yes to all of your questions, that would be very helpful!

For point 1: for general discussions with the CSP developers and community, we initially decided to use a Markdown file (here), structured into sections with new data fields listed as tables. This format was chosen because it is easy to modify and extend with examples or descriptive text.
During the transition to *.dic files, I wrote a Python script that extracts the data field tables from the cspcore.md file (plus two other *.md files specifically for dft and forcefield parameters) and converts them into the cif_compchem.dic and cif_csp.dic files. This script doesn’t apply to the *.cif files in the Examples directory, so those currently need to be edited manually. Therefore, a PR with corrections to those files would definitely work well.

For point 2: That sounds excellent. I’m not very familiar with setting up GitHub Actions myself, so having automated checks ready would be extremely helpful!

@vaitkus

vaitkus commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@nicholasfrancia Thank you for a quick response!

cif_comp.dic:

  1. _theoretical_structure.relative_free_energy save frame appears twice.
  2. The following items as declared as having the List container without specifying the dimension:
  • _forcefield.cross_terms

Even if the list is of unconstrained size, it is still recommended to state this explicitly using
_type.dimension '[]'.

cif_csp:

  1. _csp_structure_generation.data_block_label, _csp_structure_generation.data_block_description.

These definitions provide several examples, separated by a comma (e.g. "ea", "rs").
However, this is not valid CIF syntax, the values should instead be looped, e.g.

loop_
"ea"
"rs"

Same for example values "Evolutionary Algorithm", "Random Search".

  1. CSP_STRUCTURE_GENERATION_SEARCH_SPACE category

All three _category_key.name values are missing leading underscores, e.g. csp_structure_generation_search_space.space_group_number_list -> _csp_structure_generation_search_space.space_group_number_list

  1. CSP_DATA_BLOCK category

There are several data items from this category, but the category itself is not defined.

  1. The following items as declared as having the List container without specifying the dimension:
  • _csp_data_block.additional_files
  • _csp_input.atom_types
  • _csp.structure_generation_space_group_list
  • _csp_structure_generation_stopping_criteria.description
  • _csp_structure_generation_search_space.space_group_list

Even if the list is of unconstrained size, it is still recommended to state this explicitly using
_type.dimension '[]'.

There are also a few minor issue common to both dictionaries (e.g. missing licensing), however, I will open a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants