747: update identifier validation message now that leading colons aren't allowed#795
Conversation
9065ec2 to
3e497c9
Compare
|
Realised there's a |
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages - add test to verify error raised, since no other test seems to check for this error case or message.
- organising error messages
- organising error messages
- organising error messages
- the modified tests pass anyway but would have asserted that each letter in the parameter was in the error e.g. ["M", "i", "s", "s"] - added type check and tests for error/warning contains/not-contains
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages
- organising error messages - reworded error message to be consistent with the underlying XML "Name" rule/token, except that ODK form clients generally don't support having a colon for the first character, and not mentioning non-ascii unicode letters since that's probably too technical. - preceding code applies the additional entity dataset name rules for not allowing names starting with a period or two underscores.
- organising error messages - preceding code applies additional entity save_to name rules for not allowing reserved words or names starting two underscores.
- organising error messages - the replaced validation regex is pretty much the XML name rule and the error message sounds like that was the intent (although the value is put into an attribute like `<value ref="VAL"/>` not a tag).
- organising error messages
- organising error messages - in survey_element.py the code that shows the exact offending character seems to have been added for the purpose of checking the form_name, which corresponds to the Survey.name attribute, which is used as the tag name of the primary instance element. When uploading a XLSForm to an online converter this name defaults to "data", and it is only set to the basename the XLS/X file name when using the CLI, or some other value when using pyxform as a library. This code would not apply to most other names since workbook_to_json (or other) code checks names before survey_element.py sees them. Also the regex INVALID_XFORM_TAG_REGEXP is different to the rule in is_xml_tag. So the per-character check was removed since it seems unlikely to be useful and is inconsistent with other name validation.
- organising error messages
- organising error messages
- organising error messages
- not caught by relevant test since it only requires the substring "must also specify an image".
- it's possible to set the primary instance root node name via the setting "name" (which overrides other ways to set the name), so check the settings code path as well.
51e2de4 to
41ac57b
Compare
|
Force pushed to rebase on master and resolve merge conflicts ( |
Not explicitly, you're right, that's deferred to the underlying specs, in this case going down to the XML spec. I think these two StackOverflow explanations make it clear that colons should not be allowed as leading characters: https://stackoverflow.com/questions/40445735/is-a-colon-a-legal-first-character-in-an-xml-tag-name/40447829 |
lognaturel
left a comment
There was a problem hiding this comment.
Thanks for breaking this down into easy commits -- the overall diff felt impossible to review meaningfully!
Overall having a central error message registry feels like a good idea in this context. I have a few notes inline you may or may not want to act on.
I have some uneasiness about the fact that tests compare against enum references, not the rendered user-facing strings. Things like formatting bugs could ship while all tests still pass and we could fail to notice that an error message became wrong in certain contexts used. Maybe for some really common or critical error messages we could assert on literal strings? I don't have any specific candidates in mind currently but wanted to share this thought in case it gives you ideas.
- format approximately "topic name - description"
Agree it has that potential weakness but each of the tests involves using the same context items, so the risks seem to be a) there is an unused token in the template which makes the result look weird, or b) checking the rendered string with a test case during development was overlooked. Maybe each message could have a "rendering check" test to at least casually confirm what it looks like with realistic context values. Also, the The current approach improves on the previous patterns of a) only checking part of the error/warning string and therefore facing the same rendering problems as above, b) only checking part of the message and matching some other message. By using assigned symbols we can more easily see exactly where an error/warning is used and tested, both for coverage and for not having to manually find and update any whole or partial error/warning messages that may be relevant to current work. One of the next steps could be working on the language of the messages, and so far the format I've been trying to follow is roughly "This is where the problem is. This is why it's a problem. This is what you can do to fix it." Sort of like how a python traceback does context, details, tips in that order. Many messages have been collected in this PR but there are still tons still inline amongst the code, and perhaps they could be reviewed for consistency with Central/Collect/Webforms message phrasing, so maybe that all could be a follow up task? Also could be extended to localisation. |
|
Thanks! |
Closes #747
Why is this the best possible solution? Were any other approaches considered?
The majority of these commits are organising error/warning messages into the
errors.pymodule, which is what I had in mind for theErrorCodepattern which was added to pyxform a few months ago. After this PR about 30 error messages are organised there, but pyxform raises an error at about 170 locations and many of those have distinct messages.The goal of this stage of organisation is to try and identify all the errors/warnings which deal with identifiers, so that the validation approach and message content can be more consistent with each other and with the documentation. Also to take the opportunity to move over error/warning messages that are already using the
errors.Detailclass or seemed easy to move. There are still some name-related errors that have not been moved (e.g.Audits must always be named 'audit.', etc), but they don't specifically deal with identifier grammar. There is also still a name-like regex check for the "app" parameter but that requires an android package name which has different rules to XML names.What are the regression risks?
If library or script usages were looking for specific error strings then these changes would break them. But putting the errors into an enum/object should make it easier to avoid looking for specific strings - as is shown in the tests that use
ErrorCode.Does this change require updates to documentation? If so, please file an issue here and include the link below.
Maybe? #630 says ODK tools don't support having a colon as the first character in identifier names but that doesn't seem to be stated in the ODK XForms Specification. On one hand maybe it is not a hard requirement from a spec perspective, but on the other hand it is functionally required via pyxform validation and not being supported by ODK form clients.
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code