747: update identifier validation message now that leading colons aren't allowed by lindsay-stevens · Pull Request #795 · XLSForm/pyxform

lindsay-stevens · 2025-12-16T15:23:17Z

Closes #747

Why is this the best possible solution? Were any other approaches considered?

The majority of these commits are organising error/warning messages into the errors.py module, which is what I had in mind for the ErrorCode pattern which was added to pyxform a few months ago. After this PR about 30 error messages are organised there, but pyxform raises an error at about 170 locations and many of those have distinct messages.

The goal of this stage of organisation is to try and identify all the errors/warnings which deal with identifiers, so that the validation approach and message content can be more consistent with each other and with the documentation. Also to take the opportunity to move over error/warning messages that are already using the errors.Detail class or seemed easy to move. There are still some name-related errors that have not been moved (e.g. Audits must always be named 'audit.', etc), but they don't specifically deal with identifier grammar. There is also still a name-like regex check for the "app" parameter but that requires an android package name which has different rules to XML names.

What are the regression risks?

If library or script usages were looking for specific error strings then these changes would break them. But putting the errors into an enum/object should make it easier to avoid looking for specific strings - as is shown in the tests that use ErrorCode.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Maybe? #630 says ODK tools don't support having a colon as the first character in identifier names but that doesn't seem to be stated in the ODK XForms Specification. On one hand maybe it is not a hard requirement from a spec perspective, but on the other hand it is functionally required via pyxform validation and not being supported by ODK form clients.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run python -m unittest and verified all tests pass
run ruff format pyxform tests and ruff check pyxform tests to lint code
verified that any code or assets from external sources are properly credited in comments

lindsay-stevens · 2025-12-19T06:05:58Z

Realised there's a name setting so added check/tests for that. It's previously untested, only documented on xlsform.org, and kind of sneaks in there in via json_dict.update(settings) on xls2json.py L352.

- organising error messages

- organising error messages - add test to verify error raised, since no other test seems to check for this error case or message.

- organising error messages

- the modified tests pass anyway but would have asserted that each letter in the parameter was in the error e.g. ["M", "i", "s", "s"] - added type check and tests for error/warning contains/not-contains

- organising error messages

- organising error messages - reworded error message to be consistent with the underlying XML "Name" rule/token, except that ODK form clients generally don't support having a colon for the first character, and not mentioning non-ascii unicode letters since that's probably too technical. - preceding code applies the additional entity dataset name rules for not allowing names starting with a period or two underscores.

- organising error messages - preceding code applies additional entity save_to name rules for not allowing reserved words or names starting two underscores.

- organising error messages - the replaced validation regex is pretty much the XML name rule and the error message sounds like that was the intent (although the value is put into an attribute like `<value ref="VAL"/>` not a tag).

- organising error messages

- organising error messages - in survey_element.py the code that shows the exact offending character seems to have been added for the purpose of checking the form_name, which corresponds to the Survey.name attribute, which is used as the tag name of the primary instance element. When uploading a XLSForm to an online converter this name defaults to "data", and it is only set to the basename the XLS/X file name when using the CLI, or some other value when using pyxform as a library. This code would not apply to most other names since workbook_to_json (or other) code checks names before survey_element.py sees them. Also the regex INVALID_XFORM_TAG_REGEXP is different to the rule in is_xml_tag. So the per-character check was removed since it seems unlikely to be useful and is inconsistent with other name validation.

- organising error messages

- not caught by relevant test since it only requires the substring "must also specify an image".

- it's possible to set the primary instance root node name via the setting "name" (which overrides other ways to set the name), so check the settings code path as well.

lindsay-stevens · 2026-01-12T12:48:37Z

Force pushed to rebase on master and resolve merge conflicts (errors.py and test_choices_sheet.py).

lognaturel · 2026-01-29T04:49:25Z

that doesn't seem to be stated in the ODK XForms Specification

Not explicitly, you're right, that's deferred to the underlying specs, in this case going down to the XML spec. I think these two StackOverflow explanations make it clear that colons should not be allowed as leading characters: https://stackoverflow.com/questions/40445735/is-a-colon-a-legal-first-character-in-an-xml-tag-name/40447829

lognaturel

Thanks for breaking this down into easy commits -- the overall diff felt impossible to review meaningfully!

Overall having a central error message registry feels like a good idea in this context. I have a few notes inline you may or may not want to act on.

I have some uneasiness about the fact that tests compare against enum references, not the rendered user-facing strings. Things like formatting bugs could ship while all tests still pass and we could fail to notice that an error message became wrong in certain contexts used. Maybe for some really common or critical error messages we could assert on literal strings? I don't have any specific candidates in mind currently but wanted to share this thought in case it gives you ideas.

pyxform/errors.py

tests/test_choices_sheet.py

pyxform/survey_element.py

- format approximately "topic name - description"

lindsay-stevens · 2026-01-29T17:23:34Z

I have some uneasiness about the fact that tests compare against enum references, not the rendered user-facing strings.

Agree it has that potential weakness but each of the tests involves using the same context items, so the risks seem to be a) there is an unused token in the template which makes the result look weird, or b) checking the rendered string with a test case during development was overlooked. Maybe each message could have a "rendering check" test to at least casually confirm what it looks like with realistic context values. Also, the _ErrorFormatter will put "unknown" in place of any missing token values so for problem a) the user sees a somewhat cryptic but not totally broken looking message or a runtime error.

The current approach improves on the previous patterns of a) only checking part of the error/warning string and therefore facing the same rendering problems as above, b) only checking part of the message and matching some other message. By using assigned symbols we can more easily see exactly where an error/warning is used and tested, both for coverage and for not having to manually find and update any whole or partial error/warning messages that may be relevant to current work.

One of the next steps could be working on the language of the messages, and so far the format I've been trying to follow is roughly "This is where the problem is. This is why it's a problem. This is what you can do to fix it." Sort of like how a python traceback does context, details, tips in that order. Many messages have been collected in this PR but there are still tons still inline amongst the code, and perhaps they could be reviewed for consistency with Central/Collect/Webforms message phrasing, so maybe that all could be a follow up task? Also could be extended to localisation.

lindsay-stevens · 2026-01-29T19:29:15Z

Thanks!

lindsay-stevens force-pushed the pyxform-747 branch from 9065ec2 to 3e497c9 Compare December 17, 2025 15:59

lindsay-stevens marked this pull request as ready for review December 17, 2025 16:11

lindsay-stevens requested a review from lognaturel December 17, 2025 16:11

lindsay-stevens added 26 commits January 12, 2026 23:26

chg: move NAMES001 error into ErrorCode enum

86e7479

- organising error messages

chg: move NAMES002 error into ErrorCode enum

c443766

- organising error messages

chg: move NAMES003 error into ErrorCode enum

b8c5265

- organising error messages

chg: move NAMES004 error into ErrorCode enum

15b2283

- organising error messages

chg: move NAMES005 error into ErrorCode enum

c33241e

- organising error messages

chg: move choices.INVALID_NAME error into ErrorCode enum

c0312b1

- organising error messages - add test to verify error raised, since no other test seems to check for this error case or message.

chg: move choices.INVALID_LABEL error into ErrorCode enum

7342b89

- organising error messages

chg: move choices.INVALID_DUPLICATE error into ErrorCode enum

dbeb2c1

- organising error messages

chg: move sheet_headers.INVALID_HEADER error into ErrorCode enum

0fa7cd9

- organising error messages

fix: tests providing single strings to error__contains

555a511

- the modified tests pass anyway but would have asserted that each letter in the parameter was in the error e.g. ["M", "i", "s", "s"] - added type check and tests for error/warning contains/not-contains

chg: move sheet_headers.INVALID_DUPLICATE error into ErrorCode enum

76c1bc9

- organising error messages

chg: move sheet_headers.INVALID_MISSING_REQUIRED error into ErrorCode

f9a7c74

- organising error messages

chg: move choices.INVALID_HEADER error into ErrorCode enum

318ff0b

- organising error messages

chg: sort ErrorCode enum

f37d857

chg: move xls2json.SURVEY_001 error into ErrorCode enum

abc6d11

- organising error messages

chg: move xls2json.SURVEY_002 error into ErrorCode enum

c2969e3

- organising error messages

chg: move entities_parsing.ENTITY_001 error into ErrorCode enum

cd26763

- organising error messages

chg: move entities_parsing.ENTITY_002 error into ErrorCode enum

10e2dff

- organising error messages

chg: move entities_parsing.ENTITY_003 error into ErrorCode enum

cbe3a0d

- organising error messages

chg: move entities_parsing.ENTITY_004 error into ErrorCode enum

d498e59

- organising error messages

chg: move entities_parsing.ENTITY_005 error into ErrorCode enum

751a838

- organising error messages

chg: move entities_parsing.ENTITY_006 error into ErrorCode enum

6d18dc5

- organising error messages

chg: move entities_parsing.ENTITY_007 error into ErrorCode enum

3d3fb24

- organising error messages

chg: replace entity save_to name error with same ErrorCode.NAMES_008

1ab9d88

- organising error messages - preceding code applies additional entity save_to name rules for not allowing reserved words or names starting two underscores.

lindsay-stevens added 7 commits January 12, 2026 23:38

chg: replace workbook_to_json name error with same ErrorCode.NAMES_008

0069f41

- organising error messages

chg: move entity name underscores error ErrorCode enum

a06d685

- organising error messages

chg: move entity name period error ErrorCode enum

06607da

- organising error messages

chg: move entity name save_to reserved words error ErrorCode enum

83ca653

- organising error messages

fix: missing f-string prefix on big-image error message

d42a43c

- not caught by relevant test since it only requires the substring "must also specify an image".

add: check the "name" setting as well as "form_name"

41ac57b

- it's possible to set the primary instance root node name via the setting "name" (which overrides other ways to set the name), so check the settings code path as well.

lindsay-stevens force-pushed the pyxform-747 branch from 51e2de4 to 41ac57b Compare January 12, 2026 12:42

lognaturel reviewed Jan 29, 2026

View reviewed changes

lindsay-stevens added 3 commits January 30, 2026 03:54

add: comment explaining recently added test for choice name validation

c53b8c1

add: docstrings to explain intended usage of ErrorCode and Detail

187b35d

chg: make format and casing of errorcode names more consistent

05f4fae

- format approximately "topic name - description"

lindsay-stevens requested a review from lognaturel January 29, 2026 17:29

lognaturel approved these changes Jan 29, 2026

View reviewed changes

lognaturel merged commit 122548c into XLSForm:master Jan 29, 2026
14 checks passed

lindsay-stevens deleted the pyxform-747 branch January 29, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

747: update identifier validation message now that leading colons aren't allowed#795

747: update identifier validation message now that leading colons aren't allowed#795
lognaturel merged 36 commits intoXLSForm:masterfrom
lindsay-stevens:pyxform-747

lindsay-stevens commented Dec 16, 2025 •

edited

Loading

Uh oh!

lindsay-stevens commented Dec 19, 2025

Uh oh!

lindsay-stevens commented Jan 12, 2026

Uh oh!

lognaturel commented Jan 29, 2026

Uh oh!

lognaturel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lindsay-stevens commented Jan 29, 2026

Uh oh!

Uh oh!

lindsay-stevens commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lindsay-stevens commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

Uh oh!

lindsay-stevens commented Dec 19, 2025

Uh oh!

lindsay-stevens commented Jan 12, 2026

Uh oh!

lognaturel commented Jan 29, 2026

Uh oh!

lognaturel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lindsay-stevens commented Jan 29, 2026

Uh oh!

Uh oh!

lindsay-stevens commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lindsay-stevens commented Dec 16, 2025 •

edited

Loading