Skip to content

wit-component, wit-parser: preserve doc comments through embed path#2532

Open
Mees-Molenaar wants to merge 4 commits into
bytecodealliance:mainfrom
Mees-Molenaar:preserve-wit-docs-through-embed
Open

wit-component, wit-parser: preserve doc comments through embed path#2532
Mees-Molenaar wants to merge 4 commits into
bytecodealliance:mainfrom
Mees-Molenaar:preserve-wit-docs-through-embed

Conversation

@Mees-Molenaar
Copy link
Copy Markdown

For our use case, we want to store some metadata in comments in the wit file. According to the documentation, this should be available. However, we encountered that the embedding path (which is used by wit-bindgen) did not include the comments. This PR fixes that.

@Mees-Molenaar Mees-Molenaar requested a review from a team as a code owner May 28, 2026 13:06
@Mees-Molenaar Mees-Molenaar requested review from dicej and removed request for a team May 28, 2026 13:06
@alexcrichton alexcrichton requested review from alexcrichton and removed request for dicej May 28, 2026 14:26
@alexcrichton
Copy link
Copy Markdown
Member

Thanks! Given the test failures I think it might make sense to ignore errors when decoding this section rather than propagating them up? That looks to be masking other errors and/or causing minor issues.

@Mees-Molenaar
Copy link
Copy Markdown
Author

Thanks for quick review, I agree with your comment.
I have adjusted it, is this what you had in mind?

I only see that an offset is now not matching because of an offset. Is that something I need to commit, or do I need to change the implementation so that the offset keeps the same?

@alexcrichton
Copy link
Copy Markdown
Member

Oh offsets changing is fine, that's because there's more stuff in the wasm module now the docs). You can fix that by running the test suite with BLESS=1 for that. Also, could you add some comments for why the errors are ignored here?

@Mees-Molenaar Mees-Molenaar force-pushed the preserve-wit-docs-through-embed branch from e3a8083 to db517c1 Compare June 1, 2026 07:03
@Mees-Molenaar
Copy link
Copy Markdown
Author

Added a comment specifying that it is best effort and only drops the documentation when it fails and commited both offset changes.

@Mees-Molenaar
Copy link
Copy Markdown
Author

Mees-Molenaar commented Jun 1, 2026

I did a sanity check with building a wasm component using cargo and the using wasm-tools component wit to show the comments. However, they didnt show up.

Figured out (together with an AI) that the last commit was necessary, however, I am having more trouble understanding the changes now.

Currently I kow that when building a wasm component with cargo this happens:
wit_bindgen::generate!() --calls--> wit_component::metadata::encode(), therefore the changes in metadata.rs were added to add the docs.

Then wasm-component-ld is called to link. This calls wit_component::metadata::decode() so there we inject metadata (in this case docs). And it also calles ComponentEncoder::encode() inside encoding.rs to add the docs sections to the wasm output.

Then finally we use wasm-tools component wit <wasm> to get the docs in a wit file. That calls wit_parser::decoding::decode()( to get the docs.

And the encoding.rs and decoding.rs both use metadata.rs under the hood, so I guess the changes are correct and necessary for this use case.

Any thoughts on this?

@alexcrichton
Copy link
Copy Markdown
Member

Ah ok yeah I think I see what's going on here, thanks for digging in! This reveals a few things to me and thoughts:

  • One is that the documentation section right now is centered around the documentation for just one package itself and nothing else. This technically isn't appropriate when encoding the documentation for a bindgen-generated world because the the world might refer to other packages. To retain the full set of documentation it'd have to go and pull in docs from other packages as well. For example the wasi:http/service world refers to wasi:random/imports, but won't include documentation for wasi:random interfaces. This technically means that the encoding in crates/wit-component/src/metadata.rs needs to actually find all reachable packages and encode documentation for all of them.
  • As you've found, now that there's the possibility of multiple packages being documented in the same component, the documentation blob needs some sort of identifier to indicate what it's documenting. This rises from either bindings generation for mulitple worlds being union'd together or the case I'm talking about above where documentation for one world may require multiple packages.
  • The error-ing paths in crates/wit-parser/src/metadata.rs, and how this PR starts ignoring some errors, I think all need adjusting now. Due to the pruning behavior of wit-component it's natural to have missing items. For example I think all errors about missing worlds/items/interfaces/functions/etc should all basically go away and be "skip injecting this documentation". That way if the one function is missing it doesn't mean that documentation is skipped for the next function if it's present.

Overall this is unfortunately turning a bit more gnarly than intended. Given all of this there's a few discrete items that need to happen, much of which this PR already covers but I think we'll want to formalize it a bit more:

  • The custom section with docs should include the package name. Omitting it will now be considered "legacy". If the package name is present that should unconditionally be used, not the argument to the inject function. I think the PackageId argument to inject should become optional and it's only present on some paths when the single package is unequivocally known.
  • Errors about injecting docs need to be redesigned. More-or-less they should largely go away which won't require ignoring errors throughout the codebase here. Missing items effectively are no longer fatal, but decoding errors (like json syntax) would still be fatal.
  • As this PR implements, decoding docs now needs to possibly decode/inject/handle multiple sections.
  • When encoding documentation for a world it needs to handle all packages that the world references, not just the package with the world itself. This means that extracting the documentation might extract multiple separate sections.

One possibly unfortunate wrench to throw into all of this is compatibility with other versions of encoding documentation. The struct has deny_unknown_fields, for example, so it's not necessarily going to be easy to add the package name in there. That'll require a bit of a phased transition while newer versions of wasm-tools percolate throughout the ecosystem. For example encoding wit-as-wasm should still leave off the package name (for compat with existing use cases), and only encoding docs in components (or for worlds) would start including the package name. Later the package name will get unconditionally emitted, but if that were done now then it would break compat with previous wasm-tools versions.


I realize that I suspect you didn't think this was going to become such a broad feature, so if you'd prefer to not take this on that's totally understandable. I feel like this is at the very least a good exploration and this can all be recorded in an issue if needed. Would you be interested in still pursuing this though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants