Skip to content

Adds module support of includes for Rust code generation#8563

Open
csmulhern wants to merge 5 commits intogoogle:masterfrom
csmulhern:master
Open

Adds module support of includes for Rust code generation#8563
csmulhern wants to merge 5 commits intogoogle:masterfrom
csmulhern:master

Conversation

@csmulhern
Copy link
Contributor

@csmulhern csmulhern commented Mar 22, 2025

Adds a new option syntax to flatbuffers. The first usage allows setting the a rust_module for the schema file, to support mapping included file paths to existing modules that have been compiled for those schemas.

For more background and discussion on this approach, see: https://docs.google.com/document/d/1E1eiGlZ0DHMw5a5PvLtVffdfXHz_3Mp0_W4HAjQRPM0/edit.

This is useful for module based languages (e.g. Swift, Rust), as it allows schemas to be compiled into different units (modules) while still referencing one another.

For more background on what motivated this change, see the discussion in #8273.

Fixes #8273.

@github-actions github-actions bot added c++ codegen Involving generating code from schema python rust labels Mar 22, 2025
@csmulhern csmulhern force-pushed the master branch 2 times, most recently from b9695b0 to e4bfc82 Compare March 22, 2025 04:54
@csmulhern csmulhern force-pushed the master branch 2 times, most recently from 00f566c to b6974c0 Compare March 22, 2025 21:01
@csmulhern
Copy link
Contributor Author

Hey @dbaileychess @aardappel, any guidance on how I can get this reviewed?

@aardappel
Copy link
Collaborator

@CasperN can you have a look?

@csmulhern
Copy link
Contributor Author

@dbaileychess is there anyone who can look at this?

@csmulhern csmulhern force-pushed the master branch 2 times, most recently from c2b50bc to a2fc889 Compare December 5, 2025 16:02
@csmulhern
Copy link
Contributor Author

@aardappel I've tried to make this more reviewable, if you're up for some slightly more complex Rust changes.

The first commit is simply adding a new --module-mapping argument to flatc, that allows users to map .fbs include paths to "module names".

The second commit updates the Rust code generator to use these mappings when resolving types. The only change in code generation here is changing how types are referenced in the generated code. All existing code generation that doesn't use the --module-mapping flag is unchanged.

For example, imagine something like:

# schema/foo.fbs

table Foo {
  value: string;
}
# schema/bar.fbs

table Bar {
  value: Foo;
}

Normally, the generated code for Bar would just reference type Foo, including any namespacing from the schema file. With module mapping, you can say --module-mapping schema/foo.bs=foo_module when generating code for schema/bar.fbs, and then the generated code for Bar will reference Foo as ::foo_module::Foo instead. This allows generated schema code to reference other generated schema code, even if the generated code is compiled across multiple modules. I have some detailed examples and discussion of this approach in #8273, including a discussion of how this is handled for module based languages in the Protobuf ecosystem.

The third commit adds a Rust test for this new functionality.

@aardappel
Copy link
Collaborator

Not being familiar with the module requirements of these languages, I don't follow why this is necessary. I'd assume Foo would sit in a namespace foo_module, and that bar.fbs includes foo.fbs, and that then this all just works as intended without command-line arguments. Where does this break, exactly?

// Contains a mapping from schema names to module names.
struct ModuleMap {
// The type used to represent a schema name.
using Schema = std::string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I could agree that distinguishing different kinds of strings is a good thing, this is not a style we use in most of the code afaik, so it being used in only few spots is not helping readability: a reader may see Schema and expect it to be a complex type and be rather surprised it is actually just a std::string. This to me seems the kind of coding style that only works well if applied consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and I am happy to remove this. I'm still not super familiar with the flatbuffers codebase, and was just emulating the style I saw here:

struct ProtobufToFbsIdMap {
.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I thought it must have crept in somewhere :)

@csmulhern
Copy link
Contributor Author

csmulhern commented Dec 5, 2025

Not being familiar with the module requirements of these languages, I don't follow why this is necessary. I'd assume Foo would sit in a namespace foo_module, and that bar.fbs includes foo.fbs, and that then this all just works as intended without command-line arguments. Where does this break, exactly?

The problem is that the module that code is a part of is an essential part of the namespace, independent of the file path, and independent of the namespace within a module. For example, with type foo::bar::baz::Qux...

In C++ this is all self contained:

namespace foo {
namespace bar {
namespace baz {
  struct Qux { ... };
}  // namespace baz
}  // namespace bar
}  // namespace foo

// In any compilation unit, you can reference foo::bar::baz::Qux

But in Rust (and e.g. Swift), the module (or crate in Rust parlance) is the compilation unit, containing many sources.

# crate = foo

mod bar {
  mod baz {
    struct Qux { ... }
  }
}

// Inside this crate, you can refer to Qux as crate::bar::baz::Qux, where crate
// is a special keyword that resolves to the name of the crate this source file
// belongs to. Inside another crate, e.g. waldo, the path to refer to Qux is
// <crate_name>::path::to::Qux, or in this case, foo::bar::baz::Qux.

If you just typed bar::baz::Qux, the compiler would be looking for baz::Qux inside a crate named bar. Basically, all types are namespaced into the crate that defines that type. The problem is how to convey the top level namespace of that crate (or in more generic terms, I'm using the word module). It's not part of the flatbuffer schema. It's not part of the package declaration inside the schema. It's not part of the file path to import the schema. It's an artifact of how you choose to chunk your source files into modules.

In C++, there's no impact to splitting code into many libraries. E.g. I'm sure you're familiar with the common pattern at Google of using one cc_library target per source file. Every file can modify the global namespace. In Rust and Swift, unfortunately, that boundary is implicitly a namespace. So if you try to split flatbuffer source with dependencies that span across modules, you have a problem.

E.g. imagine the following where you take my example from the original post (a type Bar depends on a type Foo from another schema), and split it into two compilation units like so:

fbs_library(
  name = "foo",
  srcs = ["foo.fbs"],
)

rust_fbs_library(
  name = "foo_rust",
  deps = [":foo"]
)

fbs_library(
  name = "bar",
  srcs = ["bar.fbs"],
  deps = [":foo"],
)

rust_fbs_library(
  name = "bar_rust",
  deps = [":bar"],
  rust_deps = [":foo_rust"],
)

The only way to refer to Foo from Bar is as foo_rust::Foo.

The approach taken here is to use --module-mapping PATH=MODULE to tell flatc that types inside the schema at PATH are inside the top level namespace MODULE.

To be very concrete, you can reference the test I added in the third commit.

tests/rust_module_test/module_b.fbs defines a schema that depends on the type Foo from tests/rust_module_test/module_a.fbs.

The dependency between the Rust code is managed through the build tool Cargo, using this file: https://github.com/csmulhern/flatbuffers/blob/a2fc889a4c4a6b4f9cd7532e49574429484e5f34/tests/rust_module_test/b/Cargo.toml.

Then within the Rust code for module b, we refer to Foo using the module name: https://github.com/csmulhern/flatbuffers/blob/a2fc889a4c4a6b4f9cd7532e49574429484e5f34/tests/rust_module_test/b/module_b_generated.rs#L56.

I really appreciate you taking the time to have a look at this. Let me know if there's any other details I can add color to.

@aardappel
Copy link
Collaborator

Thanks so much for educating me :)

So I am learning that the design of FlatBuffers schemas is biased towards languages where namespaces and files are orthogonal concerns.. not surprising it started with C++.

Also, theoretically it could be made to work directly with Rust, but only if:

  • Empty namespaces are forbidden.
  • Revisiting/reopening namespaces in multiple schema files are forbidden.
  • The code generators use the top level namespace as the crate name.
  • You are supposed to output all schemas/crates in a single dir so they are parallel to eachother.

And I guess that is too much of a restriction given the amount of Rust FlatBuffers already in the wild, hence we need a workaround like this module mapping?

I guess the only question remaining is if this module mapping were to be better specified in a schema, if it can be equal for all module languages? Any reason command-line is preferrable?

mapping schema/foo.bs = foo_module;

@csmulhern
Copy link
Contributor Author

Thanks for the engagement. I agree with your breakdown. I think this point is the most problematic:

You are supposed to output all schemas/crates in a single dir so they are parallel to eachother.

I think this limitation isn't super great either:

The code generators use the top level namespace as the crate name.

I spent a bit of time writing up a survey of what the solution space might look like here. See: https://docs.google.com/document/d/1E1eiGlZ0DHMw5a5PvLtVffdfXHz_3Mp0_W4HAjQRPM0/edit?tab=t.0

After going through that exercise, I think I agree with you that putting this in the schema file itself might be the better solution. If we can align on the best approach here, I'm happy to update this PR to be in line with that.

I think the open questions would be:

  1. What syntax do we want to use to specify our custom schema data?

My feeling here is that something like "option" style syntax (see my doc) may be more extensible to other future use cases, rather than a "module" keyword.

  1. How generic vs specific do we want to be in our approach? Should we try to be generic over "module-based languages", or just go language-specific, and perhaps accept some redundancy.

My feeling here is that language-specific is probably the least likely to run into future issues where we've painted ourselves into a corner.

@aardappel
Copy link
Collaborator

Thanks for spelling it out in the doc. My sense is that needing a whole string of command-line args for code to generate correctly is not great, though I see that this in the most flexible.

In schema seems pretty good to me, and I guess it be fine to do it per language. We don't already have such a syntax, so you'll have to invent one, and your option seems good.

You could implement this in a very generic way where option foo = "bar" and --foo="bar" are both possible, and the latter overrides the former if present.

This syntax currently only supports setting the Rust module associated with a schema file.
@github-actions github-actions bot added the java label Mar 7, 2026
@csmulhern csmulhern changed the title Adds module mapping of includes for Rust code generation Adds module support of includes for Rust code generation Mar 7, 2026
@csmulhern
Copy link
Contributor Author

I've updated this PR to implement the option syntax we've discussed.

cc @jtdavis777 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ codegen Involving generating code from schema java python rust

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rust] Add better support for "crate-per-schema"

2 participants