Adds module support of includes for Rust code generation by csmulhern · Pull Request #8563 · google/flatbuffers

csmulhern · 2025-03-22T04:51:39Z

Adds a new option syntax to flatbuffers. The first usage allows setting the a rust_module for the schema file, to support mapping included file paths to existing modules that have been compiled for those schemas.

For more background and discussion on this approach, see: https://docs.google.com/document/d/1E1eiGlZ0DHMw5a5PvLtVffdfXHz_3Mp0_W4HAjQRPM0/edit.

This is useful for module based languages (e.g. Swift, Rust), as it allows schemas to be compiled into different units (modules) while still referencing one another.

For more background on what motivated this change, see the discussion in #8273.

Fixes #8273.

csmulhern · 2025-06-20T00:54:18Z

Hey @dbaileychess @aardappel, any guidance on how I can get this reviewed?

aardappel · 2025-06-22T05:35:44Z

@CasperN can you have a look?

csmulhern · 2025-10-09T22:29:06Z

@dbaileychess is there anyone who can look at this?

csmulhern · 2025-12-05T16:51:46Z

@aardappel I've tried to make this more reviewable, if you're up for some slightly more complex Rust changes.

The first commit is simply adding a new --module-mapping argument to flatc, that allows users to map .fbs include paths to "module names".

The second commit updates the Rust code generator to use these mappings when resolving types. The only change in code generation here is changing how types are referenced in the generated code. All existing code generation that doesn't use the --module-mapping flag is unchanged.

For example, imagine something like:

# schema/foo.fbs

table Foo {
  value: string;
}

# schema/bar.fbs

table Bar {
  value: Foo;
}

Normally, the generated code for Bar would just reference type Foo, including any namespacing from the schema file. With module mapping, you can say --module-mapping schema/foo.bs=foo_module when generating code for schema/bar.fbs, and then the generated code for Bar will reference Foo as ::foo_module::Foo instead. This allows generated schema code to reference other generated schema code, even if the generated code is compiled across multiple modules. I have some detailed examples and discussion of this approach in #8273, including a discussion of how this is handled for module based languages in the Protobuf ecosystem.

The third commit adds a Rust test for this new functionality.

aardappel · 2025-12-05T19:07:44Z

Not being familiar with the module requirements of these languages, I don't follow why this is necessary. I'd assume Foo would sit in a namespace foo_module, and that bar.fbs includes foo.fbs, and that then this all just works as intended without command-line arguments. Where does this break, exactly?

aardappel · 2025-12-05T19:12:01Z

include/flatbuffers/idl.h

+// Contains a mapping from schema names to module names.
+struct ModuleMap {
+  // The type used to represent a schema name.
+  using Schema = std::string;


As much as I could agree that distinguishing different kinds of strings is a good thing, this is not a style we use in most of the code afaik, so it being used in only few spots is not helping readability: a reader may see Schema and expect it to be a complex type and be rather surprised it is actually just a std::string. This to me seems the kind of coding style that only works well if applied consistently.

Agreed, and I am happy to remove this. I'm still not super familiar with the flatbuffers codebase, and was just emulating the style I saw here:

flatbuffers/src/idl_gen_fbs.cpp

Line 192 in cfce38e

struct ProtobufToFbsIdMap {

.

Ah yes, I thought it must have crept in somewhere :)

csmulhern · 2025-12-05T21:50:54Z

Not being familiar with the module requirements of these languages, I don't follow why this is necessary. I'd assume Foo would sit in a namespace foo_module, and that bar.fbs includes foo.fbs, and that then this all just works as intended without command-line arguments. Where does this break, exactly?

The problem is that the module that code is a part of is an essential part of the namespace, independent of the file path, and independent of the namespace within a module. For example, with type foo::bar::baz::Qux...

In C++ this is all self contained:

namespace foo {
namespace bar {
namespace baz {
  struct Qux { ... };
}  // namespace baz
}  // namespace bar
}  // namespace foo

// In any compilation unit, you can reference foo::bar::baz::Qux

But in Rust (and e.g. Swift), the module (or crate in Rust parlance) is the compilation unit, containing many sources.

# crate = foo

mod bar {
  mod baz {
    struct Qux { ... }
  }
}

// Inside this crate, you can refer to Qux as crate::bar::baz::Qux, where crate
// is a special keyword that resolves to the name of the crate this source file
// belongs to. Inside another crate, e.g. waldo, the path to refer to Qux is
// <crate_name>::path::to::Qux, or in this case, foo::bar::baz::Qux.

If you just typed bar::baz::Qux, the compiler would be looking for baz::Qux inside a crate named bar. Basically, all types are namespaced into the crate that defines that type. The problem is how to convey the top level namespace of that crate (or in more generic terms, I'm using the word module). It's not part of the flatbuffer schema. It's not part of the package declaration inside the schema. It's not part of the file path to import the schema. It's an artifact of how you choose to chunk your source files into modules.

In C++, there's no impact to splitting code into many libraries. E.g. I'm sure you're familiar with the common pattern at Google of using one cc_library target per source file. Every file can modify the global namespace. In Rust and Swift, unfortunately, that boundary is implicitly a namespace. So if you try to split flatbuffer source with dependencies that span across modules, you have a problem.

E.g. imagine the following where you take my example from the original post (a type Bar depends on a type Foo from another schema), and split it into two compilation units like so:

fbs_library(
  name = "foo",
  srcs = ["foo.fbs"],
)

rust_fbs_library(
  name = "foo_rust",
  deps = [":foo"]
)

fbs_library(
  name = "bar",
  srcs = ["bar.fbs"],
  deps = [":foo"],
)

rust_fbs_library(
  name = "bar_rust",
  deps = [":bar"],
  rust_deps = [":foo_rust"],
)

The only way to refer to Foo from Bar is as foo_rust::Foo.

The approach taken here is to use --module-mapping PATH=MODULE to tell flatc that types inside the schema at PATH are inside the top level namespace MODULE.

To be very concrete, you can reference the test I added in the third commit.

tests/rust_module_test/module_b.fbs defines a schema that depends on the type Foo from tests/rust_module_test/module_a.fbs.

The dependency between the Rust code is managed through the build tool Cargo, using this file: https://github.com/csmulhern/flatbuffers/blob/a2fc889a4c4a6b4f9cd7532e49574429484e5f34/tests/rust_module_test/b/Cargo.toml.

Then within the Rust code for module b, we refer to Foo using the module name: https://github.com/csmulhern/flatbuffers/blob/a2fc889a4c4a6b4f9cd7532e49574429484e5f34/tests/rust_module_test/b/module_b_generated.rs#L56.

I really appreciate you taking the time to have a look at this. Let me know if there's any other details I can add color to.

aardappel · 2025-12-07T18:37:52Z

Thanks so much for educating me :)

So I am learning that the design of FlatBuffers schemas is biased towards languages where namespaces and files are orthogonal concerns.. not surprising it started with C++.

Also, theoretically it could be made to work directly with Rust, but only if:

Empty namespaces are forbidden.
Revisiting/reopening namespaces in multiple schema files are forbidden.
The code generators use the top level namespace as the crate name.
You are supposed to output all schemas/crates in a single dir so they are parallel to eachother.

And I guess that is too much of a restriction given the amount of Rust FlatBuffers already in the wild, hence we need a workaround like this module mapping?

I guess the only question remaining is if this module mapping were to be better specified in a schema, if it can be equal for all module languages? Any reason command-line is preferrable?

mapping schema/foo.bs = foo_module;

csmulhern · 2025-12-08T01:29:48Z

Thanks for the engagement. I agree with your breakdown. I think this point is the most problematic:

You are supposed to output all schemas/crates in a single dir so they are parallel to eachother.

I think this limitation isn't super great either:

The code generators use the top level namespace as the crate name.

I spent a bit of time writing up a survey of what the solution space might look like here. See: https://docs.google.com/document/d/1E1eiGlZ0DHMw5a5PvLtVffdfXHz_3Mp0_W4HAjQRPM0/edit?tab=t.0

After going through that exercise, I think I agree with you that putting this in the schema file itself might be the better solution. If we can align on the best approach here, I'm happy to update this PR to be in line with that.

I think the open questions would be:

What syntax do we want to use to specify our custom schema data?

My feeling here is that something like "option" style syntax (see my doc) may be more extensible to other future use cases, rather than a "module" keyword.

How generic vs specific do we want to be in our approach? Should we try to be generic over "module-based languages", or just go language-specific, and perhaps accept some redundancy.

My feeling here is that language-specific is probably the least likely to run into future issues where we've painted ourselves into a corner.

aardappel · 2025-12-17T07:39:35Z

Thanks for spelling it out in the doc. My sense is that needing a whole string of command-line args for code to generate correctly is not great, though I see that this in the most flexible.

In schema seems pretty good to me, and I guess it be fine to do it per language. We don't already have such a syntax, so you'll have to invent one, and your option seems good.

You could implement this in a very generic way where option foo = "bar" and --foo="bar" are both possible, and the latter overrides the former if present.

This syntax currently only supports setting the Rust module associated with a schema file.

csmulhern · 2026-03-07T17:35:15Z

I've updated this PR to implement the option syntax we've discussed.

cc @jtdavis777 as well.

github-actions bot added c++ codegen Involving generating code from schema python rust labels Mar 22, 2025

csmulhern force-pushed the master branch 2 times, most recently from b9695b0 to e4bfc82 Compare March 22, 2025 04:54

csmulhern mentioned this pull request Mar 22, 2025

[Rust] Add better support for "crate-per-schema" #8273

Open

csmulhern force-pushed the master branch 2 times, most recently from 00f566c to b6974c0 Compare March 22, 2025 21:01

csmulhern force-pushed the master branch from b6974c0 to b871720 Compare June 20, 2025 00:52

aardappel force-pushed the master branch from b871720 to cd58f8d Compare June 22, 2025 05:36

csmulhern force-pushed the master branch from cd58f8d to b871720 Compare July 29, 2025 00:23

csmulhern force-pushed the master branch from 7cd94c5 to b871720 Compare August 5, 2025 02:02

csmulhern force-pushed the master branch from b871720 to 35250fa Compare October 9, 2025 22:14

csmulhern force-pushed the master branch 2 times, most recently from c2b50bc to a2fc889 Compare December 5, 2025 16:02

aardappel reviewed Dec 5, 2025

View reviewed changes

csmulhern force-pushed the master branch from a2fc889 to 7bfdd28 Compare December 21, 2025 05:38

csmulhern requested a review from dbaileychess as a code owner December 21, 2025 05:38

csmulhern force-pushed the master branch from 7bfdd28 to a2fc889 Compare December 21, 2025 05:41

Adds rust_module to the schema definition

810784e

Adds a new option syntax for flatbuffer schemas

62455ff

This syntax currently only supports setting the Rust module associated with a schema file.

csmulhern force-pushed the master branch from e3d8cd7 to 015711c Compare March 7, 2026 17:31

github-actions bot added the java label Mar 7, 2026

csmulhern changed the title ~~Adds module mapping of includes for Rust code generation~~ Adds module support of includes for Rust code generation Mar 7, 2026

csmulhern added 3 commits March 7, 2026 13:14

Updates Rust code generator to support the rust_module option

57a053a

Use rust_module in usage test

60d82f3

Regenerates Rust schema code

1e422b8

csmulhern force-pushed the master branch from 015711c to 1e422b8 Compare March 7, 2026 18:14

Conversation

csmulhern commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csmulhern commented Jun 20, 2025

Uh oh!

aardappel commented Jun 22, 2025

Uh oh!

csmulhern commented Oct 9, 2025

Uh oh!

csmulhern commented Dec 5, 2025

Uh oh!

aardappel commented Dec 5, 2025

Uh oh!

aardappel Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

csmulhern Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

aardappel Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

csmulhern commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aardappel commented Dec 7, 2025

Uh oh!

csmulhern commented Dec 8, 2025

Uh oh!

aardappel commented Dec 17, 2025

Uh oh!

csmulhern commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csmulhern commented Mar 22, 2025 •

edited

Loading

csmulhern commented Dec 5, 2025 •

edited

Loading