Stim compiler by joao-boechat · Pull Request #3305 · microsoft/qdk

joao-boechat · 2026-06-11T19:54:20Z

No description provided.

+            .chars
+            .peek()
+            .map_or(self.input_len as usize, |(i, _)| *i);
+        // TODO: What if some identifier starts with "rec" but is not a rec token?


+    }
+}
+
+//TODO: Deal with escaping


+    }
+
+    fn parse_item(&mut self) -> Option<Item> {
+        // TODO WHAT IF IT STARTS WITH A NEWLINE?


+            }
+            return Some(Item::Line(self.parse_line(instruction)));
+        } else {
+            // TODO error! The start of every item should be an instruction;


+                            lo: span.lo + 5,
+                            hi: span.hi - 1,
+                        }),
+                    ), // Strips 'rec[-' prefix and trailing ']' TODO validate it


+                            hi: span.hi - 1,
+                        }),
+                    ),
+                }, // Strips 'sweep[' prefix and trailing ']' TODO validate it


billti · 2026-06-12T00:29:05Z

+
+fn main() {
+    let stim_code =
+        fs::read_to_string("examples/example.stim").expect("Failed to read examples/example.stim");


I don't see this file anywhere in the PR. Also, most of our tests use inline string of input/output to verify, rather than external files.

this is for my manual testing, just so I can parse whole files at a time without having to edit the code every time. I will make sure to remove it before the official PR

amcasey

Some questions about the lexer. I'm just learning, so don't take any of this as blocking.

amcasey · 2026-06-15T16:37:14Z

+    Rec,             // rec[- ...]
+    Sweep,           // sweep[...]
+    Tag,             // "[...]"
+    Open(Delim),     // ( {


Out of curiosity, why is this one TokenKind with a parameter, rather than two token kinds? Are there places we want to handle either form of bracket?

amcasey · 2026-06-15T16:40:13Z

+};
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq)]
+pub struct Token {


Is there a concept of a known-but-erroneous token? For example, in many languages 0.2 is valid, but .2 is not, but you'd want both to appear as Double tokens for error recovery purposes. (This is almost certainly out of scope for this proof-of-concept implementation.)

amcasey · 2026-06-15T16:43:26Z

+            input_len: input
+                .len()
+                .try_into()
+                .expect("input length should fit into u32"),


I'm curious about this restriction. Is input capable of holding more than max_int bytes? If so, do we need to restrict it? If not, isn't the restriction implied? (The code might exactly express that it's impossible - I'm still getting used to reading rust.)

amcasey · 2026-06-15T16:43:54Z

+    pub fn new(input: &'a str) -> Self {
+        Self {
+            input,
+            input_len: input


To confirm, this is the length in bytes but chars will be in characters and may be shorter?

amcasey · 2026-06-15T16:47:08Z

+        self.eat_while(|c| c.is_ascii_digit());
+        let mut is_double = false;
+        if self.chars.next_if(|(_, c)| *c == '.').is_some() {
+            self.eat_while(|c| c.is_ascii_digit());


Is this guaranteed to consume at least one digit? Or does the language allow 2. as a valid double?

Looks like we allow scientific notation. Surely, 2.e isn't allowed?

amcasey · 2026-06-15T16:49:11Z

+        self.whitespace();
+    }
+
+    fn scan_number(&mut self) -> TokenKind {


Can there be a sign for the whole number? +2?

amcasey · 2026-06-15T16:49:41Z

+    }
+
+    fn scan_identifier(&mut self, lo: usize) -> TokenKind {
+        self.eat_while(|c| c.is_alphanumeric() || c == '_');


A lot of languages don't allow identifiers to start with digits. Not sure if that's true of stim.

amcasey · 2026-06-15T16:51:14Z

+            .map_or(self.input_len as usize, |(i, _)| *i);
+        // TODO: What if some identifier starts with "rec" but is not a rec token?
+        match &self.input[lo..hi] {
+            "rec" => {


I'm probably just blanking, but where did we check for the open [?

amcasey · 2026-06-15T16:51:45Z

+    }
+}
+
+impl Iterator for Lexer<'_> {


Basic rust, is '_ magic?

amcasey · 2026-06-15T16:53:11Z

+        while self.chars.next_if(|i| f(i.1)).is_some() {}
+    }
+
+    fn whitespace(&mut self) {


Is this going to consume newlines without creating corresponding tokens?

joao-boechat added 12 commits June 4, 2026 11:35

initial lexer implementation

cacd362

add display implementation for TokenKind

8716674

temporary commit to save progress

35804ba

update lex

d250224

update package name

5462765

first finalized parser

c66a59d

fix bugs, improve debugging

2c03a10

make tests consume from arbitrary example.stim file

561e3e9

fix repeated newline bug, improve parsing of custom instructions

c07ee11

fix bug of custom function without target

a73fc12

handle scientific notation in the lexer

85b0955

improve lex and parse manual testing

a6cbe23

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

billti reviewed Jun 12, 2026

View reviewed changes

joao-boechat added 8 commits June 11, 2026 17:54

save initial qir emitting code

9c724bb

add qirWriter for separating responsibilities

3a30469

file for manual testing e2e compilation

ff29da3

output header, footer, and declarations

8bcb3e7

fix clippy warnings

fa9973d

add run_qir api to python

c0bdb22

use FxHashMap from rustc_hash

c0d7155

finish implementing preselect

fc18cc3

amcasey reviewed Jun 15, 2026

View reviewed changes

Conversation

joao-boechat commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amcasey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants