Skip to content

Stim compiler#3305

Draft
joao-boechat wants to merge 20 commits into
mainfrom
joaoboechat/stim-compiler
Draft

Stim compiler#3305
joao-boechat wants to merge 20 commits into
mainfrom
joaoboechat/stim-compiler

Conversation

@joao-boechat

Copy link
Copy Markdown
Contributor

No description provided.

.chars
.peek()
.map_or(self.input_len as usize, |(i, _)| *i);
// TODO: What if some identifier starts with "rec" but is not a rec token?
}
}

//TODO: Deal with escaping
}

fn parse_item(&mut self) -> Option<Item> {
// TODO WHAT IF IT STARTS WITH A NEWLINE?
}
return Some(Item::Line(self.parse_line(instruction)));
} else {
// TODO error! The start of every item should be an instruction;
lo: span.lo + 5,
hi: span.hi - 1,
}),
), // Strips 'rec[-' prefix and trailing ']' TODO validate it
hi: span.hi - 1,
}),
),
}, // Strips 'sweep[' prefix and trailing ']' TODO validate it

fn main() {
let stim_code =
fs::read_to_string("examples/example.stim").expect("Failed to read examples/example.stim");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this file anywhere in the PR. Also, most of our tests use inline string of input/output to verify, rather than external files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for my manual testing, just so I can parse whole files at a time without having to edit the code every time. I will make sure to remove it before the official PR

@amcasey amcasey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about the lexer. I'm just learning, so don't take any of this as blocking.

Rec, // rec[- ...]
Sweep, // sweep[...]
Tag, // "[...]"
Open(Delim), // ( {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this one TokenKind with a parameter, rather than two token kinds? Are there places we want to handle either form of bracket?

};

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub struct Token {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concept of a known-but-erroneous token? For example, in many languages 0.2 is valid, but .2 is not, but you'd want both to appear as Double tokens for error recovery purposes. (This is almost certainly out of scope for this proof-of-concept implementation.)

input_len: input
.len()
.try_into()
.expect("input length should fit into u32"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this restriction. Is input capable of holding more than max_int bytes? If so, do we need to restrict it? If not, isn't the restriction implied? (The code might exactly express that it's impossible - I'm still getting used to reading rust.)

pub fn new(input: &'a str) -> Self {
Self {
input,
input_len: input

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, this is the length in bytes but chars will be in characters and may be shorter?

self.eat_while(|c| c.is_ascii_digit());
let mut is_double = false;
if self.chars.next_if(|(_, c)| *c == '.').is_some() {
self.eat_while(|c| c.is_ascii_digit());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to consume at least one digit? Or does the language allow 2. as a valid double?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we allow scientific notation. Surely, 2.e isn't allowed?

self.whitespace();
}

fn scan_number(&mut self) -> TokenKind {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a sign for the whole number? +2?

}

fn scan_identifier(&mut self, lo: usize) -> TokenKind {
self.eat_while(|c| c.is_alphanumeric() || c == '_');

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of languages don't allow identifiers to start with digits. Not sure if that's true of stim.

.map_or(self.input_len as usize, |(i, _)| *i);
// TODO: What if some identifier starts with "rec" but is not a rec token?
match &self.input[lo..hi] {
"rec" => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just blanking, but where did we check for the open [?

}
}

impl Iterator for Lexer<'_> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic rust, is '_ magic?

while self.chars.next_if(|i| f(i.1)).is_some() {}
}

fn whitespace(&mut self) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to consume newlines without creating corresponding tokens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants