Stim compiler#3305
Conversation
| .chars | ||
| .peek() | ||
| .map_or(self.input_len as usize, |(i, _)| *i); | ||
| // TODO: What if some identifier starts with "rec" but is not a rec token? |
| } | ||
| } | ||
|
|
||
| //TODO: Deal with escaping |
| } | ||
|
|
||
| fn parse_item(&mut self) -> Option<Item> { | ||
| // TODO WHAT IF IT STARTS WITH A NEWLINE? |
| } | ||
| return Some(Item::Line(self.parse_line(instruction))); | ||
| } else { | ||
| // TODO error! The start of every item should be an instruction; |
| lo: span.lo + 5, | ||
| hi: span.hi - 1, | ||
| }), | ||
| ), // Strips 'rec[-' prefix and trailing ']' TODO validate it |
| hi: span.hi - 1, | ||
| }), | ||
| ), | ||
| }, // Strips 'sweep[' prefix and trailing ']' TODO validate it |
|
|
||
| fn main() { | ||
| let stim_code = | ||
| fs::read_to_string("examples/example.stim").expect("Failed to read examples/example.stim"); |
There was a problem hiding this comment.
I don't see this file anywhere in the PR. Also, most of our tests use inline string of input/output to verify, rather than external files.
There was a problem hiding this comment.
this is for my manual testing, just so I can parse whole files at a time without having to edit the code every time. I will make sure to remove it before the official PR
amcasey
left a comment
There was a problem hiding this comment.
Some questions about the lexer. I'm just learning, so don't take any of this as blocking.
| Rec, // rec[- ...] | ||
| Sweep, // sweep[...] | ||
| Tag, // "[...]" | ||
| Open(Delim), // ( { |
There was a problem hiding this comment.
Out of curiosity, why is this one TokenKind with a parameter, rather than two token kinds? Are there places we want to handle either form of bracket?
| }; | ||
|
|
||
| #[derive(Clone, Copy, Debug, Eq, PartialEq)] | ||
| pub struct Token { |
There was a problem hiding this comment.
Is there a concept of a known-but-erroneous token? For example, in many languages 0.2 is valid, but .2 is not, but you'd want both to appear as Double tokens for error recovery purposes. (This is almost certainly out of scope for this proof-of-concept implementation.)
| input_len: input | ||
| .len() | ||
| .try_into() | ||
| .expect("input length should fit into u32"), |
There was a problem hiding this comment.
I'm curious about this restriction. Is input capable of holding more than max_int bytes? If so, do we need to restrict it? If not, isn't the restriction implied? (The code might exactly express that it's impossible - I'm still getting used to reading rust.)
| pub fn new(input: &'a str) -> Self { | ||
| Self { | ||
| input, | ||
| input_len: input |
There was a problem hiding this comment.
To confirm, this is the length in bytes but chars will be in characters and may be shorter?
| self.eat_while(|c| c.is_ascii_digit()); | ||
| let mut is_double = false; | ||
| if self.chars.next_if(|(_, c)| *c == '.').is_some() { | ||
| self.eat_while(|c| c.is_ascii_digit()); |
There was a problem hiding this comment.
Is this guaranteed to consume at least one digit? Or does the language allow 2. as a valid double?
There was a problem hiding this comment.
Looks like we allow scientific notation. Surely, 2.e isn't allowed?
| self.whitespace(); | ||
| } | ||
|
|
||
| fn scan_number(&mut self) -> TokenKind { |
There was a problem hiding this comment.
Can there be a sign for the whole number? +2?
| } | ||
|
|
||
| fn scan_identifier(&mut self, lo: usize) -> TokenKind { | ||
| self.eat_while(|c| c.is_alphanumeric() || c == '_'); |
There was a problem hiding this comment.
A lot of languages don't allow identifiers to start with digits. Not sure if that's true of stim.
| .map_or(self.input_len as usize, |(i, _)| *i); | ||
| // TODO: What if some identifier starts with "rec" but is not a rec token? | ||
| match &self.input[lo..hi] { | ||
| "rec" => { |
There was a problem hiding this comment.
I'm probably just blanking, but where did we check for the open [?
| } | ||
| } | ||
|
|
||
| impl Iterator for Lexer<'_> { |
| while self.chars.next_if(|i| f(i.1)).is_some() {} | ||
| } | ||
|
|
||
| fn whitespace(&mut self) { |
There was a problem hiding this comment.
Is this going to consume newlines without creating corresponding tokens?
No description provided.