Skip to content

Separate parsing from reading shell input#263

Merged
jpco merged 16 commits intowryun:masterfrom
jpco:scriptparse
Mar 31, 2026
Merged

Separate parsing from reading shell input#263
jpco merged 16 commits intowryun:masterfrom
jpco:scriptparse

Conversation

@jpco
Copy link
Copy Markdown
Collaborator

@jpco jpco commented Mar 16, 2026

NOTE: I've been a bit aggressive with unilaterally merging PRs lately, but this one will not be getting merged without explicit feedback.

The short version: This PR changes the $&parse primitive. Previously, $&parse would take an optional set of prompts and use that to read and parse shell input, producing a parsed command. Now, $&parse takes a command and calls that command once or more in order to read shell input, which it parses. %parse has been rewritten in such a way that its outward-facing behavior hasn't changed while using the new $&parse.

The impact of all this is that it decouples prompting, reading, parsing, and writing to history. You can see that in these snippets from initial.es. In this first one, we define the %read-line function, which takes a prompt, prints it, and then reads a line. If it exists, the $&readline primitive can implement this function, or it can be done with an echo and a $&read.

if {~ <=$&primitives readline} {
	fn-%read-line = $&readline
} {
	fn %read-line prompt {
		echo -n $prompt
		$&read
	}
}

The second snippet uses this %read-line function, along with the new $&parse and the %write-history hook, to do all the work that would previously happen within $&parse:

let (in = (); p = $prompt(1))
unwind-protect {
	$&parse {
		let (r = <={%read-line $p}) {
			in = $in $r
			p = $prompt(2)
			result $r
		}
	}
} {
	if {!~ $#fn-%write-history 0 && !~ $#in 0} {
		%write-history <={%flatten \n $in}
	}
}

Doing this allows arbitrary flexibility in writing to history or prompting -- per-line history writing such as before #65 can be done by people who want that, or a different prompt could be used for each and every line. Something like zsh's TRANSIENT_RPROMPT could be added. You could even add a hook to transform read-in text before parsing it, in order to add !!-style history expansion or old-school string-replacement-based aliases.

This PR also moves both $&parse and $&readline towards being "normal" primitives. In fact, with this PR, all the readline code in the shell is collected into a single file readline.c. Adding support for an alternative library should only require adding similar primitives in a similar file and an es script to tie things together, which is a much simpler integration story than the dozen or so #if HAVE_READLINE blocks the shell has had till now. (This starts to connect to a potential concept of "extensible primitives", or even "modules", which is in my opinion a very interesting way to begin to direct the shell. Es could inherit loadable modules from Inferno's sh, a shell that arguably descended from es!)

Improving the potential to swap out $&parse may sound strange, but could also have practical benefits. In addition to changes to internals (switching to a hand-written parser for better error reporting, or an incremental parser, etc.), swappable parsers could be a coarse way to allow certain changes to syntax.

A note on the particular API of $&parse -- why not go even simpler and have $&parse take an already-read string which it is to parse? This would make $&parse, effectively, a "push-style" parser, where input is read and then "pushed into" the parser by its caller. The major issue with this for es is that push-style parsers must be able to be called multiple times over the course of a single parse: it would be called with one line, return a value indicating it needs more input, and then would be called again with the next, and this would repeat until enough has been fed to it that it can return a fully-parsed statement.

In es, the problem with this pattern is managing parser state across calls. How do we differentiate a second $&parse call for an in-progress bit of code from a second, unrelated $&parse call, especially given es lacks opaque handles, so we can't just say $&parse $parser ...? It is much simpler to use a "pull-style" parser, where we give $&parse a way to read input for itself, and have each $&parse call correspond with a single complete parse run.

One change that comes with this PR is how it affects the -i flag. Previously, the following would happen:

; echo 'echo hello world' | ./es -i
; echo hello world
hello world
; ; 

Now, this happens:

; echo 'echo hello world' | ./es -i
hello world
; 

The difference is that before, es would call the readline library if the shell was interactive and reading from stdin. Now, it calls the readline library if it's interactive and reading from a TTY. This is a corner-case, and other shells aren't consistent on the behavior; I believe that the new behavior is easier to model and explain in the shell (given that the reader commands are always reading from their stdin, it breaks some abstractions to make them judge whether it's "really" the shell's stdin.)

At least one potential follow-on improvement to consider: we may want to add some information for $&parse to pass to reader commands when calling them. One example would be something indicating whether a heredoc is being read, in order to do something like print a different prompt. I'm not in a hurry to add anything here; I think more time is needed to think of what might be included. (I also imagine that a richer API to a curretly-running parser might eventually be useful for something like a tab-completion system, but also don't have any real design for that yet.)

jpco added 7 commits February 22, 2026 06:48
This primitive will be used as a target for $&prompt's new "reader
commands".  We don't shuffle any code around for this yet; consolidating
readline logic into something like a readline.c file will be done later.
$&newparse is like $&parse, except instead of taking prompt arguments
and doing all the reading itself, it takes a reader command as its
arguments and calls out to that to fetch input.

Ideally $&parse should be replaced with $&newparse and then a lot of
code can be cleaned up.
Not all of this actually needs $&newparse as a prerequisite, like
input->eof and removing the various fill functions.
@jpco jpco marked this pull request as draft March 17, 2026 04:09
jpco added 4 commits March 18, 2026 07:39
This makes $&parse redirect shell input to stdin as if it's an actual
redirection operator, and has $&readline actually use the stdin and
stdout the shell has given it.
I don't know how ideal this is (or the implementation), but it seems to
work well at least initially.  Also add some tests for $&parse, not all
of which are passing yet.
@wryun
Copy link
Copy Markdown
Owner

wryun commented Mar 25, 2026

Having quickly read what you've written, this sounds good to me. But ultimately I think you'll have to live with being the only active contributor and make these kind of calls.

From my perspective, if you're primarily refining/reworking the internals I'm not overly concerned, particularly when you're adding tests. Ok, strictly primitives are user facing, but even advanced es users are going to touch these only once or twice.

This is pretty consistent with other shells' behavior, and makes $&read work as an input to $&parse.
@jpco
Copy link
Copy Markdown
Collaborator Author

jpco commented Mar 25, 2026

Having quickly read what you've written, this sounds good to me. But ultimately I think you'll have to live with being the only active contributor and make these kind of calls.

I just want to make sure I'm not trying to take things in a direction nobody else likes :) as long as I have some assent on the broad strokes, I'm happy.

From my perspective, if you're primarily refining/reworking the internals I'm not overly concerned, particularly when you're adding tests.

Reworking the internals is my favorite part!

Ok, strictly primitives are user facing, but even advanced es users are going to touch these only once or twice.

Yeah, the fact that this PR is user-facing is the thing that distinguishes it from #205, #223, #261, #262 -- even if we leave the %parse function backwards-compatible.

Primitive backward compatibility is something I feel a bit queasy about, and it's also relevant to #210. I think (hopefully for reasons better than the fact that it's convenient for me) that, at least at this point in es' development, primitives should be allowed to change backwards-incompatibly. The fact that the API of each primitive isn't really documented in the man page seems like a point in agreement with me. However, it's not exactly user-friendly to just do this without giving users a prior warning about what behaviors they can or can't expect to have breaking changes.

One of the goals with the "extensible primitives" stuff I've been thinking about is to help make this story better; I'd like to make it possible for users to pick which version of which primitives they want to use. But that's further out.

@jpco
Copy link
Copy Markdown
Collaborator Author

jpco commented Mar 28, 2026

I'm separating out the change to $&read, and the addition of $&readline, into their own PRs, #264 and #265 respectively. I want to make sure to get the same level of confidence with how each of those changes look as I have about the changes to $&parse here.

jpco added 2 commits March 30, 2026 07:28
It's not the right thing long-term, but it's the narrowest change that
gives good behavior now.
@jpco jpco marked this pull request as ready for review March 30, 2026 15:06
@jpco jpco merged commit d77a440 into wryun:master Mar 31, 2026
1 check passed
@jpco jpco deleted the scriptparse branch March 31, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants