Skip to content

Support dynamic node edits#77

Open
Idorobots wants to merge 13 commits intokarlicoss:masterfrom
Idorobots:dynamic-edits
Open

Support dynamic node edits#77
Idorobots wants to merge 13 commits intokarlicoss:masterfrom
Idorobots:dynamic-edits

Conversation

@Idorobots
Copy link

@Idorobots Idorobots commented Feb 22, 2026

This is in reference to #11 and #76

I wanted to be able to modify parsed nodes and dump them to a file for a project I'm building. This code was written with AI (Codex 5.2 thinking primarily) but I was in the loop for every change made.

Here's how this works:

  • A new abstraction for LineItem is added that represents a parsed line of an org nodes text. There are subclasses for each supported Org function (dates, properties, headline etc).
  • Initially all lines are TextLines to preserve the 1-to-1 representation when using __str__.
  • When a property is set, the corresponding line is replaced with a semantic instance of LineItem. If one wasn't present for a specific property (node.scheduled = date when there was no previous scheduled date parsed from file) one is inserted.
  • If a change is made via the setters, the node's lines are marked as dirty, and the representation is regenerated from the _line_items instead. That's how the edits are made possible.
  • The _lines with the original file contents are still available, as a performance optimization, for the usual use-case of just parsing the nodes without making any modifications to them.

I also considered creating _line_items lazily, on first write, as another optimization, but didn't do that in the end - seems to be performant enough, my 37k task archive loads in 2.32s on current master and in 3.71s on this branch. Let me know if this is acceptable.

I mostly added unit tests for the new functionality as the current functionality is mostly left unchanged.

The body editing is pretty awkward as it requires a line index since bodies can be not contiguous and can contain timestamps etc.
Similarly, edge cases such as duplicated logbook drawers are equally awkward.
This PR also permits children modification, so essentially adding subtrees and moving them around the file (but only within the same OrgEnv.)

I reorganized the code a little bit and reexported the relevant functions back from the node module, to keep the API stable.

Technically, these new LineItems could replace the old attributes like _heading etc, entirely, but I opted to keep the old attributes in, at least until I get some comments on the general approach here - with richer representation it'll be easier to add support for links, etc, as part of the heading for instance.

I went ahead and made the "new" representation the only available one (without changing the public API).. That uncovered a few issues, but these were fixed and now my archive loads and looks correct. It also sped things up a bit - archive now loads in 2.68s.

@plur9
Copy link

plur9 commented Mar 6, 2026

We've been running this branch in production for the past few weeks via a fork (datacore-one/orgparse@pr-77) as the underlying parser for an org-mode AI workflow library. Wanted to share real-world feedback since you've had no external reviews yet.

What we're doing with it

We built org-workspace on top of this branch — a library for AI agents to read and mutate org files (task state transitions, property updates, LOGBOOK entries, CLOCK entries, refile). It runs overnight processing hundreds of org tasks autonomously. The mutation capabilities in this PR are exactly what made that possible without falling back to fragile regex substitution.

What works well

  • Property editing (node.set_property) is solid and handles the common case cleanly
  • State transitions (node.heading setter) work reliably
  • The _line_items approach is the right abstraction — we built LOGBOOK/CLOCK insertion on top of _insert_line_item and it composes well
  • Roundtrip fidelity — we haven't seen any corruption on files we've modified thousands of times

One fragility we hit

_mark_dirty is a private method we call from our workspace layer to signal that a node has been mutated outside of the official setters (specifically when inserting raw TextLine entries into _line_items). We had to wrap it in a try/except because it's not part of the public API:

try:
    node._mark_dirty()
except AttributeError:
    node._env.reload(node.path)  # fallback

It would be helpful to have a public node.mark_dirty() or a supported node.invalidate() method for cases where callers manipulate _line_items directly. The alternative — always going through setters — doesn't cover the LOGBOOK insertion pattern where you're inserting arbitrary text lines.

Minor notes

  • The body editing awkwardness you mentioned in the PR description is real — we ended up not using it and instead appending to _line_items directly via _insert_line_item, which works but feels fragile
  • find_by_id() (or equivalent) would be very useful — we index nodes by their :ID: property frequently and currently do a linear scan over env nodes

Overall

This PR fills a real gap. orgparse is widely used for reading org files but there's no good Python solution for writing them without mangling the file. We'd love to see this merged. Happy to help review specific parts if that's useful.

@Idorobots
Copy link
Author

Oh wow, nice!
I've been building something very similar actually: https://github.com/Idorobots/org-cli

The goal is for the coordination of AI agents around Org-Mode files.
I got a bit frustrated with the lack of response on here, so I've started an alternative Org-Mode parser library instead: https://github.com/Idorobots/org-mode-parser

It's not finished yet, but I'm aiming to make that orgparse-compatible eventually. The current state is that the tree-sitter parser handles a lot more Org-Mode syntax than orgparse and it just needs a Python wrapper.

I'd love some of your feedback on there, and perhaps we could cooperate, since we seem to be having a similar project in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants