F062: Built-in Regex Transform Operation
User Stories
US1: Match and Extract Text with Capture Groups (P1 - Must Have)
As a workflow author,
I want to extract structured data from step outputs using regular expressions with named capture groups,
So that I can parse unstructured text (logs, command output, API responses) into discrete values for downstream steps.
Acceptance Scenarios:
- Given a step output containing "version: 3.14.2", when I run
regex.match with pattern version:\s+(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+), then the outputs contain major=3, minor=14, patch=2, and matched=true
- Given a step output containing "no version info", when I run
regex.match with the same pattern, then matched=false and capture group outputs are empty strings
- Given a pattern with unnamed capture groups
(\d+)-(\d+) applied to "42-99", then outputs contain group_1=42, group_2=99
Independent Test: Create a workflow with a command step that echoes structured text, followed by a regex.match operation step that extracts values, followed by a terminal step that interpolates {{states.extract.outputs.major}}
US2: Replace Text Using Patterns (P1 - Must Have)
As a workflow author,
I want to replace matched text in a string using a regex pattern and replacement template,
So that I can transform step outputs (sanitize data, reformat strings, redact sensitive content) before passing them downstream.
Acceptance Scenarios:
- Given input text "Hello World 2026", when I run
regex.replace with pattern \d+ and replacement YEAR, then output contains "Hello World YEAR"
- Given input text "foo-bar-baz", when I run
regex.replace with pattern (\w+)-(\w+)-(\w+) and replacement $3_$2_$1, then output contains "baz_bar_foo"
- Given input text with no matches for the pattern, when I run
regex.replace, then the output equals the original input unchanged
Independent Test: Create a workflow with a regex.replace step that redacts email addresses from command output, verify the output contains [REDACTED] instead of email strings
US3: Find All Matches in Text (P2 - Should Have)
As a workflow author,
I want to find all occurrences of a pattern in text and access them as a list,
So that I can iterate over extracted values using loop constructs (for-each) in subsequent steps.
Acceptance Scenarios:
- Given text "error at line 10, error at line 25, error at line 42", when I run
regex.find_all with pattern line (\d+), then matches output is a JSON array ["line 10","line 25","line 42"] and groups output is [["10"],["25"],["42"]]
- Given text with zero matches, when I run
regex.find_all, then matches is an empty array [] and count is 0
Independent Test: Create a workflow that extracts all URLs from a text block using regex.find_all, then loops over them with a for-each step
US4: Split Text by Pattern (P3 - Nice to Have)
As a workflow author,
I want to split a string using a regex delimiter pattern,
So that I can break multi-value outputs into individual items for parallel or sequential processing.
Acceptance Scenarios:
- Given text "one::two:::three", when I run
regex.split with pattern :+, then parts output is ["one","two","three"] and count is 3
- Given a
limit input of 2, when splitting "a-b-c-d" by -, then parts is ["a","b-c-d"]
Independent Test: Create a workflow that splits CSV-like command output by a regex separator and verifies the resulting array length
Requirements
Functional Requirements
- FR-001: The system shall provide a
regex.match operation that applies a Go regexp pattern to input text and returns a boolean matched flag, the full match string, and all named/unnamed capture group values as individual outputs
- FR-002: The system shall provide a
regex.replace operation that applies a Go regexp pattern to input text, replaces matches using a replacement template supporting $1/${name} backreferences, and returns the transformed text
- FR-003: The system shall provide a
regex.find_all operation that returns all non-overlapping matches of a pattern in the input text as a JSON array, with a configurable limit (default: unlimited) to cap the number of matches
- FR-004: The system shall provide a
regex.split operation that splits input text by a regex delimiter pattern and returns the resulting parts as a JSON array, with a configurable limit for maximum splits
- FR-005: All regex operations shall validate the
pattern input at execution time and return a structured error with code EXEC.PLUGIN.OPERATION if the pattern is invalid, including the regexp compilation error message
- FR-006: All regex operations shall accept a
text input (required, string) and a pattern input (required, string) at minimum
- FR-007: The
regex.match operation shall support both named groups (?P<name>...) and positional groups (...), exposing named groups as named outputs and positional groups as group_1, group_2, etc.
- FR-008: All regex operations shall register with the
CompositeOperationProvider under the regex namespace, following the established F054/F056 built-in provider pattern
Non-Functional Requirements
- NFR-001: Regex compilation and execution shall complete in < 10ms for patterns up to 1KB and input text up to 1MB
- NFR-002: All regex types shall remain internal to
internal/infrastructure/regex/ with no new domain entities (following F054/F056 no-domain-pollution principle)
- NFR-003: Invalid regex patterns shall produce actionable error messages including the original pattern and the Go
regexp compilation error
- NFR-004: The implementation shall use Go's
regexp package (RE2 syntax) — no PCRE or backtracking engines — guaranteeing linear-time execution and preventing ReDoS
Success Criteria
Key Entities
| Entity |
Description |
Attributes |
| RegexOperationProvider |
Built-in OperationProvider for regex namespace |
operations map, Execute dispatch |
| MatchResult |
Internal result of a regex.match execution |
matched bool, full_match string, groups map |
| ReplaceResult |
Internal result of a regex.replace execution |
result string, count int |
| FindAllResult |
Internal result of a regex.find_all execution |
matches []string, groups [][]string, count int |
| SplitResult |
Internal result of a regex.split execution |
parts []string, count int |
Metadata
- Status: backlog
- Version: v0.4.0
- Priority: medium
- Estimation: M
Dependencies
- Blocked by: F057
- Unblocks: none
Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
- Uses Go
regexp package exclusively (RE2 syntax). This guarantees linear-time matching and eliminates ReDoS risk but means lookaheads/lookbehinds and backreferences in patterns are not supported. This is a deliberate trade-off: safety over power.
- The
regex namespace joins github and notify in the CompositeOperationProvider. Adding a third namespace requires only map registration — no API changes.
- Capture group outputs are strings. Workflow authors needing numeric values can use expression syntax (
int(states.step.outputs.group_1)) for type conversion.
- The
regex.replace replacement template uses Go's regexp.Expand syntax: $1, ${name}, $$ for literal dollar sign.
- F057 (file operations) is a prerequisite because regex transforms are most valuable when operating on file content read by
file.read. Without F057, regex operations are limited to command step outputs and hardcoded strings.
F062: Built-in Regex Transform Operation
User Stories
US1: Match and Extract Text with Capture Groups (P1 - Must Have)
As a workflow author,
I want to extract structured data from step outputs using regular expressions with named capture groups,
So that I can parse unstructured text (logs, command output, API responses) into discrete values for downstream steps.
Acceptance Scenarios:
regex.matchwith patternversion:\s+(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+), then the outputs containmajor=3,minor=14,patch=2, andmatched=trueregex.matchwith the same pattern, thenmatched=falseand capture group outputs are empty strings(\d+)-(\d+)applied to "42-99", then outputs containgroup_1=42,group_2=99Independent Test: Create a workflow with a command step that echoes structured text, followed by a
regex.matchoperation step that extracts values, followed by a terminal step that interpolates{{states.extract.outputs.major}}US2: Replace Text Using Patterns (P1 - Must Have)
As a workflow author,
I want to replace matched text in a string using a regex pattern and replacement template,
So that I can transform step outputs (sanitize data, reformat strings, redact sensitive content) before passing them downstream.
Acceptance Scenarios:
regex.replacewith pattern\d+and replacementYEAR, then output contains "Hello World YEAR"regex.replacewith pattern(\w+)-(\w+)-(\w+)and replacement$3_$2_$1, then output contains "baz_bar_foo"regex.replace, then the output equals the original input unchangedIndependent Test: Create a workflow with a
regex.replacestep that redacts email addresses from command output, verify the output contains[REDACTED]instead of email stringsUS3: Find All Matches in Text (P2 - Should Have)
As a workflow author,
I want to find all occurrences of a pattern in text and access them as a list,
So that I can iterate over extracted values using loop constructs (for-each) in subsequent steps.
Acceptance Scenarios:
regex.find_allwith patternline (\d+), thenmatchesoutput is a JSON array["line 10","line 25","line 42"]andgroupsoutput is[["10"],["25"],["42"]]regex.find_all, thenmatchesis an empty array[]andcountis0Independent Test: Create a workflow that extracts all URLs from a text block using
regex.find_all, then loops over them with a for-each stepUS4: Split Text by Pattern (P3 - Nice to Have)
As a workflow author,
I want to split a string using a regex delimiter pattern,
So that I can break multi-value outputs into individual items for parallel or sequential processing.
Acceptance Scenarios:
regex.splitwith pattern:+, thenpartsoutput is["one","two","three"]andcountis3limitinput of 2, when splitting "a-b-c-d" by-, thenpartsis["a","b-c-d"]Independent Test: Create a workflow that splits CSV-like command output by a regex separator and verifies the resulting array length
Requirements
Functional Requirements
regex.matchoperation that applies a Goregexppattern to input text and returns a booleanmatchedflag, the full match string, and all named/unnamed capture group values as individual outputsregex.replaceoperation that applies a Goregexppattern to input text, replaces matches using a replacement template supporting$1/${name}backreferences, and returns the transformed textregex.find_alloperation that returns all non-overlapping matches of a pattern in the input text as a JSON array, with a configurablelimit(default: unlimited) to cap the number of matchesregex.splitoperation that splits input text by a regex delimiter pattern and returns the resulting parts as a JSON array, with a configurablelimitfor maximum splitspatterninput at execution time and return a structured error with codeEXEC.PLUGIN.OPERATIONif the pattern is invalid, including the regexp compilation error messagetextinput (required, string) and apatterninput (required, string) at minimumregex.matchoperation shall support both named groups(?P<name>...)and positional groups(...), exposing named groups as named outputs and positional groups asgroup_1,group_2, etc.CompositeOperationProviderunder theregexnamespace, following the established F054/F056 built-in provider patternNon-Functional Requirements
internal/infrastructure/regex/with no new domain entities (following F054/F056 no-domain-pollution principle)regexpcompilation errorregexppackage (RE2 syntax) — no PCRE or backtracking engines — guaranteeing linear-time execution and preventing ReDoSSuccess Criteria
go-arch-lint checkpasses with newinfra-regexcomponentKey Entities
Metadata
Dependencies
Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
regexppackage exclusively (RE2 syntax). This guarantees linear-time matching and eliminates ReDoS risk but means lookaheads/lookbehinds and backreferences in patterns are not supported. This is a deliberate trade-off: safety over power.regexnamespace joinsgithubandnotifyin theCompositeOperationProvider. Adding a third namespace requires only map registration — no API changes.int(states.step.outputs.group_1)) for type conversion.regex.replacereplacement template uses Go'sregexp.Expandsyntax:$1,${name},$$for literal dollar sign.file.read. Without F057, regex operations are limited to command step outputs and hardcoded strings.