Bringing the glob Dependency In-House
Replacing the external glob npm package with stdlib-native equivalents to eliminate a supply-chain risk and align all production and tooling code with stdlib's quality standards (docs, tests, examples, benchmarks, backward compatibility).
Background
What glob Does
A robust glob implementation in JavaScript that matches files using patterns the shell uses (like stars and question marks). It works by walking the filesystem and applying regex-based matching to paths.
More here: https://github.com/isaacs/node-glob
Its core dependencies
minimatch: Converts the glob strings into regular expressions and performs the actual string matching.
fs.realpath / inflight / once: Various utilities used under the hood to manage concurrent filesystem calls and path resolutions.
How stdlib uses it
- 50+ files across:
@stdlib/_tools/pkgs/*, @stdlib/_tools/lint/*, @stdlib/_tools/static-analysis/*, and various internal bundling or testing scripts.
- Only the core discovery functions are used:
glob( pattern, opts, clbk ) and glob.sync( pattern, opts ).
- Primary use cases: Finding
package.json files, source .js files, or .c files, while explicitly passing an ignore option array (e.g., ['node_modules/**', '.git/**']) to prevent scanning massive dependency directories.
- No advanced API usage: None of the files instantiate the
Glob class directly as an event emitter or use advanced bash features like brace expansion ({a,b}).
Proposed Changes
The plan creates 2 new packages mirroring stdlib's decomposable architecture, with each package being independently consumable.
Component 1: @stdlib/utils/regexp-from-glob
Replaces the minimatch npm dependency (transitive via glob).
A utility to convert standard shell wildcard patterns into native JavaScript RegExp objects.
Scope: Strictly scoped to the wildcards stdlib actually uses (*, **, ?). It will not support brace expansions or extglobs to remain intentionally minimal and maintain high execution speed.
@stdlib/utils/regexp-from-glob/
├── lib/
│ ├── index.js # re-export main
│ └── main.js # regexpFromGlob(str) → RegExp
├── test/
│ └── test.js
├── benchmark/
│ └── benchmark.js
├── docs/
│ ├── repl.txt
│ └── types/
│ ├── index.d.ts
│ └── test.ts
├── examples/
│ └── index.js
├── README.md
└── package.json
API:
var regexpFromGlob = require( '@stdlib/utils/regexp-from-glob' );
var re = regexpFromGlob( '**/*.js' );
// returns RegExp
re.test( 'lib/index.js' ); // => true
re.test( 'lib/index.json' ); // => false
Component 2: @stdlib/fs/glob
The core package — the direct replacement for require('glob').
A filesystem traversal utility that applies the generated glob regexes to discover files. This is the main package all _tools files will switch to.
@stdlib/fs/glob/
├── lib/
│ ├── index.js # re-export main
│ ├── main.js # async implementation
│ ├── sync.js # sync implementation
│ ├── walk.js # internal DFS/BFS directory walker
│ └── validate.js # options validation
├── test/
│ ├── test.js # async tests
│ ├── test.sync.js # sync tests
│ └── test.walk.js # directory walker tree-pruning tests
├── benchmark/
│ ├── benchmark.js # benchmark: async traversal
│ └── benchmark.sync.js # benchmark: sync traversal
├── docs/
│ ├── repl.txt
│ └── types/
│ ├── index.d.ts
│ └── test.ts
├── examples/
│ └── index.js
├── README.md
└── package.json
API (drop-in compatible with current usage):
var glob = require( '@stdlib/fs/glob' );
var opts = {
'cwd': __dirname,
'ignore': [ 'node_modules/**' ],
'realpath': true
};
// Async usage
glob( '**/*.js', opts, function onGlob( error, matches ) {
if ( error ) {
console.error( error );
return;
}
console.dir( matches );
});
// Sync usage
var matches = glob.sync( '**/*.js', opts );
Key implementation details:
- Uses
@stdlib/fs/read-dir and standard fs.stat under the hood.
- Tree-Pruning (Crucial for perf): The internal
walk.js algorithm must evaluate the ignore option before descending into a directory.
- Options support:
cwd (defaults to process.cwd()), ignore (array of globs), and realpath (boolean).
- Returns arrays strictly normalized, preventing duplicates.
- No external dependencies (uses existing
stdlib utilities).
Component 3: Migration — Updating All Consumers
This is the mechanical bulk change, done after the new packages are created and verified.
[MODIFY] All 50+ files using require( 'glob' )
The change is a single-line replacement per file:
-var glob = require( 'glob' );
+var glob = require( '@stdlib/fs/glob' );
Migration strategy (phased to reduce risk):
| Phase |
Scope |
Files |
| Phase 1 |
@stdlib/_tools/pkgs/* (find, deps, clis, etc.) |
~15 files |
| Phase 2 |
@stdlib/_tools/static-analysis/* (sloc-glob, etc.) |
~10 files |
| Phase 3 |
@stdlib/_tools/lint/* (filenames, pkg-json, etc.) |
~20 files |
| Phase 4 |
Remaining bundles and scripts |
~10 files |
Each phase follows the same process:
- Run
find + sed to replace require( 'glob' ) → require( '@stdlib/fs/glob' ).
- Run existing internal tool tests for the affected script area.
- Verify output matches the original glob dependency.
Uninstall the dependancy
After all phases are complete:
Verification Plan
Automated Tests
1. Unit tests for @stdlib/utils/regexp-from-glob
make TESTS_FILTER=".*/utils/regexp-from-glob/.*" test
Tests should cover:
- Core wildcards (
*, **, ?).
- Escaping behavior for dot
., plus +, parentheses (), and brackets [].
- Prefix, suffix, and exact match conditions.
2. Unit tests for @stdlib/fs/glob
make TESTS_FILTER=".*/fs/glob/.*" test
Tests should cover:
- Standard sync and async pattern matching against a mock filesystem.
ignore arrays effectively pruning traversal.
cwd option shifts the search base.
realpath correctly resolves to absolute paths.
- Error handling (e.g., trying to read restricted directories gracefully).
3. Regression tests for migrated modules
Run the full test suite for each phase to ensure the tools operate as expected:
# Phase 1:
make TESTS_FILTER=".*/_tools/pkgs/.*" test
# Phase 2:
make TESTS_FILTER=".*/_tools/static-analysis/.*" test
# Phase 3:
make TESTS_FILTER=".*/_tools/lint/.*" test
# Phase 4: Full internal tools suite
make test
Manual Verification
After migration, verify the tools perform correctly on the monorepo:
-
Verify Package Discovery:
node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli
→ Verify: Outputs all valid stdlib sub-packages identically to previous runs, without diving into node_modules.
-
Verify SLOC execution:
node ./lib/node_modules/@stdlib/_tools/static-analysis/js/sloc-glob/bin/cli
→ Verify: Accurately computes lines of code for the workspace.
-
Performance Check:
Time the execution before and after the swap.
time node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli
→ Verify: Our native, tree-pruned walker executes in equivalent or faster time than the external dependency.
Bringing the
globDependency In-HouseReplacing the external
globnpm package with stdlib-native equivalents to eliminate a supply-chain risk and align all production and tooling code with stdlib's quality standards (docs, tests, examples, benchmarks, backward compatibility).Background
What
globDoesA robust glob implementation in JavaScript that matches files using patterns the shell uses (like stars and question marks). It works by walking the filesystem and applying regex-based matching to paths.
More here: https://github.com/isaacs/node-glob
Its core dependencies
minimatch: Converts the glob strings into regular expressions and performs the actual string matching.fs.realpath/inflight/once: Various utilities used under the hood to manage concurrent filesystem calls and path resolutions.How stdlib uses it
@stdlib/_tools/pkgs/*,@stdlib/_tools/lint/*,@stdlib/_tools/static-analysis/*, and various internal bundling or testing scripts.glob( pattern, opts, clbk )andglob.sync( pattern, opts ).package.jsonfiles, source.jsfiles, or.cfiles, while explicitly passing anignoreoption array (e.g.,['node_modules/**', '.git/**']) to prevent scanning massive dependency directories.Globclass directly as an event emitter or use advanced bash features like brace expansion ({a,b}).Proposed Changes
The plan creates 2 new packages mirroring stdlib's decomposable architecture, with each package being independently consumable.
Component 1:
@stdlib/utils/regexp-from-globA utility to convert standard shell wildcard patterns into native JavaScript
RegExpobjects.Scope: Strictly scoped to the wildcards
stdlibactually uses (*,**,?). It will not support brace expansions or extglobs to remain intentionally minimal and maintain high execution speed.API:
Component 2:
@stdlib/fs/globA filesystem traversal utility that applies the generated glob regexes to discover files. This is the main package all
_toolsfiles will switch to.API (drop-in compatible with current usage):
Key implementation details:
@stdlib/fs/read-dirand standardfs.statunder the hood.walk.jsalgorithm must evaluate theignoreoption before descending into a directory.cwd(defaults toprocess.cwd()),ignore(array of globs), andrealpath(boolean).stdlibutilities).Component 3: Migration — Updating All Consumers
This is the mechanical bulk change, done after the new packages are created and verified.
[MODIFY] All 50+ files using
require( 'glob' )The change is a single-line replacement per file:
Migration strategy (phased to reduce risk):
@stdlib/_tools/pkgs/*(find, deps, clis, etc.)@stdlib/_tools/static-analysis/*(sloc-glob, etc.)@stdlib/_tools/lint/*(filenames, pkg-json, etc.)Each phase follows the same process:
find+sedto replacerequire( 'glob' )→require( '@stdlib/fs/glob' ).Uninstall the dependancy
After all phases are complete:
Verification Plan
Automated Tests
1. Unit tests for
@stdlib/utils/regexp-from-globTests should cover:
*,**,?).., plus+, parentheses(), and brackets[].2. Unit tests for
@stdlib/fs/globTests should cover:
ignorearrays effectively pruning traversal.cwdoption shifts the search base.realpathcorrectly resolves to absolute paths.3. Regression tests for migrated modules
Run the full test suite for each phase to ensure the tools operate as expected:
Manual Verification
After migration, verify the tools perform correctly on the monorepo:
Verify Package Discovery:
→ Verify: Outputs all valid
stdlibsub-packages identically to previous runs, without diving intonode_modules.Verify SLOC execution:
→ Verify: Accurately computes lines of code for the workspace.
Performance Check:
Time the execution before and after the swap.
time node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli→ Verify: Our native, tree-pruned walker executes in equivalent or faster time than the external dependency.