Skip to content

Logical FS performance: per-directory file-list cache and inum cache …#39

Open
crayy8 wants to merge 2 commits into
SleuthKitLabs:develop-4.1xfrom
crayy8:logical_fs_performance
Open

Logical FS performance: per-directory file-list cache and inum cache …#39
crayy8 wants to merge 2 commits into
SleuthKitLabs:develop-4.1xfrom
crayy8:logical_fs_performance

Conversation

@crayy8
Copy link
Copy Markdown
Member

@crayy8 crayy8 commented May 4, 2026

…improvements

  • Add per-directory file-list cache (DIR_FILE_LIST_CACHE, 500 FIFO slots) to avoid repeatedly enumerating the same directory when resolving K files in one folder during inum-based searches.

  • Store inum_cache paths relative to base_path instead of full OS paths. Saves memory and keeps LOGICAL_INUM_CACHE_MAX_PATH_LEN evaluating only the meaningful portion of the path.

  • Bump LOGICAL_INUM_CACHE_LEN from 3000 to 50000. Combined with break- on-first-empty-slot optimization in scan loops, lookups remain bounded by actual fill level, not array size.

  • Opportunistically cache visited directories during inum searches with always_cache=false (only fills empty slots; never evicts useful ones).

  • Use alloc-before-evict pattern for inum_cache and dir_file_list_cache to avoid leaving slots in a stale "key present, data empty" state on malloc failure.

  • Add get_path_relative_to_base helper with debug assertion to verify the base_path prefix invariant in one place rather than scattering unchecked pointer arithmetic across four call sites.

  • Misc fixes: initialize target_inum and check tsk_malloc result in create_path_search_helper; check GetFullPathNameW return value in create_search_path_long_path.

  • Add comment to load_dir_and_file_lists_win documenting future FindFirstFileEx + FIND_FIRST_EX_LARGE_FETCH optimization (Win7+).

…improvements

- Add per-directory file-list cache (DIR_FILE_LIST_CACHE, 500 FIFO slots)
  to avoid repeatedly enumerating the same directory when resolving K
  files in one folder during inum-based searches.

- Store inum_cache paths relative to base_path instead of full OS paths.
  Saves memory and keeps LOGICAL_INUM_CACHE_MAX_PATH_LEN evaluating only
  the meaningful portion of the path.

- Bump LOGICAL_INUM_CACHE_LEN from 3000 to 50000. Combined with break-
  on-first-empty-slot optimization in scan loops, lookups remain bounded
  by actual fill level, not array size.

- Opportunistically cache visited directories during inum searches with
  always_cache=false (only fills empty slots; never evicts useful ones).

- Use alloc-before-evict pattern for inum_cache and dir_file_list_cache
  to avoid leaving slots in a stale "key present, data empty" state on
  malloc failure.

- Add get_path_relative_to_base helper with debug assertion to verify
  the base_path prefix invariant in one place rather than scattering
  unchecked pointer arithmetic across four call sites.

- Misc fixes: initialize target_inum and check tsk_malloc result in
  create_path_search_helper; check GetFullPathNameW return value in
  create_search_path_long_path.

- Add comment to load_dir_and_file_lists_win documenting future
  FindFirstFileEx + FIND_FIRST_EX_LARGE_FETCH optimization (Win7+).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two fixes to the logical fs cache code from the previous commit:

1. Treat dir_file_list_cache insertion as best-effort. By the time we
   reach the per-name allocations in get_or_load_cached_dir_files,
   file_names (the output parameter) is already populated with valid
   data from the disk enumeration. A malloc failure during cache
   insertion only loses the optimization for next time - the caller
   can proceed with the loaded data. Changed both error branches to
   free the partial temporaries and return TSK_OK instead of TSK_ERR.

2. Fix a pre-existing cache_path leak in load_path_from_inum. If
   find_path_for_inum_in_cache returned a non-NULL cache_path (cache
   hit on the parent dir but a_addr itself wasn't a directory) and
   create_inum_search_helper subsequently failed its tsk_malloc, we
   were returning NULL without freeing cache_path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant