Logical FS performance: per-directory file-list cache and inum cache …#39
Open
crayy8 wants to merge 2 commits into
Open
Logical FS performance: per-directory file-list cache and inum cache …#39crayy8 wants to merge 2 commits into
crayy8 wants to merge 2 commits into
Conversation
…improvements - Add per-directory file-list cache (DIR_FILE_LIST_CACHE, 500 FIFO slots) to avoid repeatedly enumerating the same directory when resolving K files in one folder during inum-based searches. - Store inum_cache paths relative to base_path instead of full OS paths. Saves memory and keeps LOGICAL_INUM_CACHE_MAX_PATH_LEN evaluating only the meaningful portion of the path. - Bump LOGICAL_INUM_CACHE_LEN from 3000 to 50000. Combined with break- on-first-empty-slot optimization in scan loops, lookups remain bounded by actual fill level, not array size. - Opportunistically cache visited directories during inum searches with always_cache=false (only fills empty slots; never evicts useful ones). - Use alloc-before-evict pattern for inum_cache and dir_file_list_cache to avoid leaving slots in a stale "key present, data empty" state on malloc failure. - Add get_path_relative_to_base helper with debug assertion to verify the base_path prefix invariant in one place rather than scattering unchecked pointer arithmetic across four call sites. - Misc fixes: initialize target_inum and check tsk_malloc result in create_path_search_helper; check GetFullPathNameW return value in create_search_path_long_path. - Add comment to load_dir_and_file_lists_win documenting future FindFirstFileEx + FIND_FIRST_EX_LARGE_FETCH optimization (Win7+). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two fixes to the logical fs cache code from the previous commit: 1. Treat dir_file_list_cache insertion as best-effort. By the time we reach the per-name allocations in get_or_load_cached_dir_files, file_names (the output parameter) is already populated with valid data from the disk enumeration. A malloc failure during cache insertion only loses the optimization for next time - the caller can proceed with the loaded data. Changed both error branches to free the partial temporaries and return TSK_OK instead of TSK_ERR. 2. Fix a pre-existing cache_path leak in load_path_from_inum. If find_path_for_inum_in_cache returned a non-NULL cache_path (cache hit on the parent dir but a_addr itself wasn't a directory) and create_inum_search_helper subsequently failed its tsk_malloc, we were returning NULL without freeing cache_path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…improvements
Add per-directory file-list cache (DIR_FILE_LIST_CACHE, 500 FIFO slots) to avoid repeatedly enumerating the same directory when resolving K files in one folder during inum-based searches.
Store inum_cache paths relative to base_path instead of full OS paths. Saves memory and keeps LOGICAL_INUM_CACHE_MAX_PATH_LEN evaluating only the meaningful portion of the path.
Bump LOGICAL_INUM_CACHE_LEN from 3000 to 50000. Combined with break- on-first-empty-slot optimization in scan loops, lookups remain bounded by actual fill level, not array size.
Opportunistically cache visited directories during inum searches with always_cache=false (only fills empty slots; never evicts useful ones).
Use alloc-before-evict pattern for inum_cache and dir_file_list_cache to avoid leaving slots in a stale "key present, data empty" state on malloc failure.
Add get_path_relative_to_base helper with debug assertion to verify the base_path prefix invariant in one place rather than scattering unchecked pointer arithmetic across four call sites.
Misc fixes: initialize target_inum and check tsk_malloc result in create_path_search_helper; check GetFullPathNameW return value in create_search_path_long_path.
Add comment to load_dir_and_file_lists_win documenting future FindFirstFileEx + FIND_FIRST_EX_LARGE_FETCH optimization (Win7+).