Skip to content

Commit 60fee63

Browse files
Fix/consistent path normalization (#73)
* fix: improve path handling in sync client pull operation * Improve and make path handling more consistent in call and log overloads * refactor(tests): update normalize_path test to cover strip_extension parameter * fix: improve path validation for SDK calls - Enhance safety in path extension handling - Fix path validation tests to work with actual error messages - Prioritize extension validation over other path format issues - Ensure consistent error handling across prompt and agent clients * feat: add path utils for centralized path validation * Further refined path handling and added more tests * refactor: use pytest tmp_path fixture to isolate test file operations This change: - Replaces hardcoded directories with pytest's tmp_path fixture - Eliminates filesystem pollution between tests - Enables safe test parallelization - Improves path validation and error messages - Removes redundant cleanup code - Makes tests follow pytest best practices * test: fix type errors in local file operations test by using proper ChatMessageParams type * docs(cli): clarify SyncClient log level control and OpenTelemetry isolation * refactor: simplified path processing to use pathlib where possible * docs: improve comment clarity * test: improve path normalization tests with parametrize and edge cases * docs: clarify pull_file docstring with failure example * docs: expand normalize_path docstring with usage and examples - Add detailed explanation of function's purpose and usage contexts\n- Document rationale for stripping leading/trailing slashes\n- Add comprehensive examples for different path formats\n- Reference SyncClient.pull usage for context * refactor: SyncClient -> FileSyncer * fix(sync): Convert base_dir to string in FileSyncer fixture * fix(dosc): correct logger namespace reference * docs: Improve error messages and comments in FileSyncer * refactor(test): use pytest.mark.parametrize for path validation tests Consolidate the three similar test loops (extension paths, slash paths, and combined paths) into a single parametrized test. This reduces code duplication while making test cases more maintainable and test output more descriptive. - Replace repetitive test loops with @pytest.mark.parametrize - Use descriptive test IDs for better test output - Group test cases by type with clear comments - Make parameter names more explicit (path_generator, test_case_description) * fix(test): use proper typing.Callable for path generator
1 parent e61d4f7 commit 60fee63

File tree

19 files changed

+976
-496
lines changed

19 files changed

+976
-496
lines changed

.fernignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
src/humanloop/evals
77
src/humanloop/prompt_utils.py
8+
src/humanloop/path_utils.py
89
src/humanloop/client.py
910
src/humanloop/overload.py
1011
src/humanloop/context.py

poetry.lock

Lines changed: 109 additions & 20 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/humanloop/cli/__main__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from dotenv import load_dotenv
1010

1111
from humanloop import Humanloop
12-
from humanloop.sync.sync_client import SyncClient
12+
from humanloop.sync.file_syncer import FileSyncer
1313

1414
# Set up logging
1515
logger = logging.getLogger(__name__)
@@ -154,6 +154,7 @@ def cli(): # Does nothing because used as a group for other subcommands (pull,
154154
"-p",
155155
help="Path in the Humanloop workspace to pull from (file or directory). You can pull an entire directory (e.g. 'my/directory') "
156156
"or a specific file (e.g. 'my/directory/my_prompt.prompt'). When pulling a directory, all files within that directory and its subdirectories will be included. "
157+
"Paths should not contain leading or trailing slashes. "
157158
"If not specified, pulls from the root of the remote workspace.",
158159
default=None,
159160
)
@@ -218,7 +219,12 @@ def pull(
218219
219220
Currently only supports syncing Prompt and Agent files. Other file types will be skipped."""
220221
client = get_client(api_key, env_file, base_url)
221-
sync_client = SyncClient(
222+
# Although pull() is available on the Humanloop client, we instantiate FileSyncer separately to control its log level.
223+
# This allows CLI users to toggle between detailed logging (--verbose) and minimal output without affecting the
224+
# main Humanloop client logger. The FileSyncer uses its own logger namespace (humanloop.sdk.file_syncer), making this
225+
# modification isolated from the client's OpenTelemetry setup. This client instance is short-lived and only
226+
# exists for the duration of the CLI command execution.
227+
file_syncer = FileSyncer(
222228
client, base_dir=local_files_directory, log_level=logging.DEBUG if verbose else logging.WARNING
223229
)
224230

@@ -227,7 +233,7 @@ def pull(
227233
click.echo(click.style(f"Environment: {environment or '(default)'}", fg=INFO_COLOR))
228234

229235
start_time = time.time()
230-
successful_files, failed_files = sync_client.pull(path, environment)
236+
successful_files, failed_files = file_syncer.pull(path, environment)
231237
duration_ms = int((time.time() - start_time) * 1000)
232238

233239
# Determine if the operation was successful based on failed_files

src/humanloop/client.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
from humanloop.overload import overload_client
2929
from humanloop.prompt_utils import populate_template
3030
from humanloop.prompts.client import PromptsClient
31-
from humanloop.sync.sync_client import DEFAULT_CACHE_SIZE, SyncClient
31+
from humanloop.sync.file_syncer import DEFAULT_CACHE_SIZE, FileSyncer
3232

3333
logger = logging.getLogger("humanloop.sdk")
3434

@@ -158,7 +158,7 @@ def __init__(
158158
)
159159

160160
# Check if cache_size is non-default but use_local_files is False
161-
self._sync_client = SyncClient(client=self, base_dir=local_files_directory, cache_size=cache_size)
161+
self._file_syncer = FileSyncer(client=self, base_dir=local_files_directory, cache_size=cache_size)
162162
eval_client = ExtendedEvalsClient(client_wrapper=self._client_wrapper)
163163
eval_client.client = self
164164
self.evaluations = eval_client
@@ -168,10 +168,10 @@ def __init__(
168168
# and the @flow decorator providing the trace_id
169169
# Additionally, call and log methods are overloaded in the prompts and agents client to support the use of local files
170170
self.prompts = overload_client(
171-
client=self.prompts, sync_client=self._sync_client, use_local_files=self.use_local_files
171+
client=self.prompts, file_syncer=self._file_syncer, use_local_files=self.use_local_files
172172
)
173173
self.agents = overload_client(
174-
client=self.agents, sync_client=self._sync_client, use_local_files=self.use_local_files
174+
client=self.agents, file_syncer=self._file_syncer, use_local_files=self.use_local_files
175175
)
176176
self.flows = overload_client(client=self.flows)
177177
self.tools = overload_client(client=self.tools)
@@ -439,7 +439,7 @@ def pull(self, path: Optional[str] = None, environment: Optional[str] = None) ->
439439
or filesystem issues)
440440
:raises HumanloopRuntimeError: If there's an error communicating with the API
441441
"""
442-
return self._sync_client.pull(environment=environment, path=path)
442+
return self._file_syncer.pull(environment=environment, path=path)
443443

444444

445445
class AsyncHumanloop(AsyncBaseHumanloop):

src/humanloop/overload.py

Lines changed: 86 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import inspect
22
import logging
33
import types
4+
from pathlib import Path
45
from typing import Any, Callable, Dict, Optional, TypeVar, Union
56

67
from humanloop.agents.client import AgentsClient
@@ -14,7 +15,7 @@
1415
from humanloop.evaluators.client import EvaluatorsClient
1516
from humanloop.flows.client import FlowsClient
1617
from humanloop.prompts.client import PromptsClient
17-
from humanloop.sync.sync_client import SyncClient
18+
from humanloop.sync.file_syncer import FileSyncer
1819
from humanloop.tools.client import ToolsClient
1920
from humanloop.types import FileType
2021
from humanloop.types.agent_call_response import AgentCallResponse
@@ -65,7 +66,7 @@ def _get_file_type_from_client(
6566

6667
def _handle_tracing_context(kwargs: Dict[str, Any], client: T) -> Dict[str, Any]:
6768
"""Handle tracing context for both log and call methods."""
68-
trace_id = get_trace_id()
69+
trace_id = get_trace_id()
6970
if trace_id is not None:
7071
if "flow" in str(type(client).__name__).lower():
7172
context = get_decorator_context()
@@ -91,45 +92,81 @@ def _handle_tracing_context(kwargs: Dict[str, Any], client: T) -> Dict[str, Any]
9192
def _handle_local_files(
9293
kwargs: Dict[str, Any],
9394
client: T,
94-
sync_client: Optional[SyncClient],
95-
use_local_files: bool,
95+
file_syncer: FileSyncer,
9696
) -> Dict[str, Any]:
97-
"""Handle local file loading if enabled."""
98-
if not use_local_files or "path" not in kwargs or sync_client is None:
99-
return kwargs
97+
"""Load prompt/agent file content from local filesystem into API request.
98+
99+
Retrieves the file content at the specified path and adds it to kwargs
100+
under the appropriate field ('prompt' or 'agent'), allowing local files
101+
to be used in API calls instead of fetching from Humanloop API.
102+
103+
Args:
104+
kwargs: API call arguments
105+
client: Client instance making the call
106+
file_syncer: FileSyncer handling local file operations
107+
108+
Returns:
109+
Updated kwargs with file content in prompt/agent field
100110
111+
Raises:
112+
HumanloopRuntimeError: On validation or file loading failures.
113+
For example, an invalid path format (absolute paths, leading/trailing slashes, etc.) or a file not being found.
114+
"""
101115
if "id" in kwargs:
102116
raise HumanloopRuntimeError("Can only specify one of `id` or `path`")
103117

118+
path = kwargs["path"]
119+
120+
# First check for path format issues (absolute paths or leading/trailing slashes)
121+
normalized_path = path.strip("/")
122+
if Path(path).is_absolute() or path != normalized_path:
123+
raise HumanloopRuntimeError(
124+
f"Path '{path}' format is invalid. "
125+
f"Paths must follow the standard API format 'path/to/resource' without leading or trailing slashes. "
126+
f"Please use '{normalized_path}' instead."
127+
)
128+
129+
# Then check for file extensions
130+
if file_syncer.is_file(path):
131+
# Extract the path without extension to suggest correct format in the error message
132+
path_without_extension = str(Path(path).with_suffix(""))
133+
134+
# Always raise error when file extension is detected (based on the outer if condition)
135+
raise HumanloopRuntimeError(
136+
f"Path '{path}' includes a file extension which is not supported in API calls. "
137+
f"When referencing files via the `path` parameter, use the path without extensions: '{path_without_extension}'. "
138+
f"Note: File extensions are only used when pulling specific files via the CLI."
139+
)
140+
104141
# Check if version_id or environment is specified
105142
use_remote = any(["version_id" in kwargs, "environment" in kwargs])
106-
normalized_path = sync_client._normalize_path(kwargs["path"])
107143

108144
if use_remote:
109145
raise HumanloopRuntimeError(
110-
f"Cannot use local file for `{normalized_path}` as version_id or environment was specified. "
146+
f"Cannot use local file for `{path}` as version_id or environment was specified. "
111147
"Please either remove version_id/environment to use local files, or set use_local_files=False to use remote files."
112148
)
113149

114150
file_type = _get_file_type_from_client(client)
115-
if file_type not in SyncClient.SERIALIZABLE_FILE_TYPES:
116-
raise HumanloopRuntimeError(f"Local files are not supported for `{file_type}` files.")
151+
if file_type not in FileSyncer.SERIALIZABLE_FILE_TYPES:
152+
raise HumanloopRuntimeError(f"Local files are not supported for `{file_type.capitalize()}` files: '{path}'.")
117153

118-
# If file_type is already specified in kwargs, it means user provided a PromptKernelRequestParams object
154+
# If file_type is already specified in kwargs (`prompt` or `agent`), it means user provided a Prompt- or AgentKernelRequestParams object
155+
# In this case, we should prioritize the user-provided value over the local file content.
119156
if file_type in kwargs and not isinstance(kwargs[file_type], str):
120157
logger.warning(
121-
f"Ignoring local file for `{normalized_path}` as {file_type} parameters were directly provided. "
158+
f"Ignoring local file for `{path}` as {file_type} parameters were directly provided. "
122159
"Using provided parameters instead."
123160
)
124161
return kwargs
125162

126163
try:
127-
file_content = sync_client.get_file_content(normalized_path, file_type) # type: ignore[arg-type] # file_type was checked above
164+
file_content = file_syncer.get_file_content(path, file_type) # type: ignore[arg-type] # file_type was checked above
128165
kwargs[file_type] = file_content
129-
except HumanloopRuntimeError as e:
130-
raise HumanloopRuntimeError(f"Failed to use local file for `{normalized_path}`: {str(e)}")
131166

132-
return kwargs
167+
return kwargs
168+
except HumanloopRuntimeError as e:
169+
raise HumanloopRuntimeError(f"Failed to use local file for `{path}`: {str(e)}")
133170

134171

135172
def _handle_evaluation_context(kwargs: Dict[str, Any]) -> tuple[Dict[str, Any], Optional[Callable[[str], None]]]:
@@ -140,7 +177,7 @@ def _handle_evaluation_context(kwargs: Dict[str, Any]) -> tuple[Dict[str, Any],
140177
return kwargs, None
141178

142179

143-
def _overload_log(self: T, sync_client: Optional[SyncClient], use_local_files: bool, **kwargs) -> LogResponseType:
180+
def _overload_log(self: T, file_syncer: Optional[FileSyncer], use_local_files: bool, **kwargs) -> LogResponseType:
144181
try:
145182
# Special handling for flows - prevent direct log usage
146183
if type(self) is FlowsClient and get_trace_id() is not None:
@@ -154,12 +191,20 @@ def _overload_log(self: T, sync_client: Optional[SyncClient], use_local_files: b
154191

155192
kwargs = _handle_tracing_context(kwargs, self)
156193

157-
# Handle local files for Prompts and Agents clients
158-
if _get_file_type_from_client(self) in ["prompt", "agent"]:
159-
if sync_client is None:
160-
logger.error("sync_client is None but client has log method and use_local_files=%s", use_local_files)
161-
raise HumanloopRuntimeError("sync_client is required for clients that support local file operations")
162-
kwargs = _handle_local_files(kwargs, self, sync_client, use_local_files)
194+
# Handle loading files from local filesystem when using Prompt and Agent clients
195+
# This enables users to define prompts/agents in local files rather than fetching from the Humanloop API
196+
if use_local_files and _get_file_type_from_client(self) in FileSyncer.SERIALIZABLE_FILE_TYPES:
197+
# Developer note: file_syncer should always be provided during SDK initialization when
198+
# use_local_files=True. If we hit this error, there's likely an initialization issue
199+
# in Humanloop.__init__ where the file_syncer wasn't properly created or passed to the
200+
# overload_client function.
201+
if file_syncer is None:
202+
logger.error("file_syncer is None but client has log method and use_local_files=%s", use_local_files)
203+
raise HumanloopRuntimeError(
204+
"SDK initialization error: file_syncer is missing but required for local file operations. "
205+
"This is likely a bug in the SDK initialization - please report this issue to the Humanloop team."
206+
)
207+
kwargs = _handle_local_files(kwargs, self, file_syncer)
163208

164209
kwargs, eval_callback = _handle_evaluation_context(kwargs)
165210
response = self._log(**kwargs) # type: ignore[union-attr] # Use stored original method
@@ -174,10 +219,16 @@ def _overload_log(self: T, sync_client: Optional[SyncClient], use_local_files: b
174219
raise HumanloopRuntimeError from e
175220

176221

177-
def _overload_call(self: T, sync_client: Optional[SyncClient], use_local_files: bool, **kwargs) -> CallResponseType:
222+
def _overload_call(self: T, file_syncer: Optional[FileSyncer], use_local_files: bool, **kwargs) -> CallResponseType:
178223
try:
179224
kwargs = _handle_tracing_context(kwargs, self)
180-
kwargs = _handle_local_files(kwargs, self, sync_client, use_local_files)
225+
# If `use_local_files` flag is True, we should use local file content for `call` operations on Prompt and Agent clients.
226+
if use_local_files and _get_file_type_from_client(self) in FileSyncer.SERIALIZABLE_FILE_TYPES:
227+
# Same file_syncer requirement as in _overload_log - see developer note there
228+
if file_syncer is None:
229+
logger.error("file_syncer is None but client has call method and use_local_files=%s", use_local_files)
230+
raise HumanloopRuntimeError("file_syncer is required for clients that support call operations")
231+
kwargs = _handle_local_files(kwargs, self, file_syncer)
181232
return self._call(**kwargs) # type: ignore[union-attr] # Use stored original method
182233
except HumanloopRuntimeError:
183234
# Re-raise HumanloopRuntimeError without wrapping to preserve the message
@@ -189,7 +240,7 @@ def _overload_call(self: T, sync_client: Optional[SyncClient], use_local_files:
189240

190241
def overload_client(
191242
client: T,
192-
sync_client: Optional[SyncClient] = None,
243+
file_syncer: Optional[FileSyncer] = None,
193244
use_local_files: bool = False,
194245
) -> T:
195246
"""Overloads client methods to add tracing, local file handling, and evaluation context."""
@@ -198,25 +249,25 @@ def overload_client(
198249
# Store original method with type ignore
199250
client._log = client.log # type: ignore
200251

201-
# Create a closure to capture sync_client and use_local_files
252+
# Create a closure to capture file_syncer and use_local_files
202253
def log_wrapper(self: T, **kwargs) -> LogResponseType:
203-
return _overload_log(self, sync_client, use_local_files, **kwargs)
254+
return _overload_log(self, file_syncer, use_local_files, **kwargs)
204255

205256
# Replace the log method with type ignore
206257
client.log = types.MethodType(log_wrapper, client) # type: ignore
207258

208259
# Overload call method for Prompt and Agent clients
209-
if _get_file_type_from_client(client) in ["prompt", "agent"]:
210-
if sync_client is None and use_local_files:
211-
logger.error("sync_client is None but client has call method and use_local_files=%s", use_local_files)
212-
raise HumanloopRuntimeError("sync_client is required for clients that support call operations")
260+
if _get_file_type_from_client(client) in FileSyncer.SERIALIZABLE_FILE_TYPES:
261+
if file_syncer is None and use_local_files:
262+
logger.error("file_syncer is None but client has call method and use_local_files=%s", use_local_files)
263+
raise HumanloopRuntimeError("file_syncer is required for clients that support call operations")
213264
if hasattr(client, "call") and not hasattr(client, "_call"):
214265
# Store original method with type ignore
215266
client._call = client.call # type: ignore
216267

217-
# Create a closure to capture sync_client and use_local_files
268+
# Create a closure to capture file_syncer and use_local_files
218269
def call_wrapper(self: T, **kwargs) -> CallResponseType:
219-
return _overload_call(self, sync_client, use_local_files, **kwargs)
270+
return _overload_call(self, file_syncer, use_local_files, **kwargs)
220271

221272
# Replace the call method with type ignore
222273
client.call = types.MethodType(call_wrapper, client) # type: ignore

src/humanloop/path_utils.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
from pathlib import Path
2+
3+
4+
def normalize_path(path: str, strip_extension: bool = False) -> str:
5+
"""Normalize a path to the standard Humanloop API format.
6+
7+
This function is primarily used when interacting with the Humanloop API to ensure paths
8+
follow the standard format: 'path/to/resource' without leading/trailing slashes.
9+
It's used when pulling files from Humanloop to local filesystem (see FileSyncer.pull)
10+
11+
The function:
12+
- Converts Windows backslashes to forward slashes
13+
- Normalizes consecutive slashes
14+
- Optionally strips file extensions (e.g. .prompt, .agent)
15+
- Removes leading/trailing slashes to match API conventions
16+
17+
Leading/trailing slashes are stripped because the Humanloop API expects paths in the
18+
format 'path/to/resource' without them. This is consistent with how the API stores
19+
and references files, and ensures paths work correctly in both API calls and local
20+
filesystem operations.
21+
22+
Args:
23+
path: The path to normalize. Can be a Windows or Unix-style path.
24+
strip_extension: If True, removes the file extension (e.g. .prompt, .agent)
25+
26+
Returns:
27+
Normalized path string in the format 'path/to/resource'
28+
29+
Examples:
30+
>>> normalize_path("path/to/file.prompt")
31+
'path/to/file.prompt'
32+
>>> normalize_path("path/to/file.prompt", strip_extension=True)
33+
'path/to/file'
34+
>>> normalize_path("\\windows\\style\\path.prompt")
35+
'windows/style/path.prompt'
36+
>>> normalize_path("/leading/slash/path/")
37+
'leading/slash/path'
38+
>>> normalize_path("multiple//slashes//path")
39+
'multiple/slashes/path'
40+
"""
41+
# Handle backslashes for Windows paths before passing to PurePosixPath
42+
# This is needed because some backslash sequences are treated as escape chars
43+
path = path.replace("\\", "/")
44+
45+
# Use PurePosixPath to normalize the path (handles consecutive slashes)
46+
path_obj = Path(path)
47+
48+
# Strip extension if requested
49+
if strip_extension:
50+
path_obj = path_obj.with_suffix("")
51+
52+
# Convert to string and remove any leading/trailing slashes
53+
# We use the path as a string and not as_posix() since we've already normalized separators
54+
return str(path_obj).strip("/")

src/humanloop/sync/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
from humanloop.sync.sync_client import SyncClient
1+
from humanloop.sync.file_syncer import FileSyncer
22

3-
__all__ = ["SyncClient"]
3+
__all__ = ["FileSyncer"]

0 commit comments

Comments
 (0)