Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions MIGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -1003,6 +1003,42 @@ new_ids = {doc.id for doc in new_documents}
store.delete_documents([doc.id for doc in old_documents if doc.id not in new_ids])
```

### Haystack logging no longer reconfigures logging for the whole process

**What changed:** Importing Haystack no longer attaches its formatting handler to the root logger, and no longer
configures `structlog` process-wide. The handler is now scoped to Haystack's own logger namespaces (`haystack`,
`haystack_integrations`, `haystack_experimental`), and the global `structlog` configuration is set only when you call
`configure_logging()` explicitly. As a result, importing Haystack no longer reformats the logs of the host application
or other libraries running in the same process.

**Why:** Haystack should behave as a well-mannered library when it runs alongside other services in the same process,
rather than taking over logging for the whole process.

**How to migrate:** If you relied on Haystack formatting every log record in the process, opt back in explicitly.

Before (v2.x):
```python
import haystack # formatted every log record in the process and configured structlog globally
```

After (v3.0):
```python
from haystack import logging

# Restore the old behavior: format every log record in the process (also configures structlog globally).
logging.configure_logging(logger_name="")
```

**Note on duplicate log lines:** Haystack's handler now sits on the `haystack.*` loggers, which still propagate to the
root logger. If the host application also configures a handler on the root logger, Haystack's own logs can appear
twice. To make Haystack fully own its output and stop the duplication, disable propagation:

```python
from haystack import logging

logging.configure_logging(propagate=False)
```

### Components now resolve API keys at warm-up

**What changed:** Components that use external services now create their resources (such as API clients) during `warm_up()` instead of in `__init__`. As a consequence, a missing API key (for example, an unset environment variable behind a `Secret.from_env_var` default) is now reported at warm-up or first run rather than at construction. This affects OpenAI and Azure OpenAI components.
Expand Down
7 changes: 4 additions & 3 deletions haystack/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@
from haystack.dataclasses import Answer, Document, ExtractedAnswer, GeneratedAnswer
from haystack.version import __version__ # noqa: F401

# Initialize the logging configuration
# This is a no-op unless `structlog` is installed
haystack.logging.configure_logging()
# Initialize the logging configuration.
# This is a no-op unless `structlog` is installed. `configure_structlog=False` means we only install our own scoped
# logging handler (so Haystack's logs are formatted) without touching the process-global `structlog` configuration.
haystack.logging.configure_logging(configure_structlog=False)

__all__ = [
"Answer",
Expand Down
114 changes: 85 additions & 29 deletions haystack/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,29 @@
import os
import sys
import typing
from collections.abc import Sequence
from typing import Any

import haystack.utils.jupyter

if typing.TYPE_CHECKING:
from structlog.typing import EventDict, Processor, WrappedLogger

HAYSTACK_LOGGING_USE_JSON_ENV_VAR = "HAYSTACK_LOGGING_USE_JSON"
HAYSTACK_LOGGING_IGNORE_STRUCTLOG_ENV_VAR = "HAYSTACK_LOGGING_IGNORE_STRUCTLOG"

# Attribute set on a logger once we have patched its methods. `logging.getLogger` returns a shared singleton, so we
# use this marker to patch each logger only once and avoid wrapping the already-wrapped methods on repeated calls.
_PATCHED_MARKER = "_haystack_patched"

# Name of the formatting handler we install. We use it to find and remove our own handler across (re)configurations.
_HANDLER_NAME = "HaystackLoggingHandler"


def _is_haystack_logging_handler(handler: logging.Handler) -> bool:
"""Return whether the given handler is the one installed by `configure_logging`."""
return isinstance(handler, logging.StreamHandler) and getattr(handler, "name", None) == _HANDLER_NAME


class PatchedLogger(typing.Protocol):
"""Class which enables using type checkers to find wrong logger usage."""
Expand Down Expand Up @@ -220,16 +235,12 @@ def _patch_structlog_call_information(logger: logging.Logger) -> None:
if not isinstance(logger, _FixedFindCallerLogger):
return

# completely copied from structlog. We only add `haystack.logging` to the list of ignored frames
# Copied from structlog's `_FixedFindCallerLogger.findCaller`, adding `haystack.logging` to the ignored
# frames. We don't forward `stacklevel` to `_find_first_app_frame_and_name` (added in structlog 25.5.0):
# structlog is optional and may be older.
def findCaller(stack_info: bool = False, stacklevel: int = 1) -> tuple[str, int, str, str | None]: # noqa: ARG001
try:
sinfo: str | None
# we need to exclude `haystack.logging` from the stack
f, name = _find_first_app_frame_and_name(["logging", "haystack.logging"])
sinfo = _format_stack(f) if stack_info else None
except Exception as error:
print(f"Error in findCaller: {error}")

f, _name = _find_first_app_frame_and_name(["logging", "haystack.logging"])
sinfo = _format_stack(f) if stack_info else None
return f.f_code.co_filename, f.f_lineno, f.f_code.co_name, sinfo

logger.findCaller = findCaller # type: ignore
Expand All @@ -247,6 +258,11 @@ def getLogger(name: str) -> PatchedLogger:
- it makes structure logging effective, not just an available feature
"""
logger = logging.getLogger(name)
if getattr(logger, _PATCHED_MARKER, False):
# Already patched: `logging.getLogger` returned the same singleton, so re-patching would stack the wrappers
# and interpolate the message more than once.
return typing.cast(PatchedLogger, logger)

logger.debug = patch_log_method_to_kwargs_only(logger.debug) # type: ignore
logger.info = patch_log_method_to_kwargs_only(logger.info) # type: ignore
logger.warn = patch_log_method_to_kwargs_only(logger.warn) # type: ignore
Expand All @@ -262,6 +278,8 @@ def getLogger(name: str) -> PatchedLogger:
# We also patch the `makeRecord` method to use keyword string interpolation
logger.makeRecord = patch_make_records_to_use_kwarg_string_interpolation(logger.makeRecord) # type: ignore

setattr(logger, _PATCHED_MARKER, True)

return typing.cast(PatchedLogger, logger)


Expand Down Expand Up @@ -295,7 +313,12 @@ def correlate_logs_with_traces(_: "WrappedLogger", __: str, event_dict: "EventDi
return event_dict


def configure_logging(use_json: bool | None = None) -> None:
def configure_logging(
use_json: bool | None = None,
logger_name: str | Sequence[str] = ("haystack", "haystack_integrations", "haystack_experimental"),
propagate: bool = True,
configure_structlog: bool = True,
) -> None:
"""
Configure logging for Haystack.

Expand All @@ -306,9 +329,27 @@ def configure_logging(use_json: bool | None = None) -> None:
- If `structlog` is installed, you can JSON format all logs. Enable this by
- setting the `use_json` parameter to `True` when calling this function
- setting the environment variable `HAYSTACK_LOGGING_USE_JSON` to `true`
"""
import haystack.utils.jupyter # to avoid circular imports

:param use_json: Whether to format logs as JSON. If `None`, we try to guess based on the environment.
:param logger_name:
The name (or names) of the logger our formatting handler is attached to. Defaults to Haystack's own
namespaces (`"haystack"`, `"haystack_integrations"` and `"haystack_experimental"`), so that we only touch
Haystack's own loggers and leave the logging configuration of the host application and any other libraries
running in the same process untouched. Pass an empty string (`""`) to attach the handler to the root logger
instead - this restores the legacy behavior of formatting *every* log record in the process.
:param propagate:
Whether the configured loggers should propagate their records to ancestor loggers (ultimately the root
logger). The default (`True`) keeps records flowing to handlers configured by the host application and to
capturing tools such as `pytest`'s `caplog`. Set it to `False` to make Haystack fully own the output of its
own logs - this avoids duplicate log lines when the host application also configures the root logger. It has
no effect when `logger_name=""` (the root logger has no ancestors).
:param configure_structlog:
Whether to configure the process-global `structlog` (thereby taking over any configuration set up by someone
else). The default (`True`) is what an explicit call should do. Pass `False` (as the import-time call in
`haystack/__init__.py` does) to leave the global `structlog` configuration untouched and only install our own
scoped handler so Haystack's own logs are formatted. This keeps merely importing Haystack from reconfiguring
`structlog` for the host application's own native `structlog` loggers.
"""
try:
import structlog
from structlog.processors import ExceptionRenderer
Expand Down Expand Up @@ -351,13 +392,21 @@ def configure_logging(use_json: bool | None = None) -> None:
# We only need that in sophisticated production setups where we want to correlate logs with traces
shared_processors.append(correlate_logs_with_traces)

structlog.configure(
processors=shared_processors + [structlog.stdlib.ProcessorFormatter.wrap_for_formatter],
logger_factory=structlog.stdlib.LoggerFactory(ignore_frame_names=["haystack.logging"]),
cache_logger_on_first_use=True,
# This is a filter that will filter out log entries that are below the log level of the root logger.
wrapper_class=structlog.make_filtering_bound_logger(min_level=logging.root.getEffectiveLevel()),
)
# `structlog.configure` is process-global: it affects every native structlog logger, not just Haystack's. We only
# configure it when explicitly asked (`configure_structlog`).
if configure_structlog:
structlog.configure(
# `filter_by_level` reads the effective level from the underlying stdlib logger on *every* call, so
# changes to the log level made after `configure_logging` runs (e.g. by the host app) are respected.
processors=[
structlog.stdlib.filter_by_level,
*shared_processors,
structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
],
logger_factory=structlog.stdlib.LoggerFactory(ignore_frame_names=["haystack.logging"]),
cache_logger_on_first_use=True,
wrapper_class=structlog.stdlib.BoundLogger,
)

renderers: list[Processor]
if use_json:
Expand Down Expand Up @@ -388,16 +437,23 @@ def configure_logging(use_json: bool | None = None) -> None:
)

handler = logging.StreamHandler()
handler.name = "HaystackLoggingHandler"
handler.name = _HANDLER_NAME
# Use OUR `ProcessorFormatter` to format all `logging` entries.
handler.setFormatter(formatter)

root_logger = logging.getLogger()
# avoid adding our handler twice
old_handlers = [
h
for h in root_logger.handlers
if not (isinstance(h, logging.StreamHandler) and h.name == "HaystackLoggingHandler")
]
new_handlers = [handler, *old_handlers]
root_logger.handlers = new_handlers
# Attach the handler to the target logger(s) - Haystack's own namespaces by default (see `logger_name`).
logger_names = [logger_name] if isinstance(logger_name, str) else list(logger_name)

# Remove our handler from every logger that carries it before re-installing: keeps re-configuration idempotent and
# prevents double emission when the target changes (e.g. switching to the root logger via `logger_name=""`).
existing_loggers = [logging.getLogger(), *logging.Logger.manager.loggerDict.values()]
for existing_logger in existing_loggers:
if isinstance(existing_logger, logging.Logger) and any(
_is_haystack_logging_handler(h) for h in existing_logger.handlers
):
existing_logger.handlers = [h for h in existing_logger.handlers if not _is_haystack_logging_handler(h)]

for name in logger_names:
target_logger = logging.getLogger(name)
target_logger.handlers = [handler, *target_logger.handlers]
target_logger.propagate = propagate
5 changes: 4 additions & 1 deletion haystack/utils/requests_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
import httpx
from tenacity import after_log, before_log, retry, retry_if_exception_type, stop_after_attempt, wait_exponential

logger = logging.getLogger(__file__)
# NOTE: this uses the standard library logger (not `haystack.logging`) on purpose: tenacity's `before_log`/`after_log`
# call the logger with positional arguments, which Haystack's keyword-only patched logger would reject. We still name
# it with `__name__` so it lives under the `haystack` namespace and is picked up by `configure_logging`.
logger = logging.getLogger(__name__)


def request_with_retry(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
upgrade:
- |
``haystack.logging.configure_logging`` now attaches its formatting handler to Haystack's own logger namespaces
(``haystack``, ``haystack_integrations`` and ``haystack_experimental``) instead of the root logger. This makes
Haystack a better-behaved library when it runs next to other services in the same process: the log records of the
host application and other libraries are no longer reformatted or duplicated by Haystack. To restore the previous
behavior of formatting every log record in the process, call ``configure_logging(logger_name="")``.
- |
Importing Haystack no longer configures ``structlog`` process-wide. The import-time setup now only installs
Haystack's own scoped logging handler (so Haystack's logs are still formatted) and leaves the global ``structlog``
configuration untouched, so the host application's own native ``structlog`` loggers are no longer reformatted by
merely importing Haystack. ``structlog`` is configured only when you call
``configure_logging(configure_structlog=True)``.
fixes:
- |
``haystack.logging.configure_logging`` no longer freezes the log level at import time. Previously the level was
captured once when the function ran (at ``import haystack``), so a log level set later by the application was
ignored for ``structlog``-native loggers. The effective level is now read on every log call.
- |
Fixed the logger in ``haystack.utils.requests_utils`` being named after the module's file path instead of
``haystack.utils.requests_utils``, which kept its records outside the ``haystack`` logger namespace.
- |
``haystack.logging.getLogger`` is now idempotent. Previously, calling it more than once for the same logger name
wrapped the already-wrapped logger methods again, which caused the log message to be run through ``str.format``
once per call. As a result a field value containing ``{...}`` could be re-interpolated and pull in the value of
another field. Each logger is now patched only once.
- |
The patched ``findCaller`` used to determine the source of a log entry no longer prints to stdout and no longer
masks errors with a misleading ``NameError``, matching structlog's own ``findCaller`` implementation.
features:
- |
``haystack.logging.configure_logging`` gained three parameters: ``logger_name`` to choose which logger(s) the
formatting handler is attached to, ``propagate`` to control whether Haystack's loggers propagate their records to
ancestor loggers, and ``configure_structlog`` to control whether the call configures the process-global
``structlog`` (taking over any existing configuration). Set ``propagate=False`` to let Haystack fully own the
output of its own logs and avoid duplicate log lines when the host application also configures the root logger.
Loading
Loading