Skip to content

Further fixes to job submission#335

Draft
allison-truhlar wants to merge 12 commits intomainfrom
further-fixes-to-job-submission
Draft

Further fixes to job submission#335
allison-truhlar wants to merge 12 commits intomainfrom
further-fixes-to-job-submission

Conversation

@allison-truhlar
Copy link
Collaborator

This PR troubleshoots submitting jobs to the Janelia cluster when running the Fileglancer server as root.

Changes so far:

  1. Work directory fix - Resolved ~ to the actual user's home (~username), not root's home.
  2. seteuid job submission - Wrapped work dir creation, symlink, and executor.submit() in the EffectiveUserContext so filesystem operations happen as the user.
  3. Stale symlink fix - Unlink existing repo symlink before creating a new one (caused an issue when a job failed - the symlink wasn't removed).
  4. bsub -U username flag — Because bsub uses getuid() (real UID = root) not geteuid(), we pass -U username to tell LSF to run the job as the actual user. Note: unsure whether this is actually needed after adding the missing FGC_CLUSTER__EXECUTOR=lsf to the fileglancer-hub .env file — should test removing it once everything else is fixed
  5. cluster.extra_paths config option — Prepends directories to os.environ["PATH"] at server startup so bsub/bjobs/bkill are findable. Needed because systemd services have minimal PATH. Documented in config.yaml.template.

Where we are now (as of release 2.7.0a7, commit eff3c09)

  • bsub is found on PATH (no more FileNotFoundError), but it fails because LSF can't find its configuration

@krokicki

The work directory, repo symlink, and bsub/sbatch submission were all running as root. The repo cache (~/.fileglancer/apps/) must stay as root since users can't write there, so user_context is passed into submit_job and applied only around work_dir creation, symlink, and executor.submit.
os.path.expanduser("~") uses the real UID (root), ignoring seteuid.
Pass username to _build_work_dir so it expands ~username instead.
- work_dir can survive from a previous failed attempt with the same job ID
Adds a cluster.extra_paths config option that prepends directories to the
server's PATH at startup, so scheduler commands (bsub, bjobs, bkill) are
findable even when the system service has a minimal default PATH.
Configurable via .env (FGC_CLUSTER__EXTRA_PATHS) to avoid committing
environment-specific paths.
resources=resource_spec,
)
with user_context if user_context is not None else nullcontext():
work_dir.mkdir(parents=True, exist_ok=True)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 2 days ago

In general, to fix “uncontrolled data used in path expression” here, we need to ensure that any path derived from username cannot escape a controlled root directory. The standard pattern is: (1) compute a base directory that is not influenced by untrusted input, (2) build any subpaths under this base using path-joining, (3) normalise (resolve() / os.path.normpath) and verify that the resulting path is still anchored in the base directory. If the check fails, reject the operation.

For this code, the main problem is _build_work_dir, which currently uses home = os.path.expanduser(f"~{username}") and then returns a Path directly built from that. The cleanest fix that does not change external functionality is:

  • Define a fixed jobs root directory, e.g. _JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs")), which is already the conceptual location we use.
  • In _build_work_dir, derive home using the existing logic (so we still respect the username’s home directory), but then build the work directory as a subpath under _JOBS_BASE, not directly from home. For example, create a per-user subdirectory _JOBS_BASE / _sanitize_for_path(username) and then append the job-specific suffix.
  • Alternatively, if you must preserve the idea “jobs go under each user’s home”, you can still normalise and enforce that the final work_dir path is within that user’s home directory: compute base = Path(home).expanduser().resolve(), then work_dir = (base / ".fileglancer" / "jobs" / suffix).resolve(), and finally check if base not in work_dir.parents and base != work_dir: raise ValueError(...). This ensures no traversal outside the home directory even if username contains weird segments.

Because the function docstring explicitly says “Build a working directory path under ~/.fileglancer/jobs/.” and the rest of the code (e.g. comments around the repo cache) assumes a server-owned directory, the first option (anchoring everything under a process-home-rooted ~/.fileglancer/jobs) is more consistent and simpler. To keep backwards compatibility with existing stored work_dir paths that already look like ~user/.fileglancer/jobs/..., we should limit our change to newly-built paths and ensure they are deterministic; switching to a root-based layout is acceptable if consumers treat work_dir as opaque.

Implementation details:

  • In fileglancer/apps/core.py, near _REPO_CACHE_BASE, define _JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs")).

  • Update _build_work_dir to:

    • Keep sanitising app_name, entry_point_id, and optional job_name_prefix as it already does.
    • Derive a safe username component, e.g. safe_user = _sanitize_for_path(username) if username else "default".
    • Build user_jobs_root = (_JOBS_BASE / safe_user).resolve().
    • Build work_dir = (user_jobs_root / f"{prefix}{job_id}-{safe_app}-{safe_ep}").resolve().
    • Optionally, assert that work_dir is under user_jobs_root by checking if user_jobs_root not in work_dir.parents and user_jobs_root != work_dir: raise ValueError(...). Given our construction, this is mostly a sanity check against path injection via _sanitize_for_path changes.
  • All other uses of work_dir (such as work_dir.mkdir(...) and repo_link.symlink_to(cached_repo_dir)) remain unchanged; they simply operate on a now-guaranteed-safe path.

  • No changes are needed in server.py or to imports beyond adding _JOBS_BASE declaration.

This keeps functionality (per-job directories per app/entry point) while ensuring username cannot redirect work directories to arbitrary locations.

Suggested changeset 1
fileglancer/apps/core.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/fileglancer/apps/core.py b/fileglancer/apps/core.py
--- a/fileglancer/apps/core.py
+++ b/fileglancer/apps/core.py
@@ -28,6 +28,7 @@
 _MANIFEST_FILENAME = "runnables.yaml"
 
 _REPO_CACHE_BASE = Path(os.path.expanduser("~/.fileglancer/apps"))
+_JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs"))
 _repo_locks: dict[str, asyncio.Lock] = {}
 
 
@@ -795,16 +796,28 @@
                     username: Optional[str] = None) -> Path:
     """Build a working directory path under ~/.fileglancer/jobs/.
 
-    When username is provided, expands ~username to the user's home directory
-    instead of the server process's home (which is typically root).
+    When username is provided, uses a per-user subdirectory under the
+    server's jobs base directory (~/.fileglancer/jobs) rather than
+    constructing a path directly from the username.
     """
     safe_app = _sanitize_for_path(app_name)
     safe_ep = _sanitize_for_path(entry_point_id)
     prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
-    home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
-    return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")
 
+    # Derive a safe per-user root under the fixed jobs base directory.
+    safe_user = _sanitize_for_path(username) if username else "default"
+    user_jobs_root = (_JOBS_BASE / safe_user).resolve()
 
+    work_dir = (user_jobs_root / f"{prefix}{job_id}-{safe_app}-{safe_ep}").resolve()
+
+    # Ensure the final path is contained within the user_jobs_root to guard
+    # against any unexpected path manipulation.
+    if user_jobs_root != work_dir and user_jobs_root not in work_dir.parents:
+        raise ValueError(f"Invalid work directory path constructed for user '{username}'")
+
+    return work_dir
+
+
 async def submit_job(
     username: str,
     app_url: str,
EOF
@@ -28,6 +28,7 @@
_MANIFEST_FILENAME = "runnables.yaml"

_REPO_CACHE_BASE = Path(os.path.expanduser("~/.fileglancer/apps"))
_JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs"))
_repo_locks: dict[str, asyncio.Lock] = {}


@@ -795,16 +796,28 @@
username: Optional[str] = None) -> Path:
"""Build a working directory path under ~/.fileglancer/jobs/.

When username is provided, expands ~username to the user's home directory
instead of the server process's home (which is typically root).
When username is provided, uses a per-user subdirectory under the
server's jobs base directory (~/.fileglancer/jobs) rather than
constructing a path directly from the username.
"""
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")

# Derive a safe per-user root under the fixed jobs base directory.
safe_user = _sanitize_for_path(username) if username else "default"
user_jobs_root = (_JOBS_BASE / safe_user).resolve()

work_dir = (user_jobs_root / f"{prefix}{job_id}-{safe_app}-{safe_ep}").resolve()

# Ensure the final path is contained within the user_jobs_root to guard
# against any unexpected path manipulation.
if user_jobs_root != work_dir and user_jobs_root not in work_dir.parents:
raise ValueError(f"Invalid work directory path constructed for user '{username}'")

return work_dir


async def submit_job(
username: str,
app_url: str,
Copilot is powered by AI and may make mistakes. Always verify output.
with user_context if user_context is not None else nullcontext():
work_dir.mkdir(parents=True, exist_ok=True)
repo_link = work_dir / "repo"
if repo_link.is_symlink() or repo_link.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 2 days ago

General approach: Ensure that any user-influenced data used in filesystem paths is validated or sanitized so it cannot escape the intended directory structure. In this case, that means constraining username before using it in _build_work_dir, and ensuring that the resulting home directory is a real, safe user home and not a crafted path.

Best concrete fix without changing existing functionality:

  1. Introduce a small helper like _sanitize_username in fileglancer/apps/core.py that enforces a conservative pattern for usernames (e.g., alphanumeric plus -_.), rejects dangerous characters (/, \, whitespace, .., NUL, etc.), and raises ValueError if invalid.
  2. In _build_work_dir, call _sanitize_username(username) (when username is not None) before passing it to os.path.expanduser(f"~{username}"). This ensures that we only ever expand a well-formed username and cannot accidentally interpret a malicious path fragment as part of the tilde expression.
  3. Optionally, resolve the resulting home path with Path(home).expanduser().resolve() and fail early if that resolution behaves unexpectedly; however, given the constraints and the existing behavior, just sanitizing username is sufficient and minimally invasive.
  4. Because submit_job already propagates username into _build_work_dir, we do not need to change call sites—only the implementation of _build_work_dir and the new helper. Raising ValueError on invalid usernames is consistent with other validation patterns in this module (e.g., _parse_github_url uses ValueError), and submit_job already translates ValueError into HTTP 400.

Concretely:

  • In fileglancer/apps/core.py, add _sanitize_username near the other helpers (_sanitize_for_path, _parse_github_url area).
  • Update _build_work_dir so that if a username is given, we first sanitize it and then call os.path.expanduser(f"~{safe_username}"). The rest of _build_work_dir remains unchanged.

No new imports are required; we can use re which is already imported at the top of the file.


Suggested changeset 1
fileglancer/apps/core.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/fileglancer/apps/core.py b/fileglancer/apps/core.py
--- a/fileglancer/apps/core.py
+++ b/fileglancer/apps/core.py
@@ -39,6 +39,26 @@
     return _repo_locks[key]
 
 
+def _sanitize_username(username: str) -> str:
+    """Sanitize a username before using it in filesystem paths.
+
+    Only allow simple POSIX-style usernames (letters, digits, underscore, dash,
+    and dot). Reject values containing path separators, whitespace, or
+    traversal sequences.
+    """
+    if not username:
+        raise ValueError("Invalid username: empty")
+    # Disallow obvious dangerous content
+    if "/" in username or "\\" in username or "\x00" in username:
+        raise ValueError(f"Invalid username: '{username}' contains illegal characters")
+    if ".." in username:
+        raise ValueError(f"Invalid username: '{username}' contains '..'")
+    # Enforce a conservative character set
+    if not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
+        raise ValueError(f"Invalid username: '{username}' has invalid format")
+    return username
+
+
 def _parse_github_url(url: str) -> tuple[str, str, str | None]:
     """Parse a GitHub repo URL into (owner, repo, branch).
 
@@ -801,7 +821,11 @@
     safe_app = _sanitize_for_path(app_name)
     safe_ep = _sanitize_for_path(entry_point_id)
     prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
-    home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
+    if username:
+        safe_username = _sanitize_username(username)
+        home = os.path.expanduser(f"~{safe_username}")
+    else:
+        home = os.path.expanduser("~")
     return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")
 
 
EOF
@@ -39,6 +39,26 @@
return _repo_locks[key]


def _sanitize_username(username: str) -> str:
"""Sanitize a username before using it in filesystem paths.

Only allow simple POSIX-style usernames (letters, digits, underscore, dash,
and dot). Reject values containing path separators, whitespace, or
traversal sequences.
"""
if not username:
raise ValueError("Invalid username: empty")
# Disallow obvious dangerous content
if "/" in username or "\\" in username or "\x00" in username:
raise ValueError(f"Invalid username: '{username}' contains illegal characters")
if ".." in username:
raise ValueError(f"Invalid username: '{username}' contains '..'")
# Enforce a conservative character set
if not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
raise ValueError(f"Invalid username: '{username}' has invalid format")
return username


def _parse_github_url(url: str) -> tuple[str, str, str | None]:
"""Parse a GitHub repo URL into (owner, repo, branch).

@@ -801,7 +821,11 @@
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
if username:
safe_username = _sanitize_username(username)
home = os.path.expanduser(f"~{safe_username}")
else:
home = os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")


Copilot is powered by AI and may make mistakes. Always verify output.
with user_context if user_context is not None else nullcontext():
work_dir.mkdir(parents=True, exist_ok=True)
repo_link = work_dir / "repo"
if repo_link.is_symlink() or repo_link.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 2 days ago

In general, to fix uncontrolled data in path expressions, we should either (1) avoid embedding untrusted data into path components used for filesystem operations, or (2) strictly validate and normalize any such data before constructing paths. For usernames, the typical approach is to restrict them to a safe character set and length and to handle lookup failures or invalid inputs explicitly.

For this specific case, the only problematic use of username is in _build_work_dir, where it is interpolated into ~{username}. We can fix this by introducing a small helper _sanitize_username_for_home(username: str) -> str that enforces a safe pattern and/or safely resolves the user’s home directory using pwd.getpwnam when running on Unix-like systems. However, we are instructed not to add new imports except well-known ones and not to assume much about the environment. A minimal, non-breaking, and portable fix is:

  • Add a helper _sanitize_username_for_home near _build_work_dir that:
    • Rejects usernames containing /, \, whitespace, or NUL, and those starting with ~ or containing ...
    • Optionally constrains to a simple regex like ^[A-Za-z0-9._-]+$.
    • Raises ValueError if the username fails validation.
  • Use this helper inside _build_work_dir before calling expanduser:
    • If username is provided, sanitize it and then compute home = os.path.expanduser(f"~{safe_username}").
    • If not, fall back to the current behavior with os.path.expanduser("~").
  • Keep the rest of the function unchanged so that work_dir structure and behavior remain the same for valid usernames.

This confines the risk to a well-defined, validated subset of usernames and ensures that maliciously crafted values (such as ones containing path separators, additional tildes, or traversal sequences) are rejected early with a clear ValueError. The calling code (submit_job) already translates ValueError into a 400 HTTP response in the API handler, so we don’t need further changes to error handling.

Concretely:

  • File fileglancer/apps/core.py:
    • Add _sanitize_username_for_home above _build_work_dir.
    • Modify _build_work_dir line 804 to sanitize username before using it, and to call expanduser with the sanitized value.

No changes are needed in fileglancer/server.py for this specific issue.


Suggested changeset 1
fileglancer/apps/core.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/fileglancer/apps/core.py b/fileglancer/apps/core.py
--- a/fileglancer/apps/core.py
+++ b/fileglancer/apps/core.py
@@ -790,6 +790,25 @@
     return "\n".join(lines)
 
 
+def _sanitize_username_for_home(username: str) -> str:
+    """Sanitize a username before using it with os.path.expanduser.
+
+    Only allow a conservative set of characters and disallow path traversal
+    or shell-style metacharacters that could affect home resolution.
+    """
+    # Disallow obvious dangerous characters and patterns
+    if not username:
+        raise ValueError("Invalid username: empty")
+    if any(ch in username for ch in ("/", "\\", "\x00", " ")):
+        raise ValueError(f"Invalid username: '{username}' contains invalid characters")
+    if username.startswith("~") or ".." in username:
+        raise ValueError(f"Invalid username: '{username}' contains invalid characters")
+    # Restrict allowed characters to a safe subset commonly used in usernames
+    if not re.fullmatch(r"[A-Za-z0-9._-]+", username):
+        raise ValueError(f"Invalid username: '{username}' contains invalid characters")
+    return username
+
+
 def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
                     job_name_prefix: Optional[str] = None,
                     username: Optional[str] = None) -> Path:
@@ -801,7 +820,11 @@
     safe_app = _sanitize_for_path(app_name)
     safe_ep = _sanitize_for_path(entry_point_id)
     prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
-    home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
+    if username:
+        safe_username = _sanitize_username_for_home(username)
+        home = os.path.expanduser(f"~{safe_username}")
+    else:
+        home = os.path.expanduser("~")
     return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")
 
 
EOF
@@ -790,6 +790,25 @@
return "\n".join(lines)


def _sanitize_username_for_home(username: str) -> str:
"""Sanitize a username before using it with os.path.expanduser.

Only allow a conservative set of characters and disallow path traversal
or shell-style metacharacters that could affect home resolution.
"""
# Disallow obvious dangerous characters and patterns
if not username:
raise ValueError("Invalid username: empty")
if any(ch in username for ch in ("/", "\\", "\x00", " ")):
raise ValueError(f"Invalid username: '{username}' contains invalid characters")
if username.startswith("~") or ".." in username:
raise ValueError(f"Invalid username: '{username}' contains invalid characters")
# Restrict allowed characters to a safe subset commonly used in usernames
if not re.fullmatch(r"[A-Za-z0-9._-]+", username):
raise ValueError(f"Invalid username: '{username}' contains invalid characters")
return username


def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
job_name_prefix: Optional[str] = None,
username: Optional[str] = None) -> Path:
@@ -801,7 +820,11 @@
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
if username:
safe_username = _sanitize_username_for_home(username)
home = os.path.expanduser(f"~{safe_username}")
else:
home = os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")


Copilot is powered by AI and may make mistakes. Always verify output.
work_dir.mkdir(parents=True, exist_ok=True)
repo_link = work_dir / "repo"
if repo_link.is_symlink() or repo_link.exists():
repo_link.unlink()

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 2 days ago

In general, to fix uncontrolled data in a path expression, we must ensure that untrusted input cannot cause the resulting path to escape a controlled root or perform unintended expansion. Here, the only tainted part is the username used with os.path.expanduser in _build_work_dir. We should (a) avoid using shell-like ~username expansion on untrusted data, and (b) restrict username to a safe form (e.g., a simple account name without path separators, dots, or null bytes). A robust approach is to resolve the user’s home directory using the system user database (pwd.getpwnam) after validating the username format; this prevents path traversal via a malicious "username" and ensures the path corresponds to a real account.

Concretely, we will adjust _build_work_dir in fileglancer/apps/core.py:

  1. Add imports for pwd and re is already imported, so no new import for regex is needed.
  2. Introduce a helper function _get_user_home(username: str) -> str that:
    • Validates username against a conservative regex like r"^[A-Za-z0-9_.-]+$" to disallow slashes and whitespace and reject empty or obviously malicious names.
    • Optionally rejects names containing ".." or NUL for extra safety.
    • Uses pwd.getpwnam(username).pw_dir to get the home directory, raising a clear ValueError if the user does not exist or if the name is invalid.
  3. Modify _build_work_dir so that:
    • If username is provided, it calls _get_user_home(username) instead of os.path.expanduser(f"~{username}").
    • If username is None, it continues to use os.path.expanduser("~") as before.
    • The rest of the path logic (.fileglancer/jobs/...) stays unchanged.

This keeps all existing functionality (per-user job directories under each user’s real home) while ensuring username cannot be abused to craft arbitrary home paths. Since repo_link is derived from work_dir, this change ensures the symlink and directory operations are rooted under a verified, legitimate home directory rather than a path influenced by arbitrary expansion.


Suggested changeset 1
fileglancer/apps/core.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/fileglancer/apps/core.py b/fileglancer/apps/core.py
--- a/fileglancer/apps/core.py
+++ b/fileglancer/apps/core.py
@@ -790,18 +790,39 @@
     return "\n".join(lines)
 
 
+def _get_user_home(username: str) -> str:
+    """Resolve a validated system user's home directory.
+
+    This avoids using os.path.expanduser with untrusted data and ensures
+    that the username corresponds to a real system account.
+    """
+    # Allow only simple account-like usernames: no slashes or whitespace.
+    if not username or not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
+        raise ValueError(f"Invalid username for home directory resolution: {username!r}")
+    if ".." in username or "\x00" in username:
+        raise ValueError(f"Invalid username for home directory resolution: {username!r}")
+    try:
+        import pwd
+        return pwd.getpwnam(username).pw_dir
+    except Exception as e:
+        raise ValueError(f"Unknown or invalid username: {username!r}") from e
+
+
 def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
                     job_name_prefix: Optional[str] = None,
                     username: Optional[str] = None) -> Path:
     """Build a working directory path under ~/.fileglancer/jobs/.
 
-    When username is provided, expands ~username to the user's home directory
-    instead of the server process's home (which is typically root).
+    When username is provided, use the system user database to resolve the
+    user's home directory instead of relying on shell-style expansion.
     """
     safe_app = _sanitize_for_path(app_name)
     safe_ep = _sanitize_for_path(entry_point_id)
     prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
-    home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
+    if username:
+        home = _get_user_home(username)
+    else:
+        home = os.path.expanduser("~")
     return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")
 
 
EOF
@@ -790,18 +790,39 @@
return "\n".join(lines)


def _get_user_home(username: str) -> str:
"""Resolve a validated system user's home directory.

This avoids using os.path.expanduser with untrusted data and ensures
that the username corresponds to a real system account.
"""
# Allow only simple account-like usernames: no slashes or whitespace.
if not username or not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
raise ValueError(f"Invalid username for home directory resolution: {username!r}")
if ".." in username or "\x00" in username:
raise ValueError(f"Invalid username for home directory resolution: {username!r}")
try:
import pwd
return pwd.getpwnam(username).pw_dir
except Exception as e:
raise ValueError(f"Unknown or invalid username: {username!r}") from e


def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
job_name_prefix: Optional[str] = None,
username: Optional[str] = None) -> Path:
"""Build a working directory path under ~/.fileglancer/jobs/.

When username is provided, expands ~username to the user's home directory
instead of the server process's home (which is typically root).
When username is provided, use the system user database to resolve the
user's home directory instead of relying on shell-style expansion.
"""
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
if username:
home = _get_user_home(username)
else:
home = os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")


Copilot is powered by AI and may make mistakes. Always verify output.
repo_link = work_dir / "repo"
if repo_link.is_symlink() or repo_link.exists():
repo_link.unlink()
repo_link.symlink_to(cached_repo_dir)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 2 days ago

In general, to fix this type of issue you ensure that any user-derived data used in path construction is validated or constrained so it cannot cause access outside intended locations. For usernames specifically, you can restrict the allowed character set and/or ensure that the resolved path is under a known safe root after os.path.expanduser.

For this code, the minimal and robust fix is to sanitize/validate the username before using it in _build_work_dir. A straightforward approach is:

  1. Add a small helper _sanitize_username_for_home in fileglancer/apps/core.py that:

    • Ensures username is non-empty.
    • Ensures it matches a conservative pattern, e.g. ^[A-Za-z0-9_.-]+$.
    • Optionally resolves ~username with os.path.expanduser and verifies the resulting path is an absolute directory under some expected base, but the character whitelist alone already prevents pathological values like ../../etc/passwd or embedded NULs.
  2. Update _build_work_dir to call this sanitizer whenever a username was provided, and use the sanitized value in os.path.expanduser(f"~{username}").

This keeps the observable behavior identical for normal usernames (which already only use safe characters), but ensures that if any untrusted/invalid username ever reached this point, it will raise a clear ValueError instead of being used in a path. It also makes the data flow obviously safe to CodeQL because the username is explicitly validated before being interpolated into a filesystem path.

Concretely:

  • In fileglancer/apps/core.py, define _sanitize_username_for_home near _build_work_dir.
  • In _build_work_dir, before computing home, if username is not None, pass it through _sanitize_username_for_home and use the sanitized result in the expanduser call.
  • No changes are needed in fileglancer/server.py.

Suggested changeset 1
fileglancer/apps/core.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/fileglancer/apps/core.py b/fileglancer/apps/core.py
--- a/fileglancer/apps/core.py
+++ b/fileglancer/apps/core.py
@@ -790,6 +790,20 @@
     return "\n".join(lines)
 
 
+def _sanitize_username_for_home(username: str) -> str:
+    """Sanitize a username before using it in a '~username' home expression.
+
+    Restrict to a conservative set of characters to avoid path traversal or
+    other unintended filesystem effects when interpolated into a path.
+    """
+    if not username:
+        raise ValueError("Username must not be empty")
+    # Allow common POSIX username characters only
+    if not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
+        raise ValueError(f"Invalid username '{username}'")
+    return username
+
+
 def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
                     job_name_prefix: Optional[str] = None,
                     username: Optional[str] = None) -> Path:
@@ -801,7 +815,11 @@
     safe_app = _sanitize_for_path(app_name)
     safe_ep = _sanitize_for_path(entry_point_id)
     prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
-    home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
+    if username:
+        safe_username = _sanitize_username_for_home(username)
+        home = os.path.expanduser(f"~{safe_username}")
+    else:
+        home = os.path.expanduser("~")
     return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")
 
 
EOF
@@ -790,6 +790,20 @@
return "\n".join(lines)


def _sanitize_username_for_home(username: str) -> str:
"""Sanitize a username before using it in a '~username' home expression.

Restrict to a conservative set of characters to avoid path traversal or
other unintended filesystem effects when interpolated into a path.
"""
if not username:
raise ValueError("Username must not be empty")
# Allow common POSIX username characters only
if not re.fullmatch(r"[A-Za-z0-9_.-]+", username):
raise ValueError(f"Invalid username '{username}'")
return username


def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
job_name_prefix: Optional[str] = None,
username: Optional[str] = None) -> Path:
@@ -801,7 +815,11 @@
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
if username:
safe_username = _sanitize_username_for_home(username)
home = os.path.expanduser(f"~{safe_username}")
else:
home = os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")


Copilot is powered by AI and may make mistakes. Always verify output.
…variables

LSF's bsub requires LSF_ENVDIR to locate lsf.conf. When running as a
systemd service via pixi, this env var does not reach the server process.

Approaches that did NOT work:
- Environment=LSF_ENVDIR=... in the systemd unit file
- Adding LSF_ENVDIR to the .env file (pydantic-settings rejected it
  as an unknown field: "lsf_envdir: Extra inputs are not permitted")
- ExecStart with `source /misc/lsf/conf/profile.lsf && exec pixi run
  start`
- Inspecting /proc/<PID>/environ confirmed LSF_ENVDIR was absent from
  the worker process in all cases

These attempts ruled out systemd misconfiguration and shell sourcing
issues, and suggest that pixi strips inherited environment variables
during activation.

The fix mirrors the existing extra_paths approach: extra_env is a dict
in ClusterSettings that is applied to os.environ inside the Python
process at startup, after pixi has already set up its environment.
- the previous commit fixed the lsf.conf issue, but there is a new error "Failed in LSF library call: External authentication failed", so trying removing the -U flag to see if LSF picks up the euid set by seteuid
cluster:
executor: local # "local" or "lsf"
# extra_paths: # Directories prepended to the server's PATH at
# - /opt/lsf/bin # startup so scheduler commands (bsub, bjobs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this outside the "cluster" settings because there may be other paths we need to add which are not related to the cluster. Also, this is referenced in server.py in a place that shouldn't know about the cluster.

# # bkill) are findable. Useful when running as a
# # systemd service with a minimal default PATH.
# extra_env: # Extra environment variables set at startup.
# LSF_ENVDIR: /misc/lsf/conf # Required for LSF to find lsf.conf.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't be necessary when using . /misc/lsf/conf/profile.lsf, but if it is, it should also go outside of the cluster config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants