Conversation
The work directory, repo symlink, and bsub/sbatch submission were all running as root. The repo cache (~/.fileglancer/apps/) must stay as root since users can't write there, so user_context is passed into submit_job and applied only around work_dir creation, symlink, and executor.submit.
os.path.expanduser("~") uses the real UID (root), ignoring seteuid.
Pass username to _build_work_dir so it expands ~username instead.
- work_dir can survive from a previous failed attempt with the same job ID
Adds a cluster.extra_paths config option that prepends directories to the server's PATH at startup, so scheduler commands (bsub, bjobs, bkill) are findable even when the system service has a minimal default PATH. Configurable via .env (FGC_CLUSTER__EXTRA_PATHS) to avoid committing environment-specific paths.
| resources=resource_spec, | ||
| ) | ||
| with user_context if user_context is not None else nullcontext(): | ||
| work_dir.mkdir(parents=True, exist_ok=True) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix “uncontrolled data used in path expression” here, we need to ensure that any path derived from username cannot escape a controlled root directory. The standard pattern is: (1) compute a base directory that is not influenced by untrusted input, (2) build any subpaths under this base using path-joining, (3) normalise (resolve() / os.path.normpath) and verify that the resulting path is still anchored in the base directory. If the check fails, reject the operation.
For this code, the main problem is _build_work_dir, which currently uses home = os.path.expanduser(f"~{username}") and then returns a Path directly built from that. The cleanest fix that does not change external functionality is:
- Define a fixed jobs root directory, e.g.
_JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs")), which is already the conceptual location we use. - In
_build_work_dir, derivehomeusing the existing logic (so we still respect the username’s home directory), but then build the work directory as a subpath under_JOBS_BASE, not directly fromhome. For example, create a per-user subdirectory_JOBS_BASE / _sanitize_for_path(username)and then append the job-specific suffix. - Alternatively, if you must preserve the idea “jobs go under each user’s home”, you can still normalise and enforce that the final
work_dirpath is within that user’s home directory: computebase = Path(home).expanduser().resolve(), thenwork_dir = (base / ".fileglancer" / "jobs" / suffix).resolve(), and finally checkif base not in work_dir.parents and base != work_dir: raise ValueError(...). This ensures no traversal outside the home directory even if username contains weird segments.
Because the function docstring explicitly says “Build a working directory path under ~/.fileglancer/jobs/.” and the rest of the code (e.g. comments around the repo cache) assumes a server-owned directory, the first option (anchoring everything under a process-home-rooted ~/.fileglancer/jobs) is more consistent and simpler. To keep backwards compatibility with existing stored work_dir paths that already look like ~user/.fileglancer/jobs/..., we should limit our change to newly-built paths and ensure they are deterministic; switching to a root-based layout is acceptable if consumers treat work_dir as opaque.
Implementation details:
-
In
fileglancer/apps/core.py, near_REPO_CACHE_BASE, define_JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs")). -
Update
_build_work_dirto:- Keep sanitising
app_name,entry_point_id, and optionaljob_name_prefixas it already does. - Derive a safe username component, e.g.
safe_user = _sanitize_for_path(username) if username else "default". - Build
user_jobs_root = (_JOBS_BASE / safe_user).resolve(). - Build
work_dir = (user_jobs_root / f"{prefix}{job_id}-{safe_app}-{safe_ep}").resolve(). - Optionally, assert that
work_diris underuser_jobs_rootby checkingif user_jobs_root not in work_dir.parents and user_jobs_root != work_dir: raise ValueError(...). Given our construction, this is mostly a sanity check against path injection via_sanitize_for_pathchanges.
- Keep sanitising
-
All other uses of
work_dir(such aswork_dir.mkdir(...)andrepo_link.symlink_to(cached_repo_dir)) remain unchanged; they simply operate on a now-guaranteed-safe path. -
No changes are needed in
server.pyor to imports beyond adding_JOBS_BASEdeclaration.
This keeps functionality (per-job directories per app/entry point) while ensuring username cannot redirect work directories to arbitrary locations.
| @@ -28,6 +28,7 @@ | ||
| _MANIFEST_FILENAME = "runnables.yaml" | ||
|
|
||
| _REPO_CACHE_BASE = Path(os.path.expanduser("~/.fileglancer/apps")) | ||
| _JOBS_BASE = Path(os.path.expanduser("~/.fileglancer/jobs")) | ||
| _repo_locks: dict[str, asyncio.Lock] = {} | ||
|
|
||
|
|
||
| @@ -795,16 +796,28 @@ | ||
| username: Optional[str] = None) -> Path: | ||
| """Build a working directory path under ~/.fileglancer/jobs/. | ||
|
|
||
| When username is provided, expands ~username to the user's home directory | ||
| instead of the server process's home (which is typically root). | ||
| When username is provided, uses a per-user subdirectory under the | ||
| server's jobs base directory (~/.fileglancer/jobs) rather than | ||
| constructing a path directly from the username. | ||
| """ | ||
| safe_app = _sanitize_for_path(app_name) | ||
| safe_ep = _sanitize_for_path(entry_point_id) | ||
| prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else "" | ||
| home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~") | ||
| return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}") | ||
|
|
||
| # Derive a safe per-user root under the fixed jobs base directory. | ||
| safe_user = _sanitize_for_path(username) if username else "default" | ||
| user_jobs_root = (_JOBS_BASE / safe_user).resolve() | ||
|
|
||
| work_dir = (user_jobs_root / f"{prefix}{job_id}-{safe_app}-{safe_ep}").resolve() | ||
|
|
||
| # Ensure the final path is contained within the user_jobs_root to guard | ||
| # against any unexpected path manipulation. | ||
| if user_jobs_root != work_dir and user_jobs_root not in work_dir.parents: | ||
| raise ValueError(f"Invalid work directory path constructed for user '{username}'") | ||
|
|
||
| return work_dir | ||
|
|
||
|
|
||
| async def submit_job( | ||
| username: str, | ||
| app_url: str, |
| with user_context if user_context is not None else nullcontext(): | ||
| work_dir.mkdir(parents=True, exist_ok=True) | ||
| repo_link = work_dir / "repo" | ||
| if repo_link.is_symlink() or repo_link.exists(): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
General approach: Ensure that any user-influenced data used in filesystem paths is validated or sanitized so it cannot escape the intended directory structure. In this case, that means constraining username before using it in _build_work_dir, and ensuring that the resulting home directory is a real, safe user home and not a crafted path.
Best concrete fix without changing existing functionality:
- Introduce a small helper like
_sanitize_usernameinfileglancer/apps/core.pythat enforces a conservative pattern for usernames (e.g., alphanumeric plus-_.), rejects dangerous characters (/,\, whitespace,.., NUL, etc.), and raisesValueErrorif invalid. - In
_build_work_dir, call_sanitize_username(username)(whenusernameis notNone) before passing it toos.path.expanduser(f"~{username}"). This ensures that we only ever expand a well-formed username and cannot accidentally interpret a malicious path fragment as part of the tilde expression. - Optionally, resolve the resulting
homepath withPath(home).expanduser().resolve()and fail early if that resolution behaves unexpectedly; however, given the constraints and the existing behavior, just sanitizingusernameis sufficient and minimally invasive. - Because
submit_jobalready propagatesusernameinto_build_work_dir, we do not need to change call sites—only the implementation of_build_work_dirand the new helper. RaisingValueErroron invalid usernames is consistent with other validation patterns in this module (e.g.,_parse_github_urlusesValueError), andsubmit_jobalready translatesValueErrorinto HTTP 400.
Concretely:
- In
fileglancer/apps/core.py, add_sanitize_usernamenear the other helpers (_sanitize_for_path,_parse_github_urlarea). - Update
_build_work_dirso that if ausernameis given, we first sanitize it and then callos.path.expanduser(f"~{safe_username}"). The rest of_build_work_dirremains unchanged.
No new imports are required; we can use re which is already imported at the top of the file.
| @@ -39,6 +39,26 @@ | ||
| return _repo_locks[key] | ||
|
|
||
|
|
||
| def _sanitize_username(username: str) -> str: | ||
| """Sanitize a username before using it in filesystem paths. | ||
|
|
||
| Only allow simple POSIX-style usernames (letters, digits, underscore, dash, | ||
| and dot). Reject values containing path separators, whitespace, or | ||
| traversal sequences. | ||
| """ | ||
| if not username: | ||
| raise ValueError("Invalid username: empty") | ||
| # Disallow obvious dangerous content | ||
| if "/" in username or "\\" in username or "\x00" in username: | ||
| raise ValueError(f"Invalid username: '{username}' contains illegal characters") | ||
| if ".." in username: | ||
| raise ValueError(f"Invalid username: '{username}' contains '..'") | ||
| # Enforce a conservative character set | ||
| if not re.fullmatch(r"[A-Za-z0-9_.-]+", username): | ||
| raise ValueError(f"Invalid username: '{username}' has invalid format") | ||
| return username | ||
|
|
||
|
|
||
| def _parse_github_url(url: str) -> tuple[str, str, str | None]: | ||
| """Parse a GitHub repo URL into (owner, repo, branch). | ||
|
|
||
| @@ -801,7 +821,11 @@ | ||
| safe_app = _sanitize_for_path(app_name) | ||
| safe_ep = _sanitize_for_path(entry_point_id) | ||
| prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else "" | ||
| home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~") | ||
| if username: | ||
| safe_username = _sanitize_username(username) | ||
| home = os.path.expanduser(f"~{safe_username}") | ||
| else: | ||
| home = os.path.expanduser("~") | ||
| return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}") | ||
|
|
||
|
|
| with user_context if user_context is not None else nullcontext(): | ||
| work_dir.mkdir(parents=True, exist_ok=True) | ||
| repo_link = work_dir / "repo" | ||
| if repo_link.is_symlink() or repo_link.exists(): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix uncontrolled data in path expressions, we should either (1) avoid embedding untrusted data into path components used for filesystem operations, or (2) strictly validate and normalize any such data before constructing paths. For usernames, the typical approach is to restrict them to a safe character set and length and to handle lookup failures or invalid inputs explicitly.
For this specific case, the only problematic use of username is in _build_work_dir, where it is interpolated into ~{username}. We can fix this by introducing a small helper _sanitize_username_for_home(username: str) -> str that enforces a safe pattern and/or safely resolves the user’s home directory using pwd.getpwnam when running on Unix-like systems. However, we are instructed not to add new imports except well-known ones and not to assume much about the environment. A minimal, non-breaking, and portable fix is:
- Add a helper
_sanitize_username_for_homenear_build_work_dirthat:- Rejects usernames containing
/,\, whitespace, or NUL, and those starting with~or containing... - Optionally constrains to a simple regex like
^[A-Za-z0-9._-]+$. - Raises
ValueErrorif the username fails validation.
- Rejects usernames containing
- Use this helper inside
_build_work_dirbefore callingexpanduser:- If
usernameis provided, sanitize it and then computehome = os.path.expanduser(f"~{safe_username}"). - If not, fall back to the current behavior with
os.path.expanduser("~").
- If
- Keep the rest of the function unchanged so that
work_dirstructure and behavior remain the same for valid usernames.
This confines the risk to a well-defined, validated subset of usernames and ensures that maliciously crafted values (such as ones containing path separators, additional tildes, or traversal sequences) are rejected early with a clear ValueError. The calling code (submit_job) already translates ValueError into a 400 HTTP response in the API handler, so we don’t need further changes to error handling.
Concretely:
- File
fileglancer/apps/core.py:- Add
_sanitize_username_for_homeabove_build_work_dir. - Modify
_build_work_dirline 804 to sanitizeusernamebefore using it, and to callexpanduserwith the sanitized value.
- Add
No changes are needed in fileglancer/server.py for this specific issue.
| @@ -790,6 +790,25 @@ | ||
| return "\n".join(lines) | ||
|
|
||
|
|
||
| def _sanitize_username_for_home(username: str) -> str: | ||
| """Sanitize a username before using it with os.path.expanduser. | ||
|
|
||
| Only allow a conservative set of characters and disallow path traversal | ||
| or shell-style metacharacters that could affect home resolution. | ||
| """ | ||
| # Disallow obvious dangerous characters and patterns | ||
| if not username: | ||
| raise ValueError("Invalid username: empty") | ||
| if any(ch in username for ch in ("/", "\\", "\x00", " ")): | ||
| raise ValueError(f"Invalid username: '{username}' contains invalid characters") | ||
| if username.startswith("~") or ".." in username: | ||
| raise ValueError(f"Invalid username: '{username}' contains invalid characters") | ||
| # Restrict allowed characters to a safe subset commonly used in usernames | ||
| if not re.fullmatch(r"[A-Za-z0-9._-]+", username): | ||
| raise ValueError(f"Invalid username: '{username}' contains invalid characters") | ||
| return username | ||
|
|
||
|
|
||
| def _build_work_dir(job_id: int, app_name: str, entry_point_id: str, | ||
| job_name_prefix: Optional[str] = None, | ||
| username: Optional[str] = None) -> Path: | ||
| @@ -801,7 +820,11 @@ | ||
| safe_app = _sanitize_for_path(app_name) | ||
| safe_ep = _sanitize_for_path(entry_point_id) | ||
| prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else "" | ||
| home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~") | ||
| if username: | ||
| safe_username = _sanitize_username_for_home(username) | ||
| home = os.path.expanduser(f"~{safe_username}") | ||
| else: | ||
| home = os.path.expanduser("~") | ||
| return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}") | ||
|
|
||
|
|
| work_dir.mkdir(parents=True, exist_ok=True) | ||
| repo_link = work_dir / "repo" | ||
| if repo_link.is_symlink() or repo_link.exists(): | ||
| repo_link.unlink() |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix uncontrolled data in a path expression, we must ensure that untrusted input cannot cause the resulting path to escape a controlled root or perform unintended expansion. Here, the only tainted part is the username used with os.path.expanduser in _build_work_dir. We should (a) avoid using shell-like ~username expansion on untrusted data, and (b) restrict username to a safe form (e.g., a simple account name without path separators, dots, or null bytes). A robust approach is to resolve the user’s home directory using the system user database (pwd.getpwnam) after validating the username format; this prevents path traversal via a malicious "username" and ensures the path corresponds to a real account.
Concretely, we will adjust _build_work_dir in fileglancer/apps/core.py:
- Add imports for
pwdandreis already imported, so no new import for regex is needed. - Introduce a helper function
_get_user_home(username: str) -> strthat:- Validates
usernameagainst a conservative regex liker"^[A-Za-z0-9_.-]+$"to disallow slashes and whitespace and reject empty or obviously malicious names. - Optionally rejects names containing
".."or NUL for extra safety. - Uses
pwd.getpwnam(username).pw_dirto get the home directory, raising a clearValueErrorif the user does not exist or if the name is invalid.
- Validates
- Modify
_build_work_dirso that:- If
usernameis provided, it calls_get_user_home(username)instead ofos.path.expanduser(f"~{username}"). - If
usernameisNone, it continues to useos.path.expanduser("~")as before. - The rest of the path logic (
.fileglancer/jobs/...) stays unchanged.
- If
This keeps all existing functionality (per-user job directories under each user’s real home) while ensuring username cannot be abused to craft arbitrary home paths. Since repo_link is derived from work_dir, this change ensures the symlink and directory operations are rooted under a verified, legitimate home directory rather than a path influenced by arbitrary expansion.
| @@ -790,18 +790,39 @@ | ||
| return "\n".join(lines) | ||
|
|
||
|
|
||
| def _get_user_home(username: str) -> str: | ||
| """Resolve a validated system user's home directory. | ||
|
|
||
| This avoids using os.path.expanduser with untrusted data and ensures | ||
| that the username corresponds to a real system account. | ||
| """ | ||
| # Allow only simple account-like usernames: no slashes or whitespace. | ||
| if not username or not re.fullmatch(r"[A-Za-z0-9_.-]+", username): | ||
| raise ValueError(f"Invalid username for home directory resolution: {username!r}") | ||
| if ".." in username or "\x00" in username: | ||
| raise ValueError(f"Invalid username for home directory resolution: {username!r}") | ||
| try: | ||
| import pwd | ||
| return pwd.getpwnam(username).pw_dir | ||
| except Exception as e: | ||
| raise ValueError(f"Unknown or invalid username: {username!r}") from e | ||
|
|
||
|
|
||
| def _build_work_dir(job_id: int, app_name: str, entry_point_id: str, | ||
| job_name_prefix: Optional[str] = None, | ||
| username: Optional[str] = None) -> Path: | ||
| """Build a working directory path under ~/.fileglancer/jobs/. | ||
|
|
||
| When username is provided, expands ~username to the user's home directory | ||
| instead of the server process's home (which is typically root). | ||
| When username is provided, use the system user database to resolve the | ||
| user's home directory instead of relying on shell-style expansion. | ||
| """ | ||
| safe_app = _sanitize_for_path(app_name) | ||
| safe_ep = _sanitize_for_path(entry_point_id) | ||
| prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else "" | ||
| home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~") | ||
| if username: | ||
| home = _get_user_home(username) | ||
| else: | ||
| home = os.path.expanduser("~") | ||
| return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}") | ||
|
|
||
|
|
| repo_link = work_dir / "repo" | ||
| if repo_link.is_symlink() or repo_link.exists(): | ||
| repo_link.unlink() | ||
| repo_link.symlink_to(cached_repo_dir) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix this type of issue you ensure that any user-derived data used in path construction is validated or constrained so it cannot cause access outside intended locations. For usernames specifically, you can restrict the allowed character set and/or ensure that the resolved path is under a known safe root after os.path.expanduser.
For this code, the minimal and robust fix is to sanitize/validate the username before using it in _build_work_dir. A straightforward approach is:
-
Add a small helper
_sanitize_username_for_homeinfileglancer/apps/core.pythat:- Ensures
usernameis non-empty. - Ensures it matches a conservative pattern, e.g.
^[A-Za-z0-9_.-]+$. - Optionally resolves
~usernamewithos.path.expanduserand verifies the resulting path is an absolute directory under some expected base, but the character whitelist alone already prevents pathological values like../../etc/passwdor embedded NULs.
- Ensures
-
Update
_build_work_dirto call this sanitizer whenever ausernamewas provided, and use the sanitized value inos.path.expanduser(f"~{username}").
This keeps the observable behavior identical for normal usernames (which already only use safe characters), but ensures that if any untrusted/invalid username ever reached this point, it will raise a clear ValueError instead of being used in a path. It also makes the data flow obviously safe to CodeQL because the username is explicitly validated before being interpolated into a filesystem path.
Concretely:
- In
fileglancer/apps/core.py, define_sanitize_username_for_homenear_build_work_dir. - In
_build_work_dir, before computinghome, ifusernameis notNone, pass it through_sanitize_username_for_homeand use the sanitized result in theexpandusercall. - No changes are needed in
fileglancer/server.py.
| @@ -790,6 +790,20 @@ | ||
| return "\n".join(lines) | ||
|
|
||
|
|
||
| def _sanitize_username_for_home(username: str) -> str: | ||
| """Sanitize a username before using it in a '~username' home expression. | ||
|
|
||
| Restrict to a conservative set of characters to avoid path traversal or | ||
| other unintended filesystem effects when interpolated into a path. | ||
| """ | ||
| if not username: | ||
| raise ValueError("Username must not be empty") | ||
| # Allow common POSIX username characters only | ||
| if not re.fullmatch(r"[A-Za-z0-9_.-]+", username): | ||
| raise ValueError(f"Invalid username '{username}'") | ||
| return username | ||
|
|
||
|
|
||
| def _build_work_dir(job_id: int, app_name: str, entry_point_id: str, | ||
| job_name_prefix: Optional[str] = None, | ||
| username: Optional[str] = None) -> Path: | ||
| @@ -801,7 +815,11 @@ | ||
| safe_app = _sanitize_for_path(app_name) | ||
| safe_ep = _sanitize_for_path(entry_point_id) | ||
| prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else "" | ||
| home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~") | ||
| if username: | ||
| safe_username = _sanitize_username_for_home(username) | ||
| home = os.path.expanduser(f"~{safe_username}") | ||
| else: | ||
| home = os.path.expanduser("~") | ||
| return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}") | ||
|
|
||
|
|
…variables LSF's bsub requires LSF_ENVDIR to locate lsf.conf. When running as a systemd service via pixi, this env var does not reach the server process. Approaches that did NOT work: - Environment=LSF_ENVDIR=... in the systemd unit file - Adding LSF_ENVDIR to the .env file (pydantic-settings rejected it as an unknown field: "lsf_envdir: Extra inputs are not permitted") - ExecStart with `source /misc/lsf/conf/profile.lsf && exec pixi run start` - Inspecting /proc/<PID>/environ confirmed LSF_ENVDIR was absent from the worker process in all cases These attempts ruled out systemd misconfiguration and shell sourcing issues, and suggest that pixi strips inherited environment variables during activation. The fix mirrors the existing extra_paths approach: extra_env is a dict in ClusterSettings that is applied to os.environ inside the Python process at startup, after pixi has already set up its environment.
- the previous commit fixed the lsf.conf issue, but there is a new error "Failed in LSF library call: External authentication failed", so trying removing the -U flag to see if LSF picks up the euid set by seteuid
| cluster: | ||
| executor: local # "local" or "lsf" | ||
| # extra_paths: # Directories prepended to the server's PATH at | ||
| # - /opt/lsf/bin # startup so scheduler commands (bsub, bjobs, |
There was a problem hiding this comment.
I would put this outside the "cluster" settings because there may be other paths we need to add which are not related to the cluster. Also, this is referenced in server.py in a place that shouldn't know about the cluster.
| # # bkill) are findable. Useful when running as a | ||
| # # systemd service with a minimal default PATH. | ||
| # extra_env: # Extra environment variables set at startup. | ||
| # LSF_ENVDIR: /misc/lsf/conf # Required for LSF to find lsf.conf. |
There was a problem hiding this comment.
I think this won't be necessary when using . /misc/lsf/conf/profile.lsf, but if it is, it should also go outside of the cluster config.
This PR troubleshoots submitting jobs to the Janelia cluster when running the Fileglancer server as root.
Changes so far:
Where we are now (as of release 2.7.0a7, commit eff3c09)
@krokicki