Skip to content

Latest commit

 

History

History
225 lines (162 loc) · 9.95 KB

File metadata and controls

225 lines (162 loc) · 9.95 KB

LogLens

CI CodeQL

C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL.

It parses auth.log / secure-style syslog input and journalctl --output=short-full-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports, with optional CSV exports for findings and warnings.

Project Status

LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.

Why This Project Exists

Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible.

LogLens is built around three ideas:

  • detection engineering over offensive functionality
  • parser observability over silent failure
  • repository discipline over throwaway scripts

The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene.

Scope

LogLens is a defensive, public-safe repository. It is intended for log parsing, detection experiments, and engineering practice. It does not provide exploitation, persistence, credential attack automation, or live offensive capability.

Repository Checks

LogLens includes two minimal GitHub Actions workflows:

  • CI builds and tests the project on ubuntu-latest and windows-latest
  • CodeQL runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule

Both workflows are intended to stay stable enough to require on pull requests to main. Regression coverage is backed by sanitized parser fixture matrices plus golden report-contract fixtures for report.md, report.json, and optional CSV outputs. Release-facing documentation is split across CHANGELOG.md, docs/release-process.md, docs/release-v0.1.0.md, docs/release-v0.3.0.md, and the repository's GitHub release notes. The repository hardening note is in docs/repo-hardening.md, and vulnerability reporting guidance is in SECURITY.md.

Threat Model

LogLens is designed for offline review of auth.log and secure style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use.

The current tool helps answer:

  • Is one source IP generating repeated SSH failures in a short window?
  • Is one source IP trying several usernames in a short window?
  • Is one account running sudo unusually often in a short window?

It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own.

Detections

LogLens currently detects:

  • Repeated SSH failed password attempts from the same IP within 10 minutes
  • One IP trying multiple usernames within 15 minutes
  • Bursty sudo activity from the same user within 5 minutes

LogLens currently parses and reports these additional auth patterns beyond the core detector inputs, broadening coverage across common Linux auth families:

  • Accepted publickey SSH successes
  • Failed publickey SSH failures, which count toward SSH brute-force detection by default
  • pam_unix(...:auth): authentication failure
  • pam_unix(...:session): session opened
  • selected pam_faillock(...:auth) failure variants
  • selected pam_sss(...:auth) failure variants

LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including:

  • total_lines
  • parsed_lines
  • unparsed_lines
  • parse_success_rate
  • top_unknown_patterns

LogLens does not currently detect:

  • Lateral movement
  • MFA abuse
  • SSH key misuse
  • Many PAM-specific failures beyond the parsed pam_unix, pam_faillock, and pam_sss sample patterns
  • Cross-file or cross-host correlation

Build

cmake -S . -B build
cmake --build build
ctest --test-dir build --output-on-failure

For fresh-machine setup and repeatable local presets, see docs/dev-setup.md.

Run

./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal
./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config
./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv

The CLI writes:

  • report.md
  • report.json

into the output directory you provide. If you omit the output directory, the files are written into the current working directory.

When you add --csv, LogLens also writes:

  • findings.csv
  • warnings.csv

Without --csv, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.

The CSV schema is intentionally small and stable:

  • findings.csv: rule, subject_kind, subject, event_count, window_start, window_end, usernames, summary
  • warnings.csv: kind, message

When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. In report.md this appears as a host summary table, and in report.json it appears as a host_summaries array.

Sample Output

For sanitized sample input, see assets/sample_auth.log and assets/sample_journalctl_short_full.log.

report.md summary excerpt:

## Summary
- Input mode: syslog_legacy
- Parsed events: 14
- Findings: 3
- Parser warnings: 2

report.json summary excerpt:

{
  "input_mode": "syslog_legacy",
  "parsed_event_count": 14,
  "finding_count": 3,
  "warning_count": 2
}

The config file schema is intentionally small and strict:

{
  "input_mode": "syslog_legacy",
  "timestamp": {
    "assume_year": 2026
  },
  "brute_force": { "threshold": 5, "window_minutes": 10 },
  "multi_user_probing": { "threshold": 3, "window_minutes": 15 },
  "sudo_burst": { "threshold": 3, "window_minutes": 5 },
  "auth_signal_mappings": {
    "ssh_failed_password": {
      "counts_as_attempt_evidence": true,
      "counts_as_terminal_auth_failure": true
    },
    "ssh_invalid_user": {
      "counts_as_attempt_evidence": true,
      "counts_as_terminal_auth_failure": true
    },
    "ssh_failed_publickey": {
      "counts_as_attempt_evidence": true,
      "counts_as_terminal_auth_failure": true
    },
    "pam_auth_failure": {
      "counts_as_attempt_evidence": true,
      "counts_as_terminal_auth_failure": false
    }
  }
}

This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, pam_auth_failure is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it.

Timestamp handling is now explicit:

  • --mode syslog or input_mode: syslog_legacy requires --year or timestamp.assume_year
  • --mode journalctl-short-full or input_mode: journalctl_short_full parses the embedded year and timezone and ignores assume_year

Example Input

Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2
Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2
Mar 10 08:15:00 example-host sudo:    alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh
Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2
Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41  user=alice
Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0)
Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth]
Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291

journalctl --output short-full style example:

Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2
Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh
Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2
Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth]

Known Limitations

  • syslog_legacy requires an explicit year; LogLens does not guess one implicitly.
  • journalctl_short_full currently supports UTC, GMT, Z, and numeric timezone offsets, not arbitrary timezone abbreviations.
  • Parser coverage is still selective: it covers common sshd, sudo, pam_unix, and selected pam_faillock / pam_sss variants rather than broad Linux auth-family support.
  • Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings.
  • pam_unix auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them.
  • Detector configuration uses a fixed config.json schema rather than partial overrides or alternate config formats.
  • Findings are rule-based triage aids, not incident verdicts or attribution.

Future Roadmap

  • Additional auth patterns and PAM coverage
  • Larger sanitized test corpus