Skip to content
View Suhxs-Reddy's full-sized avatar

Block or report Suhxs-Reddy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Suhxs-Reddy/README.md

Hi, I'm Suhas

Applied AI engineer. MS Data Science at Arizona State. Tempe, AZ.


let the data lead.


sdonthi4@asu.edu  ·  LinkedIn  ·  GitHub


 




01.  About

The data is always dirtier than you think. The deployment is always weirder than you imagined. Generic models miss the obvious because nobody told them what was obvious.

I work at the seam between AI and the world it has to live in — where 90 traffic cameras include eleven 320×240 relics from the early 2000s, where pipeline incident data from 1986 changes where you'd put a gigawatt plant in 2026, where the privacy policy nobody reads is also the one selling your location.

The fun part isn't tuning the model. It's noticing what the model can't see.




02.  Featured work


'26   CATI  

Context-Aware Traffic Intelligence · Singapore · solo build · live

A traffic detector that adapts to weather, time, camera, and air quality in real time. FiLM layers (Perez et al., 2018) modulate YOLOv11's backbone — one model for all 90 of Singapore's LTA cameras, instead of 90 separate ones. About 130K extra parameters on 9.4M. Live on HuggingFace Spaces, collecting toward a Kaggle dataset publication.

90

live LTA cameras

1.4%

parameter overhead

80K+

records on HF Spaces

2018

FiLM, applied here first

PyTorch  ·  YOLOv11  ·  FiLM  ·  BoT-SORT  ·  data.gov.sg  ·  Docker  ·  HuggingFace Spaces

Read the full story →

Inspiration & approach

Singapore runs ninety traffic cameras around the clock. Some are 1080p. Eleven are 320×240 hardware from the early 2000s. They look out over monsoon storms, midnight glare, peak-hour chaos, and the dead silence at 4am. And every single frame — every camera, every condition — gets piped through the same generic YOLO that treats them all identically. Which is exactly when vehicles get missed.

I kept staring at this and thinking: the model already has access to everything it needs to do better. The camera knows which camera it is. The system has live weather. The clock is right there. The PM2.5 air-quality index is one public API call away. So why is a clear afternoon highway and a rain-soaked midnight feed processed the same way?

The answer turned out to live in a 2018 paper — FiLM (Feature-wise Linear Modulation, Perez et al.). Tiny modules that take an external signal and use it to scale and shift the network's internal features. Three reasons it was the right tool:

  • Identity at initialization. γ=1, β=0 means day zero is exactly vanilla YOLO — the model only specializes where it helps loss. No regression risk.
  • Tiny. ~130K parameters on YOLO's 9.4M. A 1.4% overhead with negligible inference cost.
  • No retraining. One model, with a dial for context. Not ninety separate models. Not an ensemble.
The full story · how it works · the math

Picture the vision network as layers of pattern detectors stacked on top of each other. With FiLM, I dial each detector's volume up or down based on the current weather, time, camera, and air quality. In monsoon rain, the detectors looking for crisp edges get turned down. The ones looking for blurry motion blobs get turned up. At 2am on Camera #47, the network behaves differently than at 2pm on Camera #12. The model figures out the dialing on its own — I just feed it the current state of the world.

The FiLM transform itself is two affine parameters per channel:

feature_out = γ(context) ⊙ feature_in + β(context)

A tiny encoder (MLP) processes weather code, temperature, sin/cos hour-of-day, a 16-dim camera ID embedding (learned per camera), resolution, and PM2.5, then predicts γ and β for the P3/P4/P5 stages of YOLOv11's backbone. Each of the 90 cameras gets its own learned embedding that captures viewpoint and quirks.

%%{init: {'theme':'dark', 'themeVariables': {'primaryColor':'#22d3ee','primaryBorderColor':'#22d3ee','lineColor':'#64ffda','primaryTextColor':'#e5e7eb','background':'#0d1117'}}}%%
flowchart LR
    CTX[live context<br/>weather · time · camera · PM2.5] --> ENC[MLP encoder<br/>γ, β]
    IMG[camera frame] --> BB[YOLOv11 backbone]
    BB --> F1[FiLM P3]
    BB --> F2[FiLM P4]
    BB --> F3[FiLM P5]
    ENC --> F1 & F2 & F3
    F1 & F2 & F3 --> H[detection head]
    H --> OUT[vehicles · plates · tracks]
Loading

The system is live on HuggingFace Spaces, polling all 90 cameras every 60 seconds. 80,000+ records and growing. Dataset publishing to Kaggle for open traffic research.



'26   COLLIDE    

AI siting for gigawatt data centers · ASU Energy Hackathon 2026 · solo build

Where do you build a 100 MW behind-the-meter gas plant? Three ML models — Random Forest, GPU Gaussian KDE, and GMM — score land, gas, and power independently, then argue through TOPSIS. A 7-node LangGraph agent on top of Claude answers what-ifs in plain English. Monte Carlo gives P10/P50/P90 over 20 years. Ten public data sources, refreshing every five minutes. What takes consulting firms months happens in seconds.

50–500

MW per site

3 → 7

year grid wait, dodged

10K

Monte Carlo scenarios

5 min

live market refresh

FastAPI  ·  LangGraph  ·  Claude  ·  DuckDB  ·  Leaflet  ·  SHAP  ·  Random Forest  ·  KDE  ·  GMM  ·  Monte Carlo

Read the full story →

Inspiration & approach

Every AI hyperscaler needs power — gigawatts of it, twenty-four seven. Connecting a new data center to the grid takes three to seven years. Your competitors aren't waiting. So you build your own natural-gas plant on-site, behind the meter. The catch is picking where, and the catch is brutal: it's a three-body problem. Land has its own constraints (zoning, fiber, water, flood). Gas supply has its own (pipeline reliability, hub distance, curtailment risk). Electricity markets have their own (LMP spreads, price regimes, scarcity events). Real consultants charge six figures and take months because they evaluate these axes one at a time.

I wanted the three to argue with each other in real time instead of sequentially. A great parcel with bad gas economics shouldn't beat a decent parcel with stellar gas economics, but spreadsheets can't tell you that. So I gave each axis its own ML model — picked deliberately, not by default:

  • Land → Random Forest. Parcel features are tabular and noisy. RFs handle that gracefully and give per-site SHAP attributions, so I can explain why a site scored where it did.
  • Gas → GPU Gaussian KDE. Pipeline incidents cluster spatially. KDE reads that distribution out cleanly without needing labels, and PHMSA's incident database is public and severity-tagged.
  • Power → Random Forest + GMM. The market itself has distinct regimes — normal, wind-curtailment, scarcity. A 3-component GMM finds them in unlabeled ERCOT data. The RF then predicts spread durability conditioned on the regime.
  • Composite → TOPSIS. A multi-criteria decision method from 1981 that's still the cleanest way to combine apples and oranges with user-adjustable weights.
  • Agent → 7-node LangGraph + Claude. Five intents — stress-test, compare, timing, explanation, config — let a human drive the analysis without writing code. Routes through parse_intent → tool node → synthesize and streams back as SSE tokens.
The full story · how it works · agent intents

Click any coordinate across Texas, New Mexico, or Arizona — ERCOT or WECC market — and the system pulls live market data, looks up the parcel, scores all three axes, blends them through TOPSIS, and runs 10,000 Monte Carlo simulations to give you P10/P50/P90 cost ranges over the next 20 years. A LangGraph agent on top of Claude reads all of that — plus 72-hour LMP forecasts from the Moirai foundation model — and answers your follow-up questions in plain English. "What if gas prices spike 30%?" re-runs the analysis on live ERCOT and CAISO feeds.

Wired to ten public data sources — PHMSA pipeline incidents, ERCOT and CAISO LMP, EIA gas prices, NOAA weather, PERM-A federal land ownership, FCC fiber maps, FEMA flood zones, GridStatus, and live Tavily web enrichment — refreshing every five minutes through APScheduler background jobs. Pandera schema validation gates every row; failures get quarantined, never silently dropped.

Intent Tools called Trigger
stress_test evaluate_site, run_monte_carlo "What if gas spikes 40%?"
compare compare_sites, evaluate_site "Compare my pinned sites"
timing get_lmp_forecast, get_news_digest "When should I build?"
explanation SHAP from active scorecard "Why is land score low?"
config extract config JSON (Sonnet) "Set gas weight to 50%"
%%{init: {'theme':'dark', 'themeVariables': {'primaryColor':'#fb923c','primaryBorderColor':'#fb923c','lineColor':'#64ffda','primaryTextColor':'#e5e7eb','background':'#0d1117'}}}%%
flowchart LR
    SRC[10 public sources<br/>PHMSA · ERCOT · CAISO · EIA · GridStatus<br/>NOAA · PERM-A · FCC · FEMA · Tavily] --> ETL[DuckDB · Parquet · Pandera]
    ETL --> M1[Land<br/>Random Forest]
    ETL --> M2[Gas<br/>GPU KDE]
    ETL --> M3[Power<br/>RF + GMM]
    M1 & M2 & M3 --> TOP[TOPSIS]
    TOP --> NPV[Monte Carlo · 10K]
    NPV --> AGT[LangGraph · Claude]
    AGT --> UI[map · scorecard]
Loading

Background jobs: GridStatus + Waha + regime every 5 min, Tavily news every 30 min, Moirai forecast every hour. Live ERCOT LMP also pushes via WebSocket at /ws/lmp/stream.




03.  Other things I've built


'25   DataGuard  

I let an LLM read the privacy policy for you. Llama 3.1 parses any site's policy, breach-checks the domain on Have I Been Pwned, and generates one-click GDPR/CCPA opt-outs with pre-filled emails and follow-up calendar reminders.

TypeScript  ·  Chrome MV3  ·  Llama 3.1

Read more →


'25   AI Study Buddy  

A RAG tutor that knows when to shut up. Retrieves only from your real Canvas docs, with a low-relevance threshold that flags out-of-syllabus questions instead of inventing answers.

React  ·  FastAPI  ·  Claude  ·  Supermemory

Read more →


'24   Walmart Demand Forecast  

Store-level weekly demand for 45 stores. Lagged/rolling features, holiday flags, chronological splits. The boring forecasting that keeps shelves stocked.

Python  ·  pandas  ·  time-series

Read more →




04.  Currently

Reading.   Recently finished Liu Cixin's trilogy — The Three-Body Problem, The Dark Forest, Death's End. Currently reading Why Machines Learn (Anil Ananthaswamy) and Absolute Martian Manhunter (Deniz Camp / Javier Rodríguez) — through #10, #11 (April 22) next. On deck: Darwin's Radio (Greg Bear), Annihilation (Jeff VanderMeer).

Building.   CATI is live on HuggingFace Spaces and collecting from 90 cameras every 60 seconds — currently fine-tuning, plugging in the missing pieces, waiting for the dataset to have a true distribution before running real traffic analysis on top. COLLIDE is shipped and parked — not touching it for a while. Research Success Data Assistant work at ASU College of Health Solutions is the day job.

Thinking about.   The fact that 90% of useful ML signals are public datasets nobody bothered to wire up. The next CATI-shaped project. How to make agents that fail gracefully instead of hallucinating confidently.




05.  Stack

ML / Data.   Python · PyTorch · scikit-learn · Pandas · NumPy · DuckDB · Parquet · HuggingFace · YOLO

Storage.   SQL · MongoDB · AWS (S3, Athena)

Visualization.   Tableau




06.  The path here

2026 →   Research Success Data Assistant — ASU College of Health Solutions

2025 → 27   MS Data Science — Arizona State University

2025   Data Science Intern — GlobalLogic

2024   SWE Intern — GlobalLogic

2021 → 25   B.Tech Computer Science — MIT Manipal




07.  Get in touch

The fastest way is email. I'm always happy to talk applied AI, computer vision, energy infrastructure, or any project where the messy real world is part of the problem.




contribution snake

Popular repositories Loading

  1. hackaroundtheworld2 hackaroundtheworld2 Public

    HTML

  2. prediction prediction Public

    Jupyter Notebook

  3. Fast Fast Public

    Python

  4. ChatBot ChatBot Public

    Python

  5. DShospitalproject DShospitalproject Public

    DS project for Unsupervised learning using Hospital Data

    Jupyter Notebook

  6. CBC-Hackathon CBC-Hackathon Public

    Forked from vrupak/CBC-Hackathon

    A project for CBC Hackathon ASU

    TypeScript