ML/LLM Engineer — retrieval, LLM serving, and open-source library internals
Previously 5.5 years on the search team at 42Maru — Korean hybrid retrieval (BM25 + dense, learning-to-rank, hard negatives), MRC (machine reading comprehension), RAG, and open-source LLM fine-tuning.
I work on retrieval and LLM systems where real data, much of it Korean, exposes quiet failures deep in the stack: embedding losses, RoPE caches, continuous batching. Tracing those to their source is where my open-source work comes from.
The main testbed is search_system — a Korean insurance-clause retrieval lab with nori BM25 + BGE-M3 hybrid retrieval, real-query failure cases, analyzer probes, and production-style traces. The pattern is usually small, but it matters in production:
Data that is valid on one side of a representation boundary silently breaks the other — NFD Hangul vs. the analyzer, stop strings vs. byte-fragment tokens, a literal
</tool_call>vs. the tool-call parser, bf16 logits vs. a float32 loss. Korean hits these boundaries constantly; English-only test suites never do.
Recent fixes have landed upstream in sentence-transformers, transformers, Elasticsearch, MLflow, and LlamaIndex: embedding-loss correctness, dynamic RoPE cache resets, continuous-batching output snapshots, nori analyzer docs, MLflow logging, and CJK text-splitter recursion.
Retrieval training and embedding losses
- sentence-transformers #3800 — bf16/fp16 training crash across six learning-to-rank losses. (merged)
- sentence-transformers #3817 — on multi-GPU
gather_across_devices, gathered positives inGISTEmbedLoss/CachedGISTEmbedLosswere masked as false negatives, so the cross-entropy target collapsed to-infand the training signal silently vanished on rank > 0. Surfaced with a Korean polarity probe; it also covered a regression the earlier in-batch-negative fix (#3453) had left in the GIST losses. (merged) - sentence-transformers #3816 — avoid materializing the full non-FAISS hard-negative mining similarity matrix. (merged)
- sentence-transformers #3812 — MPS support for cached-loss
RandContext. (merged) - sentence-transformers #3821 — hard-negative mining's relative-margin threshold was sign-dependent and inverted on negative positive-scores; made it sign-independent (#3819). (merged)
LLM serving and model internals
- huggingface/transformers #46530 —
StopStringCriteriamisses CJK stop strings on byte-level tokenizers (#46519). (merged) - huggingface/transformers #46624 — dynamic RoPE never reset
inv_freqon thelayer_type=Nonepath (it wrotemax_seq_len_cachedto a strayNone_…attribute), so a long sequence followed by a short one kept the scaled frequencies. (merged) - huggingface/transformers #46670 — continuous batching's output conversion mutated the active request state and returned live aliases of the growing token/logprob buffers; made it a snapshot. (merged)
- run-llama/llama_index #21900 —
RecursionErrorin text splitters when a single CJK/emoji token exceedschunk_size. (merged) - huggingface/transformers #46643 —
TopHLogitsWarperwas built withoutmin_tokens_to_keep, so with peaked logits and beam sampling top-h could keep a single token while the other warpers kept the beam-safe minimum. (open) - vllm-project/vllm #45168 — Hermes tool parser drops tool calls when a literal
</tool_call>appears inside a JSON string argument (#45167). (open) - NAVER hcx-vllm-plugin #5 — reported the same parser-boundary bug class for literal
<|im_end|>inside JSON string arguments. (open issue) - vllm-project/vllm #45162 —
collect_env.pyaborted with anAssertionErroron non-Linux platforms. (open)
Search analyzers and query normalization
- elastic/elasticsearch #151157 — documented that nori's default
XPNstop tag silently deletes meaning-bearing Korean prefixes, so 비급여 (non-covered) analyzes to 급여 (covered), from issue #151094. (merged) - apache/lucene #16242 — new
HangulCompositionCharFilterfor analysis-nori: NFD-form Hangul was silently unanalyzable as Korean (#16241). (open) - elastic/elasticsearch #151008 — wildcard queries: re-escape operator characters produced by the normalizer. (open)
- explosion/spaCy #13974 — Korean tokenizer collapsed whitespace runs, breaking
doc.textround-trips and offsets. (open)
Production tooling, tracing, and vector search
- facebookresearch/faiss #5272 — diagnosed that
musllinuxwheels were dropped during the move to official PyPI wheels (*-musllinux_*remained in thecibuildwheelskip list) and outlined the restore path; upstream shipped the fix infaiss-cpu 1.14.3via #5299. (resolved upstream) - mlflow #23957 — restored dataset expectation/tag logging in
genai.evaluate(scorers=[]). (merged) - mlflow #23818 — OpenTelemetry retriever-span reassembly on ingest. (open)
- ragas #2759 — make VertexAI imports optional so
import ragasdoes not fail without Vertex dependencies. (open) - BentoML #5632 / #5633 — proxy-client configurability and monitoring-log span metadata. (open)
Closed-source enterprise systems I worked on at 42Maru, with the research and engineering teams: Korean search quality, semantic QA, retrieval behavior, and OCR/NLP pipelines for real customer workflows.
- AI ship-sales design-support system — Daewoo Shipbuilding (DSME): semantic QA over ~100K historical records for shipowners' pre-contract technical inquiries. press
- AML / trade-based transaction detection — Hana Bank: OCR-NLP over cross-border remittance invoices. press
Government-published Korean NLP artifacts from 42Maru projects I worked on: five AI Hub releases across news MRC, national-archives LLM instruction data, finance/legal MRC, numeric reasoning MRC, and table QA. ~2.3M labeled QA pairs plus a ~300M-token corpus.
news MRC · national-archives LLM corpus · finance/legal MRC · numeric-reasoning MRC · table QA
- search_system — Korean clause retrieval lab: nori BM25 + BGE-M3 hybrid retrieval, analyzer probes, real-query failures, and traces that feed the upstream work above.
- Selected upstream workspaces — sentence-transformers, transformers, lucene, elasticsearch, vllm: short-lived branches for submitted fixes and repros.
- Domain probes — population-baseline-risk and insurance-bias-probe: focused artifacts around insurance-domain behavior and model/system bias.


