Html syntax Highlight

The idea is to use a regex pattern for tokenization and deterministic tagging. Then, a classifier (LSTM etc) can fill in the tags on ambiguous tokens

Classes

We are trying to define some classes, that should work across most languages

Keywords

kwfl: flow keyword. if, for, return, try, except
kwop: operator keyword. Used like operator. in, is, select, new, echo
kwmo: modifier keyword. pub, private, static, final, volatile
kwde: declare variable, class, function
kwim: import keyword. import, from, #include (?), use

Syntax features

id: indentation. space/tab at beginning of line
ws: whitespace. space, tab
nl: new-line.
brop: opening brackets
brcl: closing brackets
sy: syntax features. :, ::, ->, =>, >>>, also <> in types
pu: punctuation.
co: comments (inline/multiline/single line)

Literal values

nu: number. dec, int, scientific, hex, bin, percent.
st: string.
bo: boolean literals.
li: other literal. null, None, undefined, built in constant values

Operators

opbi: binary operator. Other binary operators
opun: unary operator. &ref, !not, X', x++, --x
opas: assignment operators. =, <-, +=,
opmo: modifier operators. references, pointers etc

Objects and functions

pa: parameter. a variable defined together with a function.
ty: type keyword. int, f64, void
tyco: type keyword constructor.
cl: class. Non-primitve defined, also traits.
clco: class constructor. class name used as a function
mo: module/namespace.
fnme: method. A function on an object instance
fnas: associated/static method/function. On module or class
fnfr: standalone function.
fnto: function tear-off.
an: annotation. @Override, #[ allow() ], @property, rust lifetimes
va: variable or similar user defined identifier.
at: attribute. a variable/constant on some object or module.

Other

uk: unknown.

Roadmap

✅ LSTM Tagger 24-12-07
✅ Render HTML preview 25-01-19
✅ NDJSON dataset 25-08-30
✅ Cleanup labels, linting 25-09-03
✅ Optuna, settle for a good LSTM model 25-09-20
❓ Balance dataset split criterion?
❓ Lightweight inference program.
❓ Reset indentation: avoid unnecessary indentation of all lines
❓ RNN variant comparison
❓ Feature based classifier
❓ data augmentation
❓ token LM
❓ character level LM -> "end to end" model
❓ try to catch code fragments in text?
❓ language classifier?
❓ highlighting inside strings?

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
data		data
notebooks		notebooks
optuna		optuna
previews		previews
scripts		scripts
sh		sh
src		src
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
_style.css		_style.css
hlclip.py		hlclip.py
model_inference_meta.json		model_inference_meta.json
model_inference_state.pth		model_inference_state.pth
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Html syntax Highlight

Classes

Keywords

Syntax features

Literal values

Operators

Objects and functions

Other

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MarcusSalzer/html_highlight

Folders and files

Latest commit

History

Repository files navigation

Html syntax Highlight

Classes

Keywords

Syntax features

Literal values

Operators

Objects and functions

Other

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages