Skip to content

Make filename-base filetype detection independent of working directory#4112

Open
questionmarkexclamationpoint wants to merge 2 commits into
micro-editor:masterfrom
questionmarkexclamationpoint:consistent-filepath
Open

Make filename-base filetype detection independent of working directory#4112
questionmarkexclamationpoint wants to merge 2 commits into
micro-editor:masterfrom
questionmarkexclamationpoint:consistent-filepath

Conversation

@questionmarkexclamationpoint
Copy link
Copy Markdown

@questionmarkexclamationpoint questionmarkexclamationpoint commented Jun 2, 2026

At the moment, filetype detection based on the filename is rather brittle: for regexes that rely on the path, they rely on the relative path. If the filename regex is, for instance, /privoxy\/config$/, syntax highlighting only works if your relative path includes the parent directory right before the file. So, micro ../privoxy/config, micro privoxy/config, etc. work, but micro config (where config lives in a folder named privoxy), micro privoxy/./config (convoluted but u get my drift), etc. do not.

Currently this issue affects two filetypes in particular, whose regex relies completely on the path: privoxy-config, and, specifically, weechat config files (type ini). A number of filetypes make use of path-based heuristics, however, and those heuristics break down depending upon where you happen to be when you open the file.

I've updated Header.MatchFileName (now MatchFilePath) to take the absolute path to the file (updating callers), to run the check against that, and only return true if a match exists that extends into the filename.

Note that this will both add and remove categories of matches:

  • Previously, if you supplied an absolute path to micro, and your regex matched on the path but not the file, Micro would consider this a match. For instance, /some-file/ matches on /this/is/not/some-file/but/itstillmatches. This seems like unintended behavior to me, but who knows. This wacky behavior can still be achieved by using /some-file.*/, of course.
  • Again, as outlined above, opening privoxy/config from within the privoxy folder (micro config) would not match, and now it does. IMO this seems like the originally intended behavior, and allows for more particular regexes.

I wound up finding this behavior while writing a syntax highlighter for Kitty config files, since I was using a regex of /kitty\/.*\.conf/ to not be overly broad, while still including every .conf file within that folder. I'm about to open a PR with that addition pending this change if that's chill : )

In any case, please let me know what you think! Very happy user of micro for many years : )

(p.s. I've never written Go before 👀 it's probably fine)

@JoeKar
Copy link
Copy Markdown
Member

JoeKar commented Jun 2, 2026

But basically all we need is to exchange b.Path with b.AbsPath in UpdateRules(), right?
Don't ask me why we rely on the relative path for so long when we allow path patterns. 🤔

@questionmarkexclamationpoint
Copy link
Copy Markdown
Author

questionmarkexclamationpoint commented Jun 2, 2026

But basically all we need is to exchange b.Path with b.AbsPath in UpdateRules(), right? Don't ask me why we rely on the relative path for so long when we allow path patterns. 🤔

imma be honest i wanted to make the minimum change given it's a codebase i looked at for the first time yesterday, written in a language i used for the first time yesterday, and changing one parameter and the contract of a function called in two places seemed pretty low risk.

in any case, the change specifically to the MatchFileName/MatchFilesPath is warranted regardless i think, since it avoids the poteeeeential for matches on the file's ancestor dorectories without maching the file itself, which seems undesireable

lemme see what u mean about that function tho

ty for getting back so fast homie

- Since the regex is compared against the absolute path, start of string (`^`) replaced with `(^|/|\\)`
- Replaced exclusively unix style paths in a couple places with unix or dos
- Updated filename detection based purely on the directory to extend into the filename (`.+`) so that our new matcher succeeds
- Fixed a random issue with apache confs allowing matching any file whose path included apache.d (with or without infinite backslashes)
@questionmarkexclamationpoint
Copy link
Copy Markdown
Author

Ok sorry I thought you were requesting a change and not asking for a clarification(?) and then i got distracted.

Went to triple double check that I wouldn't be breaking any regexes, and yep, i was. Caught a few other minor issues, too.


detect:
filename: "httpd\\.conf|mime\\.types|vhosts\\.d\\\\*|\\.htaccess"
filename: "httpd\\.conf|mime\\.types|vhosts\\.d(/|\\\\).*|\\.htaccess"
Copy link
Copy Markdown
Author

@questionmarkexclamationpoint questionmarkexclamationpoint Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vhosts\\.d\\\\* previously would've matched:

  • vhosts.d (as in a file named vhosts.d)
  • vhosts.d\
  • vhosts.deez
  • vhosts.d\eez
  • vhosts.d\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\eez

With the addition of .* onto the end of the path, the matched substring of this regex extends into the name of the file, so it passes the new function definition.

Comment thread runtime/syntax/groff.yaml

detect:
filename: "\\.m[ems]$|\\.rof|\\.tmac$|^tmac."
filename: "\\.m[ems]$|\\.rof|\\.tmac$|(^|/|\\\\)tmac."
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation covers most of the issues addressed in this commit: start of string. Previously this regex also would've failed if you opened dir/.tmac. I hadn't considered start-of-string based matching, so, there's a few more previously-failing-now-fixed regexes with this patch.

Comment thread runtime/syntax/ini.yaml

detect:
filename: "\\.(ini|desktop|lfl|override|tscn|tres)$|(mimeapps\\.list|pinforc|setup\\.cfg|project\\.godot)$|weechat/.+\\.conf$"
filename: "\\.(ini|desktop|lfl|override|tscn|tres)$|(mimeapps\\.list|pinforc|setup\\.cfg|project\\.godot)$|weechat(/|\\\\).+\\.conf$"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: fixing for DOS users.


detect:
filename: "(nftables\\.(conf|rules)$|nftables(\\.rules)?\\.d/)"
filename: "(nftables\\.(conf|rules)$|nftables(\\.rules)?\\.d(/|\\\\).+)"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, note the need for the .+ (.* also fine) so that it picks up part of the filename.

Comment thread runtime/syntax/vi.yaml

detect:
filename: "(^|/|\\.)(ex|vim)rc$|\\.vim"
filename: "(^|/|\\\\)\\.?(ex|vim)rc$|\\.vim"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the intention here was, but with this change this no longer matches dir/something.vimrc etc. I don't know if people name their vimrcs that way. I use micro.

@questionmarkexclamationpoint
Copy link
Copy Markdown
Author

But also: MatchFilePath is only called from that one path, so in that regard I believe I am gucci.

I itch to rename filename in the syntax files to filepath for correctness, but alas, backwards compatibility. In any case, it never truly was matching on the filename alone.

Comment thread pkg/highlight/parser.go
Comment on lines +157 to +159
// MatchFilePath will check the given file's absolute path with the stored regex;
// Only matches which include at least part of the filename are considered.
func (header *Header) MatchFilePath(absoluteFilePath string) bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really convinced, that all of this is really needed.
Using header.MatchFileName(b.AbsPath) to check against header.FileNameRegex seems feasible combined with your syntax files changes, unless proven wrong.
Did I oversee something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants