Remove RFC 2047 encoding from Content-Disposition filename#36328
Remove RFC 2047 encoding from Content-Disposition filename#36328tobifasc wants to merge 1 commit intospring-projects:mainfrom
Conversation
| } | ||
| return PRINTABLE.get(b); | ||
| private static String toIso88591(String input) { | ||
| return new String(input.getBytes(StandardCharsets.ISO_8859_1)); |
There was a problem hiding this comment.
this would lead to Questionmarks "?" in the propose filename, which is not allowed in most filesystems,
I would rather propose an underline in place of non ascii characters
Suggetion:
// NFD decomposition splits characters like ä into base character 'a' + combining diacritic.
// Removing the combining diacritics (Unicode category Mn) gives a readable ASCII approximation
// (e.g. "Schöne Äpfel" → "Schone Apfel") without resorting to '?' which is a forbidden
// filename character on Windows.
String decomposed = java.text.Normalizer.normalize(input, java.text.Normalizer.Form.NFD);
return decomposed
.replaceAll("\p{InCombiningDiacriticalMarks}", "")
.replaceAll("[^\\x20-\\x7E]", "_");
There was a problem hiding this comment.
Fair point with questionmarks in filenames. I changed the replacement character to "_".
Regarding the removal of combining diacritics:
Since this is targeting ISO 8859-1, characters like ä, ö, û, ñ,... are actually supported and shouldn't cause issues.
I wanted to "fail fast" by removing unavailable characters instead of trying to approximate them. Does that sound reasonable or would you still prefer removing diacritical marks of characters not in ISO 8859-1?
Updates the Content-Disposition header creation logic to use only ISO-8859-1 characters for the fallback 'filename' parameter instead of RFC 2047 encoded strings. Non-compatible characters are replaced with '_'. This does not remove the ability to parse RFC 2047 encoded filenames. Signed-off-by: Tobias Fasching <tobias.fasching@outlook.com>
Appendix C.1 of RFC 2066 and Section 5 of RFC 2047 describe that an "encoded-word" (as described in RFC 2047) must not be used as parameter of a Content-Disposition header.
The current implementation in
ContentDispositiondoes however encode the fallbackfilenameparameter using the mechanism described in RFC 2047 (given that the charset is set to something other thanUS_ASCII).Related discussion: #29861
This PR updates the Content-Disposition header creation logic to use only ISO-8859-1 characters for the fallback
filenameparameter instead. Non-compatible characters are replaced with_. The "full" filename is still present in thefilename*parameter.This does not remove the ability to parse RFC 2047 encoded headers.