aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorDaniel Schadt <kingdread@gmx.de>2021-06-30 14:31:59 +0200
committerDaniel Schadt <kingdread@gmx.de>2021-06-30 14:31:59 +0200
commita087209121285d924456ecdd850e84b33f80726f (patch)
tree39c6243077fdbde90d5197c9727a709bbeaf93ef /README.md
parent04292d18c8ddd394acd683662caa9f848eced9d3 (diff)
downloadCana-a087209121285d924456ecdd850e84b33f80726f.tar.gz
Cana-a087209121285d924456ecdd850e84b33f80726f.tar.bz2
Cana-a087209121285d924456ecdd850e84b33f80726f.zip
CGI: Properly relay UTF-8 encoded text
The culprit was using "pack" to go from the String (which we get from readProcess) to the ByteString (which we need to have in the response). Since "pack" just throws away all the good codepoints, we ended up with a minor case of mojibake. There are two possible solutions to this issue: 1) Properly encode the text from String to ByteString using utf8-string's function or 2) Read a ByteString from the process by using process-extras Solution 1 works, but it has the side effect of assuming the process's output encoding (based on the LOCALE). Furthermore, we cannot even be sure that the script doesn't send a completely different encoding via the charset header field. Therefore, this would introduce 2 points at which character encoding mismatches could happen, which is not something we're looking forward to. Solution 2 has the benefit of just passing through the data basically unchanged. We don't need to convert the ByteString to a textual string ourselves, so that is a big benefit. The only thing where we now have UTF-8 conversion is when giving the request URI to the CGI script, but 1) URIs should be ASCII & percent encoded anyway, so there's no big chance of failure and 2) Now we are using UTF8.fromString properly
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions