Cana - Project Gemini server written in Haskell

diff options

author	Daniel Schadt <kingdread@gmx.de>	2021-06-30 14:31:59 +0200
committer	Daniel Schadt <kingdread@gmx.de>	2021-06-30 14:31:59 +0200
commit	a087209121285d924456ecdd850e84b33f80726f (patch)
tree	39c6243077fdbde90d5197c9727a709bbeaf93ef /app
parent	04292d18c8ddd394acd683662caa9f848eced9d3 (diff)
download	Cana-a087209121285d924456ecdd850e84b33f80726f.tar.gz Cana-a087209121285d924456ecdd850e84b33f80726f.tar.bz2 Cana-a087209121285d924456ecdd850e84b33f80726f.zip

CGI: Properly relay UTF-8 encoded text

The culprit was using "pack" to go from the String (which we get from readProcess) to the ByteString (which we need to have in the response). Since "pack" just throws away all the good codepoints, we ended up with a minor case of mojibake. There are two possible solutions to this issue: 1) Properly encode the text from String to ByteString using utf8-string's function or 2) Read a ByteString from the process by using process-extras Solution 1 works, but it has the side effect of assuming the process's output encoding (based on the LOCALE). Furthermore, we cannot even be sure that the script doesn't send a completely different encoding via the charset header field. Therefore, this would introduce 2 points at which character encoding mismatches could happen, which is not something we're looking forward to. Solution 2 has the benefit of just passing through the data basically unchanged. We don't need to convert the ByteString to a textual string ourselves, so that is a big benefit. The only thing where we now have UTF-8 conversion is when giving the request URI to the CGI script, but 1) URIs should be ASCII & percent encoded anyway, so there's no big chance of failure and 2) Now we are using UTF8.fromString properly

Diffstat (limited to 'app')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: