|
The culprit was using "pack" to go from the String (which we get from
readProcess) to the ByteString (which we need to have in the response).
Since "pack" just throws away all the good codepoints, we ended up with
a minor case of mojibake.
There are two possible solutions to this issue:
1) Properly encode the text from String to ByteString using
utf8-string's function
or
2) Read a ByteString from the process by using process-extras
Solution 1 works, but it has the side effect of assuming the process's
output encoding (based on the LOCALE). Furthermore, we cannot even be
sure that the script doesn't send a completely different encoding via
the charset header field. Therefore, this would introduce 2 points at
which character encoding mismatches could happen, which is not something
we're looking forward to.
Solution 2 has the benefit of just passing through the data basically
unchanged. We don't need to convert the ByteString to a textual string
ourselves, so that is a big benefit. The only thing where we now have
UTF-8 conversion is when giving the request URI to the CGI script, but
1) URIs should be ASCII & percent encoded anyway, so there's no big
chance of failure
and
2) Now we are using UTF8.fromString properly
|