Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn
, putChar
which looks fine and readable)--the examples show GHCi and Hugs98:
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> "hello: привет"
"hello: 108710881080107410771090"
Prelude> 'Я'
'1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || || ||__ Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__|| __|| Copyright (c) 1994-2005
||---|| ___|| World Wide Web: http://haskell.org/hugs
|| || Bugs: http://hackage.haskell.org/trac/hugs
|| || Version: September 2006 _________________________________________
Hugs mode: Restart with command line option +98 for Haskell 98 mode
Type :? for help
Hugs> "hello: привет"
"hello: 108710881080107410771090"
Hugs> 'Я'
'1071'
Hugs> putStrLn "hello: привет"
hello: привет
Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$
We can guess that it's because print
and show
are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> show 'Я'
"'\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\1071'"
Hugs> :q
[Leaving Hugs]
$
But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.
Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.
To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print
in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar
, putStrLn
, i.e. unescaped.)
(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)
See Question&Answers more detail:
os