PDA

View Full Version : The Mandarin


drewprops
2013-12-07, 20:06
No, not the guy from Iron Man... the language.

Some folks have axed me to put some Mandarin text on their web page and I'm wondering if there's anything special I need to do with using these non-Western characters.

You know, like turning in a circle three times or something.

I can look it up, but if you have anecdotes on this topic it would be appreciated.

:)

...

Brad
2013-12-07, 21:05
For plain old HTML, UTF-8 should be good enough, and chances are you're probably already using it! :) It's the default encoding for all Mac apps I've used that deal with text. Same for many Linux tools. I can't say what your typical Windows apps do.

If you're rolling your own PHP pages or using some PHP CMS, you may hit some pitfalls. PHP's string encoding strategy is all over the map and varies wildly between versions of PHP and different functions in the core library. Some things deal with raw bytes, some Latin-1, and some UTF-8. So, this is a place for potential investigation.

Modern Python strings are generally UTF-8, but most websites aren't written in Python. Java strings are internally UTF-16 (IIRC), but there are enough layers of obfuscation and translation in Java apps that things tend to work out to URT-8 on the screen on their own, and nobody writes (normal) websites in Java anyway. I don't have enough Ruby experience to cite examples there.

The only other really tricky point could be if you have text going into or coming out of a database. Current versions of Postgres and Mongo use UTF-8 by default. (edit: correction… psql initializes to a charset based on your locale, but it often works out to UTF-8.) MySQL uses latin1 by default (grumble grumble). Third-party software installers may initialize the DB or tables to specific character sets on their own, too. So, there's almost certainly some hands-on investigation to be done there.