Chinese language has Traditional and Simplified variants. These used to be two different wikis but were merged at some point. We will need to be able to deal with input from both.
Fortunately Traditional and Simplified variants have a one on one relationship so we can convert between each other but this means when reading and training we should be left with a single variant. Furthermore wikilabels should translate between the two during the handcoding phase.
Chinese would also benefit from the n-gram approach we are working on since a good chunk of Chinese badwords (as they are in the filter) are made out of two or more characters.