Problem
(copy and paste from https://www.mediawiki.org/wiki/Toolserver:Code_snippets#Fix_UTF-8_encoded_as_latin-1 )
Versions of MySQL before 5 did not have binary nor UTF-8 support (or at least nobody cared). A common workaround was to encode the text as a UTF-8 byte sequence and give that to MySQL which read the byte in as if they were latin-1 characters. This worked alright until MySQL and VARCHAR became more Unicode aware. Now depending on settings, it may convert the latin-1 bytes to Unicode code points then encode to your specified encoding. So è (U+00E8, UTF-8: c3 a8) becomes è (U+00C3 U+00A8, UTF-8: c3 83 c2 a8), the equivalent python transformation is u'è'.encode('utf-8').decode('latin').encode('utf-8'). This double encoding is the reason why JOINs will fail with the newer VARBINARY fields used everywhere by MediaWiki now.
This is quite complicated and annoying to work with. Source of bugs.
Who would benefit
Anyone who uses database access on Toollabs to do queries.
Proposed solution
Convert the remaining legacy field and tables so we don't have to do tricks like CONVERT( CONVERT( CONVERT( your_column USING latin1) USING binary) USING utf8); any more?