Special:Export indicates in the exported XML the wrong length.
To me it seems, that HTML entities cause this problem. Eg the export from the german wikipedia of article "Vergleich (Zahlen)" indicates in the text element: <text xml:space="preserve" bytes="23353">
Including the HTML entities the exported text is 25659 bytes long, having converted all HTML entities into their ASCII representations the article text becomes 23253 bytes long.
I would prefer to see the length within XML here, as it would make it easier to retrieve the content.
Description
Description
Event Timeline
Comment Actions
The export is done in UTF-8 and with unix-newlines (only \n, not \r\n like under windows)
The bytes indicates the real bytes, not the characters in the document. german umlauts takes 2 bytes, for example.
From a quick check the current bytes of the page and the export matches (after the decode of entities, which is normal processing in xml)