Page MenuHomePhabricator

IABot is counting string length in bytes rather than unicode symbols
Closed, ResolvedPublicBUG REPORT

Description

For pages using pretty formatting with vertically aligned spacing on non-english wikis, non-ascii characters throw of the string length count which causes spacing issues.

See: https://ru.wikipedia.org/w/index.php?title=Соловецкий_камень_(Москва)&diff=124619756&oldid=124259457

Event Timeline

Cyberpower678 triaged this task as Medium priority.
Cyberpower678 moved this task from Inbox to Backlog: Syntax on the InternetArchiveBot board.

Looks like the problem is in Core/generator.php: here, here and here. If I understand correctly, replacing in each case

$strlen = max( $strlen, strlen( $parameter ) );

by

$strlen = max( $strlen, mb_strlen( $parameter ) );

and

" |" . str_pad( $parameter, $strlen, " " ) . " = $value\n";

by something like

" |$parameter" . str_repeat( " ", $strlen - mb_strlen( $parameter ) ) . " = $value\n";

should solve the issue.
(It might be also a good idea to refactor such code into a separate function instead of copy-pasting it...)

Well, I've created a pull request at GitHub. Hopefully, this will speed up the process..

Cyberpower678 claimed this task.