Description
Since the first batch of RESTBase hosts (1 host per DC, roughly 4% for eqiad and 6% for codfw) were switched to mw-parsoid from parsoid-php, that is at ~2024-02-2024T13:52Z (T357392#9580018) we are seeing in Logstash a number of Information messages that appear somewhat worrying.
[info] Wikitext for this page has duplicate ids: Summary [info] Wikitext for this page has duplicate ids: fileinfotpl_creator_image [info] Wikitext for this page has duplicate ids: creator [info] Wikitext for this page has duplicate ids: fileinfotpl_desc [info] Wikitext for this page has duplicate ids: rationale_repl
No trace available.
Impact
Probably minimal since the messages are logged at info level. However, the only result in codesearch that returns something relevant is
https://gerrit.wikimedia.org/g/mediawiki/services/parsoid/+/master/src/Utils/DOMDataUtils.php#452
which has a big FIXME above it.
The number of messages is worrying. At 4% traffic, we log ~10k messages per hour.
Notes
A quick grep for similar errors in all parse* hosts shows that no similar messages like that were emitted in production.