Page MenuHomePhabricator

A lot of `[info] Wikitext for this page has duplicate ids:` in logstash for mw-parsoid. Possibly related to PageBundle
Open, HighPublic

Description

Description

Since the first batch of RESTBase hosts (1 host per DC, roughly 4% for eqiad and 6% for codfw) were switched to mw-parsoid from parsoid-php, that is at ~2024-02-2024T13:52Z (T357392#9580018) we are seeing in Logstash a number of Information messages that appear somewhat worrying.

[info] Wikitext for this page has duplicate ids: Summary
[info] Wikitext for this page has duplicate ids: fileinfotpl_creator_image
[info] Wikitext for this page has duplicate ids: creator
[info] Wikitext for this page has duplicate ids: fileinfotpl_desc
[info] Wikitext for this page has duplicate ids: rationale_repl

No trace available.

Impact

Probably minimal since the messages are logged at info level. However, the only result in codesearch that returns something relevant is

https://gerrit.wikimedia.org/g/mediawiki/services/parsoid/+/master/src/Utils/DOMDataUtils.php#452

which has a big FIXME above it.

The number of messages is worrying. At 4% traffic, we log ~10k messages per hour.

Notes

A quick grep for similar errors in all parse* hosts shows that no similar messages like that were emitted in production.

Event Timeline

It doesn't show up in production because the logging level is set to warn or higher there. Separately, we should probably suppress non-actionable logspam like this (there are a few of those in Parsoid).

T200517: Emit lint error or category when a page uses duplicate HTML IDs has some work done on it that could maybe change where this is being 'logged'. :)

Change 1007031 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Remove bogus comment in storeInPageBundle

https://gerrit.wikimedia.org/r/1007031

Change 1007031 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Remove bogus comment in storeInPageBundle

https://gerrit.wikimedia.org/r/1007031

Change 1008400 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a20

https://gerrit.wikimedia.org/r/1008400

Change 1008400 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.19.0-a20

https://gerrit.wikimedia.org/r/1008400