Page MenuHomePhabricator

Make "about" attribute IDs deterministic
Open, MediumPublic

Description

As shown by the tools/regen-transformTests.sh script, every time we reparse a large page (like [[en:Barack_Obama]]) we generate different ID values for the about attributes.

This seems to be because the token pipeline is async and thus non-deterministic, and we don't have a final DOM post-pass to ensure a deterministic numbering.

See also T87556: Thoughts on element IDs, sections, incremental parsing and fast section editing.

Event Timeline

cscott renamed this task from Make about IDs deterministic to Make "about" attribute IDs deterministic.Oct 4 2018, 2:09 PM
cscott triaged this task as Medium priority.
cscott created this task.
cscott updated the task description. (Show Details)

Change 476582 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Ignore failures from latest node in travis

https://gerrit.wikimedia.org/r/476582

Change 476582 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Ignore failures from latest node in travis

https://gerrit.wikimedia.org/r/476582

Does this still happen in the PHP parser?

Depends on what 'deterministic' means. Because ultimately what templates and extensions generate also influence about id assignment even if tokens are processed sequentially (which they are in the parsoid/php world). Reparses of a page aren't guaranteed to generate same ids because an intervening template could have changed it what it outputs (added / removed an extension which now requires an about id, for ex. or anything else that gets about id assigned to them).