Page MenuHomePhabricator

Fix subref_reuse_count calculation
Open, Needs TriagePublic

Description

We currently calculate the subref_reuse_count field in way which results in obviously incorrect values such as -2, for the page
https://de.wikipedia.org/wiki/Jos%C3%A9_Nunes_(Herrscher)

This task is complete when the calculation is fixed.

  • Import the above page as a test fixture.
  • Write a test to verify that it is miscalculating the subref_reuse_count as -2.
    • Yes but the count is currently 1, which is also wrong. This is probably due to recent page edits.
  • Fix the calculation, probably by iterating through the subref markers and counting unique identifiers. Should be 3 for the sample page.
Implementation
  • Don't distinguish by valid / invalid sub-refs
Open question
  • How to deal with reuses counting and sub-refs?
Review

Event Timeline

awight updated the task description. (Show Details)
awight updated the task description. (Show Details)

After looking into the issue and the code it seems, that the detection of "valid subrefs" is the reason for the confusion in the numbers. We're trying to detect "invalid" / "broken" sub-refs by looking for an error code in the ref tag. Then these refs will used to recalculate some of the numbers.

But some (sub) refs have an error code, but are still valid and good enough to be rendered.

See for example https://de.wikipedia.org/api/rest_v1/page/html/Jos%C3%A9_Nunes_(Herrscher)/262052622 there are errors in the Parsoid metadata due to some additional quotes in the Wikitext but the sub-refs are rendered fine.

So we either

  • need to improve the valid / invalid detection or we
  • don't try to remove "invalid" subrefs from the calculation
WMDE-Fisch updated the task description. (Show Details)
WMDE-Fisch subscribed.