Page MenuHomePhabricator

Wikirun reference statistics undercounts by a large margin
Open, Needs TriagePublic

Description

The reference counting for stats about the articles at the end of the game is using logic that sometimes dramatically undercounts the number of references. For instance, the app claims that en:Hide-and-seek has 4 references when it has 22.

My summary:

  • Current WikiRun code: counts how often {{cite appears in the wikitext of the article. Problem is not all references use templates or even templates that start with "cite" so it can miss the vast majority of references within an article. It also would break further if you extended to more language editions who have their own template names.
  • Cheap fix: switch to this regex. This will count <ref> tags and only the first time they're used (so number of unique sources, which I think is what you want, as opposed to number of in-line citations).
  • Proper fix: wikitext is a bad place to go for counting references because even <ref> tags aren't required to trigger a reference (more details if you're curious). Instead, you should request the article HTML (API endpoints) and use this logic to count all the references listed at the end of the article. That logic comes from a Python library where the HTML is already parsed (it's not regex-based) so you would also need to do that step. On principle, this is the right solution but the fix above is simpler based on the existing code and should get you 95% of the way there to a place where the counts make sense at a glance at least.

P.S. Love the game thanks!