Page MenuHomePhabricator

[8 hours] PageAssessments can leave behind invalid orphaned assessments
Open, LowPublicSpike

Description

Discovered this while working on T326387: Deploy PageAssessments to Chinese Wikipedia.

Timeline
  1. https://gerrit.wikimedia.org/r/c/876196 was deployed
  2. The {{#assessment}} parser function was added to the relevant templates
  3. We realized zhwiki wanted assessments for subprojects as well, so we enabled $wgPageAssessmentsSubprojects with https://gerrit.wikimedia.org/r/c/884474
  4. After waiting some days, no subprojects were appearing in the database. Null edits and changes to assessment data didn't fix it. I investigated and found that PageAssessments only creates projects when given a name that doesn't already exist (ref).
    1. This means all projects that are actually subprojects are forever stuck as normal projects, which in itself is a bug (T330789)
  5. To remedy the issue, we removed the {{#assessment}} calls from the templates, and waited for all assessments to disappear.
  6. The purgeUnusedProjects.php script was ran
Orphaned assessments

After doing the above steps, we were still left with 7 rows in page_assessments, which shouldn't be there because the parser function was removed from the templates.

mysql:research@dbstore1007.eqiad.wmnet [zhwiki]> SELECT * FROM page_assessments;
+------------+---------------+-----------+---------------+------------------+
| pa_page_id | pa_project_id | pa_class  | pa_importance | pa_page_revision |
+------------+---------------+-----------+---------------+------------------+
|     540955 |           134 | 丙        | 高            |          4096641 |
|     540955 |           166 |           | 极高          |          4096641 |
|    7650411 |           143 |           | 未知          |         67735432 |
|    8264268 |            33 | 丙        | 高            |         75754885 |
|    8264268 |            81 | 丙        | 高            |         75754885 |
|    8264268 |            98 | 丙        | 高            |         75754885 |
|    8270915 |            32 | 小作品    | 极低          |         75851073 |
+------------+---------------+-----------+---------------+------------------+
7 rows in set (0.001 sec)
mysql:research@dbstore1007.eqiad.wmnet [zhwiki]> SELECT page_id, page_title, page_namespace FROM page_assessments JOIN page ON page_id = pa_page_id;
+---------+-----------------------+----------------+
| page_id | page_title            | page_namespace |
+---------+-----------------------+----------------+
| 8270915 | 大屯海水库            |              0 |
|  540955 | 符拉迪沃斯托克        |              0 |
|  540955 | 符拉迪沃斯托克        |              0 |
+---------+-----------------------+----------------+
3 rows in set (0.001 sec)

The three pages that still exist were apparently recently moved without a redirect, and the others were deleted entirely. Yet, the assessments remain.

Conclusions

I tried to reproduce the bug on my local, but had no success. For at least some of the rows, I've ruled out any clashing with deployments or template changes. PageAssessments uses the page ID, not the page title, so the act of moving by itself shouldn't matter, either. Also, there are pages that were completely deleted.

This is no big deal because it's only 7 rows, but I thought I'd still file a bug. If the issue is because of something unrelated to the deployments and/or template changes, that suggests this could be a problem on larger wikis that have had PageAssessments since it was first released, such as English Wikipedia. PageAssessments was only live on zhwiki for a few weeks. Fortunately the storage footprint is pretty small, so probably not a big concern. However it could throw off research analysis on the assessment data.

More investigation is needed.

Event Timeline

Restricted Application added subscribers: Ericliu1912, Stang, Aklapper. · View Herald Transcript
MusikAnimal renamed this task from PageAssessments can leave behind invalid and/or outdated assessments to PageAssessments can leave behind invalid orphaned assessments.Feb 9 2023, 4:47 AM
MusikAnimal updated the task description. (Show Details)
MusikAnimal changed the subtype of this task from "Bug Report" to "Spike".Feb 9 2023, 4:59 AM
JMcLeod_WMF renamed this task from PageAssessments can leave behind invalid orphaned assessments to [8 hours] PageAssessments can leave behind invalid orphaned assessments.Feb 13 2023, 3:08 PM