Page MenuHomePhabricator

templatelinks table on Commons SQL database is not updating properly
Closed, InvalidPublic

Description

Template:Information/author_processing was retired on November 11 2019, when Template:Information switched to Lua version. However now two months latter there are still 28,836,449 entries in templatelinks table on Commons SQL database, indicating number of templates transcluding that template. That number can be looked up several different ways:

After retiring the template the content of the template was replaced so the only thing it does now is to add c:Category:Pages using "Information/author processing" template. That category reminded empty since its creation.

There is similar issue with c:Module:Date which was exported to many wikipedias under name Module:DateI18n. For consistence, the template was renamed and all the templates using the old name were corrected in April 2019. However according to c:Special:MostTranscludedPages it is still "used on 16,467,713 pages". We do have c:Category:Pages_using_Module:Date to track those pages (added by the module to all pages calling it), but it is mostly empty.

Event Timeline

@jcrespo maybe you can point this in the right direction?

Hi, @Multichill,

I might be able to tell you who to contact: [disclaimer- please note DBA take care only of server maintenance, things like "are databases up?" "Do al nodes have the same data on them?" "Do developers have enough servers?", but cannot help with debugging mediawiki-related bugs (e.g. we don't handle Wikimedia-database-issue tickets)].

I believe the Contributors-Team are the right team to rise bugs on edit-related issues (specially parsing ones) and the team taking care of editor needs. :-D Please add their tag and provide all information to them. If they need technology teams' support they will be the ones to ask further. Cheers.

Adding Contributors-Team per above suggestion.

I looked up the number of transclussions of Template:Information/author_processing and Module:Date at https://commons.wikimedia.org/wiki/Special:MostTranscludedPages and the numbers dropped in last month:

  • from 28.8M to 26.7M out of original 53M, for Template:Information/author_processing
  • from 16.5M to 15.6M out of original ~53M for Module:Date
Umherirrender added a subscriber: Umherirrender.

The deletion of templatelinks is done via the job queue, because reparsing many pages takes some time. The counts are at 0 now.

Just wait until the jobqueue is processed, doing null edits is not a good idea on that high number - https://en.wikipedia.org/wiki/Wikipedia:Purge#Null_edit

The deletion of templatelinks is done via the job queue, because reparsing many pages takes some time. The counts are at 0 now.

Just wait until the jobqueue is processed, doing null edits is not a good idea on that high number - https://en.wikipedia.org/wiki/Wikipedia:Purge#Null_edit

You assume that it even ends up in the job queue at all. Take for example https://commons.wikimedia.org/w/index.php?title=Template%3AKremlin.ru&type=revision&diff=674341865&oldid=640550655 , if the job queue is empty, why aren't the subcategories at https://commons.wikimedia.org/wiki/Category:Files_from_Kremlin.ru empty? Anyway. Both me and @Jarekt (and others have filled) plenty of bugs about these issues and didn't really get any support so we just do a ton of null edits to do what the software should be doing.

The job queue is designed to do the reparses, because that does not fit into one only transaction, wikitext processing is expensive.

When looking at the database there was not really refreshes after the edit (14 July)

select SUBSTR( page_links_updated, 1, 8) as day
     , count(*) as count
  from categorylinks
  inner join page
          on cl_from = page_id
 where cl_to like 'Files\_from\_Kremlin.ru,\_%' ESCAPE '\\'
 group by SUBSTR( page_links_updated, 1, 8);
+----------+-------+
| day      | count |
+----------+-------+
| 20220321 |  2288 |
| 20220322 |  2997 |
| 20220323 |  2868 |
| 20220324 |     6 |
| 20220325 |   247 |
| 20220326 |   263 |
| 20220327 |     5 |
| 20220328 |     4 |
| 20220329 |    17 |
| 20220330 |     9 |
| 20220331 |     4 |
| 20220401 |    54 |
| 20220402 |    17 |
| 20220403 |    11 |
| 20220404 |    15 |
| 20220405 |    15 |
| 20220406 |     8 |
| 20220407 |    14 |
| 20220408 |     5 |
| 20220409 |    12 |
| 20220410 |    10 |
| 20220411 |    30 |
| 20220412 |    19 |
| 20220413 |    10 |
| 20220414 |     3 |
| 20220415 |     1 |
| 20220416 |     2 |
| 20220417 |    22 |
| 20220418 |     1 |
| 20220419 |     8 |
| 20220420 |     1 |
| 20220421 |     1 |
| 20220422 |    11 |
| 20220423 |    16 |
| 20220424 |    22 |
| 20220425 |     6 |
| 20220427 |     2 |
| 20220428 |     1 |
| 20220429 |     4 |
| 20220430 |     9 |
| 20220501 |     7 |
| 20220502 |     6 |
| 20220503 |     3 |
| 20220504 |     7 |
| 20220506 |     6 |
| 20220507 |     1 |
| 20220508 |     4 |
| 20220509 |    50 |
| 20220510 |     3 |
| 20220511 |    16 |
| 20220512 |     8 |
| 20220513 |     4 |
| 20220514 |    16 |
| 20220515 |    15 |
| 20220517 |     3 |
| 20220518 |     2 |
| 20220519 |     7 |
| 20220520 |    13 |
| 20220521 |    12 |
| 20220522 |    32 |
| 20220523 |    36 |
| 20220524 |     9 |
| 20220525 |    12 |
| 20220526 |     8 |
| 20220527 |   100 |
| 20220528 |    43 |
| 20220529 |    14 |
| 20220530 |     5 |
| 20220531 |    20 |
| 20220601 |     3 |
| 20220602 |     6 |
| 20220603 |     6 |
| 20220604 |    15 |
| 20220605 |    28 |
| 20220606 |    14 |
| 20220607 |    24 |
| 20220608 |    26 |
| 20220609 |     2 |
| 20220610 |     8 |
| 20220611 |    20 |
| 20220612 |    14 |
| 20220613 |    13 |
| 20220614 |    21 |
| 20220615 |    36 |
| 20220616 |     6 |
| 20220617 |     9 |
| 20220618 |    11 |
| 20220619 |    14 |
| 20220620 |    24 |
| 20220621 |    24 |
| 20220622 |     7 |
| 20220623 |     9 |
| 20220624 |     7 |
| 20220626 |     4 |
| 20220627 |     6 |
| 20220628 |     5 |
| 20220629 |     4 |
| 20220630 |     9 |
| 20220701 |    12 |
| 20220702 |    17 |
| 20220703 |     5 |
| 20220704 |    10 |
| 20220705 |    33 |
| 20220706 |     6 |
| 20220707 |    13 |
| 20220708 |    10 |
| 20220709 |    10 |
| 20220710 |    41 |
| 20220711 |    35 |
| 20220712 |     6 |
| 20220713 |   430 |
| 20220714 |   276 |
| 20220727 |     1 |
+----------+-------+
113 rows in set (0.056 sec)

Even when looking at the day of the edit, there is nothing after the edit ("2022-07-14T21:03:20Z")

select page_links_updated
  from categorylinks
  inner join page
          on cl_from = page_id
 where cl_to like 'Files\_from\_Kremlin.ru,\_%' ESCAPE '\\'
 and SUBSTR( page_links_updated, 1, 8) = '20220714'
  ORDER BY page_links_updated DESC
  limit 10;
+--------------------+
| page_links_updated |
+--------------------+
| 20220714202542     |
| 20220714083321     |
| 20220714083303     |
| 20220714083253     |
| 20220714083220     |
| 20220714083205     |
| 20220714083107     |
| 20220714083038     |
| 20220714083028     |
| 20220714074335     |
+--------------------+
10 rows in set (0.043 sec)

I am not sure if a refresh links update was queued here or if all jobs are deduplicated or in error. Maybe the jobs are missed in this special case due to being a undo/revert. Not sure if that is enough information to reopen or get a new task for looking at the code.

https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-job=refreshLinks

The backlog for that template is now processed after a fresh recursive link update has been started via api's action=purge

 select tl_title, rev_timestamp as lastEdit, p1.page_touched, COUNT(*) as total, SUM(p2.page_links_updated < rev_timestamp) as backlog
  from templatelinks
    inner join page p2 on tl_from = p2.page_id
	inner join page p1 on p1.page_namespace = tl_namespace and p1.page_title = tl_title
	inner join revision on p1.page_latest = rev_id
 where tl_namespace = 10 and tl_title = 'Kremlin.ru';

+------------+----------------+----------------+-------+---------+
| tl_title   | lastEdit       | page_touched   | total | backlog |
+------------+----------------+----------------+-------+---------+
| Kremlin.ru | 20220714210320 | 20220806121439 | 22516 |       0 |
+------------+----------------+----------------+-------+---------+
1 row in set (0.069 sec)