Page MenuHomePhabricator

Recentchanges and cu_changes tables are occasionally missing revisions on multiple wikis
Open, Unbreak Now!PublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Within 1 month of 2024-05-04....
  • Run SELECT * FROM revision WHERE rev_id = 1222189171
  • Exists
  • Run SELECT * FROM recentchanges WHERE rc_this_oldid = 1222189171

What happens?:

  • Zero results

What should have happened instead?:

  • 1 result

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&formatversion=2&rclimit=500&rctitle=MediaWiki%3AGadget-popups.js:

{
	"batchcomplete": true,
	"query": {
		"recentchanges": [
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222213787,
				"old_revid": 1222203194,
				"rcid": 1772542572,
				"timestamp": "2024-05-04T16:33:57Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222203194,
				"old_revid": 1222201928,
				"rcid": 1772521250,
				"timestamp": "2024-05-04T15:11:44Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222201928,
				"old_revid": 1222201694,
				"rcid": 1772518905,
				"timestamp": "2024-05-04T15:01:27Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222201694,
				"old_revid": 1222200569,
				"rcid": 1772518444,
				"timestamp": "2024-05-04T14:59:29Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222200569,
				"old_revid": 1222195654,
				"rcid": 1772516346,
				"timestamp": "2024-05-04T14:50:52Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222195654,
				"old_revid": 1222191607,
				"rcid": 1772507180,
				"timestamp": "2024-05-04T14:16:51Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222191607,
				"old_revid": 1222191358,
				"rcid": 1772499599,
				"timestamp": "2024-05-04T13:43:06Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222191358,
				"old_revid": 1222190966,
				"rcid": 1772499204,
				"timestamp": "2024-05-04T13:40:49Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222190444,
				"old_revid": 1222190276,
				"rcid": 1772498030,
				"timestamp": "2024-05-04T13:32:03Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222190276,
				"old_revid": 1222189654,
				"rcid": 1772497930,
				"timestamp": "2024-05-04T13:31:06Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222189347,
				"old_revid": 1222189171,
				"rcid": 1772497077,
				"timestamp": "2024-05-04T13:23:01Z"
			},
			{
				"type": "edit",
				"ns": 8,
				"title": "MediaWiki:Gadget-popups.js",
				"pageid": 14548523,
				"revid": 1222188881,
				"old_revid": 1211031259,
				"rcid": 1772496609,
				"timestamp": "2024-05-04T13:18:22Z"
			}
		]
	}
}

https://en.wikipedia.org/w/index.php?title=MediaWiki:Gadget-popups.js&action=history:

Notice the revisions 1222189171 at 13:21, 1222189654 at 13:26, and 1222190966 at 13:36 are missing from the recentchanges results.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mh, while looking into this and documenting possible/probable causes sounds like a good idea, one should keep in mind that Special:RecentChanges (or the api endpoint) are not a view of the Revision table of the last 30 days, but get their data from a separate table: https://www.mediawiki.org/wiki/Manual:Recentchanges_table
So the likely question is: what caused the disruption for the save of the page to complete, but the recentchanges table to miss a few rows?

Maybe an outcome of this task could be to document that on https://www.mediawiki.org/wiki/Manual:Recentchanges_table (and maybe other recentchanges-related pages, but I'm less sure which: https://www.mediawiki.org/wiki/Recent_changes).

After another similar report T372556: Article creation not recorded in RecentChanges database table, I tried to find all possibly missing recent changes entries to look for clues and how often it probably happens. I did it on Wikidata as it's the most heavily edited wiki.

Query:

SELECT rev_timestamp, rev_id, rev_len, actor_name, page_namespace AS ns, page_title
FROM revision
STRAIGHT JOIN page ON page_id = rev_page
JOIN actor_revision ON actor_id = rev_actor
LEFT JOIN change_tag ON ct_rev_id = rev_id AND ct_tag_id IN (SELECT ctd_id FROM change_tag_def WHERE ctd_name = 'mw-new-redirect')
LEFT JOIN recentchanges ON rc_this_oldid = rev_id AND rc_type NOT IN (5, 6) /* RC_EXTERNAL, RC_CATEGORIZE */
WHERE rc_id IS NULL
AND ct_tag_id IS NULL /* leftover redirects don't get own RC entry */
AND rev_timestamp > '202408'
AND page_namespace <> 2600 /* exclude Flow */
AND NOT EXISTS (
  /* restored revisions are not re-inserted to RC */
  SELECT 1 FROM logging_logindex
  WHERE log_type = 'delete' AND log_action = 'restore'
  AND log_namespace = page_namespace AND log_title = page_title
  AND log_timestamp > rev_timestamp
)
ORDER BY rev_timestamp DESC;

This query found around 260 potentially missing rows in August (so far). Of 13M rows in August so far, this makes ~0.002% missing rate (1 in 50,000).

On further examination, I found the results are somewhat dependent.
Omissions are often concentrated in a short time period in which only one or a few users are involved:

+----------------+------------+---------+----------------+----+------------------------------------------------+
| rev_timestamp  | rev_id     | rev_len | actor_name     | ns | page_title                                     |
+----------------+------------+---------+----------------+----+------------------------------------------------+

[skipped some rows]

| 20240813185314 | 2226732447 |   95129 | 白布飘扬       |  0 | Q87533546                                      |
[50 minutes]
| 20240813180402 | 2226710252 |    1799 | Salino01       |  0 | Q85409163                                      |
| 20240813180401 | 2226710248 |   28774 | Rorschach      |  0 | Q1656514                                       |
| 20240813180348 | 2226710214 |   10066 | APTEM          |  0 | Q4131516                                       |
| 20240813180334 | 2226710188 |   26684 | Rorschach      |  0 | Q16188999                                      |
| 20240813180308 | 2226710126 |   31918 | Rorschach      |  0 | Q16010080                                      |
| 20240813180256 | 2226710097 |   29204 | Rorschach      |  0 | Q15480197                                      |
| 20240813180240 | 2226710053 |   26362 | Rorschach      |  0 | Q15427379                                      |
| 20240813180232 | 2226710034 |   19589 | Rorschach      |  0 | Q1541966                                       |
| 20240813180225 | 2226710013 |   23382 | Rorschach      |  0 | Q15381100                                      |
| 20240813180216 | 2226709971 |   31088 | Rorschach      |  0 | Q15133089                                      |
| 20240813180215 | 2226709969 |    9828 | Charp238       |  0 | Q17299700                                      |
| 20240813180211 | 2226709924 |   28115 | Rorschach      |  0 | Q15087787                                      |
| 20240813180200 | 2226709832 |   25712 | Rorschach      |  0 | Q1501373                                       |
| 20240813180158 | 2226709814 |    2681 | Beavercount    |  0 | Q126905503                                     |
| 20240813180149 | 2226709731 |   18871 | Rorschach      |  0 | Q14951435                                      |
| 20240813180142 | 2226709681 |    7184 | Charp238       |  0 | Q28105759                                      |
| 20240813180136 | 2226709618 |   24419 | Rorschach      |  0 | Q14933734                                      |
| 20240813180133 | 2226709588 |   24396 | Rorschach      |  0 | Q14861560                                      |
| 20240813180124 | 2226709509 |   32251 | Rorschach      |  0 | Q11866758                                      |
| 20240813180122 | 2226709488 |    3099 | Lahsim Niasoh  |  0 | Q128605504                                     |
| 20240813180122 | 2226709481 |   47182 | Rorschach      |  0 | Q1174759                                       |
| 20240813180111 | 2226709375 |   11205 | Rorschach      |  0 | Q115031609                                     |
| 20240813180057 | 2226709226 |    8290 | Rorschach      |  0 | Q111694350                                     |
| 20240813180055 | 2226709204 |    2678 | Lahsim Niasoh  |  0 | Q128605504                                     |
| 20240813180051 | 2226709160 |   15699 | Rorschach      |  0 | Q111472806                                     |
| 20240813180047 | 2226709113 |    9530 | Rorschach      |  0 | Q107406126                                     |
| 20240813180043 | 2226709077 |   11129 | Rorschach      |  0 | Q107042458                                     |
| 20240813180040 | 2226709036 |   61747 | Nikolay Omonov |  0 | Q657866                                        |
| 20240813180035 | 2226708999 |   20214 | Rorschach      |  0 | Q106231436                                     |
| 20240813180021 | 2226708908 |    4695 | Charp238       |  0 | Q56054989                                      |
| 20240813180015 | 2226708860 |   13639 | Rorschach      |  0 | Q102259740                                     |
| 20240813180005 | 2226708782 |    2240 | Lahsim Niasoh  |  0 | Q128605504                                     |
| 20240813180001 | 2226708753 |    8522 | AlbertRA       |  0 | Q123167118                                     |
[4 hours]
| 20240813133000 | 2226552340 | 2083948 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |

[skipped some rows]

| 20240801144340 | 2218147760 |    9769 | DaxServer      |  0 | Q128234266                                     |
[1 hour]
| 20240801134400 | 2218107347 |   10426 | DaxServer      |  0 | Q128230709                                     |
| 20240801134358 | 2218107329 |   20828 | DaxServer      |  0 | Q128230707                                     |
| 20240801134355 | 2218107297 |    9241 | DaxServer      |  0 | Q128230706                                     |
| 20240801134354 | 2218107285 |   12293 | DaxServer      |  0 | Q128230705                                     |
| 20240801134331 | 2218107106 |   20845 | DaxServer      |  0 | Q128230691                                     |
| 20240801134325 | 2218107032 |    9215 | DaxServer      |  0 | Q128230687                                     |
| 20240801134307 | 2218106926 |    7917 | DaxServer      |  0 | Q128230681                                     |
| 20240801134304 | 2218106891 |   20720 | DaxServer      |  0 | Q128230679                                     |
| 20240801134233 | 2218106641 |   15427 | DaxServer      |  0 | Q128230660                                     |
| 20240801134233 | 2218106630 |    9289 | DaxServer      |  0 | Q128230659                                     |
| 20240801134220 | 2218106523 |   11148 | DaxServer      |  0 | Q128230647                                     |
| 20240801134203 | 2218106389 |   11433 | DaxServer      |  0 | Q128230635                                     |
| 20240801134201 | 2218106377 |   12138 | DaxServer      |  0 | Q128230633                                     |
| 20240801134151 | 2218106263 |   50641 | ԱշբոտՏՆՂ       |  0 | Q44560503                                      |
| 20240801134033 | 2218105215 |    9264 | DaxServer      |  0 | Q128230541                                     |
[beginning of August]

These two "streaks" of missing rows lasted less than five minutes.

Another clue could be recurring misses of KrBot2's edits:

| 20240812152154 | 2225820074 |   11486 | DaxServer      |  0 | Q128920297                                     |
| 20240812132830 | 2225762637 | 2083911 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240812091427 | 2225619651 |  582812 | KrBot2         |  4 | Database_reports/Constraint_violations/P1454   |
| 20240812091355 | 2225619112 |   12429 | MatSuBot       |  0 | Q18390189                                      |
| 20240811125916 | 2224923984 | 2083907 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240810113741 | 2224358566 | 2083901 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240809124133 | 2223776710 | 2082868 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240808140657 | 2223318053 |   11022 | DaxServer      |  0 | Q128777309                                     |
| 20240808130754 | 2223290743 | 2082842 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240808130410 | 2223289240 | 1424439 | KrBot2         |  4 | Database_reports/Constraint_violations/P17     |
| 20240808124532 | 2223280098 |    8936 | DaxServer      |  0 | Q128772068                                     |
| 20240808121008 | 2223265130 |   10284 | DaxServer      |  0 | Q128769804                                     |
| 20240808103856 | 2223223738 |   10331 | DaxServer      |  0 | Q128763685                                     |
| 20240808103614 | 2223222925 |  724193 | KrBot2         |  4 | Database_reports/Constraint_violations/P735    |
| 20240808103359 | 2223221740 |   11202 | DaxServer      |  0 | Q128763377                                     |
| 20240808095223 | 2223205980 |  191144 | KrBot2         |  4 | Database_reports/Constraint_violations/P1329   |
| 20240808095158 | 2223205804 |   20337 | KrBot2         |  4 | Database_reports/Constraint_violations/P1336   |
| 20240808095050 | 2223205344 |   11330 | DaxServer      |  0 | Q128760525                                     |
| 20240808093037 | 2223198277 |   17713 | Mr.Ibrahembot  |  0 | Q128759215                                     |
| 20240808092917 | 2223197945 |   57540 | KrBot2         |  4 | Database_reports/Constraint_violations/P1547   |
| 20240808071031 | 2223144177 |   51584 | KrBot2         |  4 | Database_reports/Constraint_violations/P5407   |
| 20240807153433 | 2222701694 | 2082855 | KrBot2         |  4 | Database_reports/Constraint_violations/Summary |
| 20240807152710 | 2222699151 | 1423804 | KrBot2         |  4 | Database_reports/Constraint_violations/P17     |
| 20240807114548 | 2222599713 |  497070 | KrBot2         |  4 | Database_reports/Constraint_violations/P1308   |
| 20240807114513 | 2222599425 |  218601 | KrBot2         |  4 | Database_reports/Constraint_violations/P1315   |

The bot updates various report pages, some of which are very large (Wikidata:Database_reports/Constraint_violations/Summary has almost reached the limit of 2MB) and known to time out when you attempt to load or edit them (T357792). Also, it is suspicious that some other rows are skipped when the bot is active.

Maybe the bot's activity puts too much strain on the servers, resulting in interrupts or timeouts? Maybe this is generally caused by some "timeouts" on the server side?

Unrelated: The 2MB limit for page size doesn't mean it's okay to 1.9MB pages. That's the hard limit but it's much much better to keep the page size as low as possible (fragmented parsing will help eventually but we are not there and even if we were, it helps in one aspect not all). Basically anything larger than 0.5MB is much better to be split unless you really have to keep it large.

What's interesting, occurences of missing revision windows seem to be correlated between wikis. Based on the @matej_suchanek query, I've checked bigger wikis from s2 (plwiki, itwiki, nlwiki, svwiki) and it turns out that often (especially during the severe windows) when there are missing RC entries on one wiki, there are also missing ones on another wikis within a few minutes.

My results are available here: https://docs.google.com/spreadsheets/d/1_ZCv6ARiN061ywru-NedJo3w_EU_ZyCA-JMTVPdRmi0/edit?usp=sharing (alternating colors mark particular windows of missing RC entries).

I've also browsed through enwiki data and it suggests that the times may align as well but due to the number of rows for enwiki (1.3k), I didn't include them in the spreadsheet nor run a full alignment.

The query I used is available here: https://quarry.wmcloud.org/query/85618 (it's basically the same as Matej's but with exclusion of some system accounts that may be legitimately not included in the RCs – or at least they are out of scope for this task).

I re-run the queries (note: it should also exclude imported revisions), and this still seems to be an issue: enwiki - one in 4,000 edits, wikidata - one in 140,000 edits. On Wikidata, edits to large reports by KrBot2 and ListeriaBot are the majority of missing recent changes.

There is also a probably duplicate report T373805, with one particular remark:

I clicked on "Publish" and I got an error "HTTP server returned 502".

T279018 describes likely the same problem in 2021.

I was wondering if someone was interested in making a dashboard in the new Superset...

STran subscribed.

T407651: IPs can't be revealed for some temporary accounts on itwiki may be another report? However the revision also isn't available to cu_changes which is a larger scope than this ticket mentions.

T407651: IPs can't be revealed for some temporary accounts on itwiki may be another report? However the revision also isn't available to cu_changes which is a larger scope than this ticket mentions.

Just to report that this happened again with https://it.wikipedia.org/wiki/Speciale:Contributi/~2025-36382-71.

https://de.wikipedia.org/wiki/Spezial:Beiträge/~2025-42582-52 might be related – IP is unavailable for the second revision (and therefore no IP info as well)

Note the header saying "A user with 1 edit. Account created on 2025-12-23." even though there are clearly two edits. The IP info tool correctly shows 2 local edits but a global edit count of 1.

Screenshot 2025-12-28 at 13.27.54.png (1×1 px, 377 KB)

AntiCompositeNumber triaged this task as Unbreak Now! priority.Dec 29 2025, 6:56 PM

Production data loss -> UBN.

Novem_Linguae renamed this task from English Wikipedia recentchanges table missing some revisions to Recentchanges table is occasionally missing revisions on multiple wikis.Dec 29 2025, 6:58 PM
AntiCompositeNumber renamed this task from Recentchanges table is occasionally missing revisions on multiple wikis to Recentchanges and cu_changes tables are occasionally missing revisions on multiple wikis.Dec 29 2025, 6:59 PM

Something has gone really bad. We have been missing a lot of edits it seems:

mysql:research@dbstore1008.eqiad.wmnet [enwiki]> select count(*) from revision left join recentchanges on rev_id = rc_this_oldid and rc_source = 'mw.edit' where rev_timestamp like '202512%' and rc_id is null;
+----------+
| count(*) |
+----------+
|   297489 |
+----------+
1 row in set (57.068 sec)

mysql:research@dbstore1008.eqiad.wmnet [enwiki]> select count(*) from revision left join recentchanges on rev_id = rc_this_oldid and rc_source = 'mw.edit' where rev_timestamp like '202512%';
+----------+
| count(*) |
+----------+
|  5193756 |
+----------+
1 row in set (22.973 sec)

Checking one of the ids, I get a warning in ParserCache for inconsistent ID: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.12.01?id=MnwE2poBVE0pYbVvgfJu can't say what does it even mean.

I'm not sure if it's related to the issue. I've checked enwiki revisions 1330322336, 1330322337, 1330322339, 1330322340, which generated the Inconsistent revision ID warning but they are present in the recentchanges and cu_changes tables. (nb. ...38 didn't raise this warning – suspisious, isn't it? ;p)

A couple more of the rev ids that are not in RC table:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.12.25?id=WJrOUpsBlLUf3R62hUIr
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.12.25?id=uWHOUpsBD5ntDmnmrW3k
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.12.25?id=4v3PUpsB9tkJMVA5t3dv

Basically no exception all are triggering inconsistent id in PC. That can be symptom though. Maybe PC reads it from somewhere and since it's missing, we are not seeing it.

Here is a list of couple of ids from 25 December:

+------------+
| rev_id     |
+------------+
| 1329280169 |
| 1329280188 |
| 1329280199 |
| 1329280220 |
| 1329280221 |
| 1329280229 |
| 1329280241 |
| 1329280250 |
| 1329280258 |
| 1329280272 |
| 1329280286 |
| 1329280289 |
| 1329280296 |
| 1329280301 |
| 1329280307 |
| 1329280322 |
| 1329280328 |
| 1329280338 |
| 1329280350 |
| 1329280355 |
| 1329280360 |
| 1329280364 |
| 1329280372 |
| 1329280382 |
| 1329280385 |
| 1329280387 |
| 1329280388 |
| 1329280394 |
| 1329280398 |
| 1329280405 |
| 1329280412 |
| 1329280413 |
| 1329280417 |
| 1329280418 |
| 1329280419 |
| 1329280425 |
| 1329280435 |
| 1329280442 |
| 1329280453 |
| 1329280459 |
| 1329280472 |
| 1329280479 |
| 1329280485 |
| 1329280488 |
| 1329280495 |
| 1329280497 |
| 1329280503 |
| 1329280504 |
| 1329280509 |
| 1329280518 |
+------------+
[enwiki]> select count(*) from revision left join recentchanges on rev_id = rc_this_oldid and rc_source = 'mw.edit' where rev_timestamp like '202512%' and rc_id is null;
+----------+
| count(*) |
+----------+
|   297489 |
+----------+

[enwiki]> select count(*) from revision left join recentchanges on rev_id = rc_this_oldid and rc_source = 'mw.edit' where rev_timestamp like '202512%';
+----------+
| count(*) |
+----------+
|  5193756 |
+----------+

This implies a loss of 5%. Looking at nlwiki for comparison, I see a similar gap of 5%:

(nlwiki)> SELECT COUNT(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' WHERE rev_timestamp LIKE '202512%';
+----------+
| COUNT(*) |
+----------+
|   171965 |
+----------+

(nlwiki)> SELECT COUNT(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' WHERE rev_timestamp LIKE '202512%' AND rc_id IS NULL;

+----------+
| COUNT(*) |
+----------+
|     9934 |
+----------+

Assuming that this is not a completely random and evenly distributed timeout or other race condition, there are probably some common factors.

Looking at date: https://quarry.wmcloud.org/query/100476, it looks evently distributed across each day from December 1 to December 30.

Looking at namespace: https://phabricator.wikimedia.org/P86760, focussing on those with more than 100 failures per day, we're starting to see some hot spots that might point at a cause.

  • Main namespace: 5% loss
  • Talk: 20% loss
  • User: 4% loss
  • User_talk: 25% loss
  • Project: 2% loss
  • Template: 10% loss
  • Category: 42% loss

Let's look at another wiki, to see if the distribution is similar there. The above is on nlwiki, let's look at enwiktionary. https://phabricator.wikimedia.org/P86761

  • Main: 8% loss
  • Talk: 42% loss
  • User: 13% loss
  • Template: 8% loss
  • Category: 99% loss

So, what's special about Talk, User_talk, and Category? I suppose they're more heavily edited by logged-in users and/or bots.

Looking at user type (join via actor), I see some bias but nothing obvious. But this inspired me to look at the kind of edit (page creation, vs regular edit).

Looking at rev_parent_id (https://www.mediawiki.org/wiki/Manual:Revision_table):

enwiktionary
SELECT count(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%';
-- Result: 639,237

SELECT count(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%' AND rc_id IS NULL;
-- Result: 69,971 = 10.9%

SELECT count(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%' AND rev_parent_id=0;
-- Result: 64,735

SELECT count(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND rc_source = 'mw.edit' LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%' AND rev_parent_id=0 AND rc_id IS NULL;
-- Result; 64,736 = ~100%

A 100% correlation? Depending on when you run the queries, it may be off by a few in either direction since pages are being created all the time and these run without a transaction (no repeatable-read snapshot).

Anyway, this was all a distraction because page creations are recorded with rc_source = 'mw.new' instead of rc_source = 'mw.edit', so much of the apparent loss was just doing the recentchanges join incorrectly and not seeing the revisions that are in fact there. Including mw.new:

enwiki
SELECT COUNT(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND (rc_source = 'mw.edit' OR rc_source = 'mw.new') LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%';
-- Result: 5,295,851

SELECT COUNT(*) FROM revision LEFT JOIN recentchanges ON rev_id = rc_this_oldid AND (rc_source = 'mw.edit' OR rc_source = 'mw.new') LEFT JOIN page ON page_id = rev_page WHERE rev_timestamp LIKE '202512%' AND rc_id IS NULL;
-- Result: 152,168 = 2.8%

So this is not a 5% problem, but a 2.8% problem. Still huge. Let's look at namespaces, again. https://phabricator.wikimedia.org/P86763

  • nlwiki: Main 1% gap, Talk 12% gap, User 1% gap, User talk 1% gap, Project 0.6% gap, Template 4% gap, Category 8% gap
  • enwiktionary: Main 0.5% gap, User 2% gap, User_talk 3% gap, Template 4% gap, Category 6% gap

This open task is marked as UBN! priority without updates for three weeks. Should its priority be lowered? If not, who is supposed to tackle this? Thanks.