On T411803: Fix reconcile bug where user_id is not being populated correctly., we introduced reconciling for mismatch_user_id and mismatch_user_text.
Months down the line, we should have reconciled the vast majority of such issues. However looking at last 3 monthly reconciles we find that:
spark.sql("""
SELECT count(1), computation_dt
FROM wmf_content.inconsistent_rows_of_mediawiki_content_history_v1
WHERE computation_class = 'all-of-wiki-time'
GROUP BY computation_dt
ORDER BY computation_dt DESC
""").show()
[Stage 0:======================================================>(733 + 1) / 734]
+---------+-------------------+
| count(1)| computation_dt|
+---------+-------------------+
|179893699|2026-03-01 00:00:00|
|245030841|2026-02-01 00:00:00|
|361884464|2026-01-01 00:00:00| <<< start of changes from T411803
| 2882708|2025-12-01 00:00:00|
| 1134547|2025-11-01 00:00:00|
| 1087936|2025-10-01 00:00:00|
| 2326440|2025-09-01 00:00:00|
| 3072885|2025-08-01 00:00:00|
| 2873142|2025-07-01 00:00:00|
| 2116596|2025-06-01 00:00:00|
+---------+-------------------+Let's investigate why this is the case and wheter we are dropping reconcile events?