User Details
- User Since
- Jun 9 2022, 6:42 PM (200 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- XCollazo-WMF [ Global Accounts ]
Fri, Apr 10
I think we implemented this via T373693. @KCVelaga_WMF can you confirm?
Reopening and tagging Data-Engineering since we do still need to make the pipeline changes that @JAllemandou suggests:
Thu, Apr 9
Wed, Apr 8
@SNowick_WMF and @phuedx:
We investigated two angles: the top offenders by file count and size (from the ticket description), and all schemas previously removed from the sanitization allowlist (identified via git log). This is not an exhaustive scan of the full namespace — there may be additional datasets worth reviewing. Findings are grouped by suggested action.
Tue, Apr 7
Mon, Apr 6
Boldly closing this as out of date. Other tickets like T419055 have better context.
CC @BTullis
I concur that this looks like a bug.
Ok confirming this particular issue seems to be solved now. Note how overall inconsistencies are way down:
I confirm that the dump is being produced and exists on phab1004.
Fri, Mar 27
I am now of the opinion that we should just sunset these metrics:
Noting that this has only happened once so far.
- T420787 — MWCH Data Quality Dashboard: Summary
Wed, Mar 25
Note further that there seems to be other streams with ValidationErrors over last 30 days, but I did not dig deeper than the following query:
More serious EventBus error rate: T421257: EventBus: Unable to deliver all events: 503: Service Unavailable.
One (redacted) example from logstash:
Minor bug on EventGate found: T421237: `mediawiki.page_change.v1`: two schema validation errors causing events to be silently dropped by EventGate.
Tue, Mar 24
(We are now waiting on the next monthly reconcile to happen to see where we are at.)
Mon, Mar 23
The option of telling the few consumers of the rsync to change the hostname/IP they're using to mirror us (to separate it from HTTP via the CDN) would be far simpler and avoid the need for such a proxy.
Fri, Mar 20
- Make local development compatible with git worktrees
File export successful. All files now exposed publicly at https://dumps.wikimedia.org/other/mediawiki_content_history/enwiki/2026-03-01/xml/bzip2/.
Thu, Mar 19
spark_process_reconciliation_events ingest for 2026-03-16 failed multiple times. I suspect this happened due to the same cluster issues reported on T419291#11723404: T420168 and T415002.
For completeness, a link to the dashboard that was fixed: https://superset.wikimedia.org/superset/dashboard/409
Opened T420582: Migrate Airflow Search instance code away from deprecated VariableProperties to tackle that last bit of VariableProperties dependency.
All usage of VariableProperties from within Airflow DAGs have now been migrated to DagProperties.
Wed, Mar 18
My way of dealing with that would be to change the puppet code to using 32 mappers.
Failed again. There was cluster instability due to T420168 and T415002 so retrying as is.
Tue, Mar 17
Ran the following as a one off:
$ hostname -f an-launcher1003.eqiad.wmnet $ whoami analytics
This continues to be an issue:
Context: this dashboard was put together as part of T381707: Low available space on Hadoop / HDFS.
Mon, Mar 16
Being bold and reverting changes to mw-page-content-change-enrich to avoid inadvertently repeating T408918#11715866.