User Details
- User Since
- Nov 2 2020, 1:15 PM (181 w, 5 d)
- Availability
- Available
- IRC Nick
- gmodena
- LDAP User
- Gmodena
- MediaWiki User
- GModena (WMF) [ Global Accounts ]
Thu, Apr 25
The haproxy_id field has been added to messages.
Fri, Apr 19
Thu, Apr 18
About the sequence issue, that's the most plausible hypotheses. We could append (or prepend) other information pieces to the sequence number (like the haproxy process id) to avoid duplicates but we couldn't guarantee the monotonic increase (or the increase, even) in this case. I suggest using this current approach for the moment and eventually rework later.
Wed, Apr 17
Fri, Apr 5
Thu, Apr 4
Mar 27 2024
Mar 26 2024
Mar 25 2024
I need to add a wrapper to the Alert generation SerDe
Mar 22 2024
These are the fields that are sent from Benthos that aren't present in the current webrequest stream:
Mar 21 2024
Mar 19 2024
For context: this is the approach we follow with other producers, e.g. Java.
Mar 8 2024
Mar 7 2024
We can integrate our DQ framework with Python by piggy backing on pyspark 's py4j gateway. Following is a rudimentary example that produces
metrics with data_quality_metrics table format:
Mar 6 2024
Mar 5 2024
Mar 4 2024
@Ottomata @Jdforrester-WMF there's a caveat wrt using collectDefaultMetrics. The method call does not allow setting custom labels. If i understand the doc correctly, we can still define them at registry level. This would clash with the current implementation. I'm not super keen in refactoring current behaviour given the codebase status, so I'd lean towards avoiding custom labels if possible. Are they used at all?
I spent some time learning this code base, touching base to validate direction. If this makes sense, I'll open a PR. My proposal here would be to add a new collect_default option to the prometheus metrics option block
Feb 28 2024
FYI: this is a list of metrics reported with a local run:
I was on PTO last week and trying to piece together what happened and how the UBN was mitigated.
Update: We've got access back, and v4.0.0 is finally released. Still happy to break whatever we need to, and help people migrate.
Feb 27 2024
@Jdforrester-WMF FWIW I saw you started deprecation work in https://github.com/wikimedia/service-runner/pull/249/files.
I am taking a stab at this tasks, because we need gc and memory info to help track T357005: eventstreams regularly uses more than 95% of its memory limit.
Feb 26 2024
Feb 15 2024
Spoke a bit about this with @xcollazo.
Open question: do we want webrequest.frontent (or whatever we settle on) to be a versioned stream? https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning
the currently suggested one is webrequest.frontend. @gmodena, the idea there is to group all webrequest topics into the same stream, by setting topics manually in stream config. Gobblin will ingest the topics configured in stream config.
Data has been deleted from HDFS. It will be quarantined in hdfs://analytics-hadoop/user/hdfs/.Trash/Current/wmf/data/event_sanitized for a period longer than the on week grace time required by this task.
Feb 14 2024
May I proceed with deleting the tables from the Hive metastore for the impacted datasets?
Feb 13 2024
@Fabfur and I would like to start some integration tests in the short term. I moved the webrequest schema from GA to development in the primary repo. This follows the same process we adopted with page_change, and should allow for faster iteration speed without messing around with schema versions.
Feb 12 2024
TBD on final stream name in T314956: [Event Platform] Declare webrequest as an Event Platform stream, but the currently suggested one is webrequest.frontend
Feb 9 2024
Both approaches are feasible (also at the same time if we do accept to increase the payload a little)...
@Fabfur here is example payload with added meta, as we'd expect to receive according to the WIP webrequest event schema.
{ "meta": { dt: "2023-11-23T16:04:17Z", # value set by Benthos stream: "webrequest_text", # value set by Benthos domain: "en.wikipedia.org", # can we get this from HAProxy? request_id: request-uuid # can we get this from HAProxy? id: "event-uuid" # value set by Benthos? }, "accept": "application/json; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/Summary/1.2.0\"", "accept_language": "en", "backend": "ATS/9.1.4", "cache_status": "hit-front", "content_type": "application/json; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/Summary/1.5.0\"", "dt": "2023-11-23T16:04:17Z", # value recorded by HAProxy "hostname": "cp3067.esams.wmnet", "http_method": "GET", "http_status": "200", "ip": "<REDACTED>", "range": "-", "referer": "https://en.wikipedia.org/w/index.php?title=Category:Films_based_on_non-fiction_books&pagefrom=Power+Play+%281978+film%29%0APower+Play+%281978+film%29", "response_size": 987, "sequence": 10558502962, "time_firstbyte": 0.000201, "tls": "vers=TLSv1.3;keyx=UNKNOWN;auth=ECDSA;ciph=AES-256-GCM-SHA384;prot=h2;sess=new", "uri_host": "en.wikipedia.org", "uri_path": "/api/rest_v1/page/summary/Secretariat_(film)", "uri_query": "", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0", "x_analytics": "WMF-Last-Access=23-Nov-2023;WMF-Last-Access-Global=23-Nov-2023;include_pv=0;https=1;client_port=33126", "x_cache": "cp3067 miss, cp3067 hit/5" }
Feb 8 2024
@Fabfur nice!
tl;dr: our approach to address this spike is currently documented at https://wikitech.wikimedia.org/wiki/Data_Engineering/Data_Quality.
db and tables have been created:
spark-sql (default)> use wmf_data_ops; Response code Time taken: 2.698 seconds spark-sql (default)> show tables; database tableName isTemporary wmf_data_ops data_quality_alerts false wmf_data_ops data_quality_metrics false Time taken: 0.569 seconds, Fetched 2 row(s)
Feb 6 2024
Feb 5 2024
Feb 1 2024
On prod (an-launcher1002, job submitted with user analytics) missing databases are not created.
Investigating.
...
- On dev enviroments (stat1005, job submitted with user gmodena) missing databases are created.
- On prod (an-launcher1002, job submitted with user analytics) missing databases are not created.