Client-side error logging should use Elastic Common Schema (ECS) fields when possible
Open, LowPublic
Actions

Assigned To

None

Authored By

	colewhite
	Nov 9 2020, 8:23 PM

Description

We met with @Ottomata who brought to light the work on client side error logging. It was indicated that this task should be filed to serve as notice about upcoming changes to logstash that potentially affect this work.

As part of T234565, logstash will adopt Elastic Common Schema as the schema for log events. Client error logging should attempt to use ECS-defined fields when possible to stem the probability of dropped fields and ease the migration of the stream to the new schema once ratified.

Unfortunately, ECS cannot be adopted fully until the legacy logstash cluster is decommissioned due to mapping conflicts with the current mapping configuration. Looking at jsonschema/mediawiki/client/error/1.2.0.yaml, it looks like only the url field is affected.

Related Objects
Search...

Status	Assigned	Task
Open	None	T49145 Formally deprecate jQuery UI after we've stopped using jQuery UI in extensions and core
Open	None	T100270 Replace use of jQuery UI and MW UI with OOUI across all Wikimedia-deployed extensions and core
Open	None	T85394 Use OOUI suggestions/autocompletion components only (instead of jquery.suggestions, jquery.ui.autocomplete)
Open	None	T125725 [epic] Update autocomplete search box with metadata and remove and delete the old searchSuggest system
Open	None	T177251 Dead keys prevent autocomplete in search box
Resolved	ovasileva	T244392 [GOAL] Deploy the new Vue.js search experience
Resolved	ovasileva	T275200 Analyze results of A/B test for new search widget
Resolved	ovasileva	T249297 Deploy the new Vue.js search experience
Resolved	• jlinehan	T255585 [EPIC] Extend client-side error logging coverage to include English Wikipedia
Open	None	T281999 Metrics Platform Schema: Define & Model Event Level Fields
Open	None	T267602 Client-side error logging should use Elastic Common Schema (ECS) fields when possible

Event Timeline

colewhite created this task.Nov 9 2020, 8:23 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 9 2020, 8:23 PM

colewhite added a parent task: T255585: [EPIC] Extend client-side error logging coverage to include English Wikipedia.Nov 9 2020, 8:24 PM

colewhite updated the task description. (Show Details)

Ottomata added projects: Event-Platform, Product-Data-Infrastructure.Nov 9 2020, 9:13 PM

Ottomata added subscribers: • jlinehan, Jdlrobson.

Restricted Application added a project: Analytics. · View Herald TranscriptNov 9 2020, 9:13 PM

I think the http field might also be affected, and that one will be a bit trickier to reconcile.

Our Event Schema:
https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/client/error/1.1.0

ECS http:
https://doc.wikimedia.org/ecs/#ecs-http

Ottomata added a subscriber: • Mholloway.Nov 9 2020, 9:15 PM

• Mholloway renamed this task from Client-side error logging should use ECS fields when possible to Client-side error logging should use Elastic Common Schema (ECS) fields when possible.Nov 16 2020, 4:27 PM

• sdkim moved this task from Inbox to Task Backlog on the Product-Data-Infrastructure board.Nov 16 2020, 4:28 PM

• sdkim moved this task from Task Backlog to Watching on the Product-Data-Infrastructure board.

• fdans triaged this task as Medium priority.Nov 16 2020, 4:35 PM

• fdans moved this task from Incoming to Event Platform on the Analytics board.

I think the http field might also be affected, and that one will be a bit trickier to reconcile.

Just talked with @colewhite in IRC.

We'll either need to

A. set up a logstash filter to transform our http object into the ECS http object and run that forever
or
B. Alter out http object common schema to match ECS's.

A. is easy to do now, but requires maintenance and special casing.

B. is hard to do, and requires a lot of coordination. But we could do it slowly one schema at a time, and start with the ones we want to import into logstash. We'd make an fragment/http/2.0.0,...or maybe an fragment/ecs/http/1.0.0, and then include it in mediawiki/client/error. To do this we'd need to make eventgate-wikimedia aware of this new convention and set the fields appropriately. Ungh, and if we hoped to eventually migrate ALL existent schemas to ECS's http, the Hive tables would have both http subschema fields (e.g. http.request_headers and http.request.headers) probably forever (unless we manually intervened).

@jlinehan @Mholloway, thoughts?

I'm not sure what is best.

In T267602#6677656, @Ottomata wrote:

B. is hard to do, and requires a lot of coordination. But we could do it slowly one schema at a time, and start with the ones we want to import into logstash. We'd make an fragment/http/2.0.0,...or maybe an fragment/ecs/http/1.0.0, and then include it in mediawiki/client/error. To do this we'd need to make eventgate-wikimedia aware of this new convention and set the fields appropriately. Ungh, and if we hoped to eventually migrate ALL existent schemas to ECS's http, the Hive tables would have both http subschema fields (e.g. http.request_headers and http.request.headers) probably forever (unless we manually intervened).

What if we create an ECS-specific schema that has everything laid out exactly the way ECS would want it laid out? ECS from what I can tell is a one-schema-to-rule-them-all approach, so in *theory*, having one ECS schema would cover everything. We could then just have a client_error stream, which is using the ECS schema.

Are we planning to have a level of compatibility between events going into Logstash and into other back-ends?

Interesting idea! However, there are some Event Platform specifics that we'd need to handle, mainly meta.stream, meta.dt, $schema, http.client_ip (not in this schema) and http.request_headers['user-agent']. These are all touched by EventGate and/or the Hive ingestion pipeline.

We can't do much about $schema and meta.* fields, but we could potentially refactor all schemas to conform to ECS for http.* and also any other future conventions we might need to adopt.

Refactoring http.* would be a lot of work, but not toooooooo bad. We'd probably have to have EventGate and Hive ingestion support both formats for a very long time, and fill in e.g. both request_headers['user-agent'] and request.headers['user-agent'] if they exist. We already do something similar to handle the differences in legacy EventLogging schemas, I guess we can just keep tacking on more conditional logic. :/

• jlinehan added a project: Better Use Of Data.Feb 8 2021, 4:40 PM

ldelench_wmf added a parent task: T281999: Metrics Platform Schema: Define & Model Event Level Fields.May 10 2021, 3:54 PM

ldelench_wmf added a project: Metrics Platform Backlog (Metrics-Platform-MVP-Release-1).May 10 2021, 7:55 PM

@jlinehan @Ottomata see we haven't touched this ticket in December? Anything we need to action here or can we close out?

• DAbad moved this task from Metrics-Platform-MVP-Release-1 to Backlog on the Metrics Platform Backlog board.Jul 16 2021, 5:18 PM

• DAbad edited projects, added Metrics Platform Backlog; removed Metrics Platform Backlog (Metrics-Platform-MVP-Release-1).

I can't recall if we made any real decisions on what to do here. There is an issue with what ElasticSearch index is used for event streams that go there; we need to make sure that any given index doesn't have field naming conflicts. This means that we can't use the ElasticSearch index that is used for regular logs for the event platform streams; as the http field (and maybe others?) conflict. There was an idea of making a dedicated index for Event Platform streams, but I'm not sure if that was agreed upon.

I'd personally prefer not to change existent Event Platform schema fields to conform to ECS (BTW there is also a similar effort to rename some things driven by the Arch and Enterprise teams), but we should do our best to conform as much as possible for new fields, especially those in Metrics Platform's 'monoschema'.

Thinking back to the last discussion, I was under the impression it was better to update the eventstreams schema to closer fit ECS sooner rather than later. The risk, IIRC, was it would be more difficult to change later once adoption had ramped up.

Some change had been proposed but it hasn't seen much love for a while: https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/647025

Are we past the point where the schema can be amended? Is there external schema work ("monoschema"?) pushing for a specific way of organizing log data?

There are still things we can do on our end to help, but I am out of the loop if the plan has evolved since December.

Right, I think the work on that just stalled and never got done. @jlinehan?

That patch still will still have a conflict with the http field. That one is harder to resolve since it is used by a lot of other event schemas too. Am remembering now, this patch was to get the client error logging schema as in line as possible.

• jlinehan moved this task from Backlog to Discussed on the Metrics Platform Backlog board.Jul 26 2021, 4:37 PM

• DAbad assigned this task to • jlinehan.Aug 2 2021, 4:16 PM

Does T272238: Elasticsearch and Kibana are switching to non-OSI-approved SSPL licence affect whether we want to move forward with this?

ldelench_wmf lowered the priority of this task from Medium to Low.Aug 23 2021, 2:14 PM

In T267602#7294918, @Mholloway wrote:

Does T272238: Elasticsearch and Kibana are switching to non-OSI-approved SSPL licence affect whether we want to move forward with this?

At this time, the Observability team has no plans to abandon the ECS project.

• jlinehan moved this task from Discussed to To be discussed on the Metrics Platform Backlog board.Oct 4 2021, 2:23 PM

• DAbad moved this task from To be discussed to Done on the Metrics Platform Backlog board.Oct 6 2021, 2:12 PM

• jlinehan moved this task from Done to To be discussed on the Metrics Platform Backlog board.Oct 6 2021, 2:14 PM

Removing inactive assignee from this open task. (Please update assignees on open tasks after offboarding. Thanks.)

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJul 25 2022, 9:13 AM

• EChetty edited projects, added Data-Engineering-Planning; removed Data-Engineering.Sep 6 2022, 10:42 AM

• EChetty moved this task from Backlog to Radar on the Data-Engineering-Planning board.Nov 7 2022, 11:10 AM

phuedx moved this task from To be discussed to Backlog on the Metrics Platform Backlog board.Mar 16 2023, 1:35 PM

JArguello-WMF removed a project: Data-Engineering-Planning.Jun 29 2023, 10:03 PM

Restricted Application added a project: Data-Engineering. · View Herald TranscriptJun 29 2023, 10:03 PM

JArguello-WMF moved this task from Incoming (new tickets) to Event Platform Backlog on the Data-Engineering board.Jun 29 2023, 10:29 PM

JArguello-WMF added a project: Data Engineering and Event Platform Team.Jun 30 2023, 4:31 PM

JArguello-WMF moved this task from Data Eng Backlog to Event Platform Backlog on the Data Engineering and Event Platform Team board.Jun 30 2023, 4:38 PM

lbowmaker removed a project: Data Engineering and Event Platform Team.Nov 10 2023, 2:29 PM

Client-side error logging should use Elastic Common Schema (ECS) fields when possibleOpen, LowPublicActions

Description

Related ObjectsSearch...

Event Timeline

Client-side error logging should use Elastic Common Schema (ECS) fields when possible
Open, LowPublic
Actions

Related Objects
Search...