Page MenuHomePhabricator
Feed Advanced Search

Wed, Mar 3

jlinehan renamed T275070: [Metrics Platform] Finalize Metrics Platform API across platforms from Finalize client library API across platforms to [Metrics Platform] Finalize client library API across platforms.
Wed, Mar 3, 7:00 PM · Better Use Of Data
jlinehan added a subtask for T276378: Release Metrics Platform v1: T275070: [Metrics Platform] Finalize Metrics Platform API across platforms.
Wed, Mar 3, 7:00 PM · Epic, Better Use Of Data
jlinehan added a parent task for T275070: [Metrics Platform] Finalize Metrics Platform API across platforms: T276378: Release Metrics Platform v1.
Wed, Mar 3, 7:00 PM · Better Use Of Data
jlinehan claimed T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:58 PM · Epic, Better Use Of Data
jlinehan added a subtask for T276379: [Metrics Platform] Create schema fragment for standard fields: T275420: [Metrics Platform] Define first set of standard fields for metrics platform.
Wed, Mar 3, 6:57 PM · Better Use Of Data
jlinehan edited parent tasks for T275420: [Metrics Platform] Define first set of standard fields for metrics platform, added: T276379: [Metrics Platform] Create schema fragment for standard fields; removed: T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:57 PM · Product-Analytics (Kanban), Better Use Of Data
jlinehan removed a subtask for T276378: Release Metrics Platform v1: T275420: [Metrics Platform] Define first set of standard fields for metrics platform.
Wed, Mar 3, 6:57 PM · Epic, Better Use Of Data
jlinehan added a subtask for T276378: Release Metrics Platform v1: T276379: [Metrics Platform] Create schema fragment for standard fields.
Wed, Mar 3, 6:57 PM · Epic, Better Use Of Data
jlinehan added a parent task for T276379: [Metrics Platform] Create schema fragment for standard fields: T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:57 PM · Better Use Of Data
jlinehan created T276379: [Metrics Platform] Create schema fragment for standard fields.
Wed, Mar 3, 6:57 PM · Better Use Of Data
jlinehan added a parent task for T275420: [Metrics Platform] Define first set of standard fields for metrics platform: T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:55 PM · Product-Analytics (Kanban), Better Use Of Data
jlinehan added a subtask for T276378: Release Metrics Platform v1: T275420: [Metrics Platform] Define first set of standard fields for metrics platform.
Wed, Mar 3, 6:55 PM · Epic, Better Use Of Data
jlinehan renamed T275420: [Metrics Platform] Define first set of standard fields for metrics platform from Requirements gathering for standard fields in event libraries to [Metrics Platform] Define first set of standard fields for metrics platform.
Wed, Mar 3, 6:55 PM · Product-Analytics (Kanban), Better Use Of Data
jlinehan added a parent task for T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release: T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:54 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan added a subtask for T276378: Release Metrics Platform v1: T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release.
Wed, Mar 3, 6:54 PM · Epic, Better Use Of Data
jlinehan renamed T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release from Define event stream configuration syntax to [Metrics Platform] Define stream configuration syntax relevant to v1 release.
Wed, Mar 3, 6:54 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan renamed T269774: [MEP] Determine how stream configuration is authored and deployed from [Metrics Platform] Specify stream configuration syntax relevant to Metrics Platform to [MEP] Determine how stream configuration is authored and deployed.
Wed, Mar 3, 6:53 PM · Better Use Of Data, Analytics, Product-Analytics, Event-Platform, Product-Data-Infrastructure
jlinehan renamed T269774: [MEP] Determine how stream configuration is authored and deployed from MEP: Should stream configurations be written in YAML? to [Metrics Platform] Specify stream configuration syntax relevant to Metrics Platform.
Wed, Mar 3, 6:52 PM · Better Use Of Data, Analytics, Product-Analytics, Event-Platform, Product-Data-Infrastructure
jlinehan added a subtask for T276378: Release Metrics Platform v1: T269774: [MEP] Determine how stream configuration is authored and deployed.
Wed, Mar 3, 6:51 PM · Epic, Better Use Of Data
jlinehan added a parent task for T269774: [MEP] Determine how stream configuration is authored and deployed: T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:51 PM · Better Use Of Data, Analytics, Product-Analytics, Event-Platform, Product-Data-Infrastructure
jlinehan created T276378: Release Metrics Platform v1.
Wed, Mar 3, 6:50 PM · Epic, Better Use Of Data

Wed, Feb 24

jlinehan updated the task description for T275070: [Metrics Platform] Finalize Metrics Platform API across platforms.
Wed, Feb 24, 6:42 PM · Better Use Of Data
jlinehan updated the task description for T275070: [Metrics Platform] Finalize Metrics Platform API across platforms.
Wed, Feb 24, 5:58 PM · Better Use Of Data
jlinehan moved T275070: [Metrics Platform] Finalize Metrics Platform API across platforms from To Do to Doing on the Better Use Of Data board.
Wed, Feb 24, 5:23 PM · Better Use Of Data

Wed, Feb 17

jlinehan added a comment to T267408: [MEP Client Library] Write User-facing Documentation.

Marked as invalid, using a different task for this

Wed, Feb 17, 8:53 PM · Documentation, Product-Data-Infrastructure, Better Use Of Data
jlinehan closed T267408: [MEP Client Library] Write User-facing Documentation as Invalid.
Wed, Feb 17, 8:52 PM · Documentation, Product-Data-Infrastructure, Better Use Of Data
jlinehan moved T275070: [Metrics Platform] Finalize Metrics Platform API across platforms from Inbox to To Do on the Better Use Of Data board.
Wed, Feb 17, 7:37 PM · Better Use Of Data
jlinehan created T275070: [Metrics Platform] Finalize Metrics Platform API across platforms.
Wed, Feb 17, 7:36 PM · Better Use Of Data
jlinehan moved T274172: [Session Length] Complete sessionTick deployment to all wikis from Inbox to To Do on the Better Use Of Data board.
Wed, Feb 17, 7:02 PM · Better Use Of Data
jlinehan moved T274264: [Session Length] Evaluate session length distribution for newer vs. older browsers and remove the supportsPassive check from Doing to Sign-off on the Better Use Of Data board.
Wed, Feb 17, 7:02 PM · MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), Product-Data-Infrastructure, Better Use Of Data

Wed, Feb 10

jlinehan moved T274264: [Session Length] Evaluate session length distribution for newer vs. older browsers and remove the supportsPassive check from Inbox to Doing on the Better Use Of Data board.
Wed, Feb 10, 7:24 PM · MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), Product-Data-Infrastructure, Better Use Of Data
jlinehan moved T271456: Enable 'skin' dimension using stream configuration from Doing to Sign-off on the Better Use Of Data board.
Wed, Feb 10, 7:06 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan moved T267218: MediaWiki Session ID should persist according to user inactivity from QA/Review to Sign-off on the Better Use Of Data board.
Wed, Feb 10, 7:04 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Better Use Of Data, MediaWiki-User-management, Product-Data-Infrastructure

Mon, Feb 8

jlinehan closed T240460: Clients need to generate an ISO 8601 formatted timestamp, a subtask of T240462: Review and evolve client environment around EventLogging, as Resolved.
Mon, Feb 8, 7:36 PM · Epic, Better Use Of Data
jlinehan closed T240460: Clients need to generate an ISO 8601 formatted timestamp as Resolved.
Mon, Feb 8, 7:36 PM · MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), Analytics, Event-Platform, MW-1.35-notes (1.35.0-wmf.37; 2020-06-16), Patch-For-Review, Better Use Of Data
jlinehan triaged T274175: [Client libraries] Demonstrate new capabilities as Medium priority.
Mon, Feb 8, 7:28 PM · Patch-For-Review, Better Use Of Data
jlinehan moved T274175: [Client libraries] Demonstrate new capabilities from Inbox to Doing on the Better Use Of Data board.
Mon, Feb 8, 7:28 PM · Patch-For-Review, Better Use Of Data
jlinehan moved T259734: BUOD-KR1-Q4+: Certify that analytics schema and instruments have been upgraded to use the MEP system (clearing the legacy system for sunsetting) from Inbox to Epics on the Better Use Of Data board.
Mon, Feb 8, 7:10 PM · Better Use Of Data, Product-Data-Infrastructure, Goal
jlinehan moved T259157: BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system from Inbox to Epics on the Better Use Of Data board.
Mon, Feb 8, 7:09 PM · Goal, Product-Data-Infrastructure, Analytics, Better Use Of Data, Event-Platform
jlinehan moved T259704: BUOD-KR1-Q2: Upgrade MEP clients to full release status from Inbox to Epics on the Better Use Of Data board.
Mon, Feb 8, 7:09 PM · Goal, Better Use Of Data, Product-Data-Infrastructure
jlinehan added a comment to T263505: Create logging instrumentation for Wikitext editor not affected by ad blockers.

If we are interested in tracking completed edits, why not place the instrument at the server? Then the client's disposition re: Javascript doesn't mean anything. Is this problem to accurately measure the number of editors with JavaScript disabled? Or to flag them somehow? Or are their edits not being counted at all currently?

Mon, Feb 8, 6:59 PM · Better Use Of Data, Product-Analytics, WikiEditor
jlinehan added a project to T263505: Create logging instrumentation for Wikitext editor not affected by ad blockers: Better Use Of Data.
Mon, Feb 8, 6:57 PM · Better Use Of Data, Product-Analytics, WikiEditor
jlinehan created T274175: [Client libraries] Demonstrate new capabilities.
Mon, Feb 8, 6:56 PM · Patch-For-Review, Better Use Of Data
jlinehan created T274172: [Session Length] Complete sessionTick deployment to all wikis.
Mon, Feb 8, 6:49 PM · Better Use Of Data
jlinehan moved T256169: Stream cc map should not be generated on every pageload from Inbox to Sign-off on the Better Use Of Data board.
Mon, Feb 8, 6:41 PM · Patch-For-Review, Event-Platform, Analytics, Better Use Of Data
jlinehan moved T263875: Develop a new schema for MediaSearch analytics or adapt an existing one from Inbox to Sign-off on the Better Use Of Data board.
Mon, Feb 8, 6:40 PM · Better Use Of Data, Product-Data-Infrastructure, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering
jlinehan added a project to T259163: Migrate legacy metawiki schemas to Event Platform: Better Use Of Data.
Mon, Feb 8, 4:41 PM · Better Use Of Data, Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Patch-For-Review, Product-Data-Infrastructure, Analytics-Kanban, Analytics, Analytics-EventLogging, Event-Platform
jlinehan added a project to T258692: Monitoring/Alerting for Wikipedia mobile app errors: Better Use Of Data.
Mon, Feb 8, 4:41 PM · Better Use Of Data, Product-Data-Infrastructure, Sustainability (Incident Followup), Wikifeeds, Mobile, Mobile-Content-Service
jlinehan added a project to T249164: RFC: Better interface for generating metrics in MediaWiki: Better Use Of Data.
Mon, Feb 8, 4:41 PM · Better Use Of Data, Product-Data-Infrastructure, TechCom-RFC
jlinehan added a project to T263466: EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set header values in event data: Better Use Of Data.
Mon, Feb 8, 4:41 PM · Better Use Of Data, Patch-For-Review, Product-Data-Infrastructure, Event-Platform, Analytics
jlinehan added a project to T263049: Research and consider network connections made due to Event Platform: Better Use Of Data.
Mon, Feb 8, 4:41 PM · Better Use Of Data, Performance-Team (Radar), Product-Data-Infrastructure, Analytics, Event-Platform
jlinehan added a project to T263041: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId : Better Use Of Data.
Mon, Feb 8, 4:40 PM · MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), Better Use Of Data, Analytics-Radar, Product-Data-Infrastructure, Event-Platform, JavaScript, Analytics-EventLogging, Wikimedia-production-error
jlinehan added a project to T267602: Client-side error logging should use Elastic Common Schema (ECS) fields when possible: Better Use Of Data.
Mon, Feb 8, 4:40 PM · Better Use Of Data, Analytics, Product-Data-Infrastructure, Event-Platform
jlinehan added a project to T268027: Automate EventGate validation error reporting: Better Use Of Data.
Mon, Feb 8, 4:40 PM · Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
jlinehan added a project to T267215: Changes to Session ID interface for MediaWiki event instrumentation: Better Use Of Data.
Mon, Feb 8, 4:40 PM · Better Use Of Data, Performance-Team (Radar), MediaWiki-User-management, Product-Data-Infrastructure
jlinehan added a project to T259712: Allow disabling/enabling configured streams via wgEventStreams config: Better Use Of Data.
Mon, Feb 8, 4:40 PM · Better Use Of Data, Analytics, Product-Data-Infrastructure, Platform Team Initiatives (Modern Event Platform (TEC2)), Event-Platform
jlinehan added a project to T263503: Document how ad blockers / tracking blockers interact with EventLogging: Better Use Of Data.
Mon, Feb 8, 4:39 PM · Better Use Of Data, Product-Data-Infrastructure, Analytics, Product-Analytics, Analytics-EventLogging, Documentation
jlinehan added a project to T263452: Codify a standard deployment ramp for new instrumentation: Better Use Of Data.
Mon, Feb 8, 4:39 PM · Better Use Of Data, Product-Data-Infrastructure
jlinehan added a project to T267217: MediaWiki Session ID should have per-subdomain and cross-subdomain variants: Better Use Of Data.
Mon, Feb 8, 4:39 PM · Better Use Of Data, MediaWiki-User-management, Product-Data-Infrastructure
jlinehan added a project to T273219: KaiOS / Inuka Event Platform client: Better Use Of Data.
Mon, Feb 8, 4:39 PM · Better Use Of Data, KaiOS-Wikipedia-app, Inuka-Team, Product-Analytics, Product-Data-Infrastructure, Analytics-Kanban, Analytics, Analytics-EventLogging, Event-Platform
jlinehan added a project to T262626: Remove http.client_ip from EventGate default schema (again): Better Use Of Data.
Mon, Feb 8, 4:39 PM · Better Use Of Data, Analytics-Kanban, Product-Analytics, Patch-For-Review, Product-Data-Infrastructure, observability, Privacy Engineering, Analytics, Event-Platform
jlinehan added a project to T262663: Dashboard for monitoring product data traffic: Better Use Of Data.
Mon, Feb 8, 4:38 PM · Better Use Of Data, Product-Data-Infrastructure
jlinehan added a project to T218835: prefUpdate schema contains multiple identical events for the same preference update: Better Use Of Data.
Mon, Feb 8, 4:38 PM · Readers-Web-Backlog (Kanbanana-FY-2020-21), Patch-For-Review, Better Use Of Data, Product-Data-Infrastructure, Analytics-Radar, Product-Analytics
jlinehan added a project to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one: Better Use Of Data.
Mon, Feb 8, 4:38 PM · Better Use Of Data, Product-Data-Infrastructure, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering

Feb 3 2021

jlinehan added a comment to T273293: Define acceptable usage of the `meta` object in event schemas.

@Ottomata, @Mholloway and I had a chance to sit down and dive into this, notes from our discussion:

  • Likely undesirable to namespace all non-producer-managed fields (what we refer to as 'metadata' in above discussion) under any common namespace, meta.* or otherwise
    • Main reason: that namespace would need to be known to all elements of the data pipeline which set these field values
    • A body of pre-existing fields in the top level exists which would be undesirable to move/replicate, necessitating changes in behavior at various pipeline stages
    • Possible solution: indirection layer to define managed fields as 'entities' (e.g. user_agent) which are then mapped to a field name or list of possible field names, e.g. ['http.user_agent','meta.user_agent',...]. Various agents acting within the pipeline use this canonical mapping to know what field to operate on if they want to operate on user_agent
    • While interesting, we can find our way there naturally; this isn't the task to launch such an effort
  • Examination of e.g. https://schema.wikimedia.org/repositories//primary/jsonschema/fragment/mediawiki/common/current.yaml etc. showed the need for getting more awareness of what exists in the primary repository, if we should standardize on a single set of fields, how management of those fields should be shared between analytics and production use-cases, and how much we should re-use that existing work. This will need a closer look.
  • In terms of this task, the default remains to not extend meta. We had some other ideas that I will write up in a full response.
  • In terms of what we call things that are 'metadata' versus 'data', the distinction may still be useful to talk about, despite the fact that the 'meta' field is named what it is. However the convention would not necessarily be reflected in the field structure for now.
    • An option would be to use an approach where 'metadata' consists of all top-level fields except for a distinguished namespace event.* or data.* or similar, under which all the fields defined by the instrument (indeed, particular to its schema), would reside. It's not clear that this would give the same benefits, however, in terms of reminding users of provenance.
    • Perhaps all fields (aside from subfields of some structures such as http.*, etc) being top-level is the better approach. In that view, meta.* is a problematic artifact because it collects as sub-fields things that should (under this convention) be top-level.
      • This "all fields are top-level" seems to be a reasonable approach, especially under a regime like the one we expect, in which a meaningful amount (if not a majority) of the fields and their values will be 'metadata,' managed exclusively by automated processes, meaning that any namespace they did fall under would constitute the bulk of the entire event structure, if not the entirety.
  • We can assess the best way to design around meta in the short-term.
  • In the mid-term, we should consider identifying points of control in the data pipeline which use implicit conventions related to field names and schema structure, and better understand how this logic affects the design of our schema and conventions, and technical debt that we are accumulating.
  • The other mid-term goal is to consider the extent to which we should use a single set of conventions for all 'metadata' between production and analytics events, something we can consider as part of our schema fragment design process for the analytics events.
Feb 3 2021, 10:52 PM · Product-Data-Infrastructure, Analytics, Better Use Of Data

Feb 1 2021

jlinehan added a comment to T271456: Enable 'skin' dimension using stream configuration.

I think for this spike it's okay to just name it something and go from there. I think lablels is not very suggestive of either purpose or provenance in the way that meta(data) is, but this task should *not* block on that bikeshed, for sure. Also, colliding with ECS names could be complicated, as the error logging thing showed.

Feb 1 2021, 8:07 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan added a comment to T273293: Define acceptable usage of the `meta` object in event schemas.

Can you give an example maybe? There is the thing-being-measured (data), and the things-about-the-thing-being-measured (metadata), including any properties of the data itself, but also contextual information about who, what, when, why, and how it was created.

I'd say meta-data is data about the data. The example I gave was meta.stream could probably be metadata. So could data ownership as you say. But things like timestamps and ids and domains and mediawiki skins are actual data, not data about the data.

Feb 1 2021, 6:12 PM · Product-Data-Infrastructure, Analytics, Better Use Of Data
jlinehan added a comment to T273293: Define acceptable usage of the `meta` object in event schemas.

BTW, here's the first reference to meta I can find: https://github.com/wikimedia/restevent/pull/5/files


Hm, I like the motivation here: somehow clearly delineating what can/should be set by instruments and what is set by libraries. I think in practice this is going to be hard, but we can do our best.

Can we use something other than meta? I think term 'meta' or 'metadata' here is pretty overloaded and extra confusing

I think I'd disagree, in terms of comprehensibility, I think this distinction is pretty common. Can you give an example maybe? There is the thing-being-measured (data), and the things-about-the-thing-being-measured (metadata), including any properties of the data itself, but also contextual information about who, what, when, why, and how it was created. This intuition seems to match what we're trying to acheive, by having instruments focus on only the thing-being-measured.

Feb 1 2021, 3:58 PM · Product-Data-Infrastructure, Analytics, Better Use Of Data
jlinehan added a comment to T273219: KaiOS / Inuka Event Platform client.

Nice! Interesting, the KaiOS app is JS? Cool.

@jlinehan maybe we should one day consider having language specific client libraries (JS, PHP, Java, etc.) , rather than app specific (MW, Android, iOS, KaiOS).

Feb 1 2021, 3:02 PM · Better Use Of Data, KaiOS-Wikipedia-app, Inuka-Team, Product-Analytics, Product-Data-Infrastructure, Analytics-Kanban, Analytics, Analytics-EventLogging, Event-Platform

Jan 29 2021

jlinehan added a comment to T273293: Define acceptable usage of the `meta` object in event schemas.

meta is a vestigial historical compromise, and if we could get rid of it I would. It isn't impossible to get rid of it, it would just be a bit of work.

Jan 29 2021, 5:30 PM · Product-Data-Infrastructure, Analytics, Better Use Of Data

Jan 28 2021

jlinehan added a comment to T271456: Enable 'skin' dimension using stream configuration.

@Ottomata @Mholloway Made a dedicated bikeshed task to capture syntax discussion, see: T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release

Jan 28 2021, 10:08 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan added a subtask for T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release: T271456: Enable 'skin' dimension using stream configuration.
Jan 28 2021, 10:06 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan added a parent task for T271456: Enable 'skin' dimension using stream configuration: T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release.
Jan 28 2021, 10:06 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan created T273235: [Metrics Platform] Define stream configuration syntax relevant to v1 release.
Jan 28 2021, 10:06 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan added a comment to T269774: [MEP] Determine how stream configuration is authored and deployed.

Having a separate repo might make it easier for us to adapt to any changes in how operations/mediawiki-config changes over the coming years, as well as make it easier to add hooks etc and expose the repo for public browsing as in schema.wikimedia.org. For me at least, fewer repos is better mostly from a usability perspective of not needing to clone/keep track of more repositories in order to make a change, but here you've got to clone something either way (a standalone repo or operations/mediawiki-config). Having a small clean repo that only does one thing would probably make it easier for us to build an interface on top of it if we ever go that way, but more approachable for users either way. If "deployment" consists of pulling the submodule update into mediawiki-config, that seems kind of neat as well. I'd vote separate repo.

Jan 28 2021, 3:29 PM · Better Use Of Data, Analytics, Product-Analytics, Event-Platform, Product-Data-Infrastructure
jlinehan moved T265101: Instrument event logging for VE's image search from Doing to Done on the Product-Data-Infrastructure board.
Jan 28 2021, 12:55 PM · User-Ryasmeen, MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), Editing-team (FY2020-21 Kanban Board), Better Use Of Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Data-Infrastructure, Structured-Data-Backlog (Current Work), Editing-Team-Request, VisualEditor
jlinehan moved T267592: Updated schema strategy for analytics events from Doing to Stalled on the Product-Data-Infrastructure board.
Jan 28 2021, 12:55 PM · Better Use Of Data, Patch-For-Review, Product-Analytics, Product-Data-Infrastructure
jlinehan moved T269936: Schema repository structure, naming from Doing to Stalled on the Product-Data-Infrastructure board.
Jan 28 2021, 12:55 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan moved T262626: Remove http.client_ip from EventGate default schema (again) from Reviewing to Done on the Product-Data-Infrastructure board.
Jan 28 2021, 12:53 PM · Better Use Of Data, Analytics-Kanban, Product-Analytics, Patch-For-Review, Product-Data-Infrastructure, observability, Privacy Engineering, Analytics, Event-Platform
jlinehan moved T228179: Event Platform Client — Android from Reviewing to Done on the Product-Data-Infrastructure board.
Jan 28 2021, 12:53 PM · Wikipedia-Android-App-Backlog (Android-app-release-v2.7.33x-R-Rosgulla), Patch-For-Review, Product-Data-Infrastructure, Epic, Better Use Of Data
jlinehan changed hashtags for Product-Data-Infrastructure, added #product-data-infrastructure; removed #product-infrastructure-data.
Jan 28 2021, 12:47 PM
jlinehan renamed Product-Data-Infrastructure from Product-Infrastructure-Data to Product-Data-Infrastructure.
Jan 28 2021, 12:47 PM

Jan 27 2021

jlinehan added a comment to T267218: MediaWiki Session ID should persist according to user inactivity.

Uploaded WIP patch for engineers to discuss if desired at BUOD meeting. Not tested yet while I wrestle with my vagrant install but the point is clear. Will test, iterate patchsets and add tests from here.

Jan 27 2021, 6:57 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Better Use Of Data, MediaWiki-User-management, Product-Data-Infrastructure
jlinehan added a comment to T271456: Enable 'skin' dimension using stream configuration.

We never really figured out a good smart way to do this in stream config. Is it worth trying to solve this for all these use cases, or is that too cumbersome?

Jan 27 2021, 2:39 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data

Jan 21 2021

jlinehan triaged T271456: Enable 'skin' dimension using stream configuration as Medium priority.
Jan 21 2021, 6:20 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan renamed T271456: Enable 'skin' dimension using stream configuration from [SPIKE] Session Length of Logged in/Logged out users to [SPIKE] Enable 'skin' dimension using stream configuration.
Jan 21 2021, 6:20 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Patch-For-Review, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data

Jan 14 2021

jlinehan added a comment to T267218: MediaWiki Session ID should persist according to user inactivity.

Sampling session lengths should be done with a token that uses the same semantics as the sessions themselves, so this is a dependency of sampling the session tick data stream.

Jan 14 2021, 5:30 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Better Use Of Data, MediaWiki-User-management, Product-Data-Infrastructure
jlinehan added a subtask for T271455: Roll-up raw sessionTick data into distribution: T267218: MediaWiki Session ID should persist according to user inactivity.
Jan 14 2021, 5:29 PM · Analytics-Kanban, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data
jlinehan added a parent task for T267218: MediaWiki Session ID should persist according to user inactivity: T271455: Roll-up raw sessionTick data into distribution.
Jan 14 2021, 5:29 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Better Use Of Data, MediaWiki-User-management, Product-Data-Infrastructure

Jan 13 2021

jlinehan added a comment to T271455: Roll-up raw sessionTick data into distribution.

Hey all :]
I looked a bit into the size and length of the session_tick data that we're collecting right now, to determine what sampling rate we'll need to use.

Thank you @mforns for crunching the numbers and writing all of this up

Jan 13 2021, 5:29 PM · Analytics-Kanban, Product-Data-Infrastructure, Product-Analytics, Better Use Of Data

Dec 16 2020

jlinehan assigned T270226: HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. to Mholloway.
Dec 16 2020, 3:53 PM · Patch-For-Review, Wikidata, ci-test-error (WMF-deployed Build Failure), Analytics, Analytics-EventLogging
jlinehan added a comment to T270226: HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait..

Seems to be fixed for now. @jlinehan do you want to keep this task open to track your work on it (I assume you’ll eventually want to un-revert that change in some form), or is it okay to close?

Dec 16 2020, 3:49 PM · Patch-For-Review, Wikidata, ci-test-error (WMF-deployed Build Failure), Analytics, Analytics-EventLogging
jlinehan added a comment to T270226: HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait..

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/645430 was reverted while we examine the source of the error. Sorry for the inconvenience.

Dec 16 2020, 3:49 PM · Patch-For-Review, Wikidata, ci-test-error (WMF-deployed Build Failure), Analytics, Analytics-EventLogging
jlinehan added a comment to T270226: HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait..

Alright, then I’ll stop tinkering with my above changes for the time being :) thanks!

Dec 16 2020, 1:57 PM · Patch-For-Review, Wikidata, ci-test-error (WMF-deployed Build Failure), Analytics, Analytics-EventLogging
jlinehan added a comment to T270226: HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait..

Thanks for creating this ticket, we're looking into this now and should resolve soon.

Dec 16 2020, 1:48 PM · Patch-For-Review, Wikidata, ci-test-error (WMF-deployed Build Failure), Analytics, Analytics-EventLogging

Dec 11 2020

jlinehan added a comment to T269936: Schema repository structure, naming.

For app specific event schemas, prefix with the app name:

  • analytics/mediawiki/mediasearch_interaction
  • analytics/wikipedia_ios/button_click
  • analytics/wikipedia_android/button_click
  • analytics/wikivoyage_ios/search_request

For anything that might be shared across apps, don't prefex:

  • analytics/session_tick
Dec 11 2020, 3:39 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan updated the task description for T269936: Schema repository structure, naming.
Dec 11 2020, 3:03 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure
jlinehan created T269936: Schema repository structure, naming.
Dec 11 2020, 2:58 PM · Better Use Of Data, Product-Analytics, Analytics, Product-Data-Infrastructure

Dec 10 2020

jlinehan added a comment to T267602: Client-side error logging should use Elastic Common Schema (ECS) fields when possible.

B. is hard to do, and requires a lot of coordination. But we could do it slowly one schema at a time, and start with the ones we want to import into logstash. We'd make an fragment/http/2.0.0,...or maybe an fragment/ecs/http/1.0.0, and then include it in mediawiki/client/error. To do this we'd need to make eventgate-wikimedia aware of this new convention and set the fields appropriately. Ungh, and if we hoped to eventually migrate ALL existent schemas to ECS's http, the Hive tables would have both http subschema fields (e.g. http.request_headers and http.request.headers) probably forever (unless we manually intervened).

Dec 10 2020, 7:10 PM · Better Use Of Data, Analytics, Product-Data-Infrastructure, Event-Platform
jlinehan updated the task description for T269774: [MEP] Determine how stream configuration is authored and deployed.
Dec 10 2020, 3:33 PM · Better Use Of Data, Analytics, Product-Analytics, Event-Platform, Product-Data-Infrastructure