We will need to create a schema fragment in the secondary repository to contain the fields that are decided on as part of T275420. This fragment can live alongside the other fragments and does not need to belong to the analytics common fragment.
|Open||jlinehan||T276378 Release Metrics Platform v1|
|Open||DAbad||T281999 Metrics Platform Schema: Define & Model Event Level Fields|
|Duplicate||jlinehan||T277092 [Metrics Platform] Standard base schema|
|In Progress||jlinehan||T276379 [Metrics Platform] Create Metrics Platform Schema|
|Resolved||DAbad||T275420 [Metrics Platform] Define first set of standard fields for metrics platform|
|Duplicate||kzimmerman||T278595 [Metrics Platform] Evaluate preliminary list of standard fields for feasibility|
|Resolved||• Mholloway||T280195 Create a device detection strategy for the device_type context field|
|Resolved||DAbad||T278597 [Metrics Platform] Standardize Timestamps across all datetime fields|
|Open||jlinehan||T277090 [Metrics Platform] Standard base schema should include a 'test' or 'debug' field for QA and other purposes|
The code in the patch defines a session_id identifier. As of today we are not using such an identifier on webrequest traffic data, only on some events.
Not using a user_id or session_id in webrequest and therefore pageviews, as the latter derives from the former, has been discussed and agreed upon a while back when there has been a demand to add this field to the the webrequest dataset (I can't recall the exact period, but it was at least 3 years from now).
Back to events: with the very large use-case of the new event-type, I have the feeling that we will move away from webrequest being the source of traffic data for metrics. While I think this move is great, I also would like the shift in privacy setting to be thoroughly acknowledged and broadly discussed.
I second @JAllemandou!
One related question: Are all fields specified in the schema going to be collected by default?
I recall that the collected fields would be specified in the extension's configuration? Is that correct?
Will they be enabled in groups (i.e. add all user fields), or individually?
I mention this, because the schema contains a lot of privacy-sensitive fields:
pageview_id, session_id, and app_install_id, page.id, page.title, page.wikidata_id, page.revision_id, user.id, user.name, user.groups, user.edit_count, user.registration_dt.
I would argue in favor of not collecting any of those by default, even if we can delete them later, following the privacy-by-design principle.
Yes the system has been very deliberately set up to support a privacy-by-design process. None of these fields are collected by default. The purpose of defining them is so we can likewise define in advance the code that provides their values, and have a set menu of options that we have studied and understand from a privacy perspective, allowing a more rigorous approach using one of the many concepts of "privacy budget" out there.
I agree with both of you it is a *great* idea to set up a privacy-focused discussion, I'll make sure we do that and bother both of you to take part!!