Two opportunities identified by Security during threat modeling were:
- Deterring instrumentation users' over-collectiong of data, including PII in experiment definitions and via JS components
- Deterring schema changes out of the data collection policy of the originally vetted schema
Here we can apply some simple mitigations in service of these opportunities. Here's the suggested approach:
- Enhancement of https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary with the two approver rule. The Airflow DAGs repo as shown at https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/933/diffs has this. The two approver rule means that the merge requester may be one approver, but the other approver must be someone else.
- Enhancement of https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary with a default merge request template ( https://docs.gitlab.com/user/project/description_templates/#create-a-merge-request-template ) requesting confirmation that
- The commit message contains Phabricator task(s) associated with the change if it creates/updates/deletes schemas or otherwise changes software function in this repo.
- https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Writing_MediaWiki_instrumentation_code was followed for the schema(s) in the merge request. This includes if this merge request entails a new medium or high risk schema, or if this merge request entails a schema alteration that increases the risk level based on https://foundation.wikimedia.org/wiki/Legal:Data_Collection_Guidelines . It should also specify @sguebo_WMF as a reviewer of the MR in this case.
- Links to: any other data specifications, such as a measurement plan ( https://wikitech.wikimedia.org/wiki/Metrics_Platform/Measure_product_health#Measurement_plan ) or instrumentation specification ( https://wikitech.wikimedia.org/wiki/Metrics_Platform/Measure_product_health#Instrumentation_spec ) that may exist; this isn't strictly required. Links to related code / Gerrit/Gitlab patch pertinent to this schema change if it exists yet; if it doesn't exist yet that is okay and the merge requester should be sure to ensure reachability for that code via the Phabricator task(s) later on.
- Review and possible update to permissions in https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary to ensure that the effective permissions for independent review makes sense.
- Update to the README.md in https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary to reflect things people should know for 1-3. The verbiage presently indicates a lowered set of expectations for review and approval, but obviously here we're increasing those expectations.



