## Background
In 2020, Product Analytics, Data Engineering, and Product Infrastructure (disbanded since then) teams collaborated on T267494, the outcome of which was a dataset that would enable us to start understanding how long users interact with our products on the web. The work was part of a program called #better_use_of_data. A side-effect of this work was a session identifier which reset after a certain period of inactivity, and we use this identifier today in `performer_active_browsing_session_token` contextual attribute.
Version 1 of the session length dataset was very minimal and was mostly a proof of concept for a privacy-preserving, identifier-less way to calculate what was at the time a very standard metric of Internet user behavior.
Furthermore, the metric as-is is not viable as a core or essential metric for Consumer Experience (WE3) work for the following reasons:
> - Limited history available; the data only goes back to 2021 and that makes it hard to understand changing trends
> - Low granularity: we currently can’t segment the data by region (this is the biggest blocker to using it now), type of visitor (e.g. new vs returning), pages visited, features used, etc.
> - The data needs more analysis and maintenance.
See also: [[ https://wikimedia.slack.com/archives/C05FWANFT8X/p1692639419449019 | thread in #product-tech-dept Slack channel ]].
Wiki Experiences 3 (Consumer Experiences) is more interested in reader retention, and their experiments are focused on increasing retention rate. Now, it may be that session length is correlated with retention, but the high-level dataset will not help us with this. Instead, we would need to measure session length within the same experiments that we measure reader retention, and analyze that data. We do not need Session Length v1 data to do that.
We collect **95M session tick events //per day//**. We should stop that data collection and we should reconsider:
- Whether session length is an important means for us to understand user engagement
- What session length data should look like (what dimensions it should have, how experiment-friendly it is) to provide product managers what they need to understand user engagement
NOTE: We still want the session tick instrument's **regulator** to be active, because `performer.active_browsing_session_token` contextual attribute is a session identifier that resets after a certain period of inactivity, unlike `performer.session_id` which is based on MW session ID. So we should disable data collection, but not tick and session reset browser events that instruments can and do subscribe to. In fact, we want to bring the core logic of the instrument into our SDKs and tools. This work is captured in {T284223}, as part of {T406261}.
## Acceptance criteria
[ ] Stakeholders signed-off on proposal
[ ] Julie van der Hoop, Senior Product Manager, Experiment Platform
[ ] Kate Zimmerman, Senior Director of Research and Data Science
[ ] Chris Albon, Director of Machine Learning and Data Engineering
[ ] Marshall Miller, Senior Director of Product, Core Experiences
[ ] Decommissioning announced in relevant Slack channels (`#insights-and-data`, `#working-with-data`)
[ ] Data collection stopped
[ ] Event submission disabled in [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/master/modules/ext.wikimediaEvents/sessionTick.js | sessionTick.js ]]
[ ] `mediawiki.client.session_tick` stream removed from [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/ext-EventStreamConfig.php | ext-EventStreamConfig.php ]]
[ ] Data aggregation stopped
[ ] `session_length_daily` DAG is turned off in Airflow UI
[ ] [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/main/dags/session_length/session_length_daily_dag.py | session_length_daily_dag.py ]], [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/tests/main/session_length/session_length_daily_dag_test.py | session_length_daily_dag_test.py ]], and [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/tests/main/fixtures/spark_skein_specs/main_dags_session_length_session_length_daily_dag.py-session_length_daily-process_session_length.expected?ref_type=heads | fixture 1 ]] & [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/tests/main/fixtures/spark_skein_specs/main_dags_session_length_session_length_daily_dag.py-session_length_daily-process_iceberg_session_length.expected?ref_type=heads | fixture 2 ]] removed
[ ] Docommission documented
[ ] https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Traffic/SessionLength
[ ] `wmf_traffic.session_length` ([[ https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf_traffic.session_length,PROD)/Schema?is_lineage_mode=false&schemaFilter= | DataHub ]])
[ ] `wmf.session_length_daily` ([[ https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.session_length_daily,PROD)/Schema?is_lineage_mode=false&schemaFilter= | DataHub ]])
[ ] [[ https://superset.wikimedia.org/superset/dashboard/232/ | Superset dashboard ]] either un-published or deleted entirely