Page MenuHomePhabricator

Productionize JA3N-UA table to improve bot detection
Closed, ResolvedPublic

Description

This task is about making the previous work from T409577 a permanent dataset in the Data Lake. It includes:

  • Adapting the queries to generate the JA3N-UA table to the Airflow-SparkSQL paradigm, and code-versioning them. Add a create-table statement if necessary.
  • Write an Airflow DAG that populates the table periodically at the given schedule.
  • Deploy everything.

Event Timeline

Change #1212214 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Add JA3N User-Agent queries

https://gerrit.wikimedia.org/r/1212214

Change #1212626 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/puppet@production] analytics::refinery::job::data_purge: Add drop-ja3n-ua-hourly job

https://gerrit.wikimedia.org/r/1212626

Change #1212214 merged by Mforns:

[analytics/refinery@master] Add JA3N User-Agent queries

https://gerrit.wikimedia.org/r/1212214

Change #1213488 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Add user_agent_map to the ja3n_ua_hourly table

https://gerrit.wikimedia.org/r/1213488

Change #1213488 merged by Mforns:

[analytics/refinery@master] Add user_agent_map to the ja3n_ua_hourly table

https://gerrit.wikimedia.org/r/1213488

Change #1213522 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Fix issue in ja3n_ua_hourly query

https://gerrit.wikimedia.org/r/1213522

Change #1213522 merged by Mforns:

[analytics/refinery@master] Fix issue in ja3n_ua_hourly query

https://gerrit.wikimedia.org/r/1213522

Change #1213572 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Modify ja3n_ua_hourly queries to migrate the table to Iceberg

https://gerrit.wikimedia.org/r/1213572

Change #1213572 merged by Mforns:

[analytics/refinery@master] Modify ja3n_ua_hourly queries to migrate the table to Iceberg

https://gerrit.wikimedia.org/r/1213572

mforns opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1852

Shift the start_date of the ja3n_ua_hourly DAG back so that we include an...

mforns merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1852

Shift the start_date of the ja3n_ua_hourly DAG back so that we include an...