Page MenuHomePhabricator

[Analytics] Monthly repeating tasks (next: May 2024)
Open, LowPublic

Description

Next

May 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report

June 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report

July 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report
  • Update LOD health metrics data report T341327

Archive

March 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report

February 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report

January 2024

  • Run REST API notebook T334558 and add manually refined results to the data report
  • Update LOD health metrics data report T341327

December 2023

October 2023

  • Run REST API notebook T334558 and add manually refined results to the data report
  • Update LOD health metrics data report T341327

September 2023 (not done!)

April 2024

  • Query wmde.wd_rest_api_metrics_monthly generated by Airflow and add the results to the data report
  • Update LOD health metrics data report T341327

Event Timeline

Manuel renamed this task from [Analytics] Monthly repeating tasks (next: August 2023) to [Analytics] Monthly repeating tasks (next: September 2023).Sep 20 2023, 8:36 AM
Manuel updated the task description. (Show Details)
Manuel triaged this task as High priority.Sep 25 2023, 11:17 AM
Manuel renamed this task from [Analytics] Monthly repeating tasks (next: September 2023) to [Analytics] Monthly repeating tasks (next: October 2023).Oct 10 2023, 11:57 AM
Manuel renamed this task from [Analytics] Monthly repeating tasks (next: October 2023) to [Analytics] Monthly repeating tasks (next: December 2023).
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
Manuel changed the task status from Open to Stalled.Oct 31 2023, 9:29 AM
Manuel changed the task status from Stalled to Open.Dec 6 2023, 4:33 PM

Numbers for October and November have been added to the data report.

AndrewTavis_WMDE changed the task status from Open to Stalled.Dec 22 2023, 3:33 PM
AndrewTavis_WMDE updated the task description. (Show Details)
Manuel renamed this task from [Analytics] Monthly repeating tasks (next: December 2023) to [Analytics] Monthly repeating tasks (next: January 2023).Jan 8 2024, 1:53 PM

Numbers for October and November have been added to the data report.

@AndrewTavis_WMDE: Thx! Reporting is soon due, so could you pls also add December?

@Manuel: the numbers for December were added in last night :)

Manuel renamed this task from [Analytics] Monthly repeating tasks (next: January 2023) to [Analytics] Monthly repeating tasks (next: February 2024).Jan 29 2024, 9:10 AM
Manuel updated the task description. (Show Details)
AndrewTavis_WMDE changed the task status from Stalled to In Progress.Feb 8 2024, 12:48 PM
AndrewTavis_WMDE claimed this task.
AndrewTavis_WMDE lowered the priority of this task from High to Medium.
AndrewTavis_WMDE renamed this task from [Analytics] Monthly repeating tasks (next: February 2024) to [Analytics] Monthly repeating tasks (next: March 2024).Feb 8 2024, 3:26 PM
AndrewTavis_WMDE changed the task status from In Progress to Stalled.
AndrewTavis_WMDE updated the task description. (Show Details)

Sheet has been updated with the numbers for January. Note that we have less unique user agents from a local maximum last month, but the number of IPs continues to grow. Seems like adoption is picking up, but then we're not necessarily going to pick this up with user agents as many of them would be coming from Python based libraries that share user agents.

Manuel changed the task status from Stalled to Open.Mar 14 2024, 9:51 AM

@AndrewTavis_WMDE Reporting time is coming: In the first week of April, please ensure that we have both February and March data available. Ideally, test the results from T341330 by also running the original notebooks for comparison. Thx!

I've added the numbers for February to the sheet based on the first DAG run and also just went through the query job one final time to check. The queries that are being ran by the job are directly from the original queries with only a few minor changes:

For counting the filtered user agents we're doing the following:

count(
    DISTINCT CASE
        WHEN user_agent
        NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%'
        THEN user_agent
    END
) AS total_filtered_user_agents,

... instead of:

SELECT
    count(DISTINCT user_agent) AS total_filtered_user_agents

...

WHERE
    AND user_agent NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%'

Within the WHERE clause we are further adding webrequest_source = 'text' as discussed, which was suggested by WMF data engineering and meaning that we are not losing any any information, but rather that we are querying from a subset of information that included our original results.

I'll update the numbers for March once the next DAG run is finished at the start of next week!

AndrewTavis_WMDE lowered the priority of this task from Medium to Low.Thu, Mar 28, 2:54 PM

Sheet has been updated for March via a query of wmde.wd_rest_api_metrics_monthly that's generated by Airflow. Slightly lower user agents than last month, but IPs doubled 📈

AndrewTavis_WMDE renamed this task from [Analytics] Monthly repeating tasks (next: March 2024) to [Analytics] Monthly repeating tasks (next: April 2024).Mon, Apr 8, 4:47 PM
Manuel renamed this task from [Analytics] Monthly repeating tasks (next: April 2024) to [Analytics] Monthly repeating tasks (next: May 2024).Tue, Apr 9, 8:27 AM
Manuel updated the task description. (Show Details)

Looking good, thx!