Page MenuHomePhabricator

Baseline metrics for logo detection on upload wizard
Closed, ResolvedPublic

Description

Please refer to https://www.mediawiki.org/wiki/Product_Analytics#How_to_get_help_with_data_or_analysis for help answering these prompts

What team/program is this request for?
Structured Content

What are you requesting?

Get baselines for potential success metrics for logo detection:

  • Proportion of %DR mentioning logo within 30 days - all uploads to Commons
  • Proportion of % copyright DR mentioning logo within 30 days - all uploads to Commons
  • Proportion of %DR mentioning logo within 30 days in upload wizard
  • Proportion of % copyright DR mentioning logo within 30 days in upload wizard

For DR related to logo uploads refer to T340546

What is the problem you're trying to solve?
Decrease uploads of logos that violate copyright.
Decrease uploads of logos that violate copyright through Upload Wizard.

What decision will you make or action will you take with the deliverable?
Understand the baselines for the ongoing project, which metric to use to measure success

Additional details

Event Timeline

cchen triaged this task as Medium priority.Apr 10 2024, 5:52 AM
mpopov removed cchen as the assignee of this task.May 2 2024, 6:56 PM
mpopov raised the priority of this task from Medium to Needs Triage.
mpopov updated the task description. (Show Details)

@AUgolnikova-WMF: Can you please fill out the details in the description to help me understand if/how this should be prioritized?

Also, it looks like like the logo detection stuff is part of WE 1.2.1 but didn't you settle on a metric (deletion rate of copyright violation-related uploads) to use for measuring progress/success of the hypothesis?

@cchen I am not sure why the description got updated and some metrics were taken out but we do want to get data on the all the specified metrics:

  • Proportion of logos in undeleted content% copyright DR mentioning logo within a month30 days
  • Proportion of logos in overall Commons files
  • Time to deletion for logos

In the conversation we started back in February we talked about the need to look at different metrics to understand how best we can understand the logo behavior and what would be the best metric to measure success of logo detection. A prominent suggestion was time to deletion for logos. So can we please make sure to get the baselines for all metrics initially mentioned? Thank you.

@AUgolnikova-WMF According to the way Marco used to look for the logo-related deletions and uploads, we can only find these deletions from deletion reason. in this case, we are not able to get metrics like "Proportion of logos in overall Commons files" and "Proportion of logos in undeleted content" since there's no way to find these files if it's not deleted. Also, since it's part of the copyright related metric, I use metrics that align more with our current metrics.
If you also want to check the time to deletion for logos, I can add it back.

  • % logo DRs within 30 days = UW_with_logo_drs_within_30_days / UW_all
  • % of DRs mentioning logo within 30 days = UW_with_logo_drs_within_30_days/ UW_with_drs_within_30_days
  • % of copyright DRs mentioning logo within 30 days = UW_with_logo_drs_within_30_days/ UW_with_COPYRIGHT_drs_within_30_days
monthUW_all% deletedUW_drs_within_30_daysUW_COPYRIGHT_drs_within_30_daysUW_logo_drs_within_30_days% logo DRs within 30 days% of DRs mentioning logo within 30 days% of copyright DRs mentioning logo within 30 days
2024-034067773.39%339517991020.025%3.00%5.67%
2024-023532493.15%256810861350.038%5.26%12.43%
2024-013524673.73%535613743710.105%6.93%27.00%
2023-123107983.87%273011561480.048%5.42%12.80%
2023-113375884.81&291012701310.039%4.50%10.31%