Page MenuHomePhabricator

Get baselines for Commons APP hypothesis for upload wizard improvements
Closed, ResolvedPublic

Description

As part of APP, SD team submitted the following hypothesis:

If we make improvements to the Commons upload wizard that minimize one of the most common problems that cause future deletion requests, we will decrease moderator burden as measured by a 2% decrease of the ratio of newly uploaded media that become deletion requests (as per KR WE1.2). One improvement will be to encourage users to select the right option when uploading not their “own work”.  We will identify other improvements and measurable goals based on an analysis of a sample of 1000 deletion requests.

The goal of this task is to investigate data available, calculate baselines to enable Product validate the choice of success metrics and goals.
This ticket will lead to defining a dashboard for commons moderator workflows to monitor metrics on a regular basis in next FY.

Some decisions:

Requirements

Step 1
Calculate the baselines (with absolute numbers) and create a report table. While we focus on upload wizard, it would be interesting to see comparison between other upload methods:

  • Total number of upload media within a month (through upload wizard) (filter by own work and not own work)
  • Total number of filed deletion requests within a month and total number of speedy deletions (filter by own work and not own work)
  • Main metric: the deletion rate of upload media (where we can filter by own work and not own work) within a month. This does not include speedy deletions.
  • Metric of interest: deletion rate of upload media (where we can filter by own work and not own work) which is speedy deletion within a month (as it was suggested that it might go up)

Additionally look at the deletion request queue: How long does it take a DR to get resolved?

  • e.g. the ratio of not closed DRs after x months (3 months as per suggestion from Legal). Look at distribution, bring to discussion with the PM and team.

Note:

Step 2:

  • Success metric: calculate % of media uploaded within a given month (calculate for the past 12 months) that is flagged for deletion (DRs) within 30 days of being uploaded.

Data sheet https://docs.google.com/spreadsheets/d/1qR6yPFktt-DTfETFJD50a2ooVbPe6Ad9m1S3fEdK2kE/edit#gid=0

Event Timeline

cchen triaged this task as Medium priority.

@AUgolnikova-WMF I see the deletion rates increase over time because a file may be deleted 2 months or longer time after uploading. In this case, we can set a timeframe of deletion for the metrics similar to revision. For example, total number of filed deletion requests within 30 days (or 60 days) of uploading monthly

@AUgolnikova-WMF Please find the baselines for upload wizard improvements below. The data is from April 2023 and May 2023, the detailed number can be found in this notebook

Here are some notes and finds:

  • The data excluding mobile and bot uploads.
  • For both months, there were more uploads through the upload wizard compared to other work streams.
  • For upload wizard, over 70% of uploads are own works. While from other upload methods, it's hard to define whether the files are own work or not, because not as the upload wizard, the edit comments are not standard or in other languages. We can only find very few (< 0.02%) uploads that are own work.
  • Uploads that were own work have a lower deletion rate than not own work. Uploads through upload wizard have a higher deletion rate compared to uploads through other methods.
  • I used edit comments to determine whether deletions are speedy deletions or not. But a large number of comments are empty or not standard. As a result, the actual speedy deletion numbers may be greater than the data we calculated.

Below is the data:
April

  • File Uploads Metrics:
    • There were 579,666 uploads in Commons, 303,901 (52.3%) of uploads were from upload wizard, and 276,038 (47.6%) were from other upload methods.
    • Of all the uploads from the upload wizard, 71.1% were own work, and 28.9% were not.
  • Deletion Meitrcs:
    • There were in total 16,285 deletion requests, 271 of them were speedy deletions. 60.9% of them were uploads from upload wizard, and 38.9% of them were from other upload methods.
    • In upload wizard uploads, there were 9,923 deletions. 4,847 (48.8%) were own work and 5,076 (51.2%) were not own work. 170 of them were speedy deletions.
  • Deletion Rate:
    • The overall deletion rate is 2.76%. the upload wizard uploads (3.21%) have higher deletion rate than uploads through other methods (2.27%).
    • Among the uploads through upload wizard, own work 2.21% has a lower deletion rate than not own work (5.66%)

May

  • File Uploads Metrics:
    • There were 585,035 uploads in Commons, 313,852 (53.6%) of uploads were from upload wizard, and 271,183 (46.4%) were from other upload method.
    • Of all the uploads from upload wizard, 74.6% were own work, and 25.2% were not own work.
  • Deletion Meitrcs:
    • There were in total 13,074 deletion requests, 256 of which were speedy deletions. 60.9% of them were uploads from upload wizard, and 39% of them were from other upload methods.
    • In upload wizard uploads, there were 7,959 deletions. 4,993 (63%) were own work and 2966 (37%) were not own work. 177 of them were speedy deletions.
  • Deletion Rate:
    • the overall deletion rate was 2.19%. the upload wizard uploads (2.09%) have higher deletion rate than uploads through other methods (1.85%).
    • Among the uploads through upload wizard, own work 2.09% has lower deletion rate than not own work (3.65%).

Month over month deletion rate from January 2023 to May 2023:

Overall Deletion Rate

monthdeletion rate
2023-01-014.69%
2023-02-014.58%
2023-03-013.36%
2023-04-012.76%
2023-05-012.19%

Upload Wizard Deletion Rate

monthoverallown worknot own work
2023-01-014.78%3.60%7.84%
2023-02-014.30%3.63%5.97%
2023-03-013.83%2.73%6.39%
2023-04-013.21%2.21%5.66%
2023-05-012.48%2.09%3.65%

Other Upload Methods Deletion Rate

monthdeletion rate
2023-01-014.60%
2023-02-014.92%
2023-03-012.90%
2023-04-012.27%
2023-05-011.86%

Some notes from the data:

  • We can see deletion rates increasing over time.
  • the deletion rate of own work doesn't change as much as not own work, maybe because the problems that cause the deletion of own work are easier to detect than the problems of not own work.
  • The deletion rate for uploads through the upload wizard is higher.

Adding data for deletion requests in the queue:

April
The number of deletion requests in the queue for files uploaded in April by the end of April is 2,675.

May
The number of deletion requests in the queue for files uploaded in May by the end of May is 7,881.

Adding data for the overall deletions rate for the last 12 months by the end of June 2023, including deleted files and outstanding deletion requests

uploads means number of files uploaded within certain month
deletions means number of files uploaded within certain month that have been deleted by the end of June 2023. the deletions of files could be happened anytime from the file creation to the end of June 2023.
outstanding_deletions means number of files uploaded within certain month that still in the DR queue by the end of June 2023. the submission of deletion requests of files could be happened anytime from the file creation to the end of June 2023.
deletion_rate is (deletions+outstanding_deletions)/uploads

We can use April 2023 data as an example, there are 579,707 files uploaded from April 1, 2023 - April 30, 2023. Out of these 579,707 files, 17,741 files were deleted between April 1, 2023 - June 30, 2023; 488 files are still in the deletion request queue by June 30, 2023, these deletion requests could be submitted any time between April 1, 2023 - June 30, 2023.

monthuploadsdeletionsoutstanding_deletionsdeletion_rate
2022-07527,73524,1431764.61%
2022-08480,41220,5201924.31%
2022-09597,63523,3272033.94%
2022-10465,66323,5682535.12%
2022-11466,83118,9022624.11%
2022-12498,16720,7443104.23%
2023-01509,70724,7054744.94%
2023-02484,78523,0675584.87%
2023-03588,06021,1611,3493.83%
2023-04579,70717,7414883.14%
2023-05585,05218,0415,4294.01%
2023-06594,38726,1511,0834.58%

Below are the deletion rates of media uploaded within a given month that is flagged for deletion within 30 days of being uploaded in May, June and July 2023.

Deletion rate = (deletions + outstanding deletions) / uploads

monthuploadsdeletionsoutstanding_deletionsdeletion_rate
2023-05585,05218,0415,4294.01%
2023-06594,38726,1511,0834.58%
2023-07668,47214,7527802.32%

I also added deletion of uploads within 30 days for uploads through the upload wizard only

monthuploadsdeletionsoutstanding_deletionsdeletion_rate
2023-05313,86111,7174453.87%
2023-06319,72914,8186054.82%
2023-07371,7128,7784762.49%
cchen updated the task description. (Show Details)