Page MenuHomePhabricator

Image Recommendations MEP Data QA
Closed, ResolvedPublic

Description

For table android_image_recommendation_interaction

  • field reasons should be renamed reason to match Legacy data
  • in meta the field for wiki is null, let's discuss if this needs to be added (we have it in meta for the ios MEP table)
  • Table is missing geocoded_data. This is true for all MEP tables so this is a bigger discussion, not necessary now but when we move forward we will want to include for some datasets.

Event Timeline

LGoto triaged this task as High priority.May 4 2021, 4:10 PM
LGoto lowered the priority of this task from High to Medium.May 4 2021, 4:23 PM

For tables event.android_image_recommendation_interaction and event.mobilewikiappimagerecommendations. Seeing event count discrepancies and missing data from both tables, significantly more missing on MEP side. As of 2021-06-07:

Raw event counts daily by table spreadsheet here.


There are 612 participating users not showing up in Legacy that are in MEP and 169 users showing up in Legacy that are not in MEP.

TableUsers Unique to DatasetImages Annotated
MEP6127191
Legacy169697

App version counts for users tracked in these datasets:

MEP Only Users App Versions

versionCOUNT
2.7.50359-r-2021-05-13164
2.7.50355-r-2021-05-05442
2.7.50359-huawei-2021-05-131
2.7.50355-samsung-2021-05-055

Legacy Only Users App Versions

versionCOUNT
2.7.50355-r-2021-05-05111
2.7.50359-r-2021-05-1358

Changes were made to data instrumentation/functionality between versions so tracking this data by version is an important dimension to consider. Latest release 50359 (2.7.50359-r-2021-05-13) has fewer records in data discrepancies but that may be because it's been live for a shorter period of time.

Queries for initial check used are noted in spreadsheet, also documented in Jupyter notebook I can share as requested. Further investigation to see commonalities between missing datasets/user activity can be found here as well. Looking at event dates/user activity - here. User appearance in other tables mobilewikiappedit and mobilewikiappdailystats - here.


For users who appeared in both tables activity counts differ, as seen here.

Note: There are some errors (32) showing up in Logstash for schema MobileWikiAppImageRecommendations

@SNowick_WMF Continuing our conversation about the 'suggestion_source' numbers... just out of curiosity:

  1. Is 'commons' the only other suggestion source found in legacy for IR images, other than 'wikipedia' 'wikidata' and 'mediasearch'?
  2. Since we know that we have lost all rows for 'commons' [or any other source than the above 3] will it be possible to have a breakdown for suggestion_source, so we can add that number to MEP and see if they come any closer to legacy?
  1. The suggestion sources in Legacy are:

commons
wikipedia
wikidata

Suggestion sources in MEP:
wikipedia
wikidata

  1. Since there are more records in MEP that aren't in Legacy, adding the missing commons sourced images would probably just increase the records missing from Legacy but that's just a guess.

Count of images annotated by suggestion source in Legacy:

sourceCOUNT
wikidata4992
commons9284
wikipedia55998

Count of images annotated by suggestion source in MEP:

sourceCOUNT
wikidata5663
wikipedia63440

Out of curiosity I ran user retention analysis on active users from each table (date range 2021-05-06 to 2021-06-24). Directionally the results are fairly close. Keep in mind that the Image Recs task has a lower return rate than regular editing tasks.

Days RetainedLegacyMEP
19.7 %10.7 %
36.8 %7.5 %
74.6 %5.1 %
142.6 %2.9 %
300.79 %0.91 %