Page MenuHomePhabricator

Measure the surviving translations by newcomers
Closed, ResolvedPublic

Description

As part of the success metrics identified for Content Translation version 2, we want the number of articles translated by new editors that aren’t deleted in 30 days to increase.

Measurement

We want to measure the number of articles translated by (new) editors that aren’t deleted in 30 days.

  • We want to have separate measurements for (a) "new editors", and (b) "existing editors" as a reference.
  • We want to have separate measurements for activity from (a) all versions, and (b) version 2 of the tool. That will allow to understand the overall impact as version 2 becomes the default but also to "zoom in" to focus only on version 2 (where the improvements are made) when needed.
  • New editors are defined as users that created their account during the last 6 months (the initial criteria user for new editor experiences research).
  • We want to count the number of articles that were successfully published and survived after the given time period, in comparison to those that were deleted (more on the representation section below).
  • Time period: We want to group the results in 30 day periods. Analysing the translations that were published in a given month. We may need to consider a delay to give enough time for communities to review them.

Representation

An example representation is shown below, where the number of articles published is represented for each month, distinguishing those articles that survived 30 days and those that were deleted.

newcomer-metrics-sketches 5.png (421×559 px, 51 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Pginer-WMF triaged this task as Medium priority.May 14 2018, 5:31 PM
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.
Pginer-WMF raised the priority of this task from Medium to High.Jun 25 2018, 10:13 AM
nshahquinn-wmf lowered the priority of this task from High to Medium.Sep 7 2018, 6:52 PM

Hi @Neil_P._Quinn_WMF , this task description says that we want to measure the number of articles translated by (new) editors that aren’t deleted in 30 days. In the "Surviving translations by newcomers" section of your notebook (T199342#5290129), you're using the revision_is_deleted field to identify whether a revision is deleted. I'm wondering if this needs to be fixed to fulfill the request of 30-day survival period. Also, I'm wondering whether the revision_is_deleted field would be updated if a page is deleted in the future.

Hi @Neil_P._Quinn_WMF , this task description says that we want to measure the number of articles translated by (new) editors that aren’t deleted in 30 days. In the "Surviving translations by newcomers" section of your notebook (T199342#5290129), you're using the revision_is_deleted field to identify whether a revision is deleted. I'm wondering if this needs to be fixed to fulfill the request of 30-day survival period. Also, I'm wondering whether the revision_is_deleted field would be updated if a page is deleted in the future.

Yeah, you are correct that I didn't do it exactly to spec! I think the difference likely to be small since translation deletion is likely to happen quickly if it happens at all, because the main reason would be simple (too much unmodified machine translation) rather than complex (notability), but it would definitely be more correct to fix that. There is the revision_deleted_by_page_deletion_timestamp field which should make that relatively easy.

Also, revision_is_deleted has now changed meaning as part of the big restructuring of how mediawiki_history handles deleted content: you'll want either page_is_deleted or revision_is_deleted_by_page_deletion. The two are close to the same, except that revision_is_deleted_by_page_deletion covers edge cases where the page was deleted and some revisions were undeleted, which are not likely to come up in this case. And, yes, if a revision is deleted by page deletion after a given snapshot, the field will be updated in future snapshots.

Hope that helps!

Thanks so much @Neil_P._Quinn_WMF ! Please let me know if you see any issues in the query below.
(I had some problems connecting to SWAP, so these are run in separate python scripts. Let me see if I can solve the problem and update the notebook later...)

Results:

translation_counts_new_all_version.png (1×3 px, 121 KB)

translation_counts_new_version2.png (1×3 px, 119 KB)

translation_counts_exp_all_version.png (1×3 px, 120 KB)

translation_counts_exp_version2.png (1×3 px, 118 KB)

Query:

select
    date_format(event_timestamp, "YYYY-MM") as month,
    if(array_contains(revision_tags, "contenttranslation-v2"), 2, 1) as cx_version,
    -- 6 months ≈ 26 weeks = 252 days
    if(
        coalesce(datediff(event_timestamp, ssac.dt) > 252, true),
        "experienced",
        "new"
    ) as user_experience,
    if(revision_is_deleted_by_page_deletion, "deleted", "survived") as status,
    count(*) as n_translations
from wmf.mediawiki_history mh
left join event_sanitized.serversideaccountcreation ssac
on
    ssac.event.username = event_user_text and
    ssac.year >= 0
where
    mh.snapshot = "2019-06" and
    mh.event_timestamp >= "2018-07" and
    event_entity = "revision" and
    event_type = "create" and
    array_contains(revision_tags, "contenttranslation")
group by date_format(event_timestamp, "YYYY-MM"), 
if(array_contains(revision_tags, "contenttranslation-v2"), 2, 1), 
if(
        coalesce(datediff(event_timestamp, ssac.dt) > 252, true),
        "experienced",
        "new"
    ), 
if(revision_is_deleted_by_page_deletion, "deleted", "survived")

Thanks so much @Neil_P._Quinn_WMF ! Please let me know if you see any issues in the query below.
(I had some problems connecting to SWAP, so these are run in separate python scripts. Let me see if I can solve the problem and update the notebook later...)

The query looks right!

Looking at the results, it seems odd that 2019-06 sees a big jump in translations by experienced users and a big drop in ones by new users. That makes me wonder if there's some sort of classification issue going on (maybe that's why you asked). But I don't see any issues with the query, and the numbers don't line up (~7000 more translations by experienced users, and ~1000 fewer by new users).

Thanks @Neil_P._Quinn_WMF for the review!
I've pushed the code and result to github https://github.com/wikimedia-research/2018-19-Language-annual-plan-metrics.