Page MenuHomePhabricator

[add a link] Create a list of new accounts that used Add a link
Closed, ResolvedPublicFeb 22 2022

Description

As part of the feedback loop we have regarding "Add a link" feature, we will ask a few questions to newcomers who used this features at our pilot Wikipedias (ar, bn, cs, es and also fr).

We will target accounts who respond to all these 3 definitions:

  • Only people who have created their account in the past 6 months
  • Only people who have been active in the last two weeks
  • Of those people top ten who have done the most “add a link” edits

This task is to coordinate the creation of the lists we need.

Details

Due Date
Feb 22 2022, 5:00 PM

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Urbanecm_WMF changed the subtype of this task from "Task" to "Deadline".
Urbanecm_WMF set Due Date to Feb 22 2022, 5:00 PM.

I created a spreadsheet that has the data requested. The data list was generated through the following Spark query, the rest is Google Spreadsheets magic.

Spark query
WITH users_active_14_days_ago AS (
    SELECT
        rc.database,
        rc.performer.user_text,
        max(rc.meta.dt) AS latest_edit
    FROM event.mediawiki_revision_create AS rc
    WHERE
        -- Feb 2022 is fine, because this is executed on 2022-02-17, so all edits made at most 14 days ago were made in February
            rc.year = 2022
        AND rc.month = 2

        AND rc.database IN ('arwiki', 'bnwiki', 'cswiki', 'eswiki', 'frwiki')

        -- only users registered in last 6 months
        AND rc.performer.user_registration_dt >= current_timestamp - interval '6' month

    GROUP BY rc.database, rc.performer.user_text

    -- active in the last 14 days
    HAVING max(rc.meta.dt) >= current_timestamp - interval '14' day
)

SELECT
    rtc.database,
    meta.dt,
    rtc.rev_id,
    rtc.performer.user_text,
    rtc.performer.user_registration_dt,
    performer.user_edit_count

FROM event_sanitized.mediawiki_revision_tags_change AS rtc

-- active in the last 14 days
JOIN users_active_14_days_ago AS ar ON ((ar.database = rtc.database) AND (ar.user_text = rtc.performer.user_text))

WHERE
    -- we're only interested in edits made by users created less than 6 months ago, ie. the first edit can be in August 2021
    (
        rtc.year = 2022 OR
        (
                rtc.year = 2021
            AND rtc.month >= 8
        )
    )
    
    -- made in one of the required wikis
    AND rtc.database IN ('arwiki', 'bnwiki', 'cswiki', 'eswiki', 'frwiki')
    -- only add a link edits
    AND array_contains(rtc.tags, 'newcomer task add link')
    -- user registered in last 6 months
    AND rtc.performer.user_registration_dt >= current_timestamp - interval '6' month

ORDER BY rtc.database, rtc.meta.dt

@Urbanecm_WMF, is it possible to:

  • only have each name once and
  • filter out blocked users?

Thank you!

@Urbanecm_WMF, is it possible to:

  • only have each name once and

They should be listed only once. already You need to look at the "users to contact" sheet (where they're listed together with their number of add a link edits).

  • filter out blocked users?

Good point. I'll do that later today and update the spreadsheet.

Thank you!

[...]

  • filter out blocked users?

Good point. I'll do that later today and update the spreadsheet.

Done -- blocked users should now be filtered out in the spreadsheet. For posterity, source code is here.

Thank you Martin! I checked on the "user to contact" tab (that I missed), where I managed to create a list of users. :)

Great Trizek! Closing this as Resolved, as it appears we now have what was specified in the task's description. Please feel free to reopen if there are any other changes I should make!