Page MenuHomePhabricator

Document how to pull affiliate data for the campaigns product extension
Closed, ResolvedPublic

Description

Is there a way to pull Affiliate data through an API from the campaigns product extension?

Use case: As part of the event registration creation process (see see the campaigns product team page), we are asking campaign organizers if the event page registration that they are setting up is associated with a chapter, organizing partners, user groups, affiliate, or other organization outside Wikimedia (T322332). Currently this is a yes/no question with an open text field that unfurls if the organizer chooses yes. Open text answers pose a few challenges.
Ideally we could pull affiliate names and provide organizers a way to select which affiliates are connected to an event.
Note: There is a precedent for events including this information on their related event pages as images or free text see the bottom of Projeto Mais Teoria da História na Wiki/Mais Pretas and Alto!_Mujeres_trabajando.
We'd like like to allow organizers to log this information on the campaigns product extensions' registration tool. And, we'd like capture this information in a way that assures data quality and which improves the tool and experience in addition to providing the affiliate information.

For example, this WikiGap RDC event, which was set up on the registration tool, had a few fields that organizers filled out: name, organizers, start/end time, location, event_type (online, in-person, hybrid), group chat link. We would like organizers to also be able to add related affiliates to the event. This will help us understand campaigns, grants, and affiliates impact as well as improve event discoverability for events set up using the campaigns extension tooling.

Current ways to pull affiliate information.

  • through Hive: by querying GDI.affiliate_data_input_metrics - The campaigns extension, and mediawiki extentions generally, cannot access kerberos data. So for now the first option is off the table, unless the GDI data is not behind Kerberos.
  • through Quarry I see queries on affiliates and user groups https://quarry.wmcloud.org/query/runs/all?search_term=affiliate -

Are there current wmf_powered tools that are accessing Quarry data through API? Connecting with someone using that data for a tool would be helpful. Are there issues to keep in mind under this Quarry endpoint data access method?

I'm also seeing from the Quarry main page that there is work going into replacing quarry with a superset instance. When could we be looking at the Quarry to Superset transition executing? In that scenario, might Affiliates and user group data (we're interested in names of the broad groups...so Wikimedia Brazil or Egypt_Wikimedians NOT user x in either of those groups) be available to a public Superset instance? With Superset we can access a virtual dataset using a /dataset/export endpoint.
Are there issues to keep in mind under this Superset endpoint data access method?

See the related conversation on Slack's Working with Data channel

Event Timeline

Iflorez triaged this task as Medium priority.May 12 2023, 7:42 PM

@cmelo the closest thing to an API output is JSON query results from Quarry.
Quarry is a web SQL client for queries to the https://wikitech.wikimedia.org/wiki/Wiki_Replicas sanitized clones of the production MediaWiki metadata tables.

Does Quarry JSON output suffice?
https://quarry.wmcloud.org/run/1534/output/0/json

I've reached out to Jaime Anstee, Derick Alangi, and @AliceChina about querying for affiliate names.

@cmelo the closest thing to an API output is JSON query results from Quarry.
Quarry is a web SQL client for queries to the https://wikitech.wikimedia.org/wiki/Wiki_Replicas sanitized clones of the production MediaWiki metadata tables.

Does Quarry JSON output suffice?
https://quarry.wmcloud.org/run/1534/output/0/json

I've reached out to Jaime Anstee, Derick Alangi, and @AliceChina about querying for affiliate names.

Thanks @Iflorez, I am working on some other PII tasks, I didn't have enough time yet to look at how could we get the list of affiliates, but I saw the query you shared, it returned 24791 rows, is this the correct number of affiliates?

I also check this one you mention on the task description, but I think the json result does not contain the data we need:

{"meta": {"run_id": 1534, "rev_id": 1534, "query_id": 541}, "headers": ["COUNT(*)"], "rows": [[88]]}

At the moment I am trying to prepare all the PII tasks on phab because I will be OOO from May 24 to May 31, but when I am back I will take a look at this again

@cmelo It sounds like you could use json output from Quarry's quasi API setup (the json out put link). In that case, a functional quarry query could work. Unfortunately, I cannot recommend an easy serve solution here as I don't have a working query yet.

If I hear back from @AliceChina I will update here (she's been working to automate this affiliate list on office from the meta lua table data).

@JAnstee_WMF shares: on meta, the lua tables update the m:Reports tables for affiliates which I imagine you could quarry query given the structure of the mark-up but you'd need to avoid the tables for non-affiliate reporters.

Derick Alangi worked previously on the Affiliates Data Portal. He doesn't have a recommendation for tables to query.

Liam Wyatt notes: The list of officially recognized affiliates is always changing - because the accreditation and disacreditation processes for user groups is deliberately supposed to be lightweight.

@DAlangi_WMF notes:
Humans can query for always updated names of affiliates via the advance mode on the search form here: https://meta.wikimedia.org/wiki/Wikimedia_Affiliates_Data_Portal.
(1) visit that URL (https://meta.wikimedia.org/wiki/Wikimedia_Affiliates_Data_Portal)
(2) hit the search affiliate data button
(3) toggle to the “Advance mode”
(4) fill in the fields (step 1 - 5) top down with the below values. For dates, you can just put from 2000 - today (present) and you should have an updated list.

Screenshot 2023-05-19 at 7.07.47 PM.png (1×1 px, 286 KB)

For code: The data is already in the WADP in Lua tables. In the context of on wiki gadgets, you can use the MW API to query that, put on an abstract syntax tree (AST) using a luaparse library then play on the tree the way you want and extract the data you need.

Iflorez updated the task description. (Show Details)
Iflorez updated the task description. (Show Details)

@DNdubane_WMF confirms using the Mediawiki API to pull the data of the affiliates data portal on meta.

If the MW API approach does not work for this use case, we'd look to CRM driven potential solutions. FYI: The CRM is not yet deployed but is in planning. @Sadads notes that the CRM will have an "Organizations" group that will include all the affiliates. @Qgil is best poised to talk through the challenges of pulling data from the CRM.

Quick update: there's no API envisioned for the CRM.