Page MenuHomePhabricator

Add more popular articles per country data to AQS
Closed, ResolvedPublic

Description

Develop API endpoint with most popular articles per country.

  • let's do a small design document on API to make sure we have a sensical endpoint
  • Probably a Top 50 per wiki is sufficient.
  • Should only include agent_type="user" (both "spider" and "bot" should be excluded)

Event Timeline

Hi @Nuria,
This is Megha Jain and I am a newbie to Wikimedia , but would love to contribute to the open source community.
I am proficient with SQL , Python and Perl , data analytics and have developed basic web apps in JS and PHP. Currently I am a Data Science student , exploring new things.
If this seems like a good first issue for a newbie to understand Wiki Datasets , could I look into this ? Will be grateful for any guidance or pointers :)

@Meghajain171192 thanks for your interest, this is a ticket that requires access to private data and our computation environment, which we cannot give to volunteers. Now, we have an all Js project that you might be interested to contribute, see for example: https://phabricator.wikimedia.org/T263973 and https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats_2

Thanks a ton for your reply @Nuria :) Will look into "Wikistats Bug - easy to understand language for pageviews" and will also work on getting access to gerrit !

@lexnasser As a remainder here is the design document for the prior AQS endpoint: https://drive.google.com/drive/u/0/folders/1bcy6Iyb_bLwD1jcfjhL4vtKZvD-CN22L

Let's start on this task in the same way, building a design document that specifies queries semantics for APi. The parent task can be used to get input from stakeholders if needed.

fdans moved this task from Incoming to Analytics Query Service on the Analytics board.

Change 654924 had a related patch set uploaded (by Lex Nasser; owner: Lex Nasser):
[analytics/refinery@master] Create and configure Oozie job to load 'Top Articles by Country Pageviews API' data into Cassandra

https://gerrit.wikimedia.org/r/654924

Change 657228 had a related patch set uploaded (by Lex Nasser; owner: Lex Nasser):
[analytics/aqs@master] Create pageviews 'top-per-country' endpoint with tests

https://gerrit.wikimedia.org/r/657228

Change 654924 merged by Joal:
[analytics/refinery@master] Create and configure Oozie job to load data into Cassandra for pageviews 'top-per-country' AQS endpoint

https://gerrit.wikimedia.org/r/654924

Change 668236 had a related patch set uploaded (by Lex Nasser; owner: Lex Nasser):
[analytics/refinery@master] Add double quote when constructing JSON in Hive query and change field names in properties file for top-per-country job

https://gerrit.wikimedia.org/r/668236

Change 668236 merged by Joal:
[analytics/refinery@master] Fix and optimize Hive query and change field names in properties file for top-per-country job

https://gerrit.wikimedia.org/r/668236

Change 657228 merged by jenkins-bot:
[analytics/aqs@master] Create pageviews 'top-per-country' endpoint with tests

https://gerrit.wikimedia.org/r/657228

Mentioned in SAL (#wikimedia-operations) [2021-03-17T13:27:40Z] <otto@deploy1002> Started deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697

Mentioned in SAL (#wikimedia-operations) [2021-03-17T13:31:04Z] <otto@deploy1002> Finished deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697 (duration: 03m 24s)

Per the parent task, the pageviews/top-per-country endpoint is now public! Take a look at that parent task for relevant info.