Page MenuHomePhabricator

Automate daily publication of trending article lists for experiment countries
Closed, ResolvedPublic

Description

The lists will be published on wiki pages like https://www.mediawiki.org/wiki/Wikipedia_for_KaiOS/engagement1/trending/en/ng/2021-01-01 (this page has a sample of the desired format). The first five entries will be shown in order, but I plan to provide ten entries per list just in case. We will need to protect the pages and set them to use the JSON content model, which will require administrator rights on MediaWiki.org.

In addition to the article names, the lists should contain the lead image and the description from the page summary API (https://en.wikipedia.org/api/rest_v1/page/summary/Cat).

Event Timeline

nshahquinn-wmf renamed this task from Automate publication of a daily list of top-read articles in India to Automate daily publication of trending article lists for experiment countries.Mar 9 2021, 11:04 AM
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf raised the priority of this task from Medium to High.Mar 23 2021, 4:29 PM

Okay, I believe I've finally got everything set up properly.

The code is at https://github.com/wikimedia-research/2021-KaiOS-app-homepage-content-suggestions.

I've set trending_articles.py to run on stat1008 every day at 03:00 UTC to produce the lists based on the previous day's data. It will run as user analytics-product (so it can automatically authenticate with Kerberos). It posts on MediaWiki.org using my account, authenticated as an OAuth 1 owner-only consumer.

Specifically, I changed all the files in the experiment folder to mode 775 and group analytics-product-users. Then, I opened the crontab with sudo -u analytics-product crontab -e and inserted the following:

http_proxy="http://webproxy:8080"
https_proxy="http://webproxy:8080"
no_proxy=127.0.0.1,localhost,.wmnet
0 3 * * * /home/neilpquinn-wmf/2021-KaiOS-app-homepage-content-suggestions/trending_articles.py >> /home/neilpquinn-wmf/2021-KaiOS-app-homepage-content-suggestions/logs.txt 2>&1

It's important to set the proxy lines here; otherwise, the script won't be able to access the internet and therefore the APIs it needs.

I will leave this open, since I want to monitor the job over the next couple of days in case unexpected bugs come up.

It looks like the job ran just fine today! I'll check again tomorrow before closing this.

The job has now run correctly for two days in a row, so I think we can now rely on it for the experiment.