Email sent on July 11, 2016 to gather thoughts:
Within the Discovery Portal team we have a large, and potentially difficult, goal of being able to automatically update the statistics (number of articles and pageviews for each language) on the www.wikipedia.org portal page.
This email is to showcase what we've brainstormed on so far and we'd like your feedback and your participation - if you have time and want to help out. Or, if you know of someone that isn't on this email that could/would want to help, please forward this to them.
Background of existing manual process:
The wikipedia.org portal is a static HTML page that is compiled during development using several build scripts. The current process for updating these stats starts with a developer, working on their local machine, running the following about every two weeks:
- a script to pull down new pageview/article-count stats from various API endpoints
- a script to merge those stats with text translations and feeds that will combine the data into Handlebars templates
- a script to compile the templates to create the final HTML page
- git commit and merge to get the new HTML page (and updated support files) into the repo
- then, a deployment script is run during SWAT to push it all to production
Brainstorm ideas on how to automate this process:
- could the portal page use dynamic code to pull stats directly from some static data page that volunteers can freely edit?
- maybe php, javascript, or lua
- would need to have caching set up to avoid having code executed with every request
- could the portal page make a rest call to a service that would return stats info?
- the service could parse a volunteer-edited static data page
- could a background server
- 1) process detect changes to the git repo, or to static stats data files/pages, and
- 2) compile the template with new data, and
- 3) deploy the resulting static HTML?
- ideally we could automate all 3 parts (detect/compile/deploy), but automating steps 1 or 2 would be helpful
- could the deployment script merge the page code (from git) with static stats data?
- if all coding happened on a separate (non-master) branch, could an automated script push the stats into the master branch, and
- then could the master branch be deployed automatically by a cron job?
Note: while it would be nice to automagically merge/deploy new text translations, that is not the focus for this particular email thread.
We'd love to hear your thoughts!