Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug}
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Milimetric
	Jul 27 2015, 5:45 PM

Description

Background

Some members of the community (most notably Magnus Manske and GLAM folks) have asked us to provide a pageview API for what I understand is about 11 years at this point.

WMF has been releasing page view data, aggregated hourly by page title and zipped: http://dumps.wikimedia.org/other/pagecounts-ez/merged/ since 2011. Some volunteers take this data and serve it up in a pageview API, the most prominent of these is: http://stats.grok.se/. A lot of people rely on this service, but it can be unreliable at times.

Recently, we've been getting more and more requests internally for a pageview API. Different readership teams want to analyze different types of pageviews. Some people are talking about serving the pageviews per article as part of the front end interface. These internal requests are not currently addressed by our solution, but we have them in the back of our mind.

Proposed Solution

the main RESTBase instance proxies to our RESTBase cluster, to a new "pageviews" module (done but waiting on finalized plans to submit a pull request)
Three servers will be needed to run Cassandra and RESTBase (added hardware-requests task as a blocker)
This ticket will serve as the coordinating ticket and the one linked to the puppetization change in gerrit
Hadoop pushes data into Cassandra (this code is done and working, just need to open the necessary ports once we stand up the servers)

Details

	Subject	Repo	Branch	Lines +/-
	aqs: Allow CQL access from analytics	operations/puppet	production	+4 -1
	Add Analytics Query Service role	operations/puppet	production	+257 -0

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		• bd808	T131288 Make Cloud Services shared HTTP proxies enforce TLS
Resolved		• bd808	T102367 Migrate tools.wmflabs.org to https only (and set HSTS)
Resolved		Magnus	T102457 Make Magnus tools on tools.wmflabs.org work in HTTPS
Open	Feature	None	T40450 Reimplement MediaWiki's info action (tracking)
Declined		None	T43326 Incorporate analytics into MediaWiki's info action
Resolved		Tgr	T43327 Add page views graph(s) to MediaWiki's info action for Wikimedia wikis
Resolved		• Lea_WMDE	T143664 Improve page view statistics (#15)
Resolved		None	T120497 Pageview Stats tool
Open	Feature	None	T56184 Fix TreeViews to provide pageviews statistics for all articles of any wikiproject etc.
Resolved		JAllemandou	T118931 Wikimedia "top" pageviews API has problematic double-encoded JSON [8 pts] {melc}
Duplicate		• mobrovac	T125345 Many error 500 from pageviews API "Error in Cassandra table storage backend"
Resolved		Milimetric	T44259 Make domas' pageviews data available in semi-publicly queryable database format
Resolved		Milimetric	T107056 Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug}
Resolved		Ottomata	T111053 Request three servers for Pageview API
Resolved		• mobrovac	T114742 restbase is not listening on port 7231 on aqs*

Event Timeline

Milimetric created this task.Jul 27 2015, 5:45 PM

Milimetric raised the priority of this task from to Needs Triage.

Milimetric triaged this task as Medium priority.

Milimetric updated the task description. (Show Details)

Milimetric added a project: Analytics-Backlog.

Milimetric set Security to None.

Milimetric moved this task from Incoming to Medium on the Analytics-Backlog board.

Milimetric moved this task from Medium to Tasked on the Analytics-Backlog board.

Milimetric added subscribers: Milimetric, Aklapper.

Milimetric claimed this task.Aug 3 2015, 3:45 PM

Milimetric removed a project: Analytics-Backlog.

Milimetric added a project: Analytics-Kanban.

Milimetric moved this task from Next Up to In Progress on the Analytics-Kanban board.Aug 10 2015, 9:53 PM

Milimetric moved this task from In Progress to Paused on the Analytics-Kanban board.Aug 11 2015, 3:32 PM

• ggellerman moved this task from Paused to In Progress on the Analytics-Kanban board.Aug 12 2015, 3:47 PM

Change 231574 had a related patch set uploaded (by Milimetric):
[WIP] Add an Analytics specific instance of RESTBase

https://gerrit.wikimedia.org/r/231574

gerritbot added a project: Patch-For-Review.Aug 14 2015, 3:27 PM

Milimetric moved this task from In Progress to In Code Review on the Analytics-Kanban board.Aug 14 2015, 7:15 PM

• ggellerman moved this task from In Code Review to Paused on the Analytics-Kanban board.Aug 26 2015, 3:33 PM

Milimetric added a subtask: T111053: Request three servers for Pageview API.Sep 1 2015, 2:41 PM

Milimetric updated the task description. (Show Details)Sep 1 2015, 2:47 PM

Milimetric added a parent task: T44259: Make domas' pageviews data available in semi-publicly queryable database format.Sep 1 2015, 2:50 PM

• dcausse subscribed.Sep 1 2015, 3:09 PM

Milimetric moved this task from Paused to In Progress on the Analytics-Kanban board.Sep 1 2015, 3:39 PM

• ggellerman moved this task from In Progress to Paused on the Analytics-Kanban board.Sep 2 2015, 3:35 PM

Milimetric moved this task from Paused to In Progress on the Analytics-Kanban board.Sep 8 2015, 3:38 PM

Milimetric mentioned this in T110147: Add page view statistics to page information pages (action=info) [AOI].Sep 9 2015, 9:08 PM

• kevinator moved this task from In Progress to In Code Review on the Analytics-Kanban board.Sep 18 2015, 3:44 PM

• kevinator closed subtask T111053: Request three servers for Pageview API as Resolved.Sep 21 2015, 3:40 PM

Any update on this?

Milimetric moved this task from In Code Review to Ready to Deploy on the Analytics-Kanban board.Oct 1 2015, 3:51 PM

Hey DevOps Guys,
As part of that task, we would need the cassandra cluster to beaccessible from the hadoop cluster to load the data.
We would access cassandra using CQL native on the port 9042.
Thanks !

@akosiaris ^

Change 231574 merged by Ottomata:
Add Analytics Query Service role

https://gerrit.wikimedia.org/r/231574

Ottomata mentioned this in rOPUP717e9d5d5b29: Add Analytics Query Service role.Oct 2 2015, 6:48 PM

akosiaris added a project: netops.Oct 5 2015, 11:35 AM

Restricted Application added a project: acl*sre-team. · View Herald TranscriptOct 5 2015, 11:35 AM

In T107056#1697401, @JAllemandou wrote:

Hey DevOps Guys,
As part of that task, we would need the cassandra cluster to beaccessible from the hadoop cluster to load the data.
We would access cassandra using CQL native on the port 9042.
Thanks !

The network part of the configuration is done. Port TCP 9042 of the aqs cluster is accessible to machines of the analytics subnet. There is still the issue however of the ferm firewall that needs configuration

Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 5 2015, 11:37 AM

Change 243635 had a related patch set uploaded (by Alexandros Kosiaris):
aqs: Allow CQL access from analytics

https://gerrit.wikimedia.org/r/243635

akosiaris added a subtask: T114742: restbase is not listening on port 7231 on aqs*.Oct 6 2015, 9:15 AM

Change 243635 merged by Alexandros Kosiaris:
aqs: Allow CQL access from analytics

https://gerrit.wikimedia.org/r/243635

akosiaris mentioned this in rOPUP22ead0c75c0e: aqs: Allow CQL access from analytics.Oct 6 2015, 9:16 AM

• mobrovac closed subtask T114742: restbase is not listening on port 7231 on aqs* as Resolved.Oct 7 2015, 2:17 PM

Milimetric moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Oct 7 2015, 2:53 PM

• kevinator closed this task as Resolved.Oct 9 2015, 4:06 PM

• kevinator subscribed.

Dzahn reopened subtask T114742: restbase is not listening on port 7231 on aqs* as Open.Oct 14 2015, 2:27 PM

• mobrovac closed subtask T114742: restbase is not listening on port 7231 on aqs* as Resolved.Oct 15 2015, 2:06 PM

Nemo_bis added a project: Datasets-General-or-Unknown.Feb 2 2016, 7:41 AM

Nemo_bis mentioned this in T44259: Make domas' pageviews data available in semi-publicly queryable database format.Feb 2 2016, 7:43 AM

ArielGlenn moved this task from Backlog to Done on the Datasets-General-or-Unknown board.Mar 16 2016, 5:16 PM

Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug}Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug}
Closed, ResolvedPublic
Actions

Related Objects
Search...