Page MenuHomePhabricator

Instrument the Proton service to match mediawiki-services-electron-render
Closed, ResolvedPublic5 Story Points

Description

ElectronPDF came with a stats dashboard (available here: https://grafana.wikimedia.org/dashboard/db/mediawiki-electronpdfservice?orgId=1)

The new Chromium-PDF service should provide at least the same set of stats:

Acceptance criteria

We would be tracking the following:

  • rejected jobs
  • queued jobs by type (desktop|mobile]
  • size of the queue when new job comes in
  • number of rendered jobs (daily, monthly)
NOTE: StatsD flushes metrics to Graphite every 10 s (the default value of config.flushInterval). Aggregation of time series is done in Graphite. Grafana is a frontend for Graphite. Just increment the num_rendered_jobs metric whenever a job is rendered and rely on the Grafana/Graphite/StatsD pipeline to do the rest 💪
  • number of failed renderings
  • time each job spends in the queue
  • time each job spends in the rendering state
  • might be helpful: generated pdf size

[Please do not add to this list, once this task is done additional analytics can be easier]

Sign off steps

  • Set up dashboard.

Event Timeline

pmiazga created this task.Mar 9 2018, 3:33 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 9 2018, 3:33 PM
phuedx renamed this task from ChromiumPDF to Instrument the Proton service to match mediawiki-services-electron-render.Mar 9 2018, 6:14 PM
ovasileva triaged this task as Normal priority.Mar 11 2018, 5:13 PM
ovasileva updated the task description. (Show Details)

Potentially for the future, but could we also have the numbers rendered by skin?

phuedx updated the task description. (Show Details)Mar 12 2018, 2:36 PM
Jdlrobson updated the task description. (Show Details)Mar 13 2018, 4:40 PM
Jdlrobson set the point value for this task to 5.

@pmiazga, @phuedx - is this something we can start work on now or should we wait until we're closer to deploying the service?

We could. Given the work that's on #readers-web-kanbanana-board and the fact that this has been pushed back to "early Q4", we could wait a little while so as to avoid too much context switching.

@Jdlrobson - why are we moving this to tracking?

I thought all proton work was on ice till further notice? Sorry if I misunderstood.

Jdlrobson moved this task from Incoming to Upcoming on the Readers-Web-Backlog board.

Piotr to comment with a status update.

The sooner we do this task the better. Currently, the Proton renderer provides a really nice logging, when something fails it's pretty easy to find out whats wrong. Sadly there is no easy way to visualize how the service performs, In the logs, we can check the queue size or job render time but we should provide a dashboard that presents all those things in easy-readable graphs.

If we have graphs anyone will be able to check the service health without digging into a big pile of renderer-pdf logs (we log a lot).

Per standup @pmiazga attended a sync with services and this is needed for the handover, thus this is actually a higher priority than I originally realised. We are pulling into the sprint.

pmiazga claimed this task.Apr 30 2018, 4:59 PM
pmiazga moved this task from To Do to Doing on the Readers-Web-Kanbanana-Board-Old board.
pmiazga added a comment.EditedMay 7 2018, 5:07 PM

@ovasileva - is there anything else you'd like to track?

@pmiazga - if possible, could we also track the number of views to the download as PDF page (daily, monthly) - it would also be interesting to put these on the same graph as the number of jobs rendered.

Change 432335 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Use metrics to provide some basic stats about the service

https://gerrit.wikimedia.org/r/432335

Change 432335 merged by Ppchelko:
[mediawiki/services/chromium-render@master] Use metrics to provide some basic stats about the service

https://gerrit.wikimedia.org/r/432335

Change 433969 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Fix failing unit-tests

https://gerrit.wikimedia.org/r/433969

Change 433969 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Fix failing unit-tests

https://gerrit.wikimedia.org/r/433969

Code is merged - the only remaining thing is to create a dashboard

pmiazga updated the task description. (Show Details)May 19 2018, 5:20 PM
pmiazga removed a project: Patch-For-Review.

The code is live in beta, so you should be able to create a dashboard in BetaCluster's Grafana.

pmiazga removed pmiazga as the assignee of this task.May 30 2018, 4:21 PM
Jdlrobson assigned this task to phuedx.Jun 4 2018, 5:07 PM
Niedzielski added a subscriber: Niedzielski.

@Tbayer is going to see if any clarification is needed and either sign off directly or assign to @phuedx.

Tbayer reassigned this task from Tbayer to phuedx.Jun 12 2018, 4:06 PM
phuedx closed this task as Resolved.Jun 13 2018, 11:01 AM

@pmiazga I've made a few minor tweaks to the dashboard, which mostly make metric labels a little more readable. I think that all of the AC are met by the dashboard as it is defined. It's just a little unfortunate that there's not more data (hopefully the latest issue in T186748 will be resolved soon!).

phuedx updated the task description. (Show Details)Jun 13 2018, 11:01 AM

@phuedx thx, during one standup I mentioned that the Dashboard is ready, but it contains only an "example" and probably once we get Proton live we will have to revisit the Dashboard and change in the way it meets our expectations. I had no idea how properly organize everything in the way it's readable to non-tech people.

Agreed. I should've added a note to that effect in my comment above. The same happened with the Page Previews dashboard, for example.