Page MenuHomePhabricator

Provide service alerting/statistics for the citoid and zotero services
Closed, ResolvedPublic

Description

So we notice if it falls over.

Event Timeline

Jdforrester-WMF raised the priority of this task from to High.
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF added a project: Citoid.

Something like

https://wikitech.wikimedia.org/wiki/Parsoid#Monitoring ?

I don't have access to the current living versions of citoid except for the one in the services project on novawiki (a.k.a. not the one Roan set up for deploy) so I'm not sure how much I can do here- although I am happy to be given access and mess around :).

Also we probably need to monitor Zotero too. Moreso, I think. I've had some random Zotero dying happen on me and I was never able to replicate the circumstances that caused it.

Mvolz renamed this task from Provide service monitoring for the citoid service to Provide service monitoring for the citoid and zotero services.Jan 25 2015, 8:42 PM
Mvolz set Security to None.

James, should this be tagged SRE?

There are different ways to monitor it. High-level by putting metrics in ganglia and then checking for changes there or a bit lower-level with Icinga to check stuff like "is the process even running". I can do the latter, while others are more familiar with the former. If you want something in Icinga let me know a typical line from the output of "ps" please that indicates zotero process is running normally.

The icinga stuff for zotero is done in https://gerrit.wikimedia.org/r/#/c/194495/10/modules/nagios_common/files/checkcommands.cfg,cm

It is an HTTP check in that it tries to convert (yes it is actually converting, not exporting as the /export url suggests) an empty citation to the "wikipedia" format. Getting metrics out of zotero in order to have meaningful graphs in graphite or ganglia, I don't think is doable. I haven't found yet a way to query zotero about it's internal state so I am guessing if we want something we will have to rely on log parsing.

akosiaris renamed this task from Provide service monitoring for the citoid and zotero services to Provide service alerting/statistics for the citoid and zotero services.Mar 16 2015, 7:29 PM

As mentioned above, the alerting part is done

Change 197126 had a related patch set uploaded (by Mobrovac):
Report metrics using StatsD

https://gerrit.wikimedia.org/r/197126

Change 197310 had a related patch set uploaded (by Mobrovac):
Citoid: set the StatsD host to statsd.eqiad.wmnet

https://gerrit.wikimedia.org/r/197310

Change 197310 merged by Alexandros Kosiaris:
Citoid: set the StatsD host to statsd.eqiad.wmnet

https://gerrit.wikimedia.org/r/197310

Change 197126 merged by jenkins-bot:
Report metrics using StatsD

https://gerrit.wikimedia.org/r/197126

Stats collection will go live in the next citoid service production deploy.