The last data was on Dec 22.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Correct StatsFormatString so it emits valid statsd data | operations/mediawiki-config | master | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | fgiunchedi | T92322 Add monitoring of upload rate on commons to icinga alerts | |||
Resolved | ori | T85641 Graphite stopped collecting MW profiling information | |||
Resolved | • GWicke | T89846 Investigate apparent restbase request rate under-reporting in graphite: statsd issue? | |||
Invalid | fgiunchedi | T89857 scale statsd reporting/aggregation (plan) | |||
Resolved | fgiunchedi | T97509 deprecate mwprof from puppet and gerrit |
Event Timeline
there seems to be a related thing (unpuppetized afaict) instance of mwprof specifically for hhvm:
root@tungsten:/var/log/upstart# service hhvm-collector start hhvm-collector start/running, process 23791 root@tungsten:/var/log/upstart# service hhvm-profiler-to-carbon start hhvm-profiler-to-carbon start/running, process 23801 root@tungsten:/var/log/upstart#
however profiler-to-carbon doesn't seem to start:
root@tungsten:/var/log/upstart# cat hhvm-profiler-to-carbon.log [2015-01-06 09:40:17,483] Failed to extract data from collector. Traceback (most recent call last): File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 135, in <module> fullprofile = get_profiling_data() File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 122, in get_profiling_data collector_socket.connect((collector_host,collector_port)) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused Traceback (most recent call last): File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module> current[db] = BuildStats(db, fullprofile) File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats events=fullprofile[db]["-"].items() KeyError: 'all' Traceback (most recent call last): File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module> current[db] = BuildStats(db, fullprofile) File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats events=fullprofile[db]["-"].items() KeyError: 'all' Traceback (most recent call last): File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module> current[db] = BuildStats(db, fullprofile) File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats events=fullprofile[db]["-"].items() KeyError: 'all' Traceback (most recent call last): File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module> current[db] = BuildStats(db, fullprofile) File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats events=fullprofile[db]["-"].items() KeyError: 'all' root@tungsten:/var/log/upstart# cat
paging @ori since he might know what's the right action here
Any news? Still an issue?
I've tracked this down so far to profiler-to-carbon not pushing to graphite, the component that pulls from mwprof XML and transforms to graphite metrics. Working on a fix, currently the script live outside its git version in /srv/deployment/mwprof/mwprof so it is hard to tell what changed when
current status as I understand it:
- profiler traffic has moved host without coordination in https://gerrit.wikimedia.org/r/#/c/188724/2
- even if the profiler was running it wouldn't work because the format changed without coordination in https://gerrit.wikimedia.org/r/#/c/188734/1
- profiler-to-carbon won't work anyway on graphite1001 because the upstart script doesn't call the right script (/srv/deployment/mwprof/mwprof/profiler-to-carbon vs /srv/deployment/reporter/reporter/profiler-to-carbon)
- virt1000 is still sending mediawiki stats to tungsten on port 3811
- the profiler-to-carbon copy on tungsten under /srv/deployment/mwprof/mwprof/profiler-to-carbon had such a diff which is probably what prevented it to report some stats before traffic was moved
if(db.startswith('stats')): - name = 'stats.' + invalid.sub('_', str(event[0])).rstrip('_') + continue else:
Most of this is on me, now. I have to check that the format emitted by MediaWiki is actually compatible with StatsD and then coordinate with Filippo a configuration change to send it to the statsd port.
Change 191087 had a related patch set uploaded (by Ori.livneh):
Correct StatsFormatString so it emits valid statsd data
Change 191087 merged by jenkins-bot:
Correct StatsFormatString so it emits valid statsd data
Metrics are now back. The mwprof / profiler-to-carbon stack has been eliminated, too, so MediaWiki is now "speaking" StatsD (or a heavily accented variant thereof, until https://gerrit.wikimedia.org/r/#/c/191854/ is merged).
Because metrics are no longer going through mwprof / profiler-to-carbon, some metrics may have changed names slightly. You can expect the gdash dashboards to be incomplete until all the kinks have been smoothed out.
The profiler needs to be updated to use the new metrics interface, introduced in Ie10db1c15: Add StatsD metric logging.