Page MenuHomePhabricator

Graphite stopped collecting MW profiling information
Closed, ResolvedPublic

Description

The last data was on Dec 22.

Event Timeline

MaxSem raised the priority of this task from to Needs Triage.
MaxSem updated the task description. (Show Details)
MaxSem added a project: Grafana.
MaxSem set Security to None.
MaxSem added subscribers: Joe, ori.
MaxSem subscribed.

there seems to be a related thing (unpuppetized afaict) instance of mwprof specifically for hhvm:

root@tungsten:/var/log/upstart# service hhvm-collector start
hhvm-collector start/running, process 23791
root@tungsten:/var/log/upstart# service hhvm-profiler-to-carbon start
hhvm-profiler-to-carbon start/running, process 23801
root@tungsten:/var/log/upstart#

however profiler-to-carbon doesn't seem to start:

root@tungsten:/var/log/upstart# cat hhvm-profiler-to-carbon.log
[2015-01-06 09:40:17,483] Failed to extract data from collector.
Traceback (most recent call last):
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 135, in <module>
    fullprofile = get_profiling_data()
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 122, in get_profiling_data
    collector_socket.connect((collector_host,collector_port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module>
    current[db] = BuildStats(db, fullprofile)
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats
    events=fullprofile[db]["-"].items()
KeyError: 'all'
Traceback (most recent call last):
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module>
    current[db] = BuildStats(db, fullprofile)
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats
    events=fullprofile[db]["-"].items()
KeyError: 'all'
Traceback (most recent call last):
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module>
    current[db] = BuildStats(db, fullprofile)
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats
    events=fullprofile[db]["-"].items()
KeyError: 'all'
Traceback (most recent call last):
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 154, in <module>
    current[db] = BuildStats(db, fullprofile)
  File "/srv/deployment/mwprof/mwprof/hhvm-profiler-to-carbon", line 40, in BuildStats
    events=fullprofile[db]["-"].items()
KeyError: 'all'
root@tungsten:/var/log/upstart# cat

paging @ori since he might know what's the right action here

paging @ori since he might know what's the right action here

Any news? Still an issue?

I've tracked this down so far to profiler-to-carbon not pushing to graphite, the component that pulls from mwprof XML and transforms to graphite metrics. Working on a fix, currently the script live outside its git version in /srv/deployment/mwprof/mwprof so it is hard to tell what changed when

current status as I understand it:

  • profiler traffic has moved host without coordination in https://gerrit.wikimedia.org/r/#/c/188724/2
  • even if the profiler was running it wouldn't work because the format changed without coordination in https://gerrit.wikimedia.org/r/#/c/188734/1
  • profiler-to-carbon won't work anyway on graphite1001 because the upstart script doesn't call the right script (/srv/deployment/mwprof/mwprof/profiler-to-carbon vs /srv/deployment/reporter/reporter/profiler-to-carbon)
  • virt1000 is still sending mediawiki stats to tungsten on port 3811
  • the profiler-to-carbon copy on tungsten under /srv/deployment/mwprof/mwprof/profiler-to-carbon had such a diff which is probably what prevented it to report some stats before traffic was moved
         if(db.startswith('stats')):
-            name = 'stats.' + invalid.sub('_', str(event[0])).rstrip('_')
+            continue
         else:

Most of this is on me, now. I have to check that the format emitted by MediaWiki is actually compatible with StatsD and then coordinate with Filippo a configuration change to send it to the statsd port.

gerritbot subscribed.

Change 191087 had a related patch set uploaded (by Ori.livneh):
Correct StatsFormatString so it emits valid statsd data

https://gerrit.wikimedia.org/r/191087

Patch-For-Review

Change 191087 merged by jenkins-bot:
Correct StatsFormatString so it emits valid statsd data

https://gerrit.wikimedia.org/r/191087

on hold since T89846 might have been a side effect of this, I'm working with @ori too to get this fixed

Metrics are now back. The mwprof / profiler-to-carbon stack has been eliminated, too, so MediaWiki is now "speaking" StatsD (or a heavily accented variant thereof, until https://gerrit.wikimedia.org/r/#/c/191854/ is merged).

Because metrics are no longer going through mwprof / profiler-to-carbon, some metrics may have changed names slightly. You can expect the gdash dashboards to be incomplete until all the kinks have been smoothed out.

The profiler needs to be updated to use the new metrics interface, introduced in Ie10db1c15: Add StatsD metric logging.