Maniphest T195312

Move flame graphs hosting from mwlog1001 to webperf-2 and enable in Beta Cluster
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	May 22 2018, 4:40 PM

Description

Follows-up from T180766: Make MediaWiki profiler in Beta match production, and potentially depends on some of T158837.

Specifically, we should:

Set up the Redis instance in Beta for stack logs (in prod on mwlog1001).
Set up the deamon that writes from Redis to log files on disk (in prod on mwlog1001).
Set up the deamon that generates SVG flame graphs from stack log files on disk (in prod on mwlog1001).
Set up the Apache that serves the flame graphs and stack logs (in prod on mwlog1001).
Set up a public proxy to expose that Apache (either custom for beta, or by deploying the performance-site to Beta Cluster, the latter would be nice).

Doing this would help with T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon).

For T176916, I'd like to set up a copy of the flame graph pipeline for xhprof-sampled data. This way we can keep them separate from the existing one. Because 1) to avoid corruption and duplication due to different data sources (Xenon vs xhprof) that also have different sampler intervals, and 2) To be able to compare them side by side, and 3) To be able to ramp up the new one slowly without being lost in the noise.

We could do all that directly in production, but I'm proposing we take this opportunity to clean up the flamegraph stack in production. Part of it currently runs on hardware not really intended for its purpose. Once that is done, it should be easy to recreate the setup in Beta Cluster by using the same puppet profiles. Probably by adding a webperf VM to the Beta Cluster.

That would also give us the ability to try out puppet-level changes (e.g. to change Apache configuration or otherewise) in Beta before production.

This should also make it easier to experiment with the xhprof-based samplings, which we could leave running in Beta for a while to compare results side-by-side.

Details

Subject	Repo	Branch	Lines +/-
webperf: Remove arclamp subscriber from mwlog servers	operations/puppet	production	+3 -16
webperf: Switch webperf::site to use arclamp from webperf-2	operations/puppet	production	+10 -3
webperf: Split Redis from the rest of the arclamp profile	operations/puppet	production	+90 -61
webperf: Add arclamp profile to webperf::profiling_tools role	operations/puppet	production	+19 -5
webperf: Switch arclamp_host in Beta from mwlog host to webperf12	operations/puppet	production	+1 -1
webperf: Enable xenondata_host on perfsite in Beta Cluster	operations/puppet	production	+1 -0
webperf: Rename role::xenon to profile::webperf::xenon	operations/puppet	production	+9 -6
profiler: Enable xenon collection in labs (same as prod)	operations/mediawiki-config	master	+71 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	• dpifke	T158837 Consolidate performance website and related software
Resolved	Krinkle	T195312 Move flame graphs hosting from mwlog1001 to webperf-2 and enable in Beta Cluster
Resolved	Krinkle	T195314 Set up webperf-1 node in Beta Cluster

Event Timeline

Krinkle created this task.May 22 2018, 4:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 22 2018, 4:40 PM

Krinkle mentioned this in T180766: Make MediaWiki profiler in Beta match production.May 22 2018, 4:40 PM

Mentioned in SAL (#wikimedia-releng) [2018-05-22T16:53:28Z] <Krinkle> Created deployment-webperf instance (m1.small) - ref T195312

Krinkle mentioned this in T195314: Set up webperf-1 node in Beta Cluster.May 22 2018, 4:58 PM

Krinkle added a subtask: T195314: Set up webperf-1 node in Beta Cluster.

• Imarlier moved this task from Inbox, needs triage to To-do: Goals prioritized current Quarter on the Performance-Team board.May 29 2018, 8:48 PM

Krinkle added a parent task: T158837: Consolidate performance website and related software.Jun 26 2018, 10:14 PM

• Vvjjkkii renamed this task from Set up PHP flame graphs for Beta Cluster to zhcaaaaaaa.Jul 1 2018, 1:08 AM

• Vvjjkkii triaged this task as High priority.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from zhcaaaaaaa to Set up PHP flame graphs for Beta Cluster.Jul 2 2018, 3:58 PM

CommunityTechBot raised the priority of this task from High to Needs Triage.

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

Krinkle renamed this task from Set up PHP flame graphs for Beta Cluster to Move flame graphs hosting from mwlog1001 to webperf-2 and enable in Beta Cluster.Jul 3 2018, 3:48 AM

Mentioned in SAL (#wikimedia-releng) [2018-07-03T03:49:07Z] <Krinkle> Create deployment-webperf12 as equivalent of webperf1002/webperf2002 in prod (T195312, T194390)

Stashbot mentioned this in T194390: EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools.Jul 3 2018, 3:49 AM

Mentioned in SAL (#wikimedia-releng) [2018-07-03T04:31:21Z] <Krinkle> Setting up puppetmaster/cerf for deployment-webperf12 (T195312)

Krinkle triaged this task as High priority.Jul 3 2018, 10:55 PM

Krinkle closed subtask T195314: Set up webperf-1 node in Beta Cluster as Resolved.

Change 443757 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Rename role::xenon to profile::webperf::xenon

https://gerrit.wikimedia.org/r/443757

gerritbot added a project: Patch-For-Review.Jul 4 2018, 1:53 AM

Change 443760 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] profiler: Enable xenon collection in labs (same as prod)

https://gerrit.wikimedia.org/r/443760

Change 443760 merged by jenkins-bot:
[operations/mediawiki-config@master] profiler: Enable xenon collection in labs (same as prod)

https://gerrit.wikimedia.org/r/443760

Change 443762 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Change xenon interval for Beta Cluster from 10min to 30s

https://gerrit.wikimedia.org/r/443762

In T195312#4396645, @gerritbot wrote:

[operations/puppet@production] webperf: Rename role::xenon to profile::webperf::xenon
https://gerrit.wikimedia.org/r/443757

In T195312#4396677, @gerritbot wrote:

[operations/mediawiki-config@master] profiler: Enable xenon collection in labs (same as prod)
https://gerrit.wikimedia.org/r/443760

In T195312#4396695, @gerritbot wrote:

[operations/puppet@production] mediawiki: Change xenon interval for Beta Cluster from 10min to 30s
https://gerrit.wikimedia.org/r/443762

Now that these patches are merged and/or puppet-beta-cherry-pick'ed, we now have a pubsub going in Beta Cluster on its deployment-flourine02 host (equiv of mwlog1001) receiving data from HHVM on beta's app servers.

krinkle@deployment-fluorine02:~$ redis-cli subscribe xenon
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "xenon"
3) (integer) 1

1) "message"
2) "xenon"
3) "index.php;{GET};/srv/mediawiki/php-master/index.php;/srv/mediawiki/php-master/includes/WebStart.php;MediaWiki\\Ses...\\JCSingleton::init;JsonConfig\\JCSingleton::parseConfiguration 1"

1) "message"
2) "xenon"
3) "load.php;{GET};/srv/mediawiki/php-master/load.php;..;/srv/mediawiki/php-master/vendor/composer/autoload_real.php 1"
1) "message"

[..]

Change 443764 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Enable xenondata_host on perfsite in Beta Cluster

https://gerrit.wikimedia.org/r/443764

Change 444331 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Split Redis from the rest of the arclamp profile

https://gerrit.wikimedia.org/r/444331

Change 445066 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Add arclamp profile to webperf::profiling_tools role

https://gerrit.wikimedia.org/r/445066

Krinkle moved this task from To-do: Goals prioritized current Quarter to Doing (old) on the Performance-Team board.Jul 11 2018, 4:42 AM

Mentioned in SAL (#wikimedia-releng) [2018-07-17T01:36:42Z] <Krinkle> Applying role::webperf::profiling_tools class to webperf12 in Beta Cluster - T195312, T180761.

Stashbot mentioned this in T180761: Move XHGui from tungsten to xhgui-001.Jul 17 2018, 1:36 AM

Krinkle mentioned this in T199853: Increase webperf1002/webperf2002 space from 50GB to 150GB (Ganeti).Jul 18 2018, 2:47 AM

Mentioned in SAL (#wikimedia-releng) [2018-07-23T15:34:15Z] <Krinkle> Deleting deployment-webperf12 - T195312

Mentioned in SAL (#wikimedia-releng) [2018-07-23T15:34:55Z] <Krinkle> Creating deployment-webperf13 - T195312

Mentioned in SAL (#wikimedia-releng) [2018-07-23T15:57:09Z] <Krinkle> Set 'puppetmaster' Hiera for deployment-webperf13 / T195312

Mentioned in SAL (#wikimedia-releng) [2018-07-23T16:04:21Z] <Krinkle> Set up puppet cert stuff on deployment-webperf13 T195312

Mentioned in SAL (#wikimedia-releng) [2018-07-23T16:25:06Z] <Krinkle> Applying role::webperf::profiling_tools class to deployment-webperf13, T195312

Krinkle added a comment.Jul 23 2018, 4:33 PM

This comment was removed by Krinkle.

Mentioned in SAL (#wikimedia-releng) [2018-07-30T23:03:41Z] <Krinkle> Delete and recreate deployment-webperf13 (T195312 / T180761)

Mentioned in SAL (#wikimedia-releng) [2018-07-30T23:04:52Z] <Krinkle> Create instance deployment-webperf13 (deployment-webperf13 ) - T195312 / T180761

Mentioned in SAL (#wikimedia-releng) [2018-07-30T23:28:19Z] <Krinkle> Setting up puppet cert for deployment-webperf12; T195312 / T180761

Mentioned in SAL (#wikimedia-releng) [2018-07-31T17:50:49Z] <Krinkle> Apply role::webperf::profiling_tools to deployment-webperf12; T195312 / T180761

deployment-webperf12 in Beta is now working as an Arc Lamp host. I've confirmed it has xenon-log running, configured to subscribe to the Redis on deployment-fluorine02, storing data in /srv/xenon, and exposing it over HTTP at http://deployment-webperf12/xenon/.

Krinkle updated the task description. (Show Details)Aug 7 2018, 8:28 PM

Change 451107 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Switch arclamp_host in Beta from mwlog host to webperf13

https://gerrit.wikimedia.org/r/451107

In T195312#4486486, @gerritbot wrote:

[operations/puppet@production] webperf: Switch arclamp_host in Beta from mwlog host to webperf13
https://gerrit.wikimedia.org/r/451107

syslog for puppet-agent

Info: Caching catalog for deployment-webperf11.deployment-prep.eqiad.wmflabs
Info: Applying configuration version '1533674995'
Notice: /Stage[main]/Profile::Webperf::Site/Httpd::Site[performance-wikimedia-org]/Httpd::Conf[performance-wikimedia-org]/File[/etc/apache2/sites-available/50-performance-wikimedia-org.conf]/content: 
--- /etc/apache2/sites-available/50-performance-wikimedia-org.conf	2018-07-04 05:05:42.664611765 +0000
+++ /tmp/puppet-file20180807-1476-3aept6	2018-08-07 20:50:14.419982978 +0000
@@ -22,8 +22,8 @@
         Require all granted
     </Directory>
 
-    ProxyPass /xenon http://deployment-fluorine02.deployment-prep.eqiad.wmflabs/xenon
-    ProxyPassReverse /xenon http://deployment-fluorine02.deployment-prep.eqiad.wmflabs/xenon
+    ProxyPass /xenon http://deployment-webperf12.deployment-prep.eqiad.wmflabs/xenon
+    ProxyPassReverse /xenon http://deployment-webperf12.deployment-prep.eqiad.wmflabs/xenon
 
 
 </VirtualHost>

Info: Computing checksum on file /etc/apache2/sites-available/50-performance-wikimedia-org.conf
Info: /Stage[main]/Profile::Webperf::Site/Httpd::Site[performance-wikimedia-org]/Httpd::Conf[performance-wikimedia-org]/File[/etc/apache2/sites-available/50-performance-wikimedia-org.conf]: Filebucketed /etc/apache2/sites-available/50-performance-wikimedia-org.conf to puppet with sum e129f79e85bc324aa2c11235d49fd679
Notice: /Stage[main]/Profile::Webperf::Site/Httpd::Site[performance-wikimedia-org]/Httpd::Conf[performance-wikimedia-org]/File[/etc/apache2/sites-available/50-performance-wikimedia-org.conf]/content: content changed '{md5}e129f79e85bc324aa2c11235d49fd679' to '{md5}9434e76f7ea8f02e4bc14028537ebe6f'
Info: /Stage[main]/Profile::Webperf::Site/Httpd::Site[performance-wikimedia-org]/Httpd::Conf[performance-wikimedia-org]/File[/etc/apache2/sites-available/50-performance-wikimedia-org.conf]: Scheduling refresh of Service[apache2]
Notice: /Stage[main]/Httpd/Service[apache2]: Triggered 'refresh' from 1 events
Notice: Applied catalog in 9.47 seconds

Krinkle updated the task description. (Show Details)Aug 7 2018, 10:52 PM

Change 443757 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Rename role::xenon to profile::webperf::xenon

https://gerrit.wikimedia.org/r/443757

Change 443764 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Enable xenondata_host on perfsite in Beta Cluster

https://gerrit.wikimedia.org/r/443764

Krinkle claimed this task.Aug 12 2018, 12:50 PM

Krinkle moved this task from Doing (old) to Blocked (old) on the Performance-Team board.

Change 452449 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Switch webperf::site to use arclamp from webperf-2

https://gerrit.wikimedia.org/r/452449

Change 444331 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Split Redis from the rest of the arclamp profile

https://gerrit.wikimedia.org/r/444331

Change 445066 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Add arclamp profile to webperf::profiling_tools role

https://gerrit.wikimedia.org/r/445066

Change 451107 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Switch arclamp_host in Beta from mwlog host to webperf12

https://gerrit.wikimedia.org/r/451107

Change 452449 merged by Giuseppe Lavagetto:
[operations/puppet@production] webperf: Switch webperf::site to use arclamp from webperf-2

https://gerrit.wikimedia.org/r/452449

Copied xenon/logs/daily/2018-*{all,load,index,api,RunSingleJob}.log from mwlog1001 to webperfX002 hosts.

Then, in batches of a dozen files, copied them from my home directory to /srv/xenon/logs/daily as the xenon user, and ran sudo -u xenon /usr/local/bin/xenon-generate-svgs.

Krinkle moved this task from Blocked (old) to Doing (old) on the Performance-Team board.Aug 16 2018, 5:55 PM

Krinkle mentioned this in T176916: Set up sampling profiler for PHP 7 (alternative to HHVM Xenon).Oct 3 2018, 6:18 PM

Krinkle mentioned this in T196406: Decom "xhprof" viewer .Dec 22 2018, 12:34 AM

Change 503675 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] webperf: Remove arclamp subscriber from mwlog servers

https://gerrit.wikimedia.org/r/503675

Restricted Application added a subscriber: • Gilles. · View Herald TranscriptApr 13 2019, 10:04 PM

gerritbot added a project: Patch-For-Review.Apr 13 2019, 10:04 PM

Change 503675 merged by Effie Mouzeli:
[operations/puppet@production] webperf: Remove arclamp subscriber from mwlog servers