When moving graphite1001 to jessie I noticed the performance website is also hosted there, together with `coal`, `xenon` is instead hosted on fluorine (now mwlog1001), `xhgui` and its `mongodb` are on tungsten. Since planning for capex/opex next year is on, I'm proposing to consolidate all performance-related software onto VM or baremetal depending on required specs. Thoughts?
-------
### Sub tasks
* {icon check color=green} Write up understanding of the current (scattered) services. (Below: //Current services//)
* {icon check color=green} Figure out overall migration plan in terms of hardware. – (Below: //Proposed topology//)
* New stack:
* {icon check color=green} Request production VM for webperf (metrics processing). – T179036
* {icon check color=green} Migrate webperf from hafnium to webperf#001. – T186774
* {icon check color=green} Figure out how to migrate performance.wikimedia.org site.
* {icon check color=green} Migrate coal from graphite1001 to webperf#001. – T159354
* {icon check color=green} Request production VM for web apps (xhgui, xenon). - T194390
* {icon unlock color=orange} Migrate xhghui from tungsten to new VM.
* {icon lock color=red} Update wmf-config to write XWD profiles to new XHGUI location.
* {icon check color=green} Figure out how to migrate Xenon processing. (See "Moveability" below)
* {icon unlock color=orange} Move Xenon's ArcLamp and Apache-for-xenon from mwlog1001 to webperf1002.
* Old stack:
* {icon check color=green} Decom webperf/asset-check service. – T164419
* {icon check color=green} Decom webperf/ve service. – T175083
* {icon check color=green} Decom old xhprof viewer. – T196406
* {icon check color=green} Decom osmium.eqiad host. – T175093
* {icon unlock color=orange} Decom hafnium.eqiad host. – T193420
* {icon lock color=red} Decom tungsten.eqiad host. (Blocked - xhgui runs here)
### Current services
* performance.wikimedia.org: Static website. Proxied services: coal (local), xenon (mwlog1001), xhgui/xhprof (tungsten)
* Currently hosted on webperf1001.
* Code: [puppet:/role/performance/site](https://github.com/wikimedia/puppet/blob/57a55806d2/modules/role/manifests/performance/site.pp), [puppet:../apache/sites/performance.wikimedia.org](https://github.com/wikimedia/puppet/blob/57a55806d2/modules/role/templates/apache/sites/performance.wikimedia.org.erb)
* Code: [performance/docroot.git](https://github.com/wikimedia/performance-docroot)
* coal: EventLogging subscriber (Kafka) that processes Navigation Timing events and stores aggregated data into a dedicated Graphite backend).
* Currently on webperf1001.
* Code: [performance/coal](https://github.com/wikimedia/performance-coal)
* See {T159354}
* coal-web: Python http service that serves json-formatted data from the coal storage.
* Currently on webperf1001
* Uses the graphite HTTP API to pull raw data and to format that correctly. The http service is exposed via a private socket file, proxied from performance.wikimedia.org Apache config (currently on the same server).
* Code: [performance/coal](https://github.com/wikimedia/performance-coal)
* xenon: Receives data from all app servers on a Redis instance (see [operations/mediawiki-config.git:/StartProfiler](https://github.com/wikimedia/operations-mediawiki-config/blob/7d3c586359d4a59841ce9432a7e67628a406b176/wmf-config/StartProfiler.php#L75-L137)). The xenon-log python service subscribes to this Redis feeds and produces searchable log files. A cron (xenon-generate-svgs) periodically produces SVGs which are stored in a local directory and made available on performance.wikimedia.org through a local Apache proxy.
* Aside from the published SVGs (which don't need to be on mwlog1001), these log files are also manually searchable through a command-line tool `xenon-grep`.
* Currently on mwlog1001.
* Code: [puppet:/role/xenon](https://github.com/wikimedia/puppet/blob/57a55806d2/modules/role/manifests/xenon.pp), [puppet:/modules/xenon](https://github.com/wikimedia/puppet/blob/f05eed605a09eb47afd04d5217e6ace34768733c/modules/xenon/manifests/init.pp), [puppet:/files/xenon-log](https://github.com/wikimedia/puppet/blob/57a55806d2f64c35ec2e66a4d84f4e3787d79844/modules/xenon/files/xenon-log), [../apache/sites/xenon](https://github.com/wikimedia/puppet/blob/57a55806d2f64c35ec2e66a4d84f4e3787d79844/modules/role/templates/apache/sites/xenon.erb)
* xhgui: A webapp for viewing and analyzing PHP profiling data. [Wikimedia-Debug](https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug) requests can generate a profile that is submitted to the XHGui's MongoDB database. ([StartProfiler](https://github.com/wikimedia/operations-mediawiki-config/blob/7d3c586359d4a59841ce9432a7e67628a406b176/wmf-config/StartProfiler.php#L141-L200)).
* Currently on tungsten.
* Code: [puppet:/role/xhgui](https://github.com/wikimedia/puppet/blob/f05eed605a09eb47afd04d5217e6ace34768733c/modules/role/manifests/xhgui/app.pp).
* webperf: EventLogging subscriber deamons that send data to Statsd/Graphite. `webperf::statsv`, `webperf::ve`, `webperf::navtiming`.
* Currently on hafnium.
* Code: [puppet:/role/webperf](https://github.com/wikimedia/puppet/blob/7a5f538f63574af5040cdc5b671dec76561b84a2/modules/role/manifests/webperf.pp)
### Moveability
>>! In T158837#3349354, @Krinkle wrote:
>
> * performance.wikimedia.org:
> * Currently on graphite1001.
> * Should be trivial to move to another server.
> * coal and coal-web:
> * Currently on graphite1001.
> * Could be moved to a separate server if that separate server becomes a secondary graphite backend (like graphite1003). But I'd rather not make our perf/misc server a production graphite backend. So, unless we allocate two servers, probably makes sense to keep coal-web on graphite1001 for now.
> * xenon:
> * Currently on mwlog1001. 4 things: 1) A Redis server for incoming data from app servers, 2) Process to create text files from Redis data, 3) Process to create SVG files, 4) Apache to serve these files.
> * 2, 3 and 4 are easy to move to another server. The Redis server, and therefore the part that is interacted with from production I'd prefer to keep on mwlog1001.
> * xhgui:
> * Currently on tungsten. Requires MongoDB, PHP, Apache.
> * Easily kept or moved, if another server would become the perf server.
> * webperf
> * Currently on hafnium.
> * Easily moved.
### Old topology
Based on T158837#3368030:
**graphite1001**: (Our stuff is minor/secondary)
* (Our) services: coal, coal-web, perf-site.
**mwlog1001**: (Our stuff is minor/secondary)
* (Our) services: Redis (endpoint receiving xenon data), xenon-log (reads Redis, writes TXT), xenon-generate-svgs (reads TXT, writes SVG), Apache (serves TXT and SVG, proxied from perf-site)
**tungsten** (former db server; old, should be decom)
* Spec: 16 cores, 64G RAM, 40G and 1.6T HHD
* Services: XHGui (MongoDB, Apache)
**osmium** (former app server):
* Spec: 16 cores, 64G RAM, 2x 500G HDD
* Services: – (unused, previously: visualeditor, devwiki, jsbench)
**hafnium** (misc; old; should be decom/replaced):
* Spec: 24 cores, 32G RAM, 50G HHD
* Services: webperf (navtiming, statsv)
### New topology
Based on T158837#3582514:
**webperf x001** (Ganeti VM - multi-dc)
* Specs: 4 vCPU, 8GB RAM, 50GB HHD
* Services: webperf/processors_and_site (perf-site, coal::processor, coal::web, statsv, navtiming)
**webperf x002** (Ganeti VM - multi-dc)
* Specs: 4 vCPU, 8GB RAM, 50GB HHD
* Services: webperf/profiling_tools (xhgui, xenon-log, xenon-generate-svgs, Apache-for-xenon)