When moving graphite1001 to jessie I noticed the performance website is also hosted there, together with coal, xenon is instead hosted on fluorine (now mwlog1001), xhgui and its mongodb are on tungsten. Since planning for capex/opex next year is on, I'm proposing to consolidate all performance-related software onto VM or baremetal depending on required specs. Thoughts?
Sub tasks
- Write up understanding of the current (scattered) services. (Below: Current services)
- Figure out overall migration plan in terms of hardware. – (Below: Proposed topology)
- New stack:
- Request production VM for webperf (metrics processing). – T179036
- Migrate webperf from hafnium to webperf#001. – T186774
- Figure out how to migrate performance.wikimedia.org site.
- Migrate coal from graphite1001 to webperf#001. – T159354
- Request production VM for web apps (xhgui, xenon). - T194390
- make xhgui::app role support stretch/buster and deploy on new xhgui machines - T238788
- Migrate xhghui from tungsten to new VM. - VMs created in T238098
- Update wmf-config to write XWD profiles to new XHGUI location.
- Figure out how to migrate Xenon processing. (See "Moveability" below)
- Move Xenon's ArcLamp and Apache-for-xenon from mwlog1001 to webperf1002.
- Old stack:
Current services
- performance.wikimedia.org: Static website. Proxied services: coal (local), xenon (mwlog1001), xhgui/xhprof (tungsten)
- Currently hosted on webperf1001.
- Code: puppet:/role/performance/site, puppet:../apache/sites/performance.wikimedia.org
- Code: performance/docroot.git
- coal: EventLogging subscriber (Kafka) that processes Navigation Timing events and stores aggregated data into a dedicated Graphite backend).
- Currently on webperf1001.
- Code: performance/coal
- See T159354: Move coal from graphite#001 nodes to webperf#001
- coal-web: Python http service that serves json-formatted data from the coal storage.
- Currently on webperf1001
- Uses the graphite HTTP API to pull raw data and to format that correctly. The http service is exposed via a private socket file, proxied from performance.wikimedia.org Apache config (currently on the same server).
- Code: performance/coal
- xenon: Receives data from all app servers on a Redis instance (see operations/mediawiki-config.git:/StartProfiler). The xenon-log python service subscribes to this Redis feeds and produces searchable log files. A cron (xenon-generate-svgs) periodically produces SVGs which are stored in a local directory and made available on performance.wikimedia.org through a local Apache proxy.
- Aside from the published SVGs (which don't need to be on mwlog1001), these log files are also manually searchable through a command-line tool xenon-grep.
- Currently on mwlog1001.
- Code: puppet:/role/xenon, puppet:/modules/xenon, puppet:/files/xenon-log, ../apache/sites/xenon
- xhgui: A webapp for viewing and analyzing PHP profiling data. Wikimedia-Debug requests can generate a profile that is submitted to the XHGui's MongoDB database. (StartProfiler).
- Currently on tungsten.
- Code: puppet:/role/xhgui.
- webperf: EventLogging subscriber deamons that send data to Statsd/Graphite. webperf::statsv, webperf::ve, webperf::navtiming.
- Currently on hafnium.
- Code: puppet:/role/webperf
Moveability
Old topology
Based on T158837#3368030:
graphite1001: (Our stuff is minor/secondary)
- (Our) services: coal, coal-web, perf-site.
mwlog1001: (Our stuff is minor/secondary)
- (Our) services: Redis (endpoint receiving xenon data), xenon-log (reads Redis, writes TXT), xenon-generate-svgs (reads TXT, writes SVG), Apache (serves TXT and SVG, proxied from perf-site)
tungsten (former db server; old, should be decom)
- Spec: 16 cores, 64G RAM, 40G and 1.6T HHD
- Services: XHGui (MongoDB, Apache)
osmium (former app server):
- Spec: 16 cores, 64G RAM, 2x 500G HDD
- Services: – (unused, previously: visualeditor, devwiki, jsbench)
hafnium (misc; old; should be decom/replaced):
- Spec: 24 cores, 32G RAM, 50G HHD
- Services: webperf (navtiming, statsv)
New topology
Based on T158837#3582514:
webperf x001 (Ganeti VM - multi-dc)
- Specs: 4 vCPU, 8GB RAM, 50GB HHD
- Services: webperf/processors_and_site (perf-site, coal::processor, coal::web, statsv, navtiming)
webperf x002 (Ganeti VM - multi-dc)
- Specs: 4 vCPU, 8GB RAM, 50GB HHD
- Services: webperf/profiling_tools (xhgui, arc-lamp, Apache for arc-lamp)