EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools
Closed, ResolvedPublic

Description

Labs Project Tested: N/A
Site/Location: EQIAD & CODFW
Number of systems: 1 VM per site
Service: xhprof, xhgui, etc
Networking Requirements: internal:

  • will need complete access to/from webperfX001 machines
  • should be accessible on port 80 from all internal machines
  • should be accessible on port 27017 from all internal machines

Processor Requirements: 4 VCPUs
Memory: 8GB
Disks: 50GB
Other Requirements:

I'm guessing it will make the most sense to label these webperf1002/webperf2002

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 10 2018, 12:34 PM
Dzahn claimed this task.May 11 2018, 2:52 AM

Change 433287 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] introduce webperf1002 & webperf2002

https://gerrit.wikimedia.org/r/433287

Change 433287 merged by Dzahn:
[operations/dns@master] introduce webperf1002 & webperf2002

https://gerrit.wikimedia.org/r/433287

assigned IPs to webperf1002 and webperf2002:

eqiad forward
webperf1001.eqiad.wmnet has address 10.64.0.215
webperf1002.eqiad.wmnet has address 10.64.0.216

eqiad reverse
215.0.64.10.in-addr.arpa domain name pointer webperf1001.eqiad.wmnet.
216.0.64.10.in-addr.arpa domain name pointer webperf1002.eqiad.wmnet.

codfw forward
webperf2001.codfw.wmnet has address 10.192.0.96
webperf2002.codfw.wmnet has address 10.192.0.100

codfe reverse
96.0.192.10.in-addr.arpa domain name pointer webperf2001.codfw.wmnet.
100.0.192.10.in-addr.arpa domain name pointer webperf2002.codfw.wmnet.

Mentioned in SAL (#wikimedia-operations) [2018-05-15T23:06:18Z] <mutante> creating ganeti VM webperf1002.eqiad.wmnet on ganeti1004 (link: private, row: A, cpus: 4, ram: 8, disk: 50) (T194390)

Mentioned in SAL (#wikimedia-operations) [2018-05-15T23:33:35Z] <mutante> creating ganeti VM webperf2002.eqiad.wmnet on ganeti2004 (link: private, row: A, cpus: 4, ram: 8, disk: 50) (T194390)

Change 433298 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add webperf1002/2002 as spare systems with IPv6

https://gerrit.wikimedia.org/r/433298

Change 433301 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] installserver: add webperf1002,webperf2002 to DHCP

https://gerrit.wikimedia.org/r/433301

Change 433301 merged by Dzahn:
[operations/puppet@production] installserver: add webperf1002,webperf2002 to DHCP

https://gerrit.wikimedia.org/r/433301

Change 433298 merged by Dzahn:
[operations/puppet@production] add webperf1002/2002 as spare systems with IPv6

https://gerrit.wikimedia.org/r/433298

Change 433575 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for webperf1002/webperf2002

https://gerrit.wikimedia.org/r/433575

Change 433575 merged by Dzahn:
[operations/dns@master] add IPv6 records for webperf1002/webperf2002

https://gerrit.wikimedia.org/r/433575

Change 433590 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] webperf: basic role for profiling tools, add ferm rules

https://gerrit.wikimedia.org/r/433590

Change 433590 merged by Dzahn:
[operations/puppet@production] webperf: add ferm rules and basic role/profile for xhgui

https://gerrit.wikimedia.org/r/433590

Dzahn added a comment.May 17 2018, 4:33 PM

Hi @Imarlier,

so.. see all the above:

  • VMs created with specs as requested (4vCPUS,8G RAM, 50G disk), one in each DC, same partman as webperf1001/2001
  • added to DNS, called webperf1002 and webperf2002, private networks, also have IPv6 records (with mapped addresses)
  • added to puppet / site.pp with firewalling, got all the standard packages etc
  • created new basic role/profile webperf::profiling_tools that opens the requested firewall holes and installs Apache:

should be accessible on port 80 from all internal machines
should be accessible on port 27017 from all internal machines

[webperf1002:~] $ sudo iptables -L | grep "dpt:http"
ACCEPT     tcp  --  10.0.0.0/8           anywhere             tcp dpt:http
[webperf1002:~] $ sudo iptables -L | grep "dpt:27017"
ACCEPT     tcp  --  10.0.0.0/8           anywhere             tcp dpt:27017

comment: so far i took "from all internal machines" literally even though the port 80 might just need "$CACHE_MISC" (only the varnish caching servers talking to backends)

  • added Apache with modules php7.0, authnz_ldap, rewrite (because i saw that xhgui on the old server needs that and uses: php5, authnz_ldap, rewrite)
  • aware of the existing "role::xhgui::app" but that requires php5 things and would have to be refactored to a profile anyways, so i just made this new one to start and as placeholder, expectation would be that the rest of that old role class moves over and xhprof and other profiling tools on this get their own profile each.. and these profiles all get included in the new role(webperf::profiling_host). There should be just this one role per node and the httpd class must be declared in there.

This leaves me with one question:

will need complete access to/from webperfX001 machines

What kind of access did you mean here? ssh? rsync of files?

That being said, i think this ticket is resolved, the VMs are ready to be used and the rest should continue on T158837 since this was technically just the request to create those, but i wanted to check all the boxes, including the ferm rules. Seems good?

Dzahn closed this task as Resolved.May 17 2018, 4:33 PM

Change 433738 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] webperf::profiling_tools: add webperf admin groups

https://gerrit.wikimedia.org/r/433738

Change 433738 merged by Dzahn:
[operations/puppet@production] webperf::profiling_tools: add perfteam admins

https://gerrit.wikimedia.org/r/433738

Vvjjkkii renamed this task from EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools to l7caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii triaged this task as High priority.
Vvjjkkii removed Dzahn as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Dzahn; removed: gerritbot, Aklapper.
CommunityTechBot assigned this task to Dzahn.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot closed this task as Resolved.
CommunityTechBot renamed this task from l7caaaaaaa to EQIAD & CODFW: 1 VM in each data center for xhprof/xhgui/other profiling tools.
CommunityTechBot edited subscribers, added: gerritbot, Aklapper; removed: Dzahn.

Mentioned in SAL (#wikimedia-releng) [2018-07-03T03:49:07Z] <Krinkle> Create deployment-webperf12 as equivalent of webperf1002/webperf2002 in prod (T195312, T194390)