Page MenuHomePhabricator

MediaWiki deploy servers should not be mediawiki installation targets
Closed, ResolvedPublic

Description

Right now the MediaWiki deploy servers (e.g., deploy1002) are included in /etc/dsh/group/mediawiki-installation. This means deploy servers have both /srv/mediawiki-staging and /srv/mediawiki directories. This can result in confusion of the type described in T253547.

Proposal:

  • Exclude the deploy servers from /etc/dsh/group/mediawiki-installation.
  • On deploy servers: Move existing the /srv/mediawiki directory out of the way and make it a symlink to /srv/mediawiki-staging

Event Timeline

Krinkle renamed this task from Mediawiki deploy servers should not be mediawiki installation targets to MediaWiki deploy servers should not be mediawiki installation targets.Feb 16 2023, 6:13 PM
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Limbo to Perf recommendation on the Performance-Team (Radar) board.

Mentioned in SAL (#wikimedia-operations) [2023-03-13T15:21:19Z] <dancy@deploy2002> Started scap: testing T329857

Mentioned in SAL (#wikimedia-operations) [2023-03-13T15:31:28Z] <dancy@deploy2002> Finished scap: testing T329857 (duration: 10m 08s)

dancy opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/103

Exclude deploy servers from target list if /srv/mediawiki is a symlink

jnuche merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/103

Exclude deploy servers from target list if /srv/mediawiki is a symlink

Mentioned in SAL (#wikimedia-operations) [2023-03-13T17:15:27Z] <dancy@deploy2002> Started scap: testing T329857

Mentioned in SAL (#wikimedia-operations) [2023-03-13T17:22:22Z] <dancy@deploy2002> Finished scap: testing T329857 (duration: 06m 54s)

Change 901667 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Add setting to make /srv/mediawiki -> /srv/mediawiki-staging on deploy servers

https://gerrit.wikimedia.org/r/901667

Change 901676 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Experiment: Remove deploy1002/deploy2002 from mediawiki-installation dsh group

https://gerrit.wikimedia.org/r/901676

Change 901667 merged by Clément Goubert:

[operations/puppet@production] Add setting to make /srv/mediawiki -> /srv/mediawiki-staging on deploy servers

https://gerrit.wikimedia.org/r/901667

Mentioned in SAL (#wikimedia-operations) [2023-04-03T13:50:54Z] <claime> Testing deploy server dsh group inclusion - T329857

Change 901676 merged by Clément Goubert:

[operations/puppet@production] Experiment: Remove deploy1002/deploy2002 from mediawiki-installation dsh group

https://gerrit.wikimedia.org/r/901676

deploy2002:

Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]/content: 
--- /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl     2023-03-20 20:29:22.934389074 +0000
+++ /tmp/puppet-file20230403-32678-1okp5h1      2023-04-03 13:53:13.660934333 +0000
@@ -7,8 +7,6 @@
 cloudweb1003.wikimedia.org
 cloudweb1004.wikimedia.org
 cloudweb2002-dev.wikimedia.org
-deploy1002.eqiad.wmnet
-deploy2002.codfw.wmnet
 mwmaint1002.eqiad.wmnet
 mwmaint2002.codfw.wmnet
 scandium.eqiad.wmnet

Info: Computing checksum on file /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl
Info: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]: Filebucketed /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl to puppet with sum ab57d93fdbf1598fb7078912cf2b272a
Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]/content: content changed '{md5}ab57d93fdbf1598fb7078912cf2b272a' to '{md5}4922782aad3f36dc2d0f2fe62372e2a3'
Info: Confd::File[/etc/dsh/group/mediawiki-installation]: Scheduling refresh of Service[confd]
Notice: /Stage[main]/Confd/Base::Service_unit[confd]/Service[confd]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 40.76 seconds
cgoubert@deploy2002:~$ grep deploy /etc/dsh/group/mediawiki-installation

deploy1002:

cgoubert@deploy1002:~$ grep deploy /etc/dsh/group/mediawiki-installation

Test done, patch reverted.

deploy2002:

Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_ds
h_group_mediawiki-installation.tmpl]/content:
--- /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl     2023-04-03 13:53:13.700934508 +0000
+++ /tmp/puppet-file20230403-5284-14h3zc8       2023-04-03 13:58:12.614235535 +0000
@@ -7,6 +7,8 @@
 cloudweb1003.wikimedia.org
 cloudweb1004.wikimedia.org
 cloudweb2002-dev.wikimedia.org
+deploy1002.eqiad.wmnet
+deploy2002.codfw.wmnet
 mwmaint1002.eqiad.wmnet
 mwmaint2002.codfw.wmnet
 scandium.eqiad.wmnet

Info: Computing checksum on file /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl
Info: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_
group_mediawiki-installation.tmpl]: Filebucketed /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl to puppet with sum 4922782aad3f36dc2d0f2fe6
2372e2a3
Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_ds
h_group_mediawiki-installation.tmpl]/content: content changed '{md5}4922782aad3f36dc2d0f2fe62372e2a3' to '{md5}ab57d93fdbf1598fb7078912cf2b272a'
Info: Confd::File[/etc/dsh/group/mediawiki-installation]: Scheduling refresh of Service[confd]
Notice: /Stage[main]/Confd/Base::Service_unit[confd]/Service[confd]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 42.01 seconds
cgoubert@deploy2002:~$ grep deploy /etc/dsh/group/mediawiki-installation                                                                                   
deploy1002.eqiad.wmnet
deploy2002.codfw.wmnet

deploy1002:

Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]/content: 
--- /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl     2023-04-03 13:53:43.292813651 +0000
+++ /tmp/puppet-file20230403-17648-15z6t1l      2023-04-03 13:58:27.258069371 +0000
@@ -7,6 +7,8 @@
 cloudweb1003.wikimedia.org
 cloudweb1004.wikimedia.org
 cloudweb2002-dev.wikimedia.org
+deploy1002.eqiad.wmnet
+deploy2002.codfw.wmnet
 mwmaint1002.eqiad.wmnet
 mwmaint2002.codfw.wmnet
 scandium.eqiad.wmnet

Info: Computing checksum on file /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl
Info: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]: Filebucketed /etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl to puppet with sum 4922782aad3f36dc2d0f2fe62372e2a3
Notice: /Stage[main]/Scap::Dsh/Scap::Dsh::Group[mediawiki-installation]/Confd::File[/etc/dsh/group/mediawiki-installation]/File[/etc/confd/templates/_etc_dsh_group_mediawiki-installation.tmpl]/content: content changed '{md5}4922782aad3f36dc2d0f2fe62372e2a3' to '{md5}ab57d93fdbf1598fb7078912cf2b272a'
Info: Confd::File[/etc/dsh/group/mediawiki-installation]: Scheduling refresh of Service[confd]
Notice: /Stage[main]/Confd/Base::Service_unit[confd]/Service[confd]: Triggered 'refresh' from 1 event
Notice: Applied catalog in 46.29 seconds
cgoubert@deploy1002:~$ grep deploy /etc/dsh/group/mediawiki-installation
deploy1002.eqiad.wmnet
deploy2002.codfw.wmnet

Change 905297 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] beta: Enable /srv/mediawiki symlink on deployment-deploy03

https://gerrit.wikimedia.org/r/905297

Change 905304 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] mediawiki::scap: Ensure Exec['fetch_mediawiki'] resource always exists

https://gerrit.wikimedia.org/r/905304

Change 905304 merged by Clément Goubert:

[operations/puppet@production] mediawiki::scap: Ensure Exec['fetch_mediawiki'] resource always exists

https://gerrit.wikimedia.org/r/905304

Change 905297 merged by Clément Goubert:

[operations/puppet@production] beta: Enable /srv/mediawiki symlink on deployment-deploy03

https://gerrit.wikimedia.org/r/905297

Change 906051 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] mediawiki::scap: force creation of the symlink when enabled

https://gerrit.wikimedia.org/r/906051

Change 906051 merged by Clément Goubert:

[operations/puppet@production] mediawiki::scap: force creation of the symlink when enabled

https://gerrit.wikimedia.org/r/906051

dancy triaged this task as Medium priority.Apr 5 2023, 3:52 PM

Changes have been deployed to beta cluster.

Change 905983 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Revert "mediawiki::scap: force creation of the symlink when enabled"

https://gerrit.wikimedia.org/r/905983

Change 905983 merged by Clément Goubert:

[operations/puppet@production] Revert "mediawiki::scap: force creation of the symlink when enabled"

https://gerrit.wikimedia.org/r/905983

@dancy I've tested this in the Beta Cluster as follows, on the deployment-deploy03 host:

krinkle@deployment-deploy03:~$ PHP='php -d auto_prepend_file=/srv/mediawiki/wmf-config/PhpAutoPrepend.php' mwscript showJobs.php --wiki testwiki --profiler text
0
<!--
100.00% 130.019      1 - main()
  ..
  5.72% 7.439      1 - ShowJobs::execute
  2.20% 2.856      1 - Maintenance::shutdown
  0.80% 1.039     99 - JobQueue::__construct
-->

And it 1) doesn't crash, and 2) given this host doubles as both mwdeploy and mwdebug role also produces profiling output. The same in production, currently crashes on deploy2002 as per T253547#8576045. After this change, I expect that it will not crash in production and (only mwdebug) produce profiling output.

For context, the -d parameter is a php.ini override that we're looking to puppetize by default on all MW servers (and indirectly also on the deploy host). Hence it's important that it not crash, but only needs to produce profiling output on mwdebug where php-tideways is installed. We already puppetize this INI setting for all web-based php-fpm contexts. What we're adding is puppetizing it for php-cli as well. Hence it will affect the deployment server and Scap's use of maintenance scripts.

Long story short: LGTM, achieves the intended effect from my POV!

Great news @Krinkle! I will advance this change to production.

Change 909302 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Enable /srv/mediawiki symlink on prod deploy servers

https://gerrit.wikimedia.org/r/909302

Mentioned in SAL (#wikimedia-operations) [2023-04-24T08:09:34Z] <claime> Deploying 909302 on deploy1002 for T329857

Mentioned in SAL (#wikimedia-operations) [2023-04-24T08:10:29Z] <claime> Disabling puppet on deploy2002 - T329857

Change 909302 merged by Clément Goubert:

[operations/puppet@production] Enable /srv/mediawiki symlink on prod deploy servers

https://gerrit.wikimedia.org/r/909302

Mentioned in SAL (#wikimedia-operations) [2023-04-24T08:14:23Z] <claime> Deploying 909302 on deploy2002 for T329857

cgoubert@deploy1002:~$ PHP='php -d auto_prepend_file=/srv/mediawiki/wmf-config/PhpAutoPrepend.php' \
>    mwscript showJobs.php --wiki testwiki --profiler text
0
cgoubert@deploy1002:~$ ls -l /srv/mediawiki
lrwxrwxrwx 1 root root 22 Apr 24 08:13 /srv/mediawiki -> /srv/mediawiki-staging
cgoubert@deploy2002:~$ PHP='php -d auto_prepend_file=/srv/mediawiki/wmf-config/PhpAutoPrepend.php' \
>    mwscript showJobs.php --wiki testwiki --profiler text
0
cgoubert@deploy2002:~$ ls -l /srv/mediawiki
lrwxrwxrwx 1 root root 22 Apr 24 08:15 /srv/mediawiki -> /srv/mediawiki-staging

Mentioned in SAL (#wikimedia-operations) [2023-04-24T08:18:16Z] <cgoubert@deploy2002> Started scap: testing T329857

Mentioned in SAL (#wikimedia-operations) [2023-04-24T08:32:45Z] <cgoubert@deploy2002> Finished scap: testing T329857 (duration: 14m 29s)

Everything looks good from my side, tell me when you have a chance to check and I can remove the backup dirs.

Change 911259 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] P:mediawiki::common: Remove deploy check_dsh_group

https://gerrit.wikimedia.org/r/911259

Change 911259 merged by Clément Goubert:

[operations/puppet@production] P:mediawiki::common: Remove deploy check_dsh_group

https://gerrit.wikimedia.org/r/911259

dancy opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/128

Revert "Exclude deploy servers from target list if /srv/mediawiki is a symlink"

dancy merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/128

Revert "Exclude deploy servers from target list if /srv/mediawiki is a symlink"

@Clement_Goubert I noticed the /srv/mediawiki.old.20230424.T329857 directory on deploy1002.eqiad.wmnet today. It's safe to delete.

@Clement_Goubert I noticed the /srv/mediawiki.old.20230424.T329857 directory on deploy1002.eqiad.wmnet today. It's safe to delete.

Done.