eqiad: 1 VM request for doc.wikimedia.org
Closed, ResolvedPublic

Description

Labs Project Testedproduction migration (T137890)
Site/Locationeqiad
Number of systems1
Servicehttps://doc.wikimedia.org
Networking Requirementsssh/rsync from contint1001.wikimedia.org (208.80.154.17), HTTP for Varnish caches (text-lb?)
Processor Requirements4
Memory4GBytes
Disks150GBytes
Other RequirementsStretch for php7.0

https://doc.wikimedia.org/ hosts documentation and coverage report for several softwares. It is currently hosted on contint1001.wikimedia.org which also hosts Jenkins, Zuul and Docker for the deployment pipeline.

The content is generated by CI jobs on WMCS instances which rsync the artifacts to a proxy instance on WMCS: integration-publishing02. A job is then triggered on contint1001 to fetch the artifacts and copy them to Apache docroot effectively publishing them.

When overhauling the CI stack in 2016 we have identified the need to move doc.wikimedia.org to a different machine than the one running the CI stack (Jenkins, Zuul). Notably:

  • only doc.wikimedia.org requires php however the machine runs on Jessie and lacks php7.0. That breaks oojs demos and probably other ones T206046
  • although content is code-review +2 by project owners, the code is running on a production machine that has jenkins/zuul/docker which might be a security breach. It seems safer to have the code executed on a different machine.
  • whenever the CI machine is under maintenance, doc.wikimedia.org is no more available.

contint1001.wikimedia.org /srv/org/wikimedia/doc occupies 32GBytes. I have requested 150GBytes to accomodate for the operating system and potential future growth.

4CPU/4GBytes RAM sounds sufficient. There is not that many server side activities to be expected. Most of the content are static files.

We need it to be Stretch to get php7.0.

If at all possible, I would like the instance to be fairly isolated from the rest of the network (prevent outgress).

We would need bacula backup.

The envisioned flow:

hashar created this task.Dec 14 2018, 1:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 14 2018, 1:52 PM
hashar renamed this task from eqiad: 1 VM %request for doc.wikimedia.org to eqiad: 1 VM request for doc.wikimedia.org.
hashar updated the task description. (Show Details)
Dzahn added a subscriber: Dzahn.Dec 14 2018, 2:03 PM
Dzahn moved this task from Backlog to Doing on the serviceops board.
Dzahn claimed this task.Dec 17 2018, 6:32 PM
Dzahn triaged this task as Normal priority.

Change 480151 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] introduce doc1001.eqiad.wmnet, assign 10.64.0.142

https://gerrit.wikimedia.org/r/480151

Change 480151 merged by Dzahn:
[operations/dns@master] introduce doc1001.eqiad.wmnet, assign 10.64.0.142

https://gerrit.wikimedia.org/r/480151

Mentioned in SAL (#wikimedia-operations) [2018-12-17T20:17:53Z] <mutante> creating new ganeti VM doc1001.eqiad.wmnet for doc.wikimedia.org - specs as requested by hashar on T211974

Change 480247 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: add doc1001 to DHCP and netboot

https://gerrit.wikimedia.org/r/480247

Change 480247 merged by Dzahn:
[operations/puppet@production] install_server: add doc1001 to DHCP and netboot

https://gerrit.wikimedia.org/r/480247

Change 480254 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] create skeleton role for docs.wikimedia.org

https://gerrit.wikimedia.org/r/480254

Change 480254 merged by Dzahn:
[operations/puppet@production] create skeleton role for docs.wikimedia.org

https://gerrit.wikimedia.org/r/480254

Change 480261 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPv6 records for doc1001.eqiad.wmnet.

https://gerrit.wikimedia.org/r/480261

Change 480261 merged by Dzahn:
[operations/dns@master] add IPv6 records for doc1001.eqiad.wmnet.

https://gerrit.wikimedia.org/r/480261

Change 480270 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add httpd for doc.wm.org, using php-fpm, php7.2

https://gerrit.wikimedia.org/r/480270

Change 480270 merged by Dzahn:
[operations/puppet@production] add httpd for doc.wm.org, using php-fpm, php7.2

https://gerrit.wikimedia.org/r/480270

Change 480536 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] cache/trafficserver: switch doc.wikimedia.org to doc1001 backend

https://gerrit.wikimedia.org/r/480536

Change 480539 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] doc: add ensure /srv/org/wikimedia/doc exists and has backup

https://gerrit.wikimedia.org/r/480539

Change 480539 merged by Dzahn:
[operations/puppet@production] doc: ensure /srv/org/wikimedia/doc exists and has backup

https://gerrit.wikimedia.org/r/480539

Dzahn added a comment.EditedDec 18 2018, 5:37 PM

@hashar Here you go:

  • doc1001.eqiad.wmnet
  • stretch
  • php 7.2, not just 7.0
  • php-fpm, not mod_php anymore
  • /srv/org/wikimedia/doc/ exists, > 130G free
  • /srv/org/wikimedia in Bacula
  • ferm for HTTP from only CACHE servers
  • varnish / traffiserver patch pending
  • i was planning to also add "rsync from contint1001 to doc1001" but you explained to me it should push instead so skipped that

Change 480573 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] doc: add rsyncd config to let contint servers push docs

https://gerrit.wikimedia.org/r/480573

Change 480657 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Overhaul publishing

https://gerrit.wikimedia.org/r/480657

Change 480573 merged by Dzahn:
[operations/puppet@production] doc: add rsyncd config to let contint servers push docs

https://gerrit.wikimedia.org/r/480573

Mentioned in SAL (#wikimedia-operations) [2018-12-19T00:11:18Z] <mutante> contint1001 - rsyncing /srv/org/wikimedia/docs to rsync://docs1001.eqiad.wmnet/docs T211974

With Gerrit #480573 doc1001.eqiad.wmnet now has rsyncd with a doc module. Tested by @Dzahn :

[contint1001:~] $ rsync -avp ./foo/ rsync://doc1001.eqiad.wmnet/doc
sending incremental file list
./
test

sent 125 bytes  received 46 bytes  342.00 bytes/sec
total size is 0  speedup is 0.00


   0 -rw-r--r-- 1 doc-uploader wikidev    0 Dec 19 00:05 test
root@doc1001:/srv/org/wikimedia/doc#

He rsynced the entire data from contint1001 with rsync -avp /srv/org/wikimedia/doc/ rsync://doc1001.eqiad.wmnet/doc.

Change 480715 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] doc: set cluster and notification groups

https://gerrit.wikimedia.org/r/480715

Change 480716 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] doc: grant doc-uploader access to contint users

https://gerrit.wikimedia.org/r/480716

Change 480715 merged by Dzahn:
[operations/puppet@production] doc: set cluster and notification groups

https://gerrit.wikimedia.org/r/480715

The VM is working and the basic service is there ( rsyncd ). I will complete the service implementation via the parent task T137890.

What is left to be done is to grant us some kind of shell access https://gerrit.wikimedia.org/r/480716

Might want to verify that the https header is properly supported :)

Change 480798 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] doc: grant doc-uploader access to contint users

https://gerrit.wikimedia.org/r/480798

Change 480716 merged by Dzahn:
[operations/puppet@production] doc: grant access to contint-admins to doc1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/480716

Change 480802 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] doc: add Apache config for doc.wikimedia.org

https://gerrit.wikimedia.org/r/480802

Change 480802 merged by Dzahn:
[operations/puppet@production] doc: add Apache config for doc.wikimedia.org

https://gerrit.wikimedia.org/r/480802

Dzahn added a comment.Dec 19 2018, 6:43 PM

shell access for existing groups contint-admins and contint-users has been granted (same access people had before on contint*)

Dzahn closed this task as Resolved.Dec 19 2018, 7:46 PM

Yes, the VM has been created, basic role has been created, users added, httpd installed, data rsynced .. and per Hashar in T211974#4834808 i am going to call this part resolved here and also continue on ---> T137890.

Thank you for the quick spinning of the instance as well as all the preliminary puppet work. Much appreciated :)