Page MenuHomePhabricator

Create a production test wiki in group0 to parallel Wikimedia Commons
Closed, ResolvedPublic

Description

The Structured Data on Commons (SDC) project will change the setup of Commons considerably. Most importantly, Commons will be the only wiki in the wmf cluster running the WikibaseMediaInfo extension, and it will be the only wiki that is a Wikibase repo and also a client to another Wikibase repo.

  • Domain: test-commons.wikimedia.org
  • Database: testcommonswiki
  • Config: as similar to commonswiki as possible

Having a test-commons.wikimedia.org in deployment group 0 on the live cluster will allow us to do two things:

  1. try out the new configuration, before deploying it to commons. This also includes testing the MCR schema migration, before enabling the extension.
  2. later, once SDC is deployed on commons, having test-commons.wikimedia.org will ensure that weekly deployments won't break the unique setup on commons.

Related Objects

StatusSubtypeAssignedTask
InvalidNone
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
OpenNone
Resolveddaniel
Resolved Mholloway
Resolved Mholloway
ResolvedNone
ResolvedNone
Resolved Ramsey-WMF
Resolveddaniel
Resolveddaniel
InvalidTgr
Resolveddaniel
ResolvedTgr
ResolvedTgr
ResolvedTgr
Resolved Bstorm
ResolvedCCicalese_WMF
ResolvedCparle
Resolvedmatthiasmullie
Resolvedegardner
ResolvedCparle
Resolvedegardner
Resolvedmatthiasmullie
ResolvedCparle
ResolvedCparle
OpenNone
ResolvedCparle
ResolvedJdforrester-WMF
InvalidNone
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedNone
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
DuplicateNone
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
ResolvedPRODUCTION ERRORNone
ResolvedCparle
ResolvedJdforrester-WMF
ResolvedCparle
Resolvedmatthiasmullie
Resolvedmatthiasmullie
ResolvedCparle
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatthiasmullie
ResolvedCparle
Resolvedegardner
DeclinedNone
DeclinedNone
Resolvedmatthiasmullie
Resolvedegardner
Resolved Ramsey-WMF
ResolvedEdtadros
ResolvedEdtadros
ResolvedEdtadros
Resolved Ha78na
ResolvedNone
ResolvedCparle
ResolvedCparle
ResolvedCparle
ResolvedCparle
ResolvedCparle
ResolvedCparle
OpenCparle
ResolvedCparle
ResolvedCparle
OpenNone
DuplicateCparle
ResolvedCparle
ResolvedCparle
InvalidNone
Resolvedmatthiasmullie
DeclinedNone
Resolvedmatthiasmullie
ResolvedEBernhardson
Resolved Ramsey-WMF
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatthiasmullie
Resolvedmatthiasmullie
ResolvedCparle
OpenNone
Declinedmatthiasmullie
Resolvedmatthiasmullie
OpenNone
Invalidmatthiasmullie
Resolved Ramsey-WMF
ResolvedCparle
Resolved Ramsey-WMF
Resolvedthiemowmde
ResolvedMarkTraceur
Resolvedmatthiasmullie
Resolvedmatthiasmullie
ResolvedCparle
DeclinedCparle
DeclinedCparle
DeclinedCparle
ResolvedCparle
ResolvedEdtadros
InvalidNone
Resolved Ramsey-WMF
InvalidNone
OpenNone
OpenNone
ResolvedCparle
Resolved Ramsey-WMF
ResolvedCCicalese_WMF
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So, summary: given those limitations of the production testwiki setup (mostly lack of real user traffic) what is the plan for testing against this wiki and could it also (or instead of) be done in Beta Cluster?

There is no concrete plan for testing against this wiki. Ideally, we'd have a bunch of browser tests running against all the wikis on the beta cluster. Ideally, we'd have browser tests covering the SDC functionality there. That's up to Multimedia though.

Agreed, let us know if you/whoever needs a hand at getting started on those. I *think* there's some knowledge within the Multimedia team on browser testing already.

My intention behind requesting this to be created was to have some kind of early warning in case some deploy later could break the special functionality on commons. But the decision of whether this kind of safeguard is needed is really up to the Release-Engineering-Team. If you say beta is enough, that's fine with me. For MCR, this is not needed. Perhaps consult with @Abit and @Ramsey-WMF before making a call on this.

Mostly it's: If there's testing being done then it should be done in Beta initially and we'll see if production testing is needed on top of that. Most times it isn't if the Beta wiki is set up the same (we all remember situations where it wasn't quite the same :) ).

Do you believe that the test wikis on the live cluster should be removed [..]

Yes and no.

Yes, I'd prefer we not see test-wikis as ways to catch regressions during normal deployments, because almost none of our procedures (apart from train/group0) include test wikis. And for train/group0 we have other wikis that serve this purpose better, e.g. www.mediawiki.org. I don't see this as a problem because test wikis are in production, that's too late for testing. At this point, we might as well perform QA against an actual production wiki that doesn't have different or dated configuration.

No, I'd prefer we keep at least test.wikipedia and test2.wikipedia, for testing with multiversion. This is why, test2.wikipedia is on group2 (or group1?) instead of group0. Although I do think that in the long-term we should phase this out. Especially now that we have a Beta Cluster and X-Wikimedia-Debug. Beta can be used to enable upcoming features before their deployment in prod, and XWD can be used to verify changes in prod during a deployment.

@MarkTraceur , any movement on this after you talked to Greg? Anything I can do to help?

After speaking with @MarkTraceur we agreed to setup Beta Cluster in such a way to mirror what you'll be doing in production. Happy to help think that through.

Thanks all.

After speaking with @MarkTraceur we agreed to setup Beta Cluster in such a way to mirror what you'll be doing in production. Happy to help think that through.

Is there a task for this? I'd be interested to read what the plan is to have a Beta Cluster wiki running fundamentally different code months in advance of production.

After speaking with @MarkTraceur we agreed to setup Beta Cluster in such a way to mirror what you'll be doing in production. Happy to help think that through.

And now, after speaking with @Jdforrester-WMF and @MarkTraceur and others and learning more about the situation (the multiple config changes being interdependent in weird ways for SDC, MCR, etc) I am un-declining this request.

The agreed upon plan is to create this test wiki the week of January 2nd (probably on that date, a Wednesday) to give time to test before rolling out the week after.

James also plans to remove this newly created wiki by the end of this fiscal (eg: end of June 2019).

Change 481795 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/dns@master] Add testcommons.wikimedia.org

https://gerrit.wikimedia.org/r/481795

Change 481796 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/puppet@production] Add testcommons.wikimedia.org to prod_sites.pp

https://gerrit.wikimedia.org/r/481796

Change 482019 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] Initial configuration for test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482019

Change 481795 merged by Effie Mouzeli:
[operations/dns@master] Add test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/481795

Mentioned in SAL (#wikimedia-operations) [2019-01-03T15:31:01Z] <jijiki> Disabling puppet on mw servers to test 481796 - T197616

Change 481796 merged by Effie Mouzeli:
[operations/puppet@production] Add test-commons.wikimedia.org to prod_sites.pp

https://gerrit.wikimedia.org/r/481796

Mentioned in SAL (#wikimedia-operations) [2019-01-03T15:43:19Z] <jijiki> Enabled puppet on mw servers after merging 481796 - T197616

Status?

https://test-commons.wikimedia.org/ is live (DNS and apache are up)

James looks to have made the mw-config patch. Just addWiki.php to run at this point

He was trying to nerd snipe me into creating the wiki, which is when I made the dns and the apache patches

I don't mind running addWiki.php (though, looks like James has a ? over whether we put it on s4 like commons, or just s3 - question for DBA I guess) to finish it off

Obviously then some followup config will be needed to make it into a media repo properly, and make other wikis into using test-commons as their media repo

CC'ing DBA for them to confirm whether we should use s3 or s4

Change 482019 merged by jenkins-bot:
[operations/mediawiki-config@master] Initial configuration for test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482019

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:09:31Z] <reedy@deploy1001> Synchronized dblists/: T197616 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:11:08Z] <reedy@deploy1001> rebuilt and synchronized wikiversions files: T197616

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:12:25Z] <reedy@deploy1001> Synchronized multiversion/MWMultiVersion.php: T197616 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:13:37Z] <reedy@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T197616 (duration: 00m 44s)

Change 482134 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/mediawiki-config@master] Remove excess wiki suffix

https://gerrit.wikimedia.org/r/482134

Change 482134 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove excess wiki suffix

https://gerrit.wikimedia.org/r/482134

Change 482135 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/mediawiki-config@master] Add testcommonswiki to db-*.php

https://gerrit.wikimedia.org/r/482135

Change 482135 merged by jenkins-bot:
[operations/mediawiki-config@master] Add testcommonswiki to db-*.php

https://gerrit.wikimedia.org/r/482135

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:23:31Z] <reedy@deploy1001> Synchronized wmf-config/db-eqiad.php: T197616 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2019-01-03T20:24:29Z] <reedy@deploy1001> Synchronized wmf-config/db-codfw.php: T197616 (duration: 00m 44s)

Change 482139 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/mediawiki-config@master] Set $wgMultiContentRevisionSchemaMigrationStage = SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD

https://gerrit.wikimedia.org/r/482139

Change 482257 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/services/parsoid@master] Add test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482257

Change 482274 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Config: Add test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482274

Status?

https://test-commons.wikimedia.org/ is live (DNS and apache are up)

James looks to have made the mw-config patch. Just addWiki.php to run at this point

He was trying to nerd snipe me into creating the wiki, which is when I made the dns and the apache patches

I don't mind running addWiki.php (though, looks like James has a ? over whether we put it on s4 like commons, or just s3 - question for DBA I guess) to finish it off

Obviously then some followup config will be needed to make it into a media repo properly, and make other wikis into using test-commons as their media repo

I guess s3 should be the place as there we already have there:

testwiki
test2wiki
testwikidatawiki

Change 482257 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482257

I agree with Manuel T197616, I would have preferred creating it on s3 for isolation reasons- enwiki, commons and wikidata require more resources than the typical high-throughput project and they were on purpose set on dedicated hardware. I understand that you want a setup as similar as possible as the actual commonswiki, but from our point of view, s0 deployments are the ones more likely to create outages, and the above wikis, plus metawiki and centralauth are on purpose separate from group0 ones to minimize impact. Also the above 3 wikis have a large amount of hardware behind them, which makes testcommonswiki overprovisioned in some aspects.

Based on comments that removing the wiki is going to happen, I won't ask to move it, but please not removing wikis is also not a trivial task due to out of band changes (ES, wikidata, etc.). I would recommend strongly to at the very least move it to s3 (or a potential future s0) before archival when that happens. [*s0 is the idea (not yet firm) proposal of an extra group for closed and other low bandwidth wikis to avoid taking resources of active ones]

The main issue I want to bring up here was I think a lack of communication, I (and other DBAs) just learned about this when we got an alert of "data on labsdbs that shouldn't be there" (the new wiki).

Based on comments that removing the wiki is going to happen, I won't ask to move it, but please not removing wikis is also not a trivial task due to out of band changes (ES, wikidata, etc.). I would recommend strongly to at the very least move it to s3 (or a potential future s0) before archival when that happens.

I'm happy to do the work to move it to s3 right now. We could just drop all the tables in s4 and create as-new in s3, as the wiki doesn't have any content yet.

Sorry about this.

I wonder why was this created in s4 if we were asked if we preferred s3 and I replied a day after that question that s3 would be the best place from our point of view?

I've been told there was some breakage based on assuming s4 ==> commons, or commons ==> s4. I am not too worried, as I said, about a temporary project, but the assumption of that on code or configuration is worrying, as it would not be unthinkable we move commonswiki in the future to separate group.

I wonder why was this created in s4 if we were asked if we preferred s3 and I replied a day after that question that s3 would be the best place from our point of view?

Question was asked at ~19:00 UTC on 2019-01-03, you replied at ~15:00 on 2019-01-04, but it was created an hour after you were asked at ~20:15 UTC on 2019-01-03.

I wonder why was this created in s4 if we were asked if we preferred s3 and I replied a day after that question that s3 would be the best place from our point of view?

It was already created by that time

Partially, but not completely also T213096: Fix places where mw-config assumes "s4" means Commons

Waiting for one hour only is not realistic, specially if asked at 19:00 UTC.
The 4th I was on holidays (I am still on holidays) but I replied to it as I didn't want to block this so long. So we replied in less than 24h, but I didn't notice it was already done.

Change 482139 abandoned by Jforrester:
Set $wgMultiContentRevisionSchemaMigrationStage = SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD

Reason:
Not needed.

https://gerrit.wikimedia.org/r/482139

Jdforrester-WMF claimed this task.
Jdforrester-WMF removed a project: Patch-For-Review.
Jdforrester-WMF updated the task description. (Show Details)

Change 482274 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Add test-commons.wikimedia.org

https://gerrit.wikimedia.org/r/482274

The database is still present at the s4 servers, and I would like to clean it up.
It cannot be done directly on the master, as it would break the multisource slaves (dbstore1002 and labs hosts).
I can delete it on a host by host, that is fine, but I would like @Jdforrester-WMF to confirm it is already on s3 and it can be safely deleted from s4 servers.

By the way, are people aware that a shard called "test-s4" has 2 dedicated large hosts and ready to be used for production? I think it was used by Anomie and DanielK to test MCR, could it be shared for whatever testcommonswiki is being used?

Mentioned in SAL (#wikimedia-operations) [2019-01-08T16:00:10Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616

Mentioned in SAL (#wikimedia-operations) [2019-01-08T17:37:00Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616 (duration: 96m 50s)

I just sync'ed up with @Jdforrester-WMF - we are re-closing this - we will have a proper task to delete the wiki once we are ready for it
Thanks!

Is this intentionally missing from the replicas?

MariaDB [testwiki_p]> USE testcommonswiki_p;
ERROR 1044 (42000): Access denied for user 's52657'@'%' to database 'testcommonswiki_p'
MariaDB [testwiki_p]> SELECT * FROM meta_p.wiki WHERE url LIKE "%test-commons%";
Empty set (0.04 sec)

This breaks some tools that go off of the CentralAuth API and then attempt to query the replicas. But, maybe such tools should instead use meta_p.wiki :)

Is this intentionally missing from the replicas?

MariaDB [testwiki_p]> USE testcommonswiki_p;
ERROR 1044 (42000): Access denied for user 's52657'@'%' to database 'testcommonswiki_p'
MariaDB [testwiki_p]> SELECT * FROM meta_p.wiki WHERE url LIKE "%test-commons%";
Empty set (0.04 sec)

This breaks some tools that go off of the CentralAuth API and then attempt to query the replicas. But, maybe such tools should instead use meta_p.wiki :)

Don’t think anyone has filed a task requesting the views on labs to be created. It doesn’t happen automatically, so hence it not working

I don’t know if it is needed/wanted

Is this intentionally missing from the replicas?

MariaDB [testwiki_p]> USE testcommonswiki_p;
ERROR 1044 (42000): Access denied for user 's52657'@'%' to database 'testcommonswiki_p'
MariaDB [testwiki_p]> SELECT * FROM meta_p.wiki WHERE url LIKE "%test-commons%";
Empty set (0.04 sec)

This breaks some tools that go off of the CentralAuth API and then attempt to query the replicas. But, maybe such tools should instead use meta_p.wiki :)

Don’t think anyone has filed a task requesting the views on labs to be created. It doesn’t happen automatically, so hence it not working

I don’t know if it is needed/wanted

It probably should be treated as automatically required by procedure for all non-private wikis

Don’t think anyone has filed a task requesting the views on labs to be created. It doesn’t happen automatically, so hence it not working

I'm aware :) I saw T213295 so wasn't sure if we intentionally were skipping this step (I say "step" because it seems this is seldom forgotten).

I don’t know if it is needed/wanted

It would be of narrow interest, for sure. No one asked me about it, I just saw the errors in my logs. For now I'm skipping over projects that can't be found in meta_p.wiki. Nothing urgent.

First of all, I am only commenting because I have more information, but access handling is owned by the cloud team.

It probably should be treated as automatically required by procedure for all non-private wikis

This is on purpose done not-automatically so a human has to provide permissions to access to new wikis manually. In other words, the permissions are a white list to prevent, generically, exposing information that is not shared. There are specific examples where this created problems in the past, but to give one of many examples- sometimes databases or wikis are created by mistake- an automatic process would not have that into account. Of course, there are other mechanisms to prevent private data leaking, but this is one of the many layers to do it. I know it creates annoyances, but please realize that a service can "fail" many times, and as long as eventually it works, it would be ok- private data only has to fail once to be forever leaked.

Having said that, we can do better, and you can help with that- for example, setting some monitoring on tools or with reports like this. I am not sure of the status of the documentation, but last time I checked, there is a part of "notifying us DBA and cloud for wiki sanitization", of which the lack of it I complained at T197616#4859391 .

Whatever decision you take, if it should be added, file a ticket to Data-Services / cloud-services-team requesting the exposure of the new wiki.

Don’t think anyone has filed a task requesting the views on labs to be created. It doesn’t happen automatically, so hence it not working

I'm aware :) I saw T213295 so wasn't sure if we intentionally were skipping this step (I say "step" because it seems this is seldom forgotten).

I don’t know if it is needed/wanted

It would be of narrow interest, for sure. No one asked me about it, I just saw the errors in my logs. For now I'm skipping over projects that can't be found in meta_p.wiki. Nothing urgent.

Yeah, as the wiki is for testing only and going to be closed soon anyway, I had vaguely decided that it wasn't valuable. However, if there's a real need I'd be happy to make the Cloud/DBA tasks and pester people. :-)