Page MenuHomePhabricator

Deploy ORES extension to beta cluster
Closed, ResolvedPublic

Description

As pointed out by @hashar, we should deploy the ORES extension to the beta cluster before setting it up in production.

https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment#Deploy_to_beta_cluster_on_Labs

AIUI, the ores.wmflabs.org service is based on data from production wikis, so if we wanted to truly set this up in beta cluster, we'd need to have an ORES instance that uses data from the beta wikis...and I don't know how difficult that will be.

Event Timeline

We can use "testwiki" (e.g. http://ores.wmflabs.org/scores/testwiki/damaging/34567/ ) Score is the last two digits of the rev id backwards. I think that would be enough for beta clauster.

Change 272466 had a related patch set uploaded (by Ladsgroup):
Deploy ORES extension to Wikipedia project in beta cluster

https://gerrit.wikimedia.org/r/272466

Hello! So my mail to wikitech-l was a bit too short I lacked time to expose when I am going to write down here. Thanks @Legoktm for the task.

The idea of the beta cluster is to provide a platform on which team can prepare a service before it is going to be included to production. Since it is running on labs instances and has no impact on the real sites, it is much more liberal and let you iterate / refine easily. A key point is that more people can get root and pushing a change is more or less automated.

I was looking at the T106867: [Epic] Deploy Revscoring/ORES service in Prod task which has a few child tasks. They are meant for production but can as well be duplicated for beta cluster if we wanted to setup an instance of ORES on the beta cluster. Taking them one by one:

T124203: Setup varnish endpoint for ORES

Beta has Varnish caches for mobile/text/upload/parsoid which are setup using the puppet recipes from production. They run the same version of varnish and the same VCL. Each instances has a frontend varnish and a backend varnish and events are sent to kafka/eventlogging just like in production.

There is no misc-varnish on beta cluster though.

T124202: Setup LVS for ORES

No LVS on beta cluster due to labs network limitation. There is no IPv6 either. I dont think it going to be much of a concern and we can rule LVS off for ORES on beta cluster.

Backends

Then we have two tasks that are about setting up the ORES backends.

One is to have ores on scb cluster (T124201). The service team has setup a scb on beta cluster and has/is migrating its service to it. The services runs from the master branches or their /deploy repo, that is a way for teams to verify on beta the change works fine before deploying on production. Though you could run directly from the source 'master' branch to track development branch continuously.

Later the deployment tool scap3 will be used to automatically update the services on the beta cluster whenever a change is merged (either in the source repo or the /deploy repo, up to the teams).

redises servers for oresdb. They can be spawned as standalone instances in beta cluster.

For both, you will most probably need puppet manifests to implement the definition of the service. Doing it on beta let you use the platform as a sandbox to refine your puppet manifests. You can deploy the puppet patches before they are reviewed/merged by ops and have root on instances. That ease the grunt work.

Once you achieve a workable setup, you can drop all the instances and rebuild them again from scratch solely relying on puppet recipes. That dramatically help when the service is later deployed on prod since 99% of mistakes/issues are dealt with before even starting the prod deployment.

T107493

ORES has a bunch Debian package dependencies. That is a requirement for production we dont just pip install. Whenever the packages are ready they can be installed on beta cluster instances and the service fully reinstalled to make sure both puppet and dependencies are fulfilled.

All of the above sounds scary, but all the work done on the beta cluster are steps toward production. I believe it is easier to iterate on labs instances having root access versus directly to production on which lot of actions would depends on having shell access and most probably root on the servers. It is also usually much more confortable to know you are not going to break prod while doing integration work :-)

Maybe setting up a whole ORES cluster on beta cluster is overkill, though I am sure it pays off on the long term and reduce the chance an incident will screw up the production setup. An alternative could be to have the beta cluster to hit the currently existing ORES labs project. That is simpler to setup (just set $wgOresBaseUrl) but does not offer an env close to production.

Don't get me wrong : I love ORES, I dont want its deployment to be delayed. But I believe a "short" sprint to get it on beta cluster will give us much more freedom on the long term, will clarify a lot of questions still pending and will help have the actual production deployment to be as flawless as possible.

Oh @hashar: This task is about deploying ORES extension into prod not the ORES service itself. The ORES service is being moved by operations team and I have no clue how they are doing it (so ores.wmflabs.org goes into ores.wikimedia.org) but this task is about the ORES extension which uses the ORES service and stores it's data in a table, shows it, etc.

Yup I did the comparison with the production tasks on purpose. If we want to setup ORES on beta cluster, we will basically need to replicate the same setup production is going to use. My reply above is that most of the work can be handled on beta cluster first which would then ease the production deployment.

OK, here is summary of my discussion with @hashar:

  • there is two products that are moving to production: one is the ORES service (currently at ores.wmflabs.org) and the other one is the ORES extension
  • The extension only gets the data from ORES service and store it and make a GUI out of it
  • So the extension relies on the service and it should not be moved to production until the service is there
  • Here is the question: what if we want the extension in the beta cluster? Is it really necessary to wait until the service is moved to production cluster or at least, the beta cluster
  • Here is the second question: Should we have the service tested in the beta cluster? or move directly to prod

@Ladsgroup @Halfak further clarified on IRC. There is an ORES service on labs reachable at https://http://ores.wmflabs.org/ . The ORES MediaWiki extension uses it by default.

Since it is rather long to setup a new ORES service on beta, it seems easier to enable the extension to use it that labs service right now. This way the code is running on beta cluster.

Whenever a new ORES service is setup on beta cluster, we can switch the URL easily. We are not there yet.

Change 272466 had a related patch set uploaded (by Hashar):
Deploy ORES extension to Wikipedia project in beta cluster

https://gerrit.wikimedia.org/r/272466

Change 272466 merged by jenkins-bot:
Deploy ORES extension to Wikipedia project in beta cluster

https://gerrit.wikimedia.org/r/272466

OK, everything is there but we can't see it in beta features because of caching. I hope we can fix it soon.

What is the caching issue about? Seems to me beta features are registered via the hook GetBetaFeaturePreferences. If beta features cache it somehow not invalidating its cache when a new hook is registered, that is worth filling a bug against BetaFeatures .

@Halfak @Ladsgroup I guess we want another task to get the ORES service deployed to beta as well with a copy paste of my Monday comment at T127661#2051073