Page MenuHomePhabricator

ORES services should have vagrant roles
Closed, ResolvedPublic

Description

ORES and wikilabels should have MediaWiki-Vagrant roles to ease setup.

  • merge https://gerrit.wikimedia.org/r/#/c/341294/
  • change virtualenv defaults to use the shared user (no reason not to, and installing to shared folders with the root user causes weird bugs)
  • follow-up patch for jessie branch (for upstart changes), try to do wheel-based install
  • make sure dependencies are represented correctly in all packages (esp. these), maybe figure out what went wrong with recursive installation of dependencies
  • add a --non-interactive option to wikilabels' load_schema
  • make ORES extension use local ORES server when present
  • use ChangePropagation when present

ORES components supported by our vagrant roles

  • Scoring API service
  • Extension:ORES (todo: point at internal ORES service)
  • Celery
  • Wiki Labels
  • Reference UI

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Wikilabels is a stand-alone tool in labs. I'm not sure that it makes sense to have a vagrant role.

ORES could have a vagrant role. I'm guessing we'll want to set up a service for ORES. We could also just allow the ORES extension to hit ores.wikimedia.org for "testwiki".

Wikilabels is a stand-alone tool in labs. I'm not sure that it makes sense to have a vagrant role.

How do you test changes to it? Easy setup of a development environment for various tools is exactly what vagrant is for. (Could be a vanilla vagrant box and not MediaWiki-Vagrant, but putting it into something that already exists is less effort.)

I'm guessing we'll want to set up a service for ORES. We could also just allow the ORES extension to hit ores.wikimedia.org for "testwiki".

I was thinking of putting the ORES extension and the ORES service in separate roles. Use testwiki if the service is not enabled, use it if it is.

Halfak triaged this task as Low priority.Mar 4 2017, 5:55 PM
Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)

Change 351153 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/vagrant@master] Trusty backcompat patch for ORES

https://gerrit.wikimedia.org/r/351153

Change 351153 abandoned by Gergő Tisza:
Trusty backcompat patch for ORES

Reason:
Wrong branch.

https://gerrit.wikimedia.org/r/351153

@awight, I think you're the only ORES local who uses vagrant on a regular basis. Could you give this patchset a test and report back?

Change 341294 merged by jenkins-bot:
[mediawiki/vagrant@master] Add role for ORES and Wikilabels services

https://gerrit.wikimedia.org/r/341294

I think I mentioned this in code review, but with a fresh vagrant checkout and VM, I get:

> default: Execution of '/bin/systemctl start ores-wsgi' returned 6: Failed to start ores-wsgi.service: Unit mediawiki-ready.service failed to load: No such file or directory.

Running provision a second time works, so this is probably just a missing dependency.

I think the ORES service might not be running.

vagrant roles list -e
Enabled roles:

ores                       ores_service               wikilabels
grep -r ores::port puppet/hieradata/common.yaml 
ores::port: 18880

Inside the box:

sudo netstat -anp | grep 1888
tcp        0      0 0.0.0.0:18881           0.0.0.0:*               LISTEN      2978/python3

> default: Execution of '/bin/systemctl start ores-wsgi' returned 6: Failed to start ores-wsgi.service: Unit mediawiki-ready.service failed to load: No such file or directory.

Running provision a second time works, so this is probably just a missing dependency.

Interesting. The documentation for mediawiki::ready_service does say that The puppet manifest that provisions the service should require this class as well but none of the puppet classes seem to do that. Did you get the error by enabling the role before the initial vagrant up?

Yep, the process was roughly,

  • vagrant role enable ...
  • vagrant destroy
  • vagrant box update
  • vagrant up

I think the ORES service might not be running.

After a successful provisioning? Can you check systemctl status ores-wsgi / systemctl status ores-celery?

vagrant provision
==> default: Running provisioner: lsb_check...
==> default: Running provisioner: shell...
    default: Running: /tmp/vagrant-shell20170626-2651-kriyco.sh
==> default: Running provisioner: puppet...
==> default: Running Puppet with site.pp...
==> default: Info: Loading facts
==> default: Notice: Compiled catalog for mediawiki-vagrant.dev in environment production in 2.46 seconds
==> default: Info: Applying configuration version '1498513608.8cd77a3c'
==> default: Notice: /Stage[main]/Ores/Exec[pip_install_revscoring_dependencies_hack]/returns: executed successfully
==> default: Notice: /Stage[main]/Ores/Systemd::Service[ores-celery]/Service[ores-celery]/ensure: ensure changed 'stopped' to 'running'
==> default: Info: /Stage[main]/Ores/Systemd::Service[ores-celery]/Service[ores-celery]: Unscheduling refresh on Service[ores-celery]
==> default: Notice: /Stage[main]/Ores/Systemd::Service[ores-wsgi]/Service[ores-wsgi]/ensure: ensure changed 'stopped' to 'running'
==> default: Info: /Stage[main]/Ores/Systemd::Service[ores-wsgi]/Service[ores-wsgi]: Unscheduling refresh on Service[ores-wsgi]
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: executed successfully
==> default: Notice: Finished catalog run in 5.69 seconds

Suspiciously, I get that same output each time I try to provision, so the service must be dying immediately. Investigating...

Looks pretty simple, actually! Missing Python dependency? Here are snippets from syslog,

Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: Traceback (most recent call last):
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 645, in _build_master
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: ws.require(__requires__)
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 946, in require
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: needed = self.resolve(parse_requirements(requirements))
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 838, in resolve
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: raise VersionConflict(dist, req).with_context(dependent_req)
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: pkg_resources.ContextualVersionConflict: (pytz 2017.2 (/vagrant/srv/ores/lib/python3.4/site-packages), Requirement.parse('pytz==2012c'), {'revscoring'})
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: During handling of the above exception, another exception occurred:
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: Traceback (most recent call last):
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/bin/ores", line 5, in <module>
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: from pkg_resources import load_entry_point
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3070, in <module>
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: @_call_aside
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3056, in _call_aside
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: f(*args, **kwargs)
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3083, in _initialize_master_working_set
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: working_set = WorkingSet._build_master()
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 647, in _build_master
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: return cls._build_from_requirements(__requires__)
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 660, in _build_from_requirements
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: dists = ws.resolve(reqs, Environment())
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 833, in resolve
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: raise DistributionNotFound(req, requirers)
Jun 26 21:47:21 mediawiki-vagrant ores-celery[5693]: pkg_resources.DistributionNotFound: The 'pytz==2012c' distribution was not found and is required by revscoring
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: Traceback (most recent call last):
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 645, in _build_master
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: ws.require(__requires__)
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 946, in require
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: needed = self.resolve(parse_requirements(requirements))
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 838, in resolve
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: raise VersionConflict(dist, req).with_context(dependent_req)
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: pkg_resources.ContextualVersionConflict: (pytz 2017.2 (/vagrant/srv/ores/lib/python3.4/site-packages), Requirement.parse('pytz==2012c'), {'revscoring'})
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: During handling of the above exception, another exception occurred:
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: Traceback (most recent call last):
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/bin/ores", line 5, in <module>
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: from pkg_resources import load_entry_point
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3070, in <module>
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: @_call_aside
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3056, in _call_aside
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: f(*args, **kwargs)
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3083, in _initialize_master_working_set
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: working_set = WorkingSet._build_master()
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 647, in _build_master
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: return cls._build_from_requirements(__requires__)
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 660, in _build_from_requirements
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: dists = ws.resolve(reqs, Environment())
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 833, in resolve
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: raise DistributionNotFound(req, requirers)
Jun 26 21:47:22 mediawiki-vagrant ores-wsgi[5692]: pkg_resources.DistributionNotFound: The 'pytz==2012c' distribution was not found and is required by revscoring

I have no idea what's going on there, the resource pip_install_revscoring_dependencies_hack should manually require all revscoring dependencies (I don't quite understand why that was necessary in the first place but seemed to be) and pytz is in there. Network error? Or maybe provisioning failed the first time and that role did not get executed on subsequent runs?

That sounds reasonable. I tried provisioning again just to get a clean bug report, using mediawiki-vagrant commit 8cd77a3c98b4c9788ade2881e9bc156089dd5cc1. I've made sure that the ores and wikilabels repos are up-to-date. My steps to reproduce are bulleted below:

  • vagrant destroy
  • vagrant up

Fails with

> default: Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install postgresql' returned 100: Reading package lists...

Seems to be a regression of T160660, the openssl package is not updating correctly.

  • vagrant provision

SSL succeeds this time, but we run into the problem I documented above:

> default: Error: Could not start Service[ores-celery]: Execution of '/bin/systemctl start ores-celery' returned 6: Failed to start ores-celery.service: Unit mediawiki-ready.service failed to load: No such file or directory.

  • vagrant provision

Seems to succeed after starting the ores-* services, but logs mention that pytz is missing.

I'll poke at my VM today and see whether there's anything interesting to report about the state of the virtualenv...

SSL succeeds this time, but we run into the problem I documented above:

Did you use the latest version of the patch? I added puppet dependencies on mediawiki-ready in puppet, I hoped that would prevent that error.

It behaved better with the patch, first giving the SSL error but on the second provision, I got:

==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exe
c[check ORES model versions]/returns: Starting...[bab3ad4ea2be8fcbbd623825] [no req]   RuntimeExce
ption from line 52 of /vagrant/mediawiki/extensions/ORES/includes/Api.php: Failed to make ORES req
uest to [http://localhost:18880/scores/wiki/?20170627165631=1&format=json], Error fetching URL: Fa
iled to connect to localhost port 18880: Connection refused                                       
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exe
c[check ORES model versions]/returns: Backtrace:                                                  
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #0 /vagrant/mediawiki/extensions/ORES/maintenance/CheckModelVersions.php(65): ORES\Api->request(array)                                                        
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #1 /vagrant/mediawiki/extensions/ORES/maintenance/CheckModelVersions.php(24): ORES\CheckModelVersions->getModels()                                            
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #2 /vagrant/mediawiki/maintenance/doMaintenance.php(111): ORES\CheckModelVersions->execute()                                                                  
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #3 /vagrant/mediawiki/extensions/ORES/maintenance/CheckModelVersions.php(76): require_once(string)                                                            
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #4 /var/www/w/MWScript.php(95): require_once(string)        
==> default: Notice: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: #5 {main}                                                   
==> default: Error: /usr/local/bin/mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=wiki returned 1 instead of one of [0]                                                         
==> default: Error: /Stage[main]/Role::Ores/Mediawiki::Maintenance[check ORES model versions]/Exec[check ORES model versions]/returns: change from notrun to 0 failed: /usr/local/bin/mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=wiki returned 1 instead of one of [0]

A third provision gives the same error. The ores-* services are still down due to missing pytz. pytz is in fact not available in /vagrant/srv/ores/lib. But... are you ready for this?

vagrant@mediawiki-vagrant:/vagrant/srv/ores$ ./bin/python
Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytz
>>>

Succeeds. Importing pytz from the system python, however, fails. Perhaps we're running the system python when trying to start the service?

Perhaps we're running the system python when trying to start the service?

My guess was wrong, the stack trace includes virtualenv libs:

Jun 27 17:00:26 mediawiki-vagrant ores-wsgi[3290]: File "/vagrant/srv/ores/lib/python3.4/site-packages/pkg_resources/__init__.py", line 833, in resolve
Jun 27 17:00:26 mediawiki-vagrant ores-wsgi[3290]: raise DistributionNotFound(req, requirers)
Jun 27 17:00:26 mediawiki-vagrant ores-wsgi[3290]: pkg_resources.DistributionNotFound: The 'pytz==2012c' distribution was not found and is required by revscoring

Following up on my interpreter dance above,

vagrant@mediawiki-vagrant:/vagrant/srv/ores$ ./bin/python
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytz
>>> pytz
<module 'pytz' from '/vagrant/srv/ores/lib/python3.4/site-packages/pytz/__init__.py'>
>>> pytz.VERSION
'2017.2'
>>> pytz.OLSON_VERSION
'2017b'

The fact that the failure comes from pkg_resources gives me the willies. Out of curiosity, I rounded up all the places where we depend on different versions of pytz:

  • lib/python3.4/site-packages/celery-3.1.25.dist-info/METADATA:Requires-Dist: pytz (>dev)
  • lib/python3.4/site-packages/revscoring-1.3.15.dist-info/METADATA:Requires-Dist: pytz (==2017.2)

It's not clear why pkg_resources assumes revscoring requires "pytz==2017c":

vagrant@mediawiki-vagrant:/vagrant/srv/ores$ grep -r 2017c .
vagrant@mediawiki-vagrant:/vagrant/srv/ores$

It's utterly mysterious where the 2017c requirement is coming from.


Oh, dear... I'm finally catching up with the pip_install_revscoring_dependencies_hack issue. The trick is that redis and pylru shouldn't always be required, as they are optional backends that we enable in site-specific configuration. There are some docs about how to specify extras in requirements.txt, but it doesn't look like the best practice. We shouldn't have to hack in the revscoring dependencies, but maybe we really do need to pip install redis and pylru?

pytz 2012c was required by revscoring until some days ago. The hack is taking the requirements from master which is why it ended up with the wrong version. pip helpfully omits requirements.txt from locally installed packages... I could probably parse the requires from the metadata.json file but that's an even more horrible hack.

What I totally don't get is why I need to install revscoring dependencies in the first place, when I have already installed ores and revscoring is a dependency of ores and pip should be able to handle recursive dependencies.

redis and pylru are a different issue since they are only declared as dependencies in one of the deploy packages, not in revscoring itself. Extras would be a nicer way to handle that, yeah.

Change 343593 abandoned by Gergő Tisza:
Trusty backcompat patch for ORES

Reason:
It doesn't merge because the parent is not in trusty-compat. More generally, the compat branch seems not very useful due to lack of backports. So let's abandon this.

https://gerrit.wikimedia.org/r/343593

I appreciate this mega-task format, so please don't think that I'm lobbying for subtasks, but I wanted to somehow keep track of which components now have vagrant roles, and which are still TODO. I'll add that in the task description for now, feel free to reword or move of course!

Wikilabels is done in https://gerrit.wikimedia.org/r/#/c/341294/ I think? The todo part in the extension is still todo (it's in https://gerrit.wikimedia.org/r/#/c/344814/). Is "Reference UI" the same thing as Meta-ORES?

Cool!

The reference UI I'm talking about is at https://ores.wikimedia.org/ui/. I hadn't realized it until now, but that means it's built into the server and I can confirm that it currently works under vagrant.

I ran through some of the features and found small things to fix up...

Wikilabels is working for me this time, but opening the enwiki campaign fails with,

PermissionError: [Errno 13] Permission denied: '/vagrant/srv/wikilabels/src/wikilabels/wikilabels/wsgi/static/.webassets-cache'

Celery configuration might need a tweak, I'm seeing the following in syslog, which seems to break the API service.

Sep 6 23:22:20 mediawiki-vagrant ores-celery[1751]: [2017-09-06 23:22:20,237: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@127.0.0.1:5672//: [Errno 111] Connection refused.

Change 344814 merged by jenkins-bot:
[mediawiki/vagrant@master] Use local ORES service when both ores and ores_services roles are enabled

https://gerrit.wikimedia.org/r/344814

Change 377655 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/vagrant@master] Use the celery scoring backend

https://gerrit.wikimedia.org/r/377655

Change 377655 merged by jenkins-bot:
[mediawiki/vagrant@master] Use the celery scoring backend

https://gerrit.wikimedia.org/r/377655