Page MenuHomePhabricator

Fix broken beta-labs deploy
Closed, ResolvedPublic

Event Timeline

Halfak triaged this task as High priority.Feb 3 2017, 3:42 PM

Looks like this is the startup error:

[2017-02-03T15:30:13] Traceback (most recent call last):
[2017-02-03T15:30:13]   File "/srv/deployment/ores/deploy/ores_wsgi.py", line 6, in <module>
[2017-02-03T15:30:13]     application = wsgi.build()
[2017-02-03T15:30:13]   File "./ores/applications/wsgi.py", line 71, in build
[2017-02-03T15:30:13]     return server.configure(config)
[2017-02-03T15:30:13]   File "./ores/wsgi/server.py", line 28, in configure
[2017-02-03T15:30:13]     scoring_system = ScoringSystem.from_config(config, ss_name)
[2017-02-03T15:30:13]   File "./ores/scoring_systems/scoring_system.py", line 329, in from_config
[2017-02-03T15:30:13]     return Class.from_config(config, name)
[2017-02-03T15:30:13]   File "./ores/scoring_systems/celery_queue.py", line 242, in from_config
[2017-02-03T15:30:13]     config, name, section_key=section_key)
[2017-02-03T15:30:13]   File "./ores/scoring_systems/scoring_system.py", line 298, in _kwargs_from_config
[2017-02-03T15:30:13]     config, name, section_key=section_key)
[2017-02-03T15:30:13]   File "./ores/scoring_systems/celery_queue.py", line 234, in _build_context_map
[2017-02-03T15:30:13]     for name in section['scoring_contexts']}
[2017-02-03T15:30:13]   File "./ores/scoring_systems/celery_queue.py", line 234, in <dictcomp>
[2017-02-03T15:30:13]     for name in section['scoring_contexts']}
[2017-02-03T15:30:13]   File "./ores/scoring_context.py", line 227, in from_config
[2017-02-03T15:30:13]     return cls(name, model_map=model_map, extractor=extractor)
[2017-02-03T15:30:13]   File "./ores/scoring_context.py", line 248, in __init__
[2017-02-03T15:30:13]     for model_name, model in model_map.items()}
[2017-02-03T15:30:13]   File "./ores/scoring_context.py", line 248, in <dictcomp>
[2017-02-03T15:30:13]     for model_name, model in model_map.items()}
[2017-02-03T15:30:13]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 235, in format_info
[2017-02-03T15:30:13]     return self.format_info_json()
[2017-02-03T15:30:13]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 268, in format_info_json
[2017-02-03T15:30:13]     params.update(self.estimator.get_params())
[2017-02-03T15:30:13]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 69, in __getattr__
[2017-02-03T15:30:13]     raise AttributeError(attr)
[2017-02-03T15:30:13] AttributeError: estimator

I'm checking on the versions we have deployed. I think this is an issue with the model files.

https://gerrit.wikimedia.org/r/335998 Here's some changes that implement draft quality and switch us to using git-ssh to get around the 500 error.

New error!

[2017-02-04T21:26:08] Traceback (most recent call last):
[2017-02-04T21:26:08]   File "/srv/deployment/ores/deploy/ores_wsgi.py", line 6, in <module>
[2017-02-04T21:26:08]     application = wsgi.build()
[2017-02-04T21:26:08]   File "./ores/applications/wsgi.py", line 71, in build
[2017-02-04T21:26:08]     return server.configure(config)
[2017-02-04T21:26:08]   File "./ores/wsgi/server.py", line 28, in configure
[2017-02-04T21:26:08]     scoring_system = ScoringSystem.from_config(config, ss_name)
[2017-02-04T21:26:08]   File "./ores/scoring_systems/scoring_system.py", line 329, in from_config
[2017-02-04T21:26:08]     return Class.from_config(config, name)
[2017-02-04T21:26:08]   File "./ores/scoring_systems/celery_queue.py", line 242, in from_config
[2017-02-04T21:26:08]     config, name, section_key=section_key)
[2017-02-04T21:26:08]   File "./ores/scoring_systems/scoring_system.py", line 298, in _kwargs_from_config
[2017-02-04T21:26:08]     config, name, section_key=section_key)
[2017-02-04T21:26:08]   File "./ores/scoring_systems/celery_queue.py", line 234, in _build_context_map
[2017-02-04T21:26:08]     for name in section['scoring_contexts']}
[2017-02-04T21:26:08]   File "./ores/scoring_systems/celery_queue.py", line 234, in <dictcomp>
[2017-02-04T21:26:08]     for name in section['scoring_contexts']}
[2017-02-04T21:26:08]   File "./ores/scoring_context.py", line 222, in from_config
[2017-02-04T21:26:08]     scorer_model = ScorerModel.from_config(config, key)
[2017-02-04T21:26:08]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 96, in from_config
[2017-02-04T21:26:08]     return Class.from_config(config, name, section_key=section_key)
[2017-02-04T21:26:08]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 160, in from_config
[2017-02-04T21:26:08]     return cls.load(open(section['model_file'], 'rb'))
[2017-02-04T21:26:08]   File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 73, in load
[2017-02-04T21:26:08]     return pickle.load(f)
[2017-02-04T21:26:08] ImportError: No module named 'editquality'

So it looks like we're not finding the editquality module inside the repo.

Ha! Now I'm running out of disk space on sca03.

halfak@deployment-tin:/srv/deployment/ores/deploy$ scap deploy -fv
21:44:05 Started deploy [ores/deploy@7c80636]
21:44:05 Deploying Rev: 7c80636313b088928c8eba5d5bdf0b62b8db7f76
21:44:05 Update DEPLOY_HEAD
21:44:05 Creating /srv/deployment/ores/deploy/.git/DEPLOY_HEAD
Deleted tag 'scap/sync/2016-08-08/0007' (was 8257f35)
21:44:05 Update server info
Entering 'submodules/draftquality'
Entering 'submodules/editquality'
Entering 'submodules/ores'
Entering 'submodules/wheels'
Entering 'submodules/wikiclass'
21:44:05 Started deploy [ores/deploy@7c80636]: (no justification provided)
21:44:05 
== WORKER ==
:* deployment-sca03.deployment-prep.eqiad.wmflabs
21:44:05 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'fetch', '--refresh-config']
21:45:13 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'fetch', '--refresh-config'] on deployment-sca03.deployment-prep.eqiad.wmflabs returned [70]: http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git
From http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/
 * [new branch]      master     -> origin/master
 * [new tag]         scap/sync/2017-02-04/0005 -> scap/sync/2017-02-04/0005
/srv/deployment/ores/deploy-cache/cache
From /srv/deployment/ores/deploy-cache/cache
 * [new branch]      master     -> origin/master
 * [new tag]         scap/sync/2017-02-04/0003 -> scap/sync/2017-02-04/0003
 * [new tag]         scap/sync/2017-02-04/0004 -> scap/sync/2017-02-04/0004
 * [new tag]         scap/sync/2017-02-04/0005 -> scap/sync/2017-02-04/0005
Cloning into 'submodules/editquality'...
error: unable to write file models/arwiki.reverted.gradient_boosting.model
error: unable to write file models/cswiki.damaging.gradient_boosting.model
error: unable to write file models/cswiki.goodfaith.gradient_boosting.model
<... snip ...>
error: unable to write file models/wikidatawiki.goodfaith.gradient_boosting.model
error: unable to write file models/wikidatawiki.reverted.gradient_boosting.model
error: unable to write file requirements.txt
error: unable to write file setup.cfg
error: unable to write file setup.py
error: unable to write file tox.ini
fatal: cannot create directory at 'tuning_reports': No space left on device
Cloning into 'submodules/ores'...
/srv/deployment/ores/deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/ores: No space left on device
Clone of 'http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/ores' into submodule path 'submodules/ores' failed

ores/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0)                     
21:45:13 1 targets had deploy errors
21:45:13 1 targets failed
21:45:13 1 of 1 worker targets failed, exceeding limit
Rollback all deployed groups? [Y/n]: Y
21:46:51 
== WORKER ==
:* deployment-sca03.deployment-prep.eqiad.wmflabs
21:46:51 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'rollback', '--refresh-config']
ores/deploy: rollback stage(s): 100% (ok: 1; fail: 0; left: 0)                  
21:46:58 Finished deploy [ores/deploy@7c80636]: (no justification provided) (duration: 02m 52s)
21:46:58 Finished deploy [ores/deploy@7c80636] (duration: 02m 52s)
halfak@deployment-sca03:/srv/deployment/ores/deploy$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        19G   15G  3.4G  81% /
halfak@deployment-sca03:/srv/deployment/ores/deploy$ cd ../../
halfak@deployment-sca03:/srv/deployment$ du -hs * 
580M	eventstreams
11G	ores

WTF. The repo directory is only 2.5 GB on deployment-tin

halfak@deployment-tin:/srv/deployment/ores$ du -hs .
2.5G	.
``

It's alive! Thanks to @20after4 doing some cleanup of deploy-cache, we now have a working deploy in beta labs. See http://ores-beta.wmflabs.org/v2/scores/enwiki/draftquality/1 for an example scoring of one of the new models.

Halfak claimed this task.