Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Halfak | T157135 Fix broken beta-labs deploy | |||
Resolved | • mmodell | T137124 Scap3 submodule space issues |
Event Timeline
Looks like this is the startup error:
[2017-02-03T15:30:13] Traceback (most recent call last): [2017-02-03T15:30:13] File "/srv/deployment/ores/deploy/ores_wsgi.py", line 6, in <module> [2017-02-03T15:30:13] application = wsgi.build() [2017-02-03T15:30:13] File "./ores/applications/wsgi.py", line 71, in build [2017-02-03T15:30:13] return server.configure(config) [2017-02-03T15:30:13] File "./ores/wsgi/server.py", line 28, in configure [2017-02-03T15:30:13] scoring_system = ScoringSystem.from_config(config, ss_name) [2017-02-03T15:30:13] File "./ores/scoring_systems/scoring_system.py", line 329, in from_config [2017-02-03T15:30:13] return Class.from_config(config, name) [2017-02-03T15:30:13] File "./ores/scoring_systems/celery_queue.py", line 242, in from_config [2017-02-03T15:30:13] config, name, section_key=section_key) [2017-02-03T15:30:13] File "./ores/scoring_systems/scoring_system.py", line 298, in _kwargs_from_config [2017-02-03T15:30:13] config, name, section_key=section_key) [2017-02-03T15:30:13] File "./ores/scoring_systems/celery_queue.py", line 234, in _build_context_map [2017-02-03T15:30:13] for name in section['scoring_contexts']} [2017-02-03T15:30:13] File "./ores/scoring_systems/celery_queue.py", line 234, in <dictcomp> [2017-02-03T15:30:13] for name in section['scoring_contexts']} [2017-02-03T15:30:13] File "./ores/scoring_context.py", line 227, in from_config [2017-02-03T15:30:13] return cls(name, model_map=model_map, extractor=extractor) [2017-02-03T15:30:13] File "./ores/scoring_context.py", line 248, in __init__ [2017-02-03T15:30:13] for model_name, model in model_map.items()} [2017-02-03T15:30:13] File "./ores/scoring_context.py", line 248, in <dictcomp> [2017-02-03T15:30:13] for model_name, model in model_map.items()} [2017-02-03T15:30:13] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 235, in format_info [2017-02-03T15:30:13] return self.format_info_json() [2017-02-03T15:30:13] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 268, in format_info_json [2017-02-03T15:30:13] params.update(self.estimator.get_params()) [2017-02-03T15:30:13] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/sklearn_classifier.py", line 69, in __getattr__ [2017-02-03T15:30:13] raise AttributeError(attr) [2017-02-03T15:30:13] AttributeError: estimator
I'm checking on the versions we have deployed. I think this is an issue with the model files.
This issue might be related to T157136: Error after "Finished deploy": <ValueError> xrange() arg 3 must not be zero, but that's unlikely.
It's definitely related to T157141: Diffusion repository can't be cloned: 500 errors (research-ores-editquality)
https://gerrit.wikimedia.org/r/335998 Here's some changes that implement draft quality and switch us to using git-ssh to get around the 500 error.
New error!
[2017-02-04T21:26:08] Traceback (most recent call last): [2017-02-04T21:26:08] File "/srv/deployment/ores/deploy/ores_wsgi.py", line 6, in <module> [2017-02-04T21:26:08] application = wsgi.build() [2017-02-04T21:26:08] File "./ores/applications/wsgi.py", line 71, in build [2017-02-04T21:26:08] return server.configure(config) [2017-02-04T21:26:08] File "./ores/wsgi/server.py", line 28, in configure [2017-02-04T21:26:08] scoring_system = ScoringSystem.from_config(config, ss_name) [2017-02-04T21:26:08] File "./ores/scoring_systems/scoring_system.py", line 329, in from_config [2017-02-04T21:26:08] return Class.from_config(config, name) [2017-02-04T21:26:08] File "./ores/scoring_systems/celery_queue.py", line 242, in from_config [2017-02-04T21:26:08] config, name, section_key=section_key) [2017-02-04T21:26:08] File "./ores/scoring_systems/scoring_system.py", line 298, in _kwargs_from_config [2017-02-04T21:26:08] config, name, section_key=section_key) [2017-02-04T21:26:08] File "./ores/scoring_systems/celery_queue.py", line 234, in _build_context_map [2017-02-04T21:26:08] for name in section['scoring_contexts']} [2017-02-04T21:26:08] File "./ores/scoring_systems/celery_queue.py", line 234, in <dictcomp> [2017-02-04T21:26:08] for name in section['scoring_contexts']} [2017-02-04T21:26:08] File "./ores/scoring_context.py", line 222, in from_config [2017-02-04T21:26:08] scorer_model = ScorerModel.from_config(config, key) [2017-02-04T21:26:08] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 96, in from_config [2017-02-04T21:26:08] return Class.from_config(config, name, section_key=section_key) [2017-02-04T21:26:08] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 160, in from_config [2017-02-04T21:26:08] return cls.load(open(section['model_file'], 'rb')) [2017-02-04T21:26:08] File "/srv/deployment/ores/venv/lib/python3.4/site-packages/revscoring/scorer_models/scorer_model.py", line 73, in load [2017-02-04T21:26:08] return pickle.load(f) [2017-02-04T21:26:08] ImportError: No module named 'editquality'
So it looks like we're not finding the editquality module inside the repo.
Ha! Now I'm running out of disk space on sca03.
halfak@deployment-tin:/srv/deployment/ores/deploy$ scap deploy -fv 21:44:05 Started deploy [ores/deploy@7c80636] 21:44:05 Deploying Rev: 7c80636313b088928c8eba5d5bdf0b62b8db7f76 21:44:05 Update DEPLOY_HEAD 21:44:05 Creating /srv/deployment/ores/deploy/.git/DEPLOY_HEAD Deleted tag 'scap/sync/2016-08-08/0007' (was 8257f35) 21:44:05 Update server info Entering 'submodules/draftquality' Entering 'submodules/editquality' Entering 'submodules/ores' Entering 'submodules/wheels' Entering 'submodules/wikiclass' 21:44:05 Started deploy [ores/deploy@7c80636]: (no justification provided) 21:44:05 == WORKER == :* deployment-sca03.deployment-prep.eqiad.wmflabs 21:44:05 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'fetch', '--refresh-config'] 21:45:13 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'fetch', '--refresh-config'] on deployment-sca03.deployment-prep.eqiad.wmflabs returned [70]: http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git From http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/ * [new branch] master -> origin/master * [new tag] scap/sync/2017-02-04/0005 -> scap/sync/2017-02-04/0005 /srv/deployment/ores/deploy-cache/cache From /srv/deployment/ores/deploy-cache/cache * [new branch] master -> origin/master * [new tag] scap/sync/2017-02-04/0003 -> scap/sync/2017-02-04/0003 * [new tag] scap/sync/2017-02-04/0004 -> scap/sync/2017-02-04/0004 * [new tag] scap/sync/2017-02-04/0005 -> scap/sync/2017-02-04/0005 Cloning into 'submodules/editquality'... error: unable to write file models/arwiki.reverted.gradient_boosting.model error: unable to write file models/cswiki.damaging.gradient_boosting.model error: unable to write file models/cswiki.goodfaith.gradient_boosting.model <... snip ...> error: unable to write file models/wikidatawiki.goodfaith.gradient_boosting.model error: unable to write file models/wikidatawiki.reverted.gradient_boosting.model error: unable to write file requirements.txt error: unable to write file setup.cfg error: unable to write file setup.py error: unable to write file tox.ini fatal: cannot create directory at 'tuning_reports': No space left on device Cloning into 'submodules/ores'... /srv/deployment/ores/deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/ores: No space left on device Clone of 'http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/ores' into submodule path 'submodules/ores' failed ores/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0) 21:45:13 1 targets had deploy errors 21:45:13 1 targets failed 21:45:13 1 of 1 worker targets failed, exceeding limit Rollback all deployed groups? [Y/n]: Y 21:46:51 == WORKER == :* deployment-sca03.deployment-prep.eqiad.wmflabs 21:46:51 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'worker', 'rollback', '--refresh-config'] ores/deploy: rollback stage(s): 100% (ok: 1; fail: 0; left: 0) 21:46:58 Finished deploy [ores/deploy@7c80636]: (no justification provided) (duration: 02m 52s) 21:46:58 Finished deploy [ores/deploy@7c80636] (duration: 02m 52s)
halfak@deployment-sca03:/srv/deployment/ores/deploy$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/vda3 19G 15G 3.4G 81% / halfak@deployment-sca03:/srv/deployment/ores/deploy$ cd ../../ halfak@deployment-sca03:/srv/deployment$ du -hs * 580M eventstreams 11G ores
WTF. The repo directory is only 2.5 GB on deployment-tin
halfak@deployment-tin:/srv/deployment/ores$ du -hs . 2.5G . ``
It's alive! Thanks to @20after4 doing some cleanup of deploy-cache, we now have a working deploy in beta labs. See http://ores-beta.wmflabs.org/v2/scores/enwiki/draftquality/1 for an example scoring of one of the new models.