Once we have improved the signals and gotten enough new training data we need to retrain the model and deploy it.
Acceptance criteria:
- ORES uses the new improved Item quality model on Wikidata
Lydia_Pintscher | |
Aug 26 2020, 4:19 PM |
F32415766: image.png | |
Oct 26 2020, 10:42 PM |
F32415774: image.png | |
Oct 26 2020, 10:42 PM |
F32415768: image.png | |
Oct 26 2020, 10:42 PM |
F32415772: image.png | |
Oct 26 2020, 10:42 PM |
Once we have improved the signals and gotten enough new training data we need to retrain the model and deploy it.
Acceptance criteria:
Change 636463 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Upgrade articlequality to master
This git lfs thing is a mess... I hope T264651: Migrate ORES/Revscoring/etc. repos to Gitlab or Gerrit gets done ASAP.
Change 636463 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Upgrade articlequality to master
Trying to deploy to beta:
17:13:56 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'default', 'fetch', '--refresh-config'] on deployment-ores01.deployment-prep.eqiad.wmflabs returned [255]: Permission denied (publickey). 17:13:56 connection to deployment-ores01.deployment-prep.eqiad.wmflabs failed and future stages will not be attempted for this target ores/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0) 17:13:56 1 targets had deploy errors 17:13:56 1 targets failed 17:13:56 1 of 1 default targets failed, exceeding limit
What?
logging in:
$ ssh deployment-ores01.deployment-prep.eqiad.wmflabs Linux deployment-ores01 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 Debian GNU/Linux 9.3 (stretch) The last Puppet run was at Tue Sep 8 22:26:12 UTC 2020 (68806 minutes ago).
Puppet is not working...
ladsgroup@deployment-ores01:~$ sudo puppet agent -tv 2020-10-26 17:18:17.532026 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Retrieving locales Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class role::ores::redis for deployment-ores01.deployment-prep.eqiad.wmflabs on node deployment-ores01.deployment-prep.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
It seems putting ores on envoy broke ores on beta cluster:
ladsgroup@deployment-ores01:~$ sudo puppet agent -tv 2020-10-26 17:27:00.475030 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Retrieving locales Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, If you want non-sni TLS to be supported, you need to define profile::tlsproxy::envoy::global_cert_name or profile::tlsproxy::envoy::acme_cert_name (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 144, column: 13) on node deployment-ores01.deployment-prep.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
Change 636492 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Bump to HEAD of articlequality again
Change 636492 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Bump to HEAD of articlequality again
Mentioned in SAL (#wikimedia-operations) [2020-10-26T20:08:37Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326)
Mentioned in SAL (#wikimedia-operations) [2020-10-26T20:15:30Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326) (duration: 06m 53s)
The timing of precaching requests had a 20% dive:
Before | After |
For precaching requests of wikidata the nose dive is much bigger, around half:
Before | After |
This would help in the capacity issues as well.