awight (Adam Roses Wight)
User

Projects (12)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 12 2014, 9:02 PM (157 w, 2 d)
Availability
Available
IRC Nick
awight
LDAP User
Awight
MediaWiki User
Awight (WMF)

Recent Activity

Sat, Oct 14

MarcoAurelio awarded T125618: Deprecate EducationProgram extension a Burninate token.
Sat, Oct 14, 7:50 PM · Epic, Education-Program-Dashboard, MediaWiki-extensions-EducationProgram

Thu, Oct 12

awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

Sounds good. It also sounds like the Celery developers would be open to a contributed backport.

Thu, Oct 12, 9:14 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

Now *this* is interesting. The Celery code involved in the failure, https://github.com/celery/celery/blob/3.1/celery/concurrency/asynpool.py#L141
has been rewritten to use poll() instead of select().
https://github.com/celery/celery/blob/master/celery/concurrency/asynpool.py#L121

Thu, Oct 12, 9:12 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

I used lsof to watch filehandle usage over the lifecycle of the celery service: P6112

Thu, Oct 12, 8:58 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight edited P6112 (An Untitled Masterwork).
Thu, Oct 12, 8:16 PM
awight created P6112 (An Untitled Masterwork).
Thu, Oct 12, 7:50 PM
awight committed rORESDEPLOYeb712bc8578a: Update config, keep in sync with the set of revscoring 2 models (authored by awight).
Update config, keep in sync with the set of revscoring 2 models
Thu, Oct 12, 7:35 PM
awight committed rORESDEPLOYf9dfcf20e666: Include new cluster as deployment targets (authored by awight).
Include new cluster as deployment targets
Thu, Oct 12, 5:45 PM
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

@akosiaris Good questions! I only just now found good documentation for the LimitFILENO configuration variable, and you're right that I had given the limit as the expected total across all workers in the service. 8,192 should indeed be a good per-process limit. We'll get back to you here, after discussing some more.

Thu, Oct 12, 4:35 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis.

Thanks, I'll experiment with removing the STRAIGHT_JOIN, as well as with @awight's suggestion at T164796#3560530 of breaking out the change tags rollup.

Thu, Oct 12, 6:12 AM · MW-1.31-release-notes (WMF-deploy-2017-10-03 (1.31.0-wmf.2)), User-notice, MediaWiki-extensions-WikibaseRepository, Wikidata-Sprint, Patch-For-Review, Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), DBA, Wikidata, Commons, Contributors-Team, Wikimedia-log-errors, MW-1.30-release-notes (WMF-deploy-2017-08-08_(1.30.0-wmf.13)), Russian-Sites, Wikimedia-General-or-Unknown, Performance, MediaWiki-Watchlist
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

I've left the beta cluster in another pickle for the night. The service is rolled back to 1.3, using scap to roll back to -r 42c56632e. The Extension:ORES has new code with the compatibility, however its FetchScoreJob is broken. The request is made using the v3 API, but we parse the response as v1. I missed this in testing.

Thu, Oct 12, 5:55 AM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight committed rORESDEPLOY510d415f3fb5: Update config, mostly to drop revert models (authored by awight).
Update config, mostly to drop revert models
Thu, Oct 12, 4:29 AM
awight committed rORESDEPLOY21c526944838: Update config, mostly to drop revert models (authored by awight).
Update config, mostly to drop revert models
Thu, Oct 12, 3:20 AM
awight created P6108 scap log.
Thu, Oct 12, 2:37 AM
awight committed rORESDEPLOY2eb73aa2af82: Add required scap config `keyholder_key` (authored by awight).
Add required scap config `keyholder_key`
Thu, Oct 12, 2:26 AM

Wed, Oct 11

awight added a comment to T164796: Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?).

@Bawolff Thanks for taking a look! I think you're right that my initial analysis was wrong, and I was just seeing cached/paged-in improvements. I made a pretty random guess in related bug T176456#3661739, that the multi-column indexes were forcing a suboptimal ordering of the conditions, but I'll also experiment with your suggestions.

Wed, Oct 11, 10:50 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Scoring-platform-team, Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Edit-Review-Improvements-RC-Page, MediaWiki-extensions-ORES
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

@akosiaris We're now estimating that 480 workers will eventually use 220k–1.2M file handles due to a glitch in Celery that may take some time to fix. How about we set the fileno limit on the service to 2M, and continue testing?

Wed, Oct 11, 10:04 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T177036: Clean up file handle and Redis connection management in ORES worker and celery processes.

By the N^2 guess, we would need 480 x 480 = 230,400 file handles. Cool, so let's take the max of our guesses and try a limit of 2M?

Wed, Oct 11, 9:30 PM · Scoring-platform-team, ORES
awight added a comment to T177036: Clean up file handle and Redis connection management in ORES worker and celery processes.

Looks this is a known thing that the number of connections between each process, c.f. the "handshake problem", if each process has a full set of connections to all previously launched workers. That would increase as triangular numbers. The 8th triangular number is 36, and 32nd is 528, and 480th is 83,436 so not perfect but it looks like a closely related curve.

Wed, Oct 11, 9:14 PM · Scoring-platform-team, ORES
awight updated subscribers of T175180: Deploy ORES (revscoring 2.0).
Wed, Oct 11, 5:44 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
awight added a comment to T177967: FetchScoreJob is trying to update scores for nonexistent models.

Probably unrelated, Beta Redis needs some more configuration?

Wed, Oct 11, 5:13 PM · Scoring-platform-team
awight created T177967: FetchScoreJob is trying to update scores for nonexistent models.
Wed, Oct 11, 5:11 PM · Scoring-platform-team
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

JobRunner exception coming from the new code,

2017-10-11 06:52:30 [Wd2jjQpEFhUAAELgsiMAAAAE] deployment-jobrunner02 wikidatawiki 1.31.0-alpha exception ERROR: [Wd2jjQpEFhUAAELgsiMAAAAE] /rpc/RunJobs.phpwiki=wikidatawiki&type=ORESFetchScoreJob&maxtime=30&maxmem=300M   RuntimeException from line 172 of /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php: No model available for [models] {"exception_id":"Wd2jjQpEFhUAAELgsiMAAAAE","exception_url":"/rpc/RunJobs.php?wiki=wikidatawiki&type=ORESFetchScoreJob&maxtime=30&maxmem=300M","caught_by":"mwe_handler"} 
[Exception RuntimeException] (/srv/mediawiki/php-master/extensions/ORES/includes/Cache.php:172) No model available for [models]
  #0 /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php(207): ORES\Cache->getModelId(string)
  #1 /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php(49): ORES\Cache->processRevision(array, string, array)
  #2 /srv/mediawiki/php-master/extensions/ORES/includes/FetchScoreJob.php(72): ORES\Cache->storeScores(array)
  #3 /srv/mediawiki/php-master/includes/jobqueue/JobRunner.php(295): ORES\FetchScoreJob->run()
  #4 /srv/mediawiki/php-master/includes/jobqueue/JobRunner.php(193): JobRunner->executeJob(ORES\FetchScoreJob, Wikimedia\Rdbms\LBFactoryMulti, BufferingStatsdDataFactory, integer)
  #5 /srv/mediawiki/rpc/RunJobs.php(47): JobRunner->run(array)
  #6 {main}
Wed, Oct 11, 5:03 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Tue, Oct 10

awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

The ORES extension code merged to master doesn't quite check out on beta Wikipedia. It seems to work still, although I can't say why. There seems to be no attempt to use the new-style API, and caching doesn't seem to work because there are requests for test_stats every few seconds.

Tue, Oct 10, 9:50 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight moved T175053: Make RCFilters compatible with both the old and new thresholds APIs from Review to Active on the Scoring-platform-team (Current) board.
Tue, Oct 10, 9:47 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
GitHub <noreply@github.com> committed rOEQ892dcd631055: Merge 87c57003d3dcc6989cbfb5ff86b9f11908565b64 into… (authored by awight).
Merge 87c57003d3dcc6989cbfb5ff86b9f11908565b64 into…
Tue, Oct 10, 7:38 PM

Fri, Oct 6

awight moved T177544: Revscoring 2.0 takes up too much memory from Review to Pending deployment on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:53 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight moved T177636: Reduce label_thresholds granularity from Review to Pending deployment on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:53 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight moved T159105: ORES services should have vagrant roles from Active to Done on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:46 PM · Patch-For-Review, Scoring-platform-team (Current), Wikilabels, ORES, MediaWiki-Vagrant
awight moved T153152: Design JADE data storage schema from Active to Review on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:46 PM · Scoring-platform-team (Current), ORES
awight closed T175736: Give ores admins read access to /srv/log/ores/main.log* as Resolved.
Fri, Oct 6, 7:46 PM · Patch-For-Review, Operations, Scoring-platform-team (Current), ORES
awight closed T175736: Give ores admins read access to /srv/log/ores/main.log*, a subtask of T169246: Stress/capacity test new ores* cluster, as Resolved.
Fri, Oct 6, 7:46 PM · User-Ladsgroup, Operations, User-Joe, Scoring-platform-team (Current), Patch-For-Review, ORES
awight moved T175736: Give ores admins read access to /srv/log/ores/main.log* from Active to Done on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:46 PM · Patch-For-Review, Operations, Scoring-platform-team (Current), ORES
awight moved T177544: Revscoring 2.0 takes up too much memory from Active to Review on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:44 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight moved T177636: Reduce label_thresholds granularity from Active to Review on the Scoring-platform-team (Current) board.
Fri, Oct 6, 7:44 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight triaged T177649: revscoring model_info display should include target prediction value as Lowest priority.
Fri, Oct 6, 6:40 PM · Scoring-platform-team
awight created T177649: revscoring model_info display should include target prediction value.
Fri, Oct 6, 6:39 PM · Scoring-platform-team
awight created T177636: Reduce label_thresholds granularity.
Fri, Oct 6, 5:18 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight added a comment to T177544: Revscoring 2.0 takes up too much memory.

Just playing around, I dumped the thresholds table to json:

m = Model.load(open("models/enwiki.damaging.gradient_boosting.model", "r"))
o=json.dumps(m.info['statistics'].label_thresholds.format(formatting='json'))
f=open("out", "w")
f.write(o)
3598434
f.close()
Fri, Oct 6, 4:13 PM · Scoring-platform-team (Current), ORES, revscoring, artificial-intelligence
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

The code is in good shape and ready for another review.

Fri, Oct 6, 1:43 AM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Thu, Oct 5

awight moved T174402: Review and fix file handle management in worker and celery processes from Done to Active on the Scoring-platform-team (Current) board.
Thu, Oct 5, 11:23 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight reopened T174402: Review and fix file handle management in worker and celery processes as "Open".

Reopening, I saw this error kill the celery worker on a few of our nodes during stress testing.

Thu, Oct 5, 11:23 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight reopened T174402: Review and fix file handle management in worker and celery processes, a subtask of T169246: Stress/capacity test new ores* cluster, as Open.
Thu, Oct 5, 11:22 PM · User-Ladsgroup, Operations, User-Joe, Scoring-platform-team (Current), Patch-For-Review, ORES
awight added a comment to T169246: Stress/capacity test new ores* cluster.

Ran a few tests today, and found that the filehandle issue is not solved. The celery service died on several nodes, with the file descriptor error. Good news is that even with a degraded cluster, we can easily serve 2k requests/minute, at 6% CPU usage per machine.

Thu, Oct 5, 11:22 PM · User-Ladsgroup, Operations, User-Joe, Scoring-platform-team (Current), Patch-For-Review, ORES
awight created P6087 (An Untitled Masterwork).
Thu, Oct 5, 11:21 PM
awight added a comment to T176456: ORES on Watchlist causes big slowdown—especially with 'Last revision' filter turned on.

I'll just admit right here that I'm not particularly good at query optimization. But still, I'll go ahead and make the obvious comment here. I'm not sure we're using multi-column indexes correctly. For example, this is the only index that covers the oresc_probability column,

CREATE INDEX /*i*/oresc_model_class_prob ON /*_*/ores_classification (oresc_model, oresc_class, oresc_probability);

The order of columns matters here, so in my understanding, we can only filter by probability after we've already narrowed by model and class. Similarly, we can't filter by class until model is locked in. Something like that. So I'm imagining that our multi-column key is causing the conditions to be evaluated in a difficult order.

Thu, Oct 5, 5:15 PM · User-notice-collaboration, Patch-For-Review, Collaboration-Feature-Rollouts (Collaboration-WL-Graduated-Everywhere), Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Performance, Edit-Review-Improvements
awight added a comment to T176456: ORES on Watchlist causes big slowdown—especially with 'Last revision' filter turned on.

There's a shortcut, add ?rcfilters=1 to the url

Thu, Oct 5, 3:49 AM · User-notice-collaboration, Patch-For-Review, Collaboration-Feature-Rollouts (Collaboration-WL-Graduated-Everywhere), Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Performance, Edit-Review-Improvements

Wed, Oct 4

awight added a comment to T177440: Determine on which wikis ORES filters cause significant Watchlist slowdowns.

@jmatazzoni Can you confirm whether ORES filters would be omitted just on the Watchlist page, or from all RCFilters displays? If the latter, this seems dire and perhaps it's better to delay the RCFilters deployment?

Wed, Oct 4, 8:13 PM · Collaboration-Feature-Rollouts (Collaboration-WL-Graduated-Everywhere), Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Performance, Edit-Review-Improvements
awight added a comment to T176456: ORES on Watchlist causes big slowdown—especially with 'Last revision' filter turned on.

I told @jmatazzoni I would try to take a look at this, by capturing the query on my mw-vagrant dev wiki and then explaining the query plan on production MySQL. However, I'm stuck at making RCFilters appear on my Watchlist page. I've enabled $wgStructuredChangeFiltersOnWatchlist, I see new filters on my Recent Changes page, but nothing on WL yet. Anyone able to help me?

Wed, Oct 4, 8:09 PM · User-notice-collaboration, Patch-For-Review, Collaboration-Feature-Rollouts (Collaboration-WL-Graduated-Everywhere), Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Performance, Edit-Review-Improvements
awight created T177421: Cached thresholds should be invalidated for new model versions..
Wed, Oct 4, 5:03 PM · MediaWiki-extensions-ORES, Scoring-platform-team
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

Looks like there is a misunderstanding here. There are not "TWO breaking changes". The only breaking change is how model information is reported.

Wed, Oct 4, 4:36 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Tue, Oct 3

awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

A bit of discussion from IRC,

[18:12:46] <awight|afk>	 RoanKattouw: o/ I realized the thresholds deployment will still require down time.  Donno why I keep overlooking this key point:
[18:13:10] <awight>	 revscoring 2.x breaks the v1 thresholds API
[18:13:18] <RoanKattouw>	 Wait what
[18:13:23] <RoanKattouw>	 There are TWO breaking changes?
[18:13:31] <awight>	 Which means, no matter what back compat exists in ext-ORES we’re hosed
[18:13:45] <awight>	 lemme double-check that now
[18:14:06] <awight>	 https://ores-misc.wmflabs.org/v2/scores/enwiki/draftquality/?model_info=test_stats
[18:14:29] <awight>	 That’s the old thresholds style, called against revscoring 2.0
[18:15:18] <awight>	 However, that specific bug is masking what the server *should* do.  I’ll see if I can patch the bug and make the old-style call.
[18:15:30] <awight>	 (working on that under T176830)
[18:15:30] <stashbot>	 T176830: Bug: ORES thresholds API fails - https://phabricator.wikimedia.org/T176830
[18:19:24] <awight>	 RoanKattouw: Down time wouldn’t be too horrible, I would disable the ORES UI by config and somehow stop the FetchScoreJob.  Not totally sure how to backfill the missed RCs, but that might be an acceptable loss.
[18:19:50] <RoanKattouw>	 There's a script for that
[18:19:54] <RoanKattouw>	 Backfilling I mean
[18:20:17] <RoanKattouw>	 Also we drop scores on RCs every now and then already, presumably due to ORES errors/timeouts
[18:22:40] <awight>	 k thanks for knowing that.  I won’t worry too much about downtime, if that turns out to be necessary.
Tue, Oct 3, 9:56 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight updated the task description for T174629: Use Themis bias detector on edit quality and draft quality models.
Tue, Oct 3, 8:21 PM · Scoring-platform-team

Mon, Oct 2

awight reassigned T166235: Flagged revs approve model to fiwiki from awight to Halfak.
Mon, Oct 2, 4:45 PM · Scoring-platform-team (Current), User-Ladsgroup, artificial-intelligence, editquality-modeling
awight added a comment to T169969: Regularly purge old ores graphite metrics.

Copying an inventory of metrics here might help.

Mon, Oct 2, 4:42 PM · Scoring-platform-team (Current), ORES, User-fgiunchedi, Operations, Graphite
awight added a comment to T174685: Create list of ORES collaborators (focus on language asset helpers).

@awight to provide list of people submitting false positive reports, and Phabricator queries to identify wiki community members participating there.

Mon, Oct 2, 4:41 PM · JADE, Scoring-platform-team (Current)
awight added a comment to T174558: Deploy damaging/goodfaith model for svwiki.

We aren't blocked on deploying to labs (@Halfak).

Mon, Oct 2, 4:39 PM · Patch-For-Review, WMSE-Development-Support-2017 (Support-for-fighting-vandalism), Scoring-platform-team (Current), editquality-modeling, ORES, artificial-intelligence
awight moved T174558: Deploy damaging/goodfaith model for svwiki from Active to Review on the Scoring-platform-team (Current) board.
Mon, Oct 2, 4:38 PM · Patch-For-Review, WMSE-Development-Support-2017 (Support-for-fighting-vandalism), Scoring-platform-team (Current), editquality-modeling, ORES, artificial-intelligence
awight edited projects for T174403: [Investigate] ORES worker threads shouldn't use Redis connection pool, added: Scoring-platform-team; removed Scoring-platform-team (Current).
Mon, Oct 2, 4:37 PM · Scoring-platform-team
awight closed T175628: Add LV dictionary to install. as Resolved.
Mon, Oct 2, 4:35 PM · Scoring-platform-team (Current), Patch-For-Review, ORES
awight added a comment to T175627: UK dictionary broken in production.

@Halfak mentioned that we might need to retrain models to use the new dict pkg.

Mon, Oct 2, 4:35 PM · Scoring-platform-team (Current), Patch-For-Review, revscoring, artificial-intelligence
awight closed T155440: Add notice to on-wiki labeling pages (e.g. en:WP:Labels) about deprecation. as Resolved.
Mon, Oct 2, 4:33 PM · Scoring-platform-team (Current), Wikilabels, User-Ladsgroup
awight moved T155440: Add notice to on-wiki labeling pages (e.g. en:WP:Labels) about deprecation. from Review to Done on the Scoring-platform-team (Current) board.
Mon, Oct 2, 4:33 PM · Scoring-platform-team (Current), Wikilabels, User-Ladsgroup
awight updated subscribers of T176134: Train & test damaging/goodfaith model for eswiki.

@Halfak Good one for you to review?

Mon, Oct 2, 4:32 PM · artificial-intelligence, User-Ladsgroup, Scoring-platform-team (Current), editquality-modeling, ORES
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

@Halfak is suggesting that we have the frontend fallback from v3 to v1 if it's getting bad results. This could be cached and might allow us a graceful upgrade.

Mon, Oct 2, 4:30 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

@Halfak This is the MW thresholds work.

Mon, Oct 2, 4:05 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Thu, Sep 28

awight added a comment to T176805: Grant Zppix access to ORES project.

@Zppix We discussed this a bit and are agreed that the shared labs VPS is not a good place to do development.

Thu, Sep 28, 11:54 PM · User-Zppix, Scoring-platform-team
awight closed T176830: Bug: ORES thresholds API fails as Invalid.

I was missing the models parameter.

Thu, Sep 28, 11:51 PM · Scoring-platform-team
awight created T177036: Clean up file handle and Redis connection management in ORES worker and celery processes.
Thu, Sep 28, 11:48 PM · Scoring-platform-team, ORES
awight closed T174402: Review and fix file handle management in worker and celery processes as Resolved.

@akosiaris This fixed the problem, thanks!

Thu, Sep 28, 11:47 PM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight closed T174402: Review and fix file handle management in worker and celery processes, a subtask of T169246: Stress/capacity test new ores* cluster, as Resolved.
Thu, Sep 28, 11:47 PM · User-Ladsgroup, Operations, User-Joe, Scoring-platform-team (Current), Patch-For-Review, ORES
awight updated the task description for T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) .
Thu, Sep 28, 11:25 PM · Patch-For-Review, Outreachy (Round-15), Outreach-Programs-Projects, Stewards-and-global-tools, User-Tgr, MediaWiki-extension-requests, artificial-intelligence
awight created T177034: Outreachy microtask: write a CAPTCHA plugin that can fall back to another algorithm.
Thu, Sep 28, 11:25 PM · ConfirmEdit (CAPTCHA extension), Outreachy
awight updated the task description for T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) .
Thu, Sep 28, 11:02 PM · Patch-For-Review, Outreachy (Round-15), Outreach-Programs-Projects, Stewards-and-global-tools, User-Tgr, MediaWiki-extension-requests, artificial-intelligence
awight created T177033: Outreachy microtask: analyze sample mouse movement data and extract feature vectors.
Thu, Sep 28, 11:01 PM · Outreachy
awight awarded T175454: PAWS - Redirect loop detected a Baby Tequila token.
Thu, Sep 28, 9:55 PM · PAWS
awight moved T175053: Make RCFilters compatible with both the old and new thresholds APIs from Active to Review on the Scoring-platform-team (Current) board.
Thu, Sep 28, 9:30 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

Actually, Revscoring 2.x doesn't support the API v1 stats so this change cannot be gracefully deployed. We'll need to schedule an hour or two of downtime to deploy.

Thu, Sep 28, 12:23 AM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Wed, Sep 27

awight created T176914: Wire statistics into test model included with our Vagrant role..
Wed, Sep 27, 9:45 PM · ORES, Scoring-platform-team
awight added a comment to T176456: ORES on Watchlist causes big slowdown—especially with 'Last revision' filter turned on.

This may be a duplicate of T168096.

Wed, Sep 27, 5:53 PM · User-notice-collaboration, Patch-For-Review, Collaboration-Feature-Rollouts (Collaboration-WL-Graduated-Everywhere), Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), Performance, Edit-Review-Improvements
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

Back compat turned out to be easy :-D.

Wed, Sep 27, 7:46 AM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight updated the task description for T176830: Bug: ORES thresholds API fails.
Wed, Sep 27, 7:08 AM · Scoring-platform-team
awight added a comment to T176830: Bug: ORES thresholds API fails.

I also run into consistent crashes when running the master code, trying simple requests like model_info=statistics.

Wed, Sep 27, 12:31 AM · Scoring-platform-team
awight created T176830: Bug: ORES thresholds API fails.
Wed, Sep 27, 12:31 AM · Scoring-platform-team
awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

No apologies needed, it's shallow water under the bridge! I'm mostly just taking notes about the horrors that lie ahead :-). FWIW, I think it was a perfectly reasonable call to break API v1 and v2 compatibility, while we're the only consumer of the broken feature. But I did want to call it out and encourage us to either maintain or deprecate in the future.

Wed, Sep 27, 12:25 AM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Tue, Sep 26

awight added a comment to T175053: Make RCFilters compatible with both the old and new thresholds APIs.

Deploying this without a break in service might be tricky. The new API cannot return the old format test_stats, even on the v1 route, so the ext-ORES code will have to accept both old and new formats, and somehow automatically detect which request format is appropriate. It would make this easier if we could have both revscoring 1 and 2 servers available in production, and switch ext-ORES between them using configuration. IMO we've broken the concept of versioned API endpoints.

Tue, Sep 26, 11:56 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight moved T175053: Make RCFilters compatible with both the old and new thresholds APIs from Monitor (long term) to Active on the Scoring-platform-team (Current) board.
Tue, Sep 26, 10:17 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight moved T175053: Make RCFilters compatible with both the old and new thresholds APIs from Monitor to Current on the Scoring-platform-team board.
Tue, Sep 26, 6:33 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight triaged T175053: Make RCFilters compatible with both the old and new thresholds APIs as High priority.
Tue, Sep 26, 6:26 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES
awight claimed T175053: Make RCFilters compatible with both the old and new thresholds APIs.
Tue, Sep 26, 6:26 PM · MW-1.31-release-notes (WMF-deploy-2017-10-17 (1.31.0-wmf.4)), Patch-For-Review, Scoring-platform-team (Current), Collaboration-Team-Triage, ORES

Fri, Sep 22

awight added a comment to T176333: Deploy JADE MVP API in labs.

Regarding the comment suppression concerns, it might be fair to have our service create Flow posts from the comments, then store the URL in our database. This might even be acceptable in the medium-term, beyond MVP.

Fri, Sep 22, 8:08 AM · Scoring-platform-team (Current), Epic, JADE
awight added a comment to T176333: Deploy JADE MVP API in labs.

This is great, thanks for roughing out the MVP!

Fri, Sep 22, 8:00 AM · Scoring-platform-team (Current), Epic, JADE

Sep 15 2017

awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

Here's where it gets crazy, though. I was able to clone the deployment directory and run the celery server as my user, with horrifying results. It ran!

sudo lsof | wc -l
54280
Sep 15 2017, 2:22 AM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

Fruitless adventure into the bowels of os.pipe

Sep 15 2017, 1:54 AM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

Thanks for correcting my misunderstanding of ulimit!

Sep 15 2017, 1:53 AM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight created P6011 Test for pipe2.
Sep 15 2017, 1:05 AM

Sep 14 2017

awight renamed T175875: Celery task pool doesn't degrade nicely. from [Investigate] Does alarm timeout break Celery? to Celery task pool doesn't degrade nicely..
Sep 14 2017, 4:18 AM · Scoring-platform-team
awight added a comment to T174402: Review and fix file handle management in worker and celery processes.

I think I've been wrong all this time. The system's total open file count is suspiciously close to the configured limit of 65536.

@ores1002:~$ sudo lsof | wc -l
54246
@ores1002:~$ sudo service celery-ores-worker restart;
@ores1002:~$ sudo lsof | wc -l
60007
@ores1002:~$ sudo lsof | wc -l
60127
@ores1002:~$ sudo lsof | wc -l
60278
Sep 14 2017, 4:03 AM · Scoring-platform-team (Current), Operations, Patch-For-Review, User-Ladsgroup, ORES
awight added a comment to T175875: Celery task pool doesn't degrade nicely..

This might be invalid, I need to set more appropriate limits and see how the pool behaves.

Sep 14 2017, 12:00 AM · Scoring-platform-team

Sep 13 2017

awight created T175875: Celery task pool doesn't degrade nicely..
Sep 13 2017, 11:09 PM · Scoring-platform-team