Page MenuHomePhabricator

Halfak (Aaron Halfaker, EpochFail, halfak)
Principal Research Scientist

Projects (20)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:05 PM (277 w, 5 d)
Availability
Available
IRC Nick
halfak
LDAP User
Halfak
MediaWiki User
EpochFail [ Global Accounts ]

Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.

You can find me as:

Recent Activity

Fri, Feb 14

Halfak added a comment to T245311: Address Jade UI issues. .

For the help information for edit quality:

Fri, Feb 14, 10:18 PM · Jade, Scoring-platform-team (Current)
Halfak added a comment to T245311: Address Jade UI issues. .

Looks like the message we want for U1 is jade-nochange.

Fri, Feb 14, 10:10 PM · Jade, Scoring-platform-team (Current)
Halfak created T245311: Address Jade UI issues. .
Fri, Feb 14, 9:20 PM · Jade, Scoring-platform-team (Current)

Thu, Feb 13

Halfak added a comment to T234839: Review Adam's Topic Dataset.

In our current taxonomy, here are the projects related to "society":

  • WikiProject Awards
  • WikiProject Gender Studies
  • WikiProject LGBT studies
  • WikiProject Modern Western Europe
  • WikiProject Pakistani history
  • WikiProject Russian history
  • WikiProject Sexology and sexuality
  • WikiProject Ageing and culture
  • WikiProject Agriculture
  • WikiProject Alternative views
  • WikiProject Animal rights
  • WikiProject Arab world
  • WikiProject Corruption
  • WikiProject Cultural Evolution
  • WikiProject Disability
  • WikiProject Environment
  • WikiProject Fisheries and Fishing
  • WikiProject Forestry
  • WikiProject Globalization
  • WikiProject Home Living
  • WikiProject Human rights
  • WikiProject Human Rights in Sri Lanka
  • WikiProject Nonviolence
  • WikiProject Ethnic groups
  • WikiProject African diaspora
  • WikiProject Asian Americans
  • WikiProject Anthropology
  • WikiProject Assyria
  • WikiProject Azerbaijan
  • WikiProject Basque
  • WikiProject Berbers
  • WikiProject Clans of Scotland
  • WikiProject Igbo
  • WikiProject Indian caste system
  • WikiProject Franco-Americans
  • WikiProject Pashtun
  • WikiProject Taiwan
  • WikiProject Tamil civilization
  • WikiProject Israel Palestine Collaboration
  • WikiProject Sociology
  • WikiProject Feminism
Thu, Feb 13, 4:13 PM · Product-Analytics
Halfak added a comment to T233448: Review prometheus ORES rules for completeness.

OK that sounds good. What are the next steps for updating the dashboards? How do we map our current metrics onto Promethius-generated metrics.

Thu, Feb 13, 2:31 PM · Patch-For-Review, ORES, Scoring-platform-team

Wed, Feb 12

Halfak added a comment to T209884: Content quality scale translatable strings might not work as implemented.

Enwiki has a 6 item scale
Wikidatawiki has a 5 item scale.

Wed, Feb 12, 9:59 PM · I18n, Scoring-platform-team, Jade
Halfak closed T210804: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith} as Resolved.
Wed, Feb 12, 9:58 PM · Scoring-platform-team (Current), Regression, Jade
Halfak moved T210804: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith} from Active to Done on the Scoring-platform-team (Current) board.
Wed, Feb 12, 9:58 PM · Scoring-platform-team (Current), Regression, Jade
Halfak edited projects for T210804: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith}, added: Scoring-platform-team (Current); removed Scoring-platform-team.
Wed, Feb 12, 9:57 PM · Scoring-platform-team (Current), Regression, Jade
Halfak closed T235183: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. as Resolved.
Wed, Feb 12, 9:56 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak closed T235183: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. , a subtask of T235181: Build WikiProject directory topic models for ar, cs, and kowiki, as Resolved.
Wed, Feb 12, 9:56 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak edited projects for T235183: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. , added: Scoring-platform-team (Current); removed Scoring-platform-team.
Wed, Feb 12, 9:56 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak added a parent task for T222271: Document and share operational details of ores-support-checklist: T245068: Add topic information to the ores-support-checklist.
Wed, Feb 12, 9:55 PM · ORES-Support-Checklist, Scoring-platform-team
Halfak added a subtask for T245068: Add topic information to the ores-support-checklist: T222271: Document and share operational details of ores-support-checklist.
Wed, Feb 12, 9:55 PM · drafttopic-modeling, Scoring-platform-team
Halfak created T245068: Add topic information to the ores-support-checklist.
Wed, Feb 12, 9:55 PM · drafttopic-modeling, Scoring-platform-team
Halfak closed T229401: Key-value extraction misses on Wikipedia:WikiProject Council/Directory/WikiProject template invocations as Declined.

Thanks for your work on this @dr0ptp4kt, since we transitioned to a manual taxonomy this isn't a problem anymore. So I'd like to decline this task.

Wed, Feb 12, 9:51 PM · Patch-For-Review, drafttopic-modeling, Scoring-platform-team
Halfak added a comment to T233448: Review prometheus ORES rules for completeness.

Should we do the cleanup that I proposed above?

Wed, Feb 12, 9:50 PM · Patch-For-Review, ORES, Scoring-platform-team
Halfak lowered the priority of T223313: Add legal language to wikilabels from High to Low.
Wed, Feb 12, 9:47 PM · WMF-Legal, Wikilabels, Scoring-platform-team
Halfak added a comment to T223313: Add legal language to wikilabels.

Ping.

Wed, Feb 12, 9:47 PM · WMF-Legal, Wikilabels, Scoring-platform-team
Halfak edited projects for T180822: Improve ORES articlequality feature extraction for images, added: Scoring-platform-team (Current); removed Scoring-platform-team.
Wed, Feb 12, 9:44 PM · Scoring-platform-team (Current), artificial-intelligence, articlequality-modeling
Halfak lowered the priority of T221640: Move fiwiki from custom to config-based Makfile from Medium to Low.
Wed, Feb 12, 9:44 PM · Scoring-platform-team, artificial-intelligence, editquality-modeling
Halfak moved T242013: Implement native NN model in revscoring from Untriaged to Research on the Scoring-platform-team board.
Wed, Feb 12, 9:43 PM · Scoring-platform-team (Research), artificial-intelligence, revscoring
Halfak moved T242013: Implement native NN model in revscoring from Untriaged to Research on the Scoring-platform-team board.
Wed, Feb 12, 9:42 PM · Scoring-platform-team (Research), artificial-intelligence, revscoring
Halfak renamed T242013: Implement native NN model in revscoring from Implement native NN topic model in revscoring to Implement native NN model in revscoring.
Wed, Feb 12, 9:42 PM · Scoring-platform-team (Research), artificial-intelligence, revscoring
Halfak moved T243357: Once the ORES drafttopic - ElasticSearch pipeline is set up, update data about all articles from Untriaged to Monitor on the Scoring-platform-team board.
Wed, Feb 12, 9:41 PM · Scoring-platform-team, Discovery-Search, Growth-Team (Current Sprint), NewcomerTasks 1.1
Halfak moved T243359: Define configuration for ORES articletopic search from Untriaged to Monitor on the Scoring-platform-team board.
Wed, Feb 12, 9:40 PM · Patch-For-Review, NewcomerTasks 1.1, Scoring-platform-team, Growth-Team (Current Sprint)
Halfak added a comment to T243359: Define configuration for ORES articletopic search.

The models use the exact same names in other wikis. There will need to be a localized mapping at the UI level.

Wed, Feb 12, 9:40 PM · Patch-For-Review, NewcomerTasks 1.1, Scoring-platform-team, Growth-Team (Current Sprint)
Halfak moved T244039: Api tests: Hard deprecate $this->doLogin, remove calls in favor of passing a user where needed from Untriaged to Monitor on the Scoring-platform-team board.
Wed, Feb 12, 9:39 PM · MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), Patch-For-Review, Scoring-platform-team, Wikidata, Growth-Team, TitleBlacklist, TimedMediaHandler, ORES, MediaWiki-extensions-CodeReview, MediaWiki-extensions-BounceHandler, MediaWiki-extensions-WikibaseRepository, Thanks, MediaWiki-extensions-Newsletter, MassMessage, Technical-Debt (Deprecation process), MediaWiki-General, User-DannyS712
Halfak added a comment to T244192: Newcomer tasks: ORES ontology mapping and score thresholds.

Looks like this is done from the Scoring-platform-team side. Let me know if you need anything else from us.

Wed, Feb 12, 9:38 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak moved T244192: Newcomer tasks: ORES ontology mapping and score thresholds from Active to Done on the Scoring-platform-team (Current) board.
Wed, Feb 12, 9:37 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak edited projects for T244192: Newcomer tasks: ORES ontology mapping and score thresholds, added: Scoring-platform-team (Current); removed Scoring-platform-team.
Wed, Feb 12, 9:37 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak edited projects for T244297: Newcomer tasks: set initial thresholds for ORES articletopic, added: Scoring-platform-team (Current); removed Scoring-platform-team.
Wed, Feb 12, 9:37 PM · Scoring-platform-team (Current), Patch-For-Review, Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak moved T244421: Newcomer tasks: UX changes for ORES topics from Untriaged to Monitor on the Scoring-platform-team board.
Wed, Feb 12, 9:36 PM · Growth Design, Growth-Team (Current Sprint), Scoring-platform-team, Discovery-Search
Halfak moved T244569: SpecialRecentChanges::doMainQuery needs tunning from Untriaged to Monitor on the Scoring-platform-team board.
Wed, Feb 12, 9:36 PM · Growth-Team, Core Platform Team Workboards (Clinic Duty Team), Scoring-platform-team, Performance Issue, MediaWiki-Special-pages, ORES
Halfak added a comment to T244569: SpecialRecentChanges::doMainQuery needs tunning.

Adding Growth-Team because this is related to their RecentChanges Filters.

Wed, Feb 12, 9:36 PM · Growth-Team, Core Platform Team Workboards (Clinic Duty Team), Scoring-platform-team, Performance Issue, MediaWiki-Special-pages, ORES
Halfak added a project to T244569: SpecialRecentChanges::doMainQuery needs tunning: Growth-Team.
Wed, Feb 12, 9:35 PM · Growth-Team, Core Platform Team Workboards (Clinic Duty Team), Scoring-platform-team, Performance Issue, MediaWiki-Special-pages, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

Here's an strace of one of the child processes that goes berserk:

Wed, Feb 12, 8:35 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

I tried removing the --die-on-term option and I get the same behavior.

Wed, Feb 12, 7:56 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

From https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html

To shutdown uWSGI use SIGINT or SIGQUIT instead. If you absolutely can not live with uWSGI being so disrespectful towards SIGTERM, by all means enable the die-on-term option.

Wed, Feb 12, 7:52 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

OK so I've done some tests. It's clear that we can see this CPU/memory spike when shutting down uwsgi. Essentially, all of the child processes (workers) suddenly use as much CPU as they can. top doesn't report more memory being used by the workers, but it does report precipitous drop in available memory while shutting down. Further the logs report:

Wed, Feb 12, 7:43 PM · Scoring-platform-team (Current), Operations, ORES
Halfak renamed T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) from Ores celery OOM events in all hosts to ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).
Wed, Feb 12, 7:37 PM · Scoring-platform-team (Current), Operations, ORES
Halfak moved T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) from Pending deployment to Active on the Scoring-platform-team (Current) board.
Wed, Feb 12, 2:00 PM · Scoring-platform-team (Current), Operations, ORES

Tue, Feb 11

Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

Could it be that the listen queue is filling up when we stop uwsgi?

Tue, Feb 11, 3:17 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

Maybe related:

Tue, Feb 11, 3:02 PM · Scoring-platform-team (Current), Operations, ORES

Mon, Feb 10

Halfak added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

That's a good question. If they are using the enwiki model -- even crosswiki-- they should probably use enwiki thresholds.

Mon, Feb 10, 9:56 PM · Scoring-platform-team (Current), Patch-For-Review, Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

+1 to @Tgr. "Useful threshold" depends on what you are optimizing for.

Mon, Feb 10, 9:32 PM · Scoring-platform-team (Current), Patch-For-Review, Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak moved T242648: Implement CSS styles for Jade Entity UI from Active to Review on the Scoring-platform-team (Current) board.
Mon, Feb 10, 5:55 PM · MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), Patch-For-Review, Scoring-platform-team (Current), Design, Jade
Halfak moved T208819: Implement Jade Entity UI from Active to Review on the Scoring-platform-team (Current) board.
Mon, Feb 10, 5:54 PM · Scoring-platform-team (Current), Design, Jade
Halfak moved T205545: Add English Language idioms to revscoring from Active to Review on the Scoring-platform-team (Current) board.
Mon, Feb 10, 5:54 PM · Scoring-platform-team (Current), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring

Fri, Feb 7

Halfak moved T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) from Active to Pending deployment on the Scoring-platform-team (Current) board.
Fri, Feb 7, 5:03 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

I just deployed a change to beta that dramatically reduced the memory usage of uwsgi processes.

Fri, Feb 7, 5:01 PM · Scoring-platform-team (Current), Operations, ORES
Halfak created P10347 ORES beta deployment fail.
Fri, Feb 7, 3:27 PM

Thu, Feb 6

Halfak committed rORES6ed308c5150c: Adds a test to check on our build_event_set method. (authored by Halfak).
Adds a test to check on our build_event_set method.
Thu, Feb 6, 10:40 PM
Halfak committed rORESc6d4aaea8b46: Adds a test to check on our build_event_set method. (authored by Halfak).
Adds a test to check on our build_event_set method.
Thu, Feb 6, 9:36 PM
Halfak committed rORESf69989cf436b: reverts werkzeug requirement to 0.16.1 (authored by Halfak).
reverts werkzeug requirement to 0.16.1
Thu, Feb 6, 9:27 PM
Halfak committed rORESf3b1ca65bfe9: Implements ModelLoader for ScoringContext to control server/client memory (authored by Halfak).
Implements ModelLoader for ScoringContext to control server/client memory
Thu, Feb 6, 8:43 PM
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

When I start up the deployment ORES config locally with 4 workers, I can see that we are using ~2516000 bytes of RES for two processes. It looks like my available RAM goes down by ~5000000 bytes so that lines up with actual memory usage.

Thu, Feb 6, 4:17 PM · Scoring-platform-team (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

OK rolled back.

Thu, Feb 6, 4:10 PM · Scoring-platform-team (Current), Operations, ORES

Wed, Feb 5

Halfak closed T235181: Build WikiProject directory topic models for ar, cs, and kowiki, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T235181: Build WikiProject directory topic models for ar, cs, and kowiki as Resolved.
Wed, Feb 5, 4:27 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak closed T235184: Generate word vectors for ar, cs, en, and ko using FastText, a subtask of T235183: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. , as Resolved.
Wed, Feb 5, 4:27 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak closed T235184: Generate word vectors for ar, cs, en, and ko using FastText, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T235184: Generate word vectors for ar, cs, en, and ko using FastText as Resolved.
Wed, Feb 5, 4:27 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak closed T242345: Implement English pronoun count features in topic models, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T242345: Implement English pronoun count features in topic models as Resolved.
Wed, Feb 5, 4:27 PM · drafttopic-modeling, Scoring-platform-team (Current)
Halfak closed T243107: Retrain enwiki drafttopic models on supervised vectors, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T243107: Retrain enwiki drafttopic models on supervised vectors as Resolved.
Wed, Feb 5, 4:27 PM · drafttopic-modeling, Scoring-platform-team (Current)
Halfak closed T243108: Add new vectors to deployment assets as Resolved.
Wed, Feb 5, 4:27 PM · ORES, drafttopic-modeling, Scoring-platform-team (Current)
Halfak closed T243108: Add new vectors to deployment assets, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T242647: Implement common lib for text preproccessing as Resolved.
Wed, Feb 5, 4:27 PM · drafttopic-modeling, revscoring, artificial-intelligence, Scoring-platform-team (Current)
Halfak closed T243522: Reduce memory footprint of topic models as Resolved.
Wed, Feb 5, 4:27 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak closed T243522: Reduce memory footprint of topic models, a subtask of T243451: Deploy ORES -- Late Jan 2020, as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak closed T243451: Deploy ORES -- Late Jan 2020 as Resolved.
Wed, Feb 5, 4:27 PM · ORES, Scoring-platform-team (Current)
Halfak moved T243451: Deploy ORES -- Late Jan 2020 from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · ORES, Scoring-platform-team (Current)
Halfak moved T243522: Reduce memory footprint of topic models from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak moved T235181: Build WikiProject directory topic models for ar, cs, and kowiki from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring
Halfak moved T242345: Implement English pronoun count features in topic models from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · drafttopic-modeling, Scoring-platform-team (Current)
Halfak moved T243108: Add new vectors to deployment assets from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · ORES, drafttopic-modeling, Scoring-platform-team (Current)
Halfak moved T243107: Retrain enwiki drafttopic models on supervised vectors from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · drafttopic-modeling, Scoring-platform-team (Current)
Halfak moved T235184: Generate word vectors for ar, cs, en, and ko using FastText from Pending deployment to Done on the Scoring-platform-team (Current) board.
Wed, Feb 5, 4:26 PM · Scoring-platform-team (Current), artificial-intelligence, drafttopic-modeling, revscoring

Tue, Feb 4

Halfak added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

Here's a gist that I put together with my initial explorations and discussions of choosing thresholds: https://gist.github.com/halfak/630dc3fd811995c2a0260d43da462645

Tue, Feb 4, 9:58 PM · Scoring-platform-team (Current), Patch-For-Review, Discovery-Search (Current work), Growth-Team (Current Sprint)
Halfak created P10310 Get ORES topic thresholds. .
Tue, Feb 4, 9:16 PM · Growth-Team, Scoring-platform-team

Mon, Feb 3

Halfak created T244151: Wikilabels docs -- Make install docs better.
Mon, Feb 3, 5:44 PM · Wikilabels, Scoring-platform-team (Current)
Halfak placed T242013: Implement native NN model in revscoring up for grabs.
Mon, Feb 3, 5:43 PM · Scoring-platform-team (Research), artificial-intelligence, revscoring

Fri, Jan 24

Halfak reassigned T242648: Implement CSS styles for Jade Entity UI from Halfak to kevinbazira.
Fri, Jan 24, 9:14 PM · MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), Patch-For-Review, Scoring-platform-team (Current), Design, Jade
Halfak moved T242647: Implement common lib for text preproccessing from Review to Done on the Scoring-platform-team (Current) board.
Fri, Jan 24, 9:14 PM · drafttopic-modeling, revscoring, artificial-intelligence, Scoring-platform-team (Current)
Halfak moved T243522: Reduce memory footprint of topic models from Review to Pending deployment on the Scoring-platform-team (Current) board.
Fri, Jan 24, 9:14 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak added a comment to T243522: Reduce memory footprint of topic models.

All models updated. Looks good: https://github.com/wikimedia/drafttopic/pull/47

Fri, Jan 24, 6:55 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak claimed T243522: Reduce memory footprint of topic models.
Fri, Jan 24, 6:55 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak moved T243522: Reduce memory footprint of topic models from Active to Review on the Scoring-platform-team (Current) board.
Fri, Jan 24, 6:55 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)

Thu, Jan 23

Halfak added a comment to T243522: Reduce memory footprint of topic models.

OK so I've now generated learned vectors for 50c/100k vocab. I just trained the enwiki articletopic model.

Thu, Jan 23, 10:10 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak added a comment to T243522: Reduce memory footprint of topic models.

Aha! It looks like memory usage is greater when we do not use the mmap='r' option. Here's what I see after I run model = KeyedVectors.load("enwiki-20191201-learned_vectors.100_cell.300k.kv").

Thu, Jan 23, 9:57 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak added a comment to T243553: Failed executing job: ORESFetchScoreJob.

Should be back online now.

Thu, Jan 23, 9:52 PM · Scoring-platform-team, ORES, Beta-Cluster-Infrastructure
Halfak added a comment to T243553: Failed executing job: ORESFetchScoreJob.

We had a failed deploy earlier today. I'll go clean it up.

Thu, Jan 23, 9:48 PM · Scoring-platform-team, ORES, Beta-Cluster-Infrastructure
Halfak added a comment to T217232: Outreach campaign to raise awareness of Scoring Platform.

Indeed. I think a good next step is to get a rep from Audiences (who have promised us some product support) and Keegan into a meeting so that we can review and re-hash. I'll keep that on my todo list to set up after allhands.

Thu, Jan 23, 6:32 PM · CommRel-Specialists-Support (Jan-Mar-2020), Community comms and outreach, Scoring-platform-team
Halfak added a comment to T243522: Reduce memory footprint of topic models.

First, I'm trying out memory-maps. I converted out word2vec format into gensim's KV objects with:

>>> from gensim.models import KeyedVectors
>>> model = KeyedVectors.load_word2vec_format("enwiki-20191201-learned_vectors.100_cell.300k.vec")
>>> model.save("enwiki-20191201-learned_vectors.100_cell.300k.kv")
Thu, Jan 23, 4:37 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak added a comment to T217232: Outreach campaign to raise awareness of Scoring Platform.

Hey folks! Sorry to not chime in sooner. We're blocked right now on product support from Audiences. In the meantime, if you see my question from Aug 26th, that is still unanswered.

Thu, Jan 23, 4:12 PM · CommRel-Specialists-Support (Jan-Mar-2020), Community comms and outreach, Scoring-platform-team
Halfak added a comment to T243522: Reduce memory footprint of topic models.

I'm investigating memory usage. I'm working from a python terminal on my dev laptop. Essentially, I'm tracking VSZ and RSS while running commands.
Before loading anything:

  • VSZ: 35600
  • RSS: 9340

After from revscoring import Model:

  • VSZ: 495752
  • RSS: 76216

After enwiki = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1010852
  • RSS: 567348

After arwiki = Model.load(open("models/arwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1385732
  • RSS: 941856

After enwiki2 = Model.load(open("models/enwiki.articletopic.gradient_boosting.model"))

  • VSZ: 1464596
  • RSS: 1020768 ----

This is higher memory usage than I think we are really prepared for. After loading all of the models, it ends up being about 3x as much memory as we needed before. As we can see from the final load, that memory gets shared relatively straightforwardly, but it is still too much.
I wonder if we can use gensim's memory-map mode to get around this. Alternatively, we can reduce the dimensions of our vectors or reduce the size of the vocabulary.

Thu, Jan 23, 4:03 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak created T243522: Reduce memory footprint of topic models.
Thu, Jan 23, 4:03 PM · Patch-For-Review, ORES, Scoring-platform-team (Current)
Halfak added a comment to T243451: Deploy ORES -- Late Jan 2020.

I'm investigating memory usage. I'm working from a python terminal on my dev laptop. Essentially, I'm tracking VSZ and RSS while running commands.

Thu, Jan 23, 4:00 PM · ORES, Scoring-platform-team (Current)