bmansurov (Baha)
Engineering

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Nov 24 2014, 11:16 PM (207 w, 5 d)
Availability
Available
IRC Nick
bmansurov
LDAP User
Bmansurov
MediaWiki User
Bmansurov (WMF) [ Global Accounts ]

Recent Activity

Fri, Nov 16

bmansurov added a comment to T209655: Copy Wikidata dumps to HDFs.

Thanks, @JAllemandou! The more recent dumps are very useful.

Fri, Nov 16, 2:00 PM · Wikidata, Research, Analytics

Thu, Nov 15

bmansurov created T209655: Copy Wikidata dumps to HDFs.
Thu, Nov 15, 10:22 PM · Wikidata, Research, Analytics
bmansurov added a comment to T208896: Cannot connect to Spark with Jupyter notebook on stat1007.

Just a follow up that Python kernels won't start on stat1007 for some reason. I've given up trying to fix the issue for now.

Thu, Nov 15, 2:54 PM · Analytics-Kanban, Research, Analytics

Wed, Nov 14

bmansurov closed T208896: Cannot connect to Spark with Jupyter notebook on stat1007 as Resolved.

@Ottomata was able to figure it out: https://gist.github.com/ottomata/7651d0f008aa18dcd948ef3636424b23

Wed, Nov 14, 8:38 PM · Analytics-Kanban, Research, Analytics
bmansurov added a comment to T208896: Cannot connect to Spark with Jupyter notebook on stat1007.

Also, I connect to notebooks using an Emacs plugin, and SWAP as authentication enabled, which prevents Emacs from connecting to Jupyter.

Wed, Nov 14, 8:05 PM · Analytics-Kanban, Research, Analytics
bmansurov updated the task description for T208896: Cannot connect to Spark with Jupyter notebook on stat1007.
Wed, Nov 14, 7:59 PM · Analytics-Kanban, Research, Analytics
bmansurov added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Miriam @DarTar please see T209503#4747097.

Wed, Nov 14, 5:47 PM · Analytics-EventLogging, Analytics-Kanban
bmansurov updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.
Wed, Nov 14, 5:46 PM · Analytics-EventLogging, Analytics-Kanban
bmansurov updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.
Wed, Nov 14, 5:30 PM · Analytics-EventLogging, Analytics-Kanban

Tue, Nov 13

bmansurov added a comment to T208896: Cannot connect to Spark with Jupyter notebook on stat1007.

@fdans thanks for looking into this issue. I need to use spark because my work is a little heavy on computation side and would adversely affect other SWAP users if I did so. One of our researchers had to go back to a stats machine for this reason.

Tue, Nov 13, 4:00 PM · Analytics-Kanban, Research, Analytics

Thu, Nov 8

bmansurov added a comment to T209050: Print schema is whitelisting both session ids and page ids.

Sorry, @fdans, I won't be able to help. I no longer maintain that schema. I'll update the wiki page.

Thu, Nov 8, 5:19 PM · Readers-Web-Backlog, Analytics

Tue, Nov 6

bmansurov updated subscribers of T208896: Cannot connect to Spark with Jupyter notebook on stat1007.
Tue, Nov 6, 8:58 PM · Analytics-Kanban, Research, Analytics
bmansurov created T208896: Cannot connect to Spark with Jupyter notebook on stat1007.
Tue, Nov 6, 8:57 PM · Analytics-Kanban, Research, Analytics
bmansurov added a comment to T208622: Import recommendations into production database.

@faidon thanks! Next week sounds good. We're having our offsite this week, so there's no rush.

Tue, Nov 6, 6:25 PM · Operations, Research
bmansurov created P7767 pyspark woes.
Tue, Nov 6, 6:22 PM

Fri, Nov 2

bmansurov moved T208622: Import recommendations into production database from Staged to In Progress on the Research board.
Fri, Nov 2, 8:24 PM · Operations, Research
bmansurov updated subscribers of T208622: Import recommendations into production database.
Fri, Nov 2, 8:23 PM · Operations, Research
bmansurov created T208622: Import recommendations into production database.
Fri, Nov 2, 8:21 PM · Operations, Research

Fri, Oct 26

bmansurov committed rMSRA2b29ed032134: Add translation based 'morelike' API for missing articles (authored by bmansurov).
Add translation based 'morelike' API for missing articles
Fri, Oct 26, 6:38 PM
bmansurov added a comment to T205294: Request to create database and account for recommendation API.

@jcrespo can you share the password for the 'recommendationapi' user so that I can load some data into the database (I don't have access to the private puppet repo)? Also can you tell me which hosts allow me to connect to the database? Thanks!

Fri, Oct 26, 1:07 PM · Patch-For-Review, DBA, Research

Thu, Oct 25

bmansurov added a comment to T207795: Create the recommendation api DB in Beta.

Just be careful not to show/store anything private to/on it as it's on a labs system.

OK!

Thu, Oct 25, 3:24 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov added a comment to T207795: Create the recommendation api DB in Beta.

You shouldn't need to directly interact with the password yourself, as I imagine puppet will just deploy it into the configuration for your service.

Thu, Oct 25, 3:19 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov added a comment to T207795: Create the recommendation api DB in Beta.

Thanks, @Krenair. This is very helpful. Where's the password stored? How can I get it? For the tools database it's stored at $HOME/replica.my.cnf, but this is presumably different?

Thu, Oct 25, 2:28 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov added a comment to T207795: Create the recommendation api DB in Beta.

Thanks, @Krenair. Can you also share any documentation on how to connect to the database?

Thu, Oct 25, 1:30 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov added a comment to T203253: Run a Second Round of Data Collection.

We've stopped data collection as of now.

Thu, Oct 25, 11:09 AM · Patch-For-Review, Research-Archive, Performance-Team (Radar), MW-1.32-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Research-2017-18-Q4
bmansurov updated the task description for T203253: Run a Second Round of Data Collection.
Thu, Oct 25, 11:08 AM · Patch-For-Review, Research-Archive, Performance-Team (Radar), MW-1.32-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Research-2017-18-Q4

Wed, Oct 24

bmansurov committed rMSRA9fd5d81ac2fa: Add translation based 'morelike' API for missing articles (authored by bmansurov).
Add translation based 'morelike' API for missing articles
Wed, Oct 24, 8:49 PM

Tue, Oct 23

bmansurov added a comment to T207795: Create the recommendation api DB in Beta.

OK, removed the backup part.

Tue, Oct 23, 9:25 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov updated the task description for T207795: Create the recommendation api DB in Beta.
Tue, Oct 23, 9:24 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov updated the task description for T207795: Create the recommendation api DB in Beta.
Tue, Oct 23, 9:13 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov merged T207756: Beta labs: Request to create database and account for recommendation API into T207795: Create the recommendation api DB in Beta.
Tue, Oct 23, 9:11 PM · Release-Engineering-Team, Core Platform Team Backlog (Watching / External), Services (watching), Research, Recommendation-API, Beta-Cluster-Infrastructure
bmansurov merged task T207756: Beta labs: Request to create database and account for recommendation API into T207795: Create the recommendation api DB in Beta.
Tue, Oct 23, 9:11 PM · DBA, Research
bmansurov removed projects from T206083: Many client side errors on citation data, significant percentages of data lost : Discovery-Search (Current work), Patch-For-Review.

Thanks, @Nuria!

Tue, Oct 23, 5:29 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics
bmansurov created T207756: Beta labs: Request to create database and account for recommendation API.
Tue, Oct 23, 3:14 PM · DBA, Research
bmansurov added a comment to T205452: Setup access from service to mysql.

@jcrespo can you please help out with T205452#4674282? Thanks!

Tue, Oct 23, 1:52 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research
bmansurov moved T206083: Many client side errors on citation data, significant percentages of data lost from Backlog to Needs review on the Discovery-Search (Current work) board.
Tue, Oct 23, 1:28 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics
bmansurov added a project to T206083: Many client side errors on citation data, significant percentages of data lost : Discovery-Search (Current work).
Tue, Oct 23, 1:28 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics
bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

@Nuria I'd appreciate your review of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/468490/ before the branch cut today. I'd like to get the fix in to go out this Thursday. Thanks!

Tue, Oct 23, 1:27 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics

Oct 18 2018

bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

@Miriam I've submitted a patch to limit the link text to 100 characters and page title to 200 characters. Let me know if these numbers need to change. Thanks!

Oct 18 2018, 10:54 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics
bmansurov added a comment to T183039: Gather labels as ground truth for translation and synonym section classifiers.

OK, added the link.

Oct 18 2018, 9:09 PM · Research-2017-18-Q3, Research
bmansurov added a comment to T183039: Gather labels as ground truth for translation and synonym section classifiers.

@leila should I turn off https://gapfinder-tools.wmflabs.org/section-alignment/ then?

Oct 18 2018, 6:33 PM · Research-2017-18-Q3, Research
bmansurov updated the task description for T207406: Recommendation API: resolve interlanguage confclits.
Oct 18 2018, 6:29 PM · Research
bmansurov triaged T207406: Recommendation API: resolve interlanguage confclits as High priority.
Oct 18 2018, 6:29 PM · Research

Oct 17 2018

bmansurov added a comment to T205452: Setup access from service to mysql.

Looks like this is done, @mobrovac?

Oct 17 2018, 2:50 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research

Oct 15 2018

bmansurov updated the task description for T205452: Setup access from service to mysql.
Oct 15 2018, 1:44 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research

Oct 11 2018

bmansurov added a comment to T203046: Output 1.4: Public test APIs corresponding to section recommendation algorithms.

OK

Oct 11 2018, 11:26 PM · Epic, address-knowledge-gaps
bmansurov updated subscribers of T203046: Output 1.4: Public test APIs corresponding to section recommendation algorithms.

@leila, ooops, I mixed up section with article. Since this task was assigned to me, while @diego is working on it, I got confused. Diego should probably claim this task, IMO. What you're saying makes sense.

Oct 11 2018, 9:36 PM · Epic, address-knowledge-gaps

Oct 10 2018

bmansurov added a comment to T178925: Review Korean Morphological Libraries.

\o/ I see you got some input from a native speaker for the remaining sections, @TJones.

Oct 10 2018, 10:20 PM · Discovery-Search (Current work), Discovery

Oct 9 2018

bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

According to grafana, on average we're getting 613 events/second for the CitationUsagePageLoad schema. We're also getting about 41 client side errors/minute.

Oct 9 2018, 3:10 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics
bmansurov closed T199807: Schema:CitationUsage improvements as Resolved.

Further improvements will be done as part of T206083.

Oct 9 2018, 1:26 PM · MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, Research, Research-Archive
bmansurov closed T199807: Schema:CitationUsage improvements, a subtask of T190437: Analyze the first round of data about readers' usage of references, as Resolved.
Oct 9 2018, 1:26 PM · Research-2017-18-Q4, Epic, Research-Programs
bmansurov added a comment to T203046: Output 1.4: Public test APIs corresponding to section recommendation algorithms.

@leila is this task similar to T203263? Maybe I should merge them?

Oct 9 2018, 1:23 PM · Epic, address-knowledge-gaps
bmansurov updated the task description for T203253: Run a Second Round of Data Collection.
Oct 9 2018, 1:02 PM · Patch-For-Review, Research-Archive, Performance-Team (Radar), MW-1.32-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Research-2017-18-Q4
bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

@Miriam any updates on this? Did you get a chance to talk with Michele and Tiziano?

Oct 9 2018, 1:01 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics

Oct 5 2018

bmansurov closed T187957: Set up Github -> Diffusion -> Deployment workflow for Research landing page as Resolved.
Oct 5 2018, 7:57 PM · Research-landing-page, Research
bmansurov moved T187957: Set up Github -> Diffusion -> Deployment workflow for Research landing page from Staged to Done (current quarter) on the Research board.

We discussed T187957#4146727 with Dario, and decided to keep things as is for now. We can create changes both on Gerrit and Github. Changes created in Gerrit will be merged using the Gerrit workflow (+2'ing). Changes created on Github will be pushed to Gerrit manually.

Oct 5 2018, 6:40 PM · Research-landing-page, Research
bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

@Amire80

Resolving these conflicts is challenging and time-consuming, but it's nevertheless feasible.

Oct 5 2018, 5:43 PM · Epic, address-knowledge-gaps

Oct 4 2018

bmansurov added a comment to T178925: Review Korean Morphological Libraries.

@TJones, OK, I'll wait for your reply and see what I should do differently while doing the rest. (Thanks for the compliment.)

Oct 4 2018, 9:14 PM · Discovery-Search (Current work), Discovery
bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

@Amire80 thanks for chiming in. I think we'll all benefit from identifying these problematic interlanguage links and fixing them. Hopefully we can publish a list of issues.

Oct 4 2018, 2:26 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T178925: Review Korean Morphological Libraries.

I've left some notes on the talk page. I'll do the remaining bits as I find some spare time.

Oct 4 2018, 1:47 PM · Discovery-Search (Current work), Discovery

Oct 3 2018

bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

@leila that's what I understood too. So in order to link Neoplasm (en) to Tumor (de), we'd go from Neoplasm (en) to Neoplasma (de) and then from that article to Tumor (de).

Oct 3 2018, 10:01 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

@leila thanks for the lead. Do you remember if in 2015 (when the scripts were written), Neoplasm (en) was linked to Neoplasma (de) in langlinks. Right now, it seems that's not the case:

Oct 3 2018, 9:30 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T206142: Quarterly check-in report - TEC-9.

@leila, I'm not sure which slide is best for theses but we also worked on:

  1. Implemented the paper (Growing Wikipedia Across Languages via Recommendation);
  2. Generated article recommendation for top 50 language pairs used in ContentTranslation;
  3. Created a morelike API for missing aritcles (still WIP though);
  4. Ongoing efforts to take the article creation API to production (sorted out the database issue).
Oct 3 2018, 8:36 PM · Research
bmansurov added a comment to T178925: Review Korean Morphological Libraries.

@TJones, OK, I'll take a look. I'll leave a comment here when I'm done.

Oct 3 2018, 4:39 PM · Discovery-Search (Current work), Discovery
bmansurov added a comment to T178925: Review Korean Morphological Libraries.

I know some Korean and I'd be happy to help with this task if you don't hear from native Korean speakers.

Oct 3 2018, 3:48 PM · Discovery-Search (Current work), Discovery
bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

@Nuria that makes sense. Rather than limiting URL length (so that we don't get incomplete data), would it be a good idea to not report these errors? So I'd detect long URLs and not have EL ping these URLs. Would that work?

Oct 3 2018, 12:31 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics

Oct 2 2018

bmansurov added a comment to T206083: Many client side errors on citation data, significant percentages of data lost .

For CitationUsagePageLoad we're getting about 450-800 events per second, which gives us 37,500 events per minute. At 200 errors per minute, we one error every 187.5 events. @Miriam and I found this not significant and that's why submitted this patch.

Oct 2 2018, 11:10 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Analytics

Oct 1 2018

bmansurov added a comment to T205452: Setup access from service to mysql.

@mobrovac no blockers left?

Oct 1 2018, 12:20 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research

Sep 28 2018

bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

Turns out we cannot reliably detect redirects across languages. For example, '"Them"' redirects to 'Them_(King_Diamond_album)' (Q1756739). Since we're trying to figure out the Wikidata ID of '"Them"' we can only search Wikidata items by English labels. There are many items with that label:

  • Them (Q1338638)
  • Them (Q37545106)
  • Them (Q1112469)
  • Them (Q3591139)
  • etc.
Sep 28 2018, 3:04 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T205452: Setup access from service to mysql.

Friendly ping @Joe @fgiunchedi. Can you please help with this task? Thanks!

Sep 28 2018, 1:40 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research

Sep 26 2018

bmansurov updated subscribers of T203041: Output 2.1: An improved task recommendation API.
Sep 26 2018, 5:09 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T203041: Output 2.1: An improved task recommendation API.

@leila I've been experimenting with the implementation of the section 2.1 of the paper. We can get redirects from Hive (prod.redirect), but not sure how to retrieve interlanguage links as they are not being used in Wikpedia according to this (see the intro). Do you know how?

Sep 26 2018, 4:28 PM · Epic, address-knowledge-gaps

Sep 25 2018

bmansurov added a comment to T190772: Build the first version of section recommender by fusing the synonym and translator models.

BTW, try adding ensure_ascii=False to json_dumps for easy debugging.

Sep 25 2018, 10:50 PM · Research-2017-18-Q4, Research
bmansurov added a comment to T190772: Build the first version of section recommender by fusing the synonym and translator models.

I think here's why it's happening. You'll see that articles appear in both current.xml and current[N].xml. Here's an example:

Sep 25 2018, 10:42 PM · Research-2017-18-Q4, Research
bmansurov added a comment to T190772: Build the first version of section recommender by fusing the synonym and translator models.

@diego I looked at your code briefly and tested it with lang=uz, and the output JSON didn't contain any duplicate rows. Can you paste one of the duplicate rows from ruwiki maybe?

Sep 25 2018, 8:56 PM · Research-2017-18-Q4, Research
bmansurov moved T205452: Setup access from service to mysql from Staged to In Progress on the Research board.
Sep 25 2018, 3:53 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research
bmansurov created T205452: Setup access from service to mysql.
Sep 25 2018, 3:53 PM · Core Platform Team Kanban (Done with CPT), Services (done), Recommendation-API, SCB, Operations, Research
bmansurov closed T203039: Storage of data for recommendation API as Resolved.

@Pchelolo the database has been setup (T205294). I think this task is complete as far as storage is concerned. I'll create another task for setting up access to the database from a service.

Sep 25 2018, 3:44 PM · Operations, DBA, Services (designing), Research
bmansurov closed T203039: Storage of data for recommendation API, a subtask of T203041: Output 2.1: An improved task recommendation API, as Resolved.
Sep 25 2018, 3:44 PM · Epic, address-knowledge-gaps
bmansurov added a comment to T205294: Request to create database and account for recommendation API.

@jcrespo thanks! Looks like I misunderstood you. If DB creation is done, then I'll talk to the Services to team about the productionizing part.

Sep 25 2018, 3:38 PM · Patch-For-Review, DBA, Research
bmansurov added a comment to T205294: Request to create database and account for recommendation API.

@jcrespo could you please create an account with username 'recommendationapiservice' with the 'SELECT' right only?

Sep 25 2018, 2:45 PM · Patch-For-Review, DBA, Research
bmansurov updated subscribers of T205294: Request to create database and account for recommendation API.

@Pchelolo what do you think about T205294#4613658?

Sep 25 2018, 12:19 PM · Patch-For-Review, DBA, Research

Sep 24 2018

bmansurov removed a project from T191086: Instrument and collect data via CitationUsage schema: Patch-For-Review.
Sep 24 2018, 6:23 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov updated the task description for T205294: Request to create database and account for recommendation API.
Sep 24 2018, 4:09 PM · Patch-For-Review, DBA, Research
bmansurov added a comment to T205294: Request to create database and account for recommendation API.

@jcrespo, good call. I've updated the task description.

Sep 24 2018, 3:52 PM · Patch-For-Review, DBA, Research
bmansurov updated the task description for T205294: Request to create database and account for recommendation API.
Sep 24 2018, 3:52 PM · Patch-For-Review, DBA, Research
bmansurov added a comment to T191086: Instrument and collect data via CitationUsage schema.

We're increasing the sampling rate for CitationUsagePageLoad from 10% to 33.3% in a few hours.

Sep 24 2018, 3:46 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov updated the task description for T205294: Request to create database and account for recommendation API.
Sep 24 2018, 3:32 PM · Patch-For-Review, DBA, Research
bmansurov created T205294: Request to create database and account for recommendation API.
Sep 24 2018, 3:31 PM · Patch-For-Review, DBA, Research
bmansurov added a comment to T203039: Storage of data for recommendation API.

@jcrespo anything else blocking us from importing data to the database? Any documentation on connecting to the database from the services?

Sep 24 2018, 3:02 PM · Operations, DBA, Services (designing), Research

Sep 18 2018

bmansurov updated the task description for T203041: Output 2.1: An improved task recommendation API.
Sep 18 2018, 6:47 PM · Epic, address-knowledge-gaps

Sep 17 2018

bmansurov added a comment to T191086: Instrument and collect data via CitationUsage schema.

Apparently, there was no train last week so our changes didn't make it to production. I'm delaying data collection until Thursday.

Sep 17 2018, 6:34 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov added a comment to T191086: Instrument and collect data via CitationUsage schema.

Analytics heads up that we're deploying CitationUsage at 100%, and CitationUsagePageLoad at 10% (per our conversation with @Nuria on IRC) in about two hours. This should yield in about 150 req/sec and 250 req/sec respectively. Tomorrow if these numbers are correct, we'd like to increase the 10% to 33.3%, which will increase 250 req/sec to around 800 req/sec.

Sep 17 2018, 4:10 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov added a comment to T203039: Storage of data for recommendation API.

@jcrespo 250K rows/sec sounds great. Batch import speed per se is not too important — I just don't want to wait hours to load data up like I did in a labs instance. And yes, starting with m2 section looks like a good idea.

Sep 17 2018, 3:03 PM · Operations, DBA, Services (designing), Research
bmansurov added a comment to T203039: Storage of data for recommendation API.

Another consideration is that here we're distributing pre-learned AI model, I believe there should be industry standards or best practices on how to deploy such data, it's not my area of expertise though. @bmansurov are you aware of any?

Sep 17 2018, 1:40 PM · Operations, DBA, Services (designing), Research

Sep 11 2018

bmansurov added a comment to T191086: Instrument and collect data via CitationUsage schema.

Good catch, @mforns!

Sep 11 2018, 4:31 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4

Sep 6 2018

bmansurov updated subscribers of T191086: Instrument and collect data via CitationUsage schema.

@EBernhardson would you please review T191086#4564047? Thanks!

Sep 6 2018, 7:14 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov moved T191086: Instrument and collect data via CitationUsage schema from Backlog to Needs review on the Discovery-Search (Current work) board.
Sep 6 2018, 7:13 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov edited projects for T191086: Instrument and collect data via CitationUsage schema, added: Discovery-Search (Current work); removed Discovery-Search.
Sep 6 2018, 7:13 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4
bmansurov added a project to T191086: Instrument and collect data via CitationUsage schema: Discovery-Search.
Sep 6 2018, 7:11 PM · Discovery-Search, Patch-For-Review, MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Research-Archive, Performance-Team (Radar), Research-2017-18-Q4