mpopov (Mikhail Popov)
Data Analyst

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jul 27 2015, 4:15 PM (129 w, 23 h)
Availability
Available
IRC Nick
bearloga
LDAP User
Bearloga
MediaWiki User
MPopov (WMF)

Data Analyst in Reading (formerly of Discovery) | User:MPopov (WMF) | Highlighted Works

Recent Activity

Fri, Jan 12

mpopov awarded T184768: Bug behavior of QTree[Long] for quantileBounds a Mountain of Wealth token.
Fri, Jan 12, 10:44 PM · MobileApp, Analytics, Wikipedia-Android-App-Backlog, Discovery-Analysis

Tue, Jan 9

mpopov updated subscribers of T172581: Set up mechanism for archiving Google Search Console data.

Call me crazy but i bet if we ask google for this data they will be happy to give it to us w/o having to setup web scraping/downloads
Again, call me crazy but i bet this data could be made public by google 100% such you do not need authentication to query it , we woudl be able to do it and so will be any interested party. seems that it would require a few conversations but little actual hands-on work

Tue, Jan 9, 11:53 PM · Discovery-Analysis (Current work), Discovery, SEO, Reading-analysis

Wed, Jan 3

mpopov added a comment to T184019: Run search relevance survey on enwiki and frwiki.

@mpopov I wasn't quite sure from https://wikimedia-research.github.io/Discovery-Search-Adhoc-RelevanceSurveys/#responses_required , is 40 to 70 responses the number of impressions (yes+no+dismiss+timeout), the number of clicks (yes+no+dismiss), or the number of yes+no? I think it was yes+no+dismiss, but it might have been yes+no+dismiss+timeout?

Closer reading of the report:

the model is very accurate with at least 40 yes/no/unsure/dismiss responses and the most accurate with at least 70 responses

I think is saying that we are not considering timeouts here, which means with an ~30% response rate to get 70 responses we need 210 impressions?

Wed, Jan 3, 6:56 PM · Patch-For-Review, Discovery-Search (Current work)

Tue, Jan 2

chelsyx awarded T179528: Investigate full-text searches in event logging vs SRP pageviews a Like token.
Tue, Jan 2, 11:49 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T142795: Offer interwiki search with language detection functionality over the API.

Just tried the second link (API results for "sistema parlamentario con sede de gobierno" on enwiki (including results from eswiki)) and got the following:

Tue, Jan 2, 8:28 PM · MW-1.29-release (WMF-deploy-2017-01-03_(1.29.0-wmf.7)), MW-1.29-release-notes, Discovery-Search (Current work), Patch-For-Review, Easy, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog, Discovery
mpopov updated the task description for T183984: [WMF All Hands 2018] Dataviz Literacy Workshop.
Tue, Jan 2, 8:04 PM · Discovery, Discovery-Analysis (Current work)
mpopov moved T183984: [WMF All Hands 2018] Dataviz Literacy Workshop from Backlog to In progress on the Discovery-Analysis (Current work) board.
Tue, Jan 2, 8:04 PM · Discovery, Discovery-Analysis (Current work)
mpopov triaged T183984: [WMF All Hands 2018] Dataviz Literacy Workshop as Normal priority.
Tue, Jan 2, 8:03 PM · Discovery, Discovery-Analysis (Current work)

Wed, Dec 20

debt awarded T175048: Search Relevance Survey test #3: analysis of test a Like token.
Wed, Dec 20, 10:24 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T175048: Search Relevance Survey test #3: analysis of test.

Added Python version into the production instructions for @EBernhardson's convenience :) https://github.com/wikimedia-research/Discovery-Search-Adhoc-RelevanceSurveys/tree/master/production#predicting-rank

Wed, Dec 20, 9:49 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T175048: Search Relevance Survey test #3: analysis of test.

Thanks! I'm not sure what I was expecting, but it is interesting to see. It seems to like giving scores of 0.5, but a lot of models end up with a sort of "default" score they like best. I am surprised that it doesn't show any scores above 0.75. Should we map scores from a 0-0.75 range, rather then 0-1? Or, based on the low end of the trend line, maybe even 0.25-075?

Wed, Dec 20, 7:25 PM · Discovery-Analysis (Current work), Discovery
TJones awarded F11851691: plot.png a Pterodactyl token.
Wed, Dec 20, 4:37 PM

Tue, Dec 19

mpopov moved T175048: Search Relevance Survey test #3: analysis of test from In progress to Done on the Discovery-Analysis (Current work) board.

Alrighty, here ya go! It's not as pretty as you were probably expecting!

Tue, Dec 19, 10:52 PM · Discovery-Analysis (Current work), Discovery

Mon, Dec 18

dcausse awarded F11851691: plot.png a Love token.
Mon, Dec 18, 3:55 PM

Dec 15 2017

mpopov moved T179528: Investigate full-text searches in event logging vs SRP pageviews from In progress to Needs review on the Discovery-Analysis (Current work) board.

@EBernhardson @chelsyx do you see any errors?

Dec 15 2017, 10:06 PM · Discovery-Analysis (Current work), Discovery
mpopov moved T175048: Search Relevance Survey test #3: analysis of test from Needs review to In progress on the Discovery-Analysis (Current work) board.
Dec 15 2017, 5:58 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T175048: Search Relevance Survey test #3: analysis of test.

I was worried about only having a binary classifier, but I see in the conclusion that it can get mapped to a 0-10 scale. Have you looked at the distribution (or the distribution when mapped to a 0-3 scale) to see if it matches the distribution of Discernatron scores in a reasonable way? I don't recall whether the Discernatron scores were, for example, strongly unimodal, or strongly bimodal, or just generally lumpy.

Dec 15 2017, 5:58 PM · Discovery-Analysis (Current work), Discovery

Dec 14 2017

mpopov added a comment to T172581: Set up mechanism for archiving Google Search Console data.

Furthermore, it occurred to me to check with Google API Terms of Service. Specifically, the section on content:

Dec 14 2017, 7:07 PM · Discovery-Analysis (Current work), Discovery, SEO, Reading-analysis
mpopov added a comment to T172581: Set up mechanism for archiving Google Search Console data.

Okay, I've got some stuff working right now for fetching this stuff which means it's time to consider aspects we haven't before.

Dec 14 2017, 12:47 AM · Discovery-Analysis (Current work), Discovery, SEO, Reading-analysis

Dec 13 2017

mpopov moved T172581: Set up mechanism for archiving Google Search Console data from Up Next to Current work on the Discovery-Analysis board.
Dec 13 2017, 6:41 PM · Discovery-Analysis (Current work), Discovery, SEO, Reading-analysis
mpopov moved T177358: Metrics for SDoC: translations from In progress to Needs review on the Discovery-Analysis (Current work) board.

Search query language breakdown note & results at https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177358-2

Dec 13 2017, 6:41 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov updated the task description for T177358: Metrics for SDoC: translations.
Dec 13 2017, 6:39 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Dec 12 2017

Ramsey-WMF awarded T177358: Metrics for SDoC: translations a Like token.
Dec 12 2017, 11:03 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Dec 8 2017

mpopov added a comment to T182352: UDF for language detection.

I wonder if it's possible to make a Hive UDF that uses Google's Compact Language Detector v3 (CLD3) library & model https://github.com/google/cld3

Dec 8 2017, 12:20 AM · Discovery-Search, Discovery-Analysis, Analytics, Discovery

Dec 7 2017

mpopov updated the task description for T182352: UDF for language detection.
Dec 7 2017, 8:48 PM · Discovery-Search, Discovery-Analysis, Analytics, Discovery
mpopov triaged T182352: UDF for language detection as Normal priority.
Dec 7 2017, 8:44 PM · Discovery-Search, Discovery-Analysis, Analytics, Discovery
mpopov claimed T177358: Metrics for SDoC: translations.
Dec 7 2017, 7:36 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Dec 5 2017

mpopov moved T179528: Investigate full-text searches in event logging vs SRP pageviews from Needs review to In progress on the Discovery-Analysis (Current work) board.
Dec 5 2017, 9:28 PM · Discovery-Analysis (Current work), Discovery
mpopov moved T179528: Investigate full-text searches in event logging vs SRP pageviews from In progress to Needs review on the Discovery-Analysis (Current work) board.

Final numbers from English Wikipedia: https://github.com/wikimedia-research/Discovery-Search-Adhoc-SearchResultPageInvestigation#findings

Dec 5 2017, 8:55 PM · Discovery-Analysis (Current work), Discovery

Nov 30 2017

mpopov added a comment to T179528: Investigate full-text searches in event logging vs SRP pageviews.
Nov 30 2017, 9:13 PM · Discovery-Analysis (Current work), Discovery

Nov 28 2017

mpopov moved T153856: Add lint/CI to all wikimedia/discovery analytics repositories from Backlog to Stalled/Waiting on the Discovery-Analysis (Current work) board.
Nov 28 2017, 9:14 PM · Patch-For-Review, Release-Engineering-Team (Watching / External), Discovery-Analysis (Current work), Discovery, Continuous-Integration-Config
mpopov added a comment to T174946: R execution on stat1005 -> 'stack smashing error'.

Might seem like a silly question and I hope this doesn't come off as offensive, but did you re-install all the R packages you were using before? Because even though your R package library was copied over from stat1002 and R will technically see those packages as available, they all need to be re-installed because it's a new machine. Copying libraries only works when it's the same OS & configuration. The packages that include C/C++ code especially require re-compilation.

Nov 28 2017, 6:37 PM · Analytics
mpopov added a comment to T174946: R execution on stat1005 -> 'stack smashing error'.

@mpopov, you use R on stat1005, yes? Have you ever had this problem?

Nov 28 2017, 6:31 PM · Analytics

Nov 27 2017

mpopov moved T179528: Investigate full-text searches in event logging vs SRP pageviews from Backlog to In progress on the Discovery-Analysis (Current work) board.
Nov 27 2017, 7:40 PM · Discovery-Analysis (Current work), Discovery

Nov 22 2017

mpopov moved T175048: Search Relevance Survey test #3: analysis of test from In progress to Needs review on the Discovery-Analysis (Current work) board.

HOLY MOLEY I'M FINALLY DONE: https://wikimedia-research.github.io/Discovery-Search-Adhoc-RelevanceSurveys/

Nov 22 2017, 8:14 PM · Discovery-Analysis (Current work), Discovery

Nov 21 2017

mpopov moved T177357: Metrics for SDoC: future work of interest (templates and licensing) from Current work to Up Next on the Discovery-Analysis board.
Nov 21 2017, 9:11 PM · Discovery-Analysis, Structured-Data-Commons, Discovery, Wikidata

Nov 14 2017

mpopov moved T177357: Metrics for SDoC: future work of interest (templates and licensing) from Needs triage to Current work on the Discovery-Analysis board.
Nov 14 2017, 10:54 PM · Discovery-Analysis, Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T170468: Dashboard: Search results page - dwell time metric.

@debt: Does the comment above (T170468#3745776) satisfy your concern?

Nov 14 2017, 10:46 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T179850: Run analysis of WDQS internal and external traffic.
  • If I read correctly, the only difference between screenshot 3 and 4 is bot vs spider. I'm wondering why we get spiders from internal (just curiosity)
Nov 14 2017, 10:40 PM · Discovery, Discovery-Analysis (Current work), Discovery-Search (Current work)
mpopov moved T170022: Map analytics from In progress to Needs review on the Discovery-Analysis (Current work) board.
Nov 14 2017, 8:12 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov added a comment to T180492: Create Wikipedia Slack bot.

I might be able to work on this in my free time at some point :) seems like a fun project

Nov 14 2017, 5:51 PM · Possible-Tech-Projects, Technical-Tool-Request

Nov 13 2017

mpopov committed R1821:39e5297ded97: db1047 => db1108 (authored by mpopov).
db1047 => db1108
Nov 13 2017, 8:09 PM
mpopov committed R1821:d0af60b49dd0: db1047 => db1108 (authored by mpopov).
db1047 => db1108
Nov 13 2017, 8:09 PM

Nov 10 2017

mpopov created T180270: RStudio web version on SWAP.
Nov 10 2017, 9:36 PM · Analytics

Nov 9 2017

mpopov moved T170022: Map analytics from Backlog to In progress on the Discovery-Analysis (Current work) board.
Nov 9 2017, 10:03 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Nov 8 2017

mpopov moved T179850: Run analysis of WDQS internal and external traffic from In progress to Needs review on the Discovery-Analysis (Current work) board.

Query & code at https://github.com/wikimedia-research/Discovery-WDQS-Adhoc-InternalExternal

Nov 8 2017, 11:40 PM · Discovery, Discovery-Analysis (Current work), Discovery-Search (Current work)
mpopov added a comment to T170468: Dashboard: Search results page - dwell time metric.

Re-opening this ticket. The numbers all seem to be the same for FR/CA and Other Languages, no matter what settings are selected, which seems really odd.

Nov 8 2017, 8:02 PM · Discovery-Analysis (Current work), Discovery
mpopov added a comment to T153856: Add lint/CI to all wikimedia/discovery analytics repositories.

@mpopov is the current job working properly? Should we move it to the gate-and-submit queue so it has to pass before patches can be merged? I'm probably going to refactor the existing jobs over to our new docker-based system but wanted to make sure its actually useful and works before doing so.

Nov 8 2017, 7:19 PM · Patch-For-Review, Release-Engineering-Team (Watching / External), Discovery-Analysis (Current work), Discovery, Continuous-Integration-Config

Nov 7 2017

mpopov claimed T179850: Run analysis of WDQS internal and external traffic.
Nov 7 2017, 9:16 PM · Discovery, Discovery-Analysis (Current work), Discovery-Search (Current work)

Nov 2 2017

mpopov closed T112170: Model user behavior and detect when reality heavily deviated from expectation as Declined.

Dashboards removed and golden is no longer generating daily forecasts. RIP cool idea

Nov 2 2017, 7:25 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov closed T112170: Model user behavior and detect when reality heavily deviated from expectation, a subtask of T147682: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003), as Declined.
Nov 2 2017, 7:25 PM · Analytics-Kanban, Patch-For-Review, Discovery-Analysis (Current work), Tracking, Discovery, Operations
mpopov committed R1821:0b3cca60f152: Add functions for working with interleaved experiments (authored by mpopov).
Add functions for working with interleaved experiments
Nov 2 2017, 4:59 PM
mpopov committed R1821:39f4533c6887: Add functions for working with interleaved experiments (authored by mpopov).
Add functions for working with interleaved experiments
Nov 2 2017, 4:51 PM
mpopov committed R1821:bb7affbcda22: [WIP] Add functions for working with interleaved experiments (authored by mpopov).
[WIP] Add functions for working with interleaved experiments
Nov 2 2017, 2:21 AM

Nov 1 2017

mpopov moved T179528: Investigate full-text searches in event logging vs SRP pageviews from Needs triage to Up Next on the Discovery-Analysis board.
Nov 1 2017, 6:53 PM · Discovery-Analysis (Current work), Discovery
mpopov created T179528: Investigate full-text searches in event logging vs SRP pageviews.
Nov 1 2017, 6:49 PM · Discovery-Analysis (Current work), Discovery
mpopov moved T178096: Make a Puppet profile/role for doing R-based heavy stats/ML on Wikimedia Cloud from Needs review to Done on the Discovery-Analysis (Current work) board.
  • discovery::deep_learner role works (makes TensorFlow, Keras, and Caffe available, including the R wrappers for TF & Keras)
  • discovery::learner role works (makes a bunch of ML packages available – in R & Python)
  • discovery::forecaster role works (makes a bunch of time series / forecasting R packages available: prophet, bsts, forecast)
  • discovery::bayes role mostly works (for some reason it doesn't install rstan but discovery::forecaster – which also includes the discovery_computing::bayesian_statistics profile – does)
  • discovery::allstar_cruncher role works (multi-purpose, has ALL of the packages from the other roles, so get your game on, go play)
Nov 1 2017, 5:00 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Oct 31 2017

mpopov updated subscribers of T112170: Model user behavior and detect when reality heavily deviated from expectation.

Making the forecasts has become too computationally intensive (takes a while and hogs up some resources on the stat box), so I think I need to disable them and remove the dashboard. The source code (for both the dashboard and the forecasting) will still be up and existing forecasts will be preserved, but no new ones will be made.

Oct 31 2017, 7:09 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Oct 24 2017

mpopov updated subscribers of T171258: WDCM: Process Module Scaling/migrate to Production.
  • wdcm_base.pp. This is the base WDCM profile and it encompasses (a) R packages to be installed, (b) a call to the r_lang module with a non-default value of the timeout parameter (for {dplyr} and {tidyr}, of which at least the first one can take some time to install), and setting up the Shiny Server Portal (the index.hmtl file).

Oct 24 2017, 10:09 PM · User-Addshore, Patch-For-Review, WMDE-Analytics-Engineering, User-GoranSMilovanovic
mpopov updated subscribers of T178096: Make a Puppet profile/role for doing R-based heavy stats/ML on Wikimedia Cloud.

@Gehel: getting the following errors. test-bayes, test-learner, test-deep-learner, test-forecaster, and test-allstar-cruncher all use Stretch.

Oct 24 2017, 8:41 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Oct 23 2017

mpopov lowered the priority of T168967: Upload shiny-server .deb to our Jessie apt repository from Normal to Lowest.

Ish? Until this is done, we're limited to using Ubuntu for the VMs that host our dashboards. Since the WM Cloud team (formerly WM Labs) is deprecating Ubuntu Trusty in favor of only offering Debian for VMs, we'll have to file a Phab ticket requesting a Trusty instance if we have to shut one down and launch a replacement. I don't think this task should be declined, but I am gonna adjust the priority to reflect where we are on this.

Oct 23 2017, 8:28 PM · Discovery-Analysis, Discovery, Operations, Discovery-Search (Current work)

Oct 17 2017

debt awarded T177356: Metrics for SDoC: look at querying databases a Party Time token.
Oct 17 2017, 9:11 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx I dont think the ltr-i-1024 bucket should be included in this first look, it's an interleaved result set that can't really be interpreted with our standard metrics.

Oct 17 2017, 12:20 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 13 2017

mpopov added a comment to T177354: Metrics for SDoC: look at contributions.

@chelsyx do you wanna add your stuff to https://github.com/wikimedia-research/SDoC-Initial-Metrics ?

Oct 13 2017, 7:45 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov moved T177356: Metrics for SDoC: look at querying databases from In progress to Done on the Discovery-Analysis (Current work) board.

Queries & data uploaded to https://github.com/wikimedia-research/SDoC-Initial-Metrics

Oct 13 2017, 7:44 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov updated the task description for T177356: Metrics for SDoC: look at querying databases.
Oct 13 2017, 7:42 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177356: Metrics for SDoC: look at querying databases.

Growth of number of deleters over time:

Oct 13 2017, 7:31 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov updated the task description for T177356: Metrics for SDoC: look at querying databases.
Oct 13 2017, 6:18 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177356: Metrics for SDoC: look at querying databases.
  1. Historical trends
Oct 13 2017, 6:18 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 12 2017

mpopov moved T178096: Make a Puppet profile/role for doing R-based heavy stats/ML on Wikimedia Cloud from In progress to Needs review on the Discovery-Analysis (Current work) board.
Oct 12 2017, 8:49 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov moved T178096: Make a Puppet profile/role for doing R-based heavy stats/ML on Wikimedia Cloud from Backlog to In progress on the Discovery-Analysis (Current work) board.
Oct 12 2017, 5:30 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov created T178096: Make a Puppet profile/role for doing R-based heavy stats/ML on Wikimedia Cloud.
Oct 12 2017, 5:29 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Oct 11 2017

mpopov updated the task description for T177356: Metrics for SDoC: look at querying databases.
Oct 11 2017, 11:28 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177356: Metrics for SDoC: look at querying databases.
  • Most copyright-related deletions happen within 1 day of upload across almost all media types, with the exception of 'drawing' (SVGs)
  • A lot of audio files are deleted within 1 minute or 1 week of upload
  • Half of all images and PDFs deleted were deleted within 1 month of upload for non-copyright reasons
Oct 11 2017, 11:27 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov updated the task description for T177356: Metrics for SDoC: look at querying databases.
Oct 11 2017, 9:57 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177356: Metrics for SDoC: look at querying databases.

Reasons for files deleted in 2017:

Oct 11 2017, 9:00 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177354: Metrics for SDoC: look at contributions.

Unfortunately, the mediawiki snapshot doesn't has the image table which describes images and other uploaded files.

Oct 11 2017, 6:19 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov added a comment to T177354: Metrics for SDoC: look at contributions.

Hey @chelsyx - what time frame does this cover?

Oct 11 2017, 5:05 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov claimed T177356: Metrics for SDoC: look at querying databases.
Oct 11 2017, 4:00 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
mpopov moved T177356: Metrics for SDoC: look at querying databases from Needs triage to Current work on the Discovery-Analysis board.
Oct 11 2017, 3:59 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 10 2017

mpopov updated subscribers of T171652: Language Analysis Morphological Library Research Spike.

Perhaps worth noting that I'm pretty sure http://discovery.wmflabs.org/metrics/#langproj_breakdown isn't a true breakdown of search volume, although i should double check with @mpopov . I think that's a proportion of events in the TestSeachSatisfaction schema. The sampling on low volume wikis is all the same, but the top 20 or so have custom sampling rates which means we can't directly compare the numbers.

Oct 10 2017, 11:42 PM · I18n, Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery

Oct 6 2017

mpopov moved T176811: [Dashboard] Count the number of user session tokens by volume for mobile web search from Needs review to Done on the Discovery-Analysis (Current work) board.

@chelsyx: thanks and good job!

Oct 6 2017, 11:22 PM · Discovery-Analysis (Current work)
mpopov moved T171215: Interleaved results A/B test: analysis of data from Needs review to Done on the Discovery-Analysis (Current work) board.

Final draft up at https://wikimedia-research.github.io/Discovery-Search-Test-InterleavedLTR/

Oct 6 2017, 11:14 PM · Discovery-Search (Current work), Discovery-Analysis (Current work), Discovery, CirrusSearch

Oct 5 2017

mpopov committed R1821:2fced58f4463: [WIP] Add functions for working with interleaved experiments (authored by mpopov).
[WIP] Add functions for working with interleaved experiments
Oct 5 2017, 5:43 PM
mpopov committed R1821:1b8b395c0df9: Fix variable name (authored by mpopov).
Fix variable name
Oct 5 2017, 5:38 PM
mpopov committed R1821:0449576e628d: [WIP] Add functions for working with interleaved experiments (authored by mpopov).
[WIP] Add functions for working with interleaved experiments
Oct 5 2017, 5:34 PM
mpopov committed R1821:adcc85c94664: Switch to fetching EL data from db1047 (authored by mpopov).
Switch to fetching EL data from db1047
Oct 5 2017, 5:17 PM

Oct 3 2017

mpopov added a comment to T162369: Evaluate rescore windows for learning to rank.

It would depend on how often things below the top 20 move into the top 20 in practice, not just in theory. We can use the search logs to find this out, no?

Oct 3 2017, 5:57 PM · Discovery-Search (Current work), Discovery

Oct 2 2017

mpopov added a comment to T176997: Extract a set of a few hundred most popular abandoned queries.

A few things that come to mind:

  • A nice large list would give us a better idea of the distribution of queries. Are there some really common things that people bail on, or is it all low frequency? One day isn't enough to tell, though it looks like the long tail is very long since @mpopov dropped the unique items, leaving a pretty short list.
Oct 2 2017, 7:19 PM · Discovery-Analysis, Discovery-Search (Current work), Discovery, CirrusSearch

Sep 28 2017

mpopov added a comment to T176997: Extract a set of a few hundred most popular abandoned queries.

Using just the event logging data from 2017-08-01 to today (2017-09-28), here's a glimpse at queries from abandoned full-text searches:

Sep 28 2017, 6:36 PM · Discovery-Analysis, Discovery-Search (Current work), Discovery, CirrusSearch
mpopov claimed T175048: Search Relevance Survey test #3: analysis of test.
Sep 28 2017, 4:43 PM · Discovery-Analysis (Current work), Discovery
mpopov moved T171215: Interleaved results A/B test: analysis of data from In progress to Needs review on the Discovery-Analysis (Current work) board.

Bootstrapping finally finished -_- second draft up at https://people.wikimedia.org/~bearloga/reports/ltr-test.html

Sep 28 2017, 4:39 PM · Discovery-Search (Current work), Discovery-Analysis (Current work), Discovery, CirrusSearch
mpopov removed a project from T170022: Map analytics : Patch-For-Review.

In beta:

Sep 28 2017, 1:21 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery

Sep 27 2017

Quiddity awarded T150215: [Dashboard][Search] Sparklines for KPIs a Love token.
Sep 27 2017, 10:26 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov added a comment to T176639: Replace references to dbstore1002 by db1047 in reportupdater jobs.

@mforns: we specify the analytics-store hostname in our R package (the function that makes sql queries: https://github.com/wikimedia/wikimedia-discovery-wmf/blob/master/R/mysql.R#L39--L76) which is used for querying both wiki content dbs as well as the log db. If we add a type argument that sets the hostname ("db1047" in case of type == "events", for example), what hostname should we use for non-eventlogging queries?

Sep 27 2017, 10:06 PM · Patch-For-Review, Analytics-Kanban
mpopov added a comment to T112170: Model user behavior and detect when reality heavily deviated from expectation.

Deployed at https://discovery.wmflabs.org/forecasts/

Sep 27 2017, 4:14 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery
mpopov committed rWDDE4c990fcea85d: Tab doc path fix (authored by mpopov).
Tab doc path fix
Sep 27 2017, 4:08 PM

Sep 26 2017

mpopov committed rWDDE9c6181d3cbcd: Make develop the default branch (authored by mpopov).
Make develop the default branch
Sep 26 2017, 4:32 PM
mpopov committed rWDDE825202e79278: Edit Project Config (authored by mpopov).
Edit Project Config
Sep 26 2017, 4:23 PM
mpopov committed rWDDEe7d35d4e2195: Edit Project Config (authored by mpopov).
Edit Project Config
Sep 26 2017, 4:21 PM