chelsyx (Chelsy Xie)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Aug 9 2016, 7:00 PM (70 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
CXie (WMF)

Recent Activity

Thu, Dec 14

chelsyx moved T177353: Metrics for SDoC: look at search hits based on which element the search is hitting from In progress to Needs review on the Discovery-Analysis (Current work) board.

All results and analysis codebase can be found here: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177353

Thu, Dec 14, 8:59 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

Categorization
Excluding hidden categories and 'needing_category' categories, there are 1,629,592 (3.73%) files that don't belong to any category, 22,492,880 (51.55%) files belong to only 1 category as of December 12, 2017.

Thu, Dec 14, 8:56 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx triaged T182849: Identify unhelpful file names on commons as Low priority.
Thu, Dec 14, 7:24 PM · Wikidata, Discovery-Analysis, Structured-Data-Commons
chelsyx added a subtask for T177353: Metrics for SDoC: look at search hits based on which element the search is hitting: T182849: Identify unhelpful file names on commons.
Thu, Dec 14, 7:23 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a parent task for T182849: Identify unhelpful file names on commons: T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
Thu, Dec 14, 7:23 PM · Wikidata, Discovery-Analysis, Structured-Data-Commons
chelsyx added a comment to T182849: Identify unhelpful file names on commons.

Hello @thiemowmde ! The purpose of T177353 and its parent ticket T174519: [epic] SDoC: Determine baseline for metrics is to figure out a baseline for metrics on Commons in order to measure future successes for the Structured-Data-Commons (SDoC) project. The SDoC team and us (Discovery-Analysis) came up with a list of stuff that would be interesting to measure, and create T177353 and other child tickets (see T174519 for more details). There is a exploratory nature in this work: some metrics in the list are clearly defined, while some -- for example, what is the exact meaning of "unhelpful" -- are not. Any ideas and comments are very welcome!

Thu, Dec 14, 7:17 PM · Wikidata, Discovery-Analysis, Structured-Data-Commons
chelsyx claimed T179450: Documentation of SDoC findings.
Thu, Dec 14, 8:09 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx moved T182849: Identify unhelpful file names on commons from Needs triage to Up Next on the Discovery-Analysis board.
Thu, Dec 14, 7:54 AM · Wikidata, Discovery-Analysis, Structured-Data-Commons
chelsyx created T182849: Identify unhelpful file names on commons.
Thu, Dec 14, 7:54 AM · Wikidata, Discovery-Analysis, Structured-Data-Commons

Tue, Dec 12

chelsyx updated the task description for T177358: Metrics for SDoC: translations.
Tue, Dec 12, 10:42 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177358: Metrics for SDoC: translations.

We parsed the wikitext of all files in Commons xml data dumps of November 20, 2017, and extract the language templates in them (e.g. {{en}}, {{LangSwitch}}). Out of the total 43,268,565 files, 14,848,551 (34.32%) files don't have any language templates, 23,780,247 (54.96%) files use only 1 language.

Tue, Dec 12, 10:41 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx moved T167824: Test the importance of search result position vs quality from Needs triage to Up Next on the Discovery-Analysis board.
Tue, Dec 12, 9:29 PM · Discovery-Analysis, Discovery
chelsyx edited projects for T167824: Test the importance of search result position vs quality, added: Discovery-Analysis; removed Discovery-Analysis (Current work).
Tue, Dec 12, 9:26 PM · Discovery-Analysis, Discovery
chelsyx changed the status of T177353: Metrics for SDoC: look at search hits based on which element the search is hitting from Stalled to Open.

We parsed the wikitext of all files in Commons xml data dumps of November 20, 2017. Out of the total 43,268,565 files, 41,796,560 (96.6%) files have a infobox, 41,309,028 (95.47%) have some contents in their description fields (description, title, depicted people, depicted place, etc).

Tue, Dec 12, 7:10 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx changed the status of T177353: Metrics for SDoC: look at search hits based on which element the search is hitting, a subtask of T174519: [epic] SDoC: Determine baseline for metrics, from Stalled to Open.
Tue, Dec 12, 7:10 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Fri, Dec 1

debt awarded T176493: Analysis of testing on 18 wikis with > 1% of search traffic a Party Time token.
Fri, Dec 1, 4:54 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx edited projects for T177353: Metrics for SDoC: look at search hits based on which element the search is hitting, added: Discovery-Analysis (Current work); removed Discovery-Analysis.
Fri, Dec 1, 12:40 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Thu, Nov 30

chelsyx claimed T177358: Metrics for SDoC: translations.
Thu, Nov 30, 6:06 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx moved T177534: Search Metrics for SDoC: eventlogging from In progress to Needs review on the Discovery-Analysis (Current work) board.

We computed several search metrics with event logging data in November 2017, and compare them with English Wikipedia. They are searches on desktop only, since we have very few searches on mobile web on Commons (less than 100 search result pages daily).

Thu, Nov 30, 5:56 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Wed, Nov 22

chelsyx committed rWDAR2c154e446de0: small fix of survival plots (authored by chelsyx).
small fix of survival plots
Wed, Nov 22, 1:33 AM

Tue, Nov 21

chelsyx moved T177534: Search Metrics for SDoC: eventlogging from Backlog to In progress on the Discovery-Analysis (Current work) board.
Tue, Nov 21, 8:54 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx claimed T177534: Search Metrics for SDoC: eventlogging.
Tue, Nov 21, 8:53 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx committed rWDAR1a8e80d1f66a: Change grouping color in survival plots (authored by chelsyx).
Change grouping color in survival plots
Tue, Nov 21, 7:51 PM
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

I have one UI suggestion for Page Visit Times in general: would it be possible to consistently use two colors (say, red for A and blue for B) on each sub-chart?

Thanks @TJones ! Fixing it.

Tue, Nov 21, 6:02 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Mon, Nov 20

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

Report for the DBN test: https://analytics.wikimedia.org/datasets/discovery/reports/Experiement_with_different_grouping_of_queries_that_get_fed_into_the_DBN.html

Mon, Nov 20, 10:03 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx committed rWDARf32c3f97a9a4: Small fixes (authored by chelsyx).
Small fixes
Mon, Nov 20, 9:30 PM
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

Report of test on 18 languages is updated with interleaved results: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html

Mon, Nov 20, 5:43 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Sun, Nov 19

chelsyx committed rWDARd8a5c7359f73: Add interleaved test analysis (authored by chelsyx).
Add interleaved test analysis
Sun, Nov 19, 7:59 AM

Fri, Nov 17

chelsyx committed rWDARb99a3a3ad9d2: Add interleaved test analysis (authored by chelsyx).
Add interleaved test analysis
Fri, Nov 17, 10:59 PM
chelsyx committed rWDAR1f69e1ecb056: Add interleaved test analysis (authored by chelsyx).
Add interleaved test analysis
Fri, Nov 17, 9:00 PM

Nov 15 2017

chelsyx committed rWDARf156ecf8a0ea: Modularize autoreporter (authored by chelsyx).
Modularize autoreporter
Nov 15 2017, 8:33 PM

Nov 9 2017

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx I pulled the data out for 11/2 00:00 to 11/9 00:00 into a single tsv file at stat1005.eqiad.wmnet:/mnt/hdfs/user/ebernhardson/tss_tsv/part-00000-7faa8246-4477-421e-8c91-df291eec70cc.csv.gz This is about 234M compressed and 1.18G uncompressed. If necessary i can re-sample this on session ids to get smaller data.

Nov 9 2017, 8:39 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Nov 8 2017

chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

Status of tasks of this ticket:

Nov 8 2017, 12:25 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Nov 7 2017

chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

On November 7, the number of files having a "needing categories" category is 4,268,386 (10%). The following table break down the counts by media type:

Nov 7 2017, 11:58 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx created P6283 count the number of files by number of categories, category type, media type.
Nov 7 2017, 8:42 PM

Nov 2 2017

chelsyx moved T177957: Analysis of: A/B test to test relaxing the retrieval query filter from In progress to Needs review on the Discovery-Analysis (Current work) board.

Report: https://analytics.wikimedia.org/datasets/discovery/reports/AB_test_to_test_relaxing_the_retrieval_query_filter.html

Nov 2 2017, 11:17 PM · Discovery-Analysis (Current work), Discovery
chelsyx claimed T177957: Analysis of: A/B test to test relaxing the retrieval query filter.
Nov 2 2017, 9:59 PM · Discovery-Analysis (Current work), Discovery
chelsyx moved T177957: Analysis of: A/B test to test relaxing the retrieval query filter from Backlog to In progress on the Discovery-Analysis (Current work) board.
Nov 2 2017, 9:59 PM · Discovery-Analysis (Current work), Discovery
chelsyx moved T179449: Follow-up from Metrics from In progress to Done on the Discovery-Analysis (Current work) board.

On the quarterly metrics meeting, we show the clickthrough rate on desktop by search type: fulltext ~21.3%, autocomplete ~95%. This is the proportion of pageviews that have at least one clickthrough. For fulltext search, each search result page is a pageview; for autocomplete, as long as the users don't clickthrough, no matter how many queries users put in the search box, the pageview ID remains the same.

Nov 2 2017, 4:31 AM · Discovery-Analysis (Current work), Discovery

Nov 1 2017

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@EBernhardson Thanks for offering to get the data into tsv file! :) Are you going to parse the json string in hadoop eventlogging? Because doing this in R would take a long time, unless we can get sparklyR or sparkR work.

Nov 1 2017, 6:35 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx moved T179449: Follow-up from Metrics from Backlog to In progress on the Discovery-Analysis (Current work) board.
Nov 1 2017, 5:59 PM · Discovery-Analysis (Current work), Discovery
Liuxinyu970226 awarded T178097: Dashboard annotations needed for fix of mw.track bug a Heartbreak token.
Nov 1 2017, 1:48 PM · Discovery-Analysis (Current work), Discovery

Oct 30 2017

chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

Oh, that looks like that will be quite interesting, @chelsyx, although it looks like it might be a bit of manual work involved.

Oct 30 2017, 10:51 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

Great idea, @EBernhardson, let's do it! @chelsyx can you get that sampling from the data we already have?

Oct 30 2017, 10:47 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 28 2017

chelsyx added a comment to T178958: Metrics check-in for Q1 2017/18.

@debt I've put search part into the slide deck. Let me know if you have any question. :)

Oct 28 2017, 1:57 AM · Discovery-Analysis (Current work), Discovery
chelsyx updated the task description for T178958: Metrics check-in for Q1 2017/18.
Oct 28 2017, 1:54 AM · Discovery-Analysis (Current work), Discovery
chelsyx added a comment to T178097: Dashboard annotations needed for fix of mw.track bug .

Done. http://discovery.wmflabs.org/metrics/#mobile_events

Oct 28 2017, 1:31 AM · Discovery-Analysis (Current work), Discovery

Oct 26 2017

chelsyx moved T178097: Dashboard annotations needed for fix of mw.track bug from Needs review to Done on the Discovery-Analysis (Current work) board.
Oct 26 2017, 8:38 PM · Discovery-Analysis (Current work), Discovery
chelsyx removed a project from T178097: Dashboard annotations needed for fix of mw.track bug : Patch-For-Review.

Live on beta: http://discovery-beta.wmflabs.org/metrics/#mobile_events

Oct 26 2017, 8:37 PM · Discovery-Analysis (Current work), Discovery
chelsyx moved T178958: Metrics check-in for Q1 2017/18 from Backlog to In progress on the Discovery-Analysis (Current work) board.
Oct 26 2017, 5:23 PM · Discovery-Analysis (Current work), Discovery
chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

For unhelpful file names, I want to extract the old and new file names from the move log whose change reason is meaningless or ambiguous, and then train a model to classify these file names. As far as I know, short text classification like this is a bit tricky.. @mpopov do you have any suggestion?

Oct 26 2017, 5:01 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx updated the task description for T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
Oct 26 2017, 3:28 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx updated the task description for T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
Oct 26 2017, 3:27 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.

There are 142,994 files with annotations (ImageNote), follow this link for the most current count.

Oct 26 2017, 12:17 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 24 2017

chelsyx updated the task description for T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
Oct 24 2017, 6:41 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

Good idea! Thanks @Nuria !

Oct 24 2017, 5:38 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 23 2017

chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

Hi @Nuria , the numbers I showed above are cumulative sum at the end of each month, while the numbers you talked about are newly uploads for each month. From my query, for Dec 2016, the number of newly uploaded files by bots are 392,566, by users = 392,786. This is closed to what is shown on https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm.

Oct 23 2017, 6:42 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 18 2017

chelsyx moved T177353: Metrics for SDoC: look at search hits based on which element the search is hitting from Backlog to In progress on the Discovery-Analysis (Current work) board.
Oct 18 2017, 6:39 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx moved T178097: Dashboard annotations needed for fix of mw.track bug from In progress to Needs review on the Discovery-Analysis (Current work) board.
Oct 18 2017, 6:38 PM · Discovery-Analysis (Current work), Discovery
chelsyx claimed T177353: Metrics for SDoC: look at search hits based on which element the search is hitting.
Oct 18 2017, 4:53 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx claimed T178097: Dashboard annotations needed for fix of mw.track bug .
Oct 18 2017, 4:52 PM · Discovery-Analysis (Current work), Discovery
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

Reports: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html and joyplots: https://people.wikimedia.org/~chelsyx/ctr_distribution.html are updated.

Oct 18 2017, 4:49 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx committed rWDARd883a8e33145: Fix UI stuff (authored by chelsyx).
Fix UI stuff
Oct 18 2017, 7:42 AM
chelsyx committed rWDAR350c7eb4cb5e: Fix UI stuff (authored by chelsyx).
Fix UI stuff
Oct 18 2017, 7:42 AM

Oct 17 2017

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@chelsyx I'll try to carve some time out this week to add interleaved CIs to wmf (there's currently a patch that adds interleaved preference calculation) so that the report can also output stuff for interleaved groups (defined as having "-i-" in the name).

Thanks @mpopov ! No rush!

Oct 17 2017, 6:40 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.
  • "This test ran from 19 September 2017 to 05 October 2017 on all wikis
    • It actually ran on 18 wikis, not "all"
  • would it be possible to show the browser and OS in a table that can be sorted? It's really difficult to look at that table and make sense out of what is more popular.
  • under searches and number of searches - the long table isn't horrible, but a bit hard to read. For the charts--the legend is all the way at the bottom and requires a lot of scrolling to figure out what you're looking at, then scrolling back up to determine how to read the charts. Can the legend be at the top and bottom of the overall large chart image? (searches, searches with clicks, searches with results)
  • for the data summary of sister project snippets—yay! can the legend for the various projects be on each of the charts? Otherwise, you have to do a lot of scrolling to remember which project is being show for each wiki, especially since each wiki has their 'favorite' sister project to click though to! :) (Same for inter-wiki charts re: legend for each chart)

Yep, I'm working on it.

  • for the sister projects—do you know why there were more clickthroughs on about half the wikis for the test group than for the control group?

Which chart are you referring to? I didn't see a big difference in sister projects clicks between the two groups.

Oct 17 2017, 6:16 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

@EBernhardson @TJones For hewiki, I fetched several query strings with zero result from ltr-1024 group of hewiki on 9/20 and 9/21 (the first two days of this experiment when ltr-1024 had very high zero result rates). I ran them in hewiki and most of them returns some results. So I think there may be some bugs in the test configuration for those days on hewiki, which result in null event_hitsReturned.

Oct 17 2017, 3:47 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

I removed all sessions with more than 100 searches (Previously, I removed sessions only when they have more than 100 searches AND only have SERP events.), now the dewiki distribution is not bimodal anymore. Yay!

Oct 17 2017, 12:24 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 16 2017

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.
  • @EBernhardson—looking at hewiki ZRR results. Is it possible for the LTR to ditch results? Like, if something scores so low it gets dropped? With a 95% credible interval and 18 data sets, one showing statistically significant differences in ZRR is not a shock, but the fact that it is Hebrew is suspicious, since that's the one where the training data and the production data actually differ because of the language analyzer.

Looking at the daily graphs, it seems there was an abnormally high ZRR on the first day or two of the test. I wonder if there is perhaps a very high volume session throwing things off? IIUC this is per-search ZRR. Adding in a per-session ZRR (does at least 1 query have results) might remove that. Completely removing high volume sessions as "probably bots", like we did for the enwiki MLR analysis, might also do the trick?

Oct 16 2017, 10:26 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

Codebase and output: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177354

Oct 16 2017, 6:43 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx updated subscribers of T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

I bootstrapped from the preprocessed data for 1000 times, and compute the distribution of the search-wise CTR. Then I changed the re-sample size from 1k to 10k, and then create joy plots for every wiki. Here is the most interesting one:

Oct 16 2017, 4:30 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 13 2017

chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

@mpopov yup, I will put my stuff in the repo.

Oct 13 2017, 7:48 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T178097: Dashboard annotations needed for fix of mw.track bug .

According to T176464#3669451, this bug didn't cause the decrease in logged events on March 29th 2017.

Oct 13 2017, 7:11 PM · Discovery-Analysis (Current work), Discovery
chelsyx added a comment to T176464: [Spike 2.5h] Did the new mobile header treatment break the search experience?.

@Niedzielski Looks ok to me. Thank you all very much for the help! :D

Oct 13 2017, 6:47 PM · Discovery-Analysis, Discovery, Readers-Web-Backlog (Tracking), Spike, MobileFrontend
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

Might also be worth looking into: I increased the sampling rates significantly for this test. This new test ran for 16 days and contains 1.4M SERP events from 683k sessions, significantly higher than anything we've collected before. Is this increase in event counts useful in making the buckets differentiable, or is it simply more data to store and process? I realize though that because the data is split between so many wikis it may not be as useful as having 700k sessions all from a single busy site like dewiki or enwiki.

Oct 13 2017, 5:42 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx claimed T176493: Analysis of testing on 18 wikis with > 1% of search traffic.
Oct 13 2017, 1:15 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx moved T177354: Metrics for SDoC: look at contributions from In progress to Needs review on the Discovery-Analysis (Current work) board.
Oct 13 2017, 1:09 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx updated the task description for T177354: Metrics for SDoC: look at contributions.
Oct 13 2017, 1:00 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

The following two graphs breakdown the number by month:

Oct 13 2017, 12:53 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

Updated: On Oct 12, 2017, the number of files uploaded by bots is 9,390,721 (22.03%), and the number of files uploaded by users is 33,241,541 (77.97%). The following table break down the counts by media type:

Oct 13 2017, 12:48 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx committed rWDARa96493568a46: Bug fixes (authored by chelsyx).
Bug fixes
Oct 13 2017, 12:36 AM
chelsyx placed T176493: Analysis of testing on 18 wikis with > 1% of search traffic up for grabs.
Oct 13 2017, 12:23 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

The auto-report is updated: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html

Oct 13 2017, 12:23 AM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery

Oct 12 2017

chelsyx added a comment to T176493: Analysis of testing on 18 wikis with > 1% of search traffic.

Auto-generated report is up: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html. There are still some bugs in the report I need to fix and I will update the report later.

Oct 12 2017, 4:36 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx moved T177354: Metrics for SDoC: look at contributions from Needs review to In progress on the Discovery-Analysis (Current work) board.
Oct 12 2017, 5:59 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 11 2017

chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

@mpopov Looks like the file type categorization on commons is messier than we thought...
For example, File:Krazy_Kat_Bugolist_1916_silent.ogv is an ogv file, but its img_minor_mime is ogg, img_major_mime is application, and img_media_type is video. This is the same for other ogv files. While for ogg files like File:Whitenoisesound.ogg, its img_minor_mime is ogg, img_major_mime is application, and img_media_type is audio.

Oct 11 2017, 10:08 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx claimed T176493: Analysis of testing on 18 wikis with > 1% of search traffic.
Oct 11 2017, 6:19 PM · Patch-For-Review, Discovery-Analysis (Current work), Discovery-Search (Current work), Discovery
chelsyx added a comment to T177354: Metrics for SDoC: look at contributions.

Hey @chelsyx - what time frame does this cover?

Jumping in to say this looks like it's from launch of Commons to now.

Thanks @mpopov ! Yes, this is the file counts on Oct 10.

Oct 11 2017, 5:56 PM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx added a comment to T171652: Language Analysis Morphological Library Research Spike.

@mpopov Agree. That would be less confusing as well.

Oct 11 2017, 4:01 PM · I18n, Discovery-Search (Current work), Tamil-Sites, Malayalam-Sites, Bengali-Sites, Discovery
chelsyx moved T177354: Metrics for SDoC: look at contributions from In progress to Needs review on the Discovery-Analysis (Current work) board.

The number of files uploaded by bots is 9,390,408 (22.04%), and the number of files uploaded by users is 33,222,838 (77.96%). The following table break down the counts by major mime:

Oct 11 2017, 5:23 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 7 2017

chelsyx moved T177354: Metrics for SDoC: look at contributions from Backlog to In progress on the Discovery-Analysis (Current work) board.
Oct 7 2017, 12:52 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx moved T177354: Metrics for SDoC: look at contributions from Needs triage to Current work on the Discovery-Analysis board.
Oct 7 2017, 12:52 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata
chelsyx claimed T177354: Metrics for SDoC: look at contributions.
Oct 7 2017, 12:51 AM · Discovery-Analysis (Current work), Structured-Data-Commons, Discovery, Wikidata

Oct 6 2017

chelsyx added a comment to T176464: [Spike 2.5h] Did the new mobile header treatment break the search experience?.

Update: From the dashboard, we noticed that MobileWebSearch events increased drastically on Sep 29, back to the same level before March 29. We will keep watching.

Oct 6 2017, 11:25 PM · Discovery-Analysis, Discovery, Readers-Web-Backlog (Tracking), Spike, MobileFrontend
chelsyx removed a project from T176811: [Dashboard] Count the number of user session tokens by volume for mobile web search: Patch-For-Review.
Oct 6 2017, 11:20 PM · Discovery-Analysis (Current work)
chelsyx updated subscribers of T176811: [Dashboard] Count the number of user session tokens by volume for mobile web search.

Now live on beta: http://discovery-beta.wmflabs.org/metrics/#mobile_events

Oct 6 2017, 11:19 PM · Discovery-Analysis (Current work)

Oct 4 2017

chelsyx moved T176815: Investigate the full-text search pattern on mobile web from In progress to Done on the Discovery-Analysis (Current work) board.

We examined the query P5973 carefully and didn't find anything that would change the full-text search usage pattern on mobile web. More interestingly, when we focus on users who went through the "prefix -> full-text" funnel, we can see that while the number of users and the number of prefix search are higher on weekends, this same group of users open more full-text search result pages on weekdays:

Oct 4 2017, 10:15 PM · Discovery-Analysis (Current work)
chelsyx awarded T176464: [Spike 2.5h] Did the new mobile header treatment break the search experience? a Love token.
Oct 4 2017, 8:10 PM · Discovery-Analysis, Discovery, Readers-Web-Backlog (Tracking), Spike, MobileFrontend
chelsyx lowered the priority of T176464: [Spike 2.5h] Did the new mobile header treatment break the search experience? from High to Normal.

Sorry I'm late for the party.

Oct 4 2017, 8:09 PM · Discovery-Analysis, Discovery, Readers-Web-Backlog (Tracking), Spike, MobileFrontend

Oct 2 2017

chelsyx moved T131795: Create a parameterized report template for search team's A/B tests from Stalled/Waiting to Done on the Discovery-Analysis (Current work) board.

Done. https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/discovery/autoreporter and https://phabricator.wikimedia.org/diffusion/WDAR/
Thank you @mpopov !

Oct 2 2017, 6:47 PM · Discovery-Analysis (Current work), Discovery