Fri, Jan 12
Tue, Jan 9
Wed, Jan 3
Tue, Jan 2
Just tried the second link (API results for "sistema parlamentario con sede de gobierno" on enwiki (including results from eswiki)) and got the following:
Wed, Dec 20
Added Python version into the production instructions for @EBernhardson's convenience :) https://github.com/wikimedia-research/Discovery-Search-Adhoc-RelevanceSurveys/tree/master/production#predicting-rank
Tue, Dec 19
Alrighty, here ya go! It's not as pretty as you were probably expecting!
Mon, Dec 18
Dec 15 2017
Dec 14 2017
Okay, I've got some stuff working right now for fetching this stuff which means it's time to consider aspects we haven't before.
Dec 13 2017
Search query language breakdown note & results at https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177358-2
Dec 12 2017
Dec 8 2017
I wonder if it's possible to make a Hive UDF that uses Google's Compact Language Detector v3 (CLD3) library & model https://github.com/google/cld3
Dec 7 2017
Dec 5 2017
Final numbers from English Wikipedia: https://github.com/wikimedia-research/Discovery-Search-Adhoc-SearchResultPageInvestigation#findings
Nov 30 2017
Nov 28 2017
Might seem like a silly question and I hope this doesn't come off as offensive, but did you re-install all the R packages you were using before? Because even though your R package library was copied over from stat1002 and R will technically see those packages as available, they all need to be re-installed because it's a new machine. Copying libraries only works when it's the same OS & configuration. The packages that include C/C++ code especially require re-compilation.
Nov 27 2017
Nov 22 2017
HOLY MOLEY I'M FINALLY DONE: https://wikimedia-research.github.io/Discovery-Search-Adhoc-RelevanceSurveys/
Nov 21 2017
Nov 14 2017
I might be able to work on this in my free time at some point :) seems like a fun project
Nov 13 2017
Nov 10 2017
Nov 9 2017
Nov 8 2017
Nov 7 2017
Nov 2 2017
Dashboards removed and golden is no longer generating daily forecasts. RIP cool idea
Nov 1 2017
- discovery::deep_learner role works (makes TensorFlow, Keras, and Caffe available, including the R wrappers for TF & Keras)
- discovery::learner role works (makes a bunch of ML packages available – in R & Python)
- discovery::forecaster role works (makes a bunch of time series / forecasting R packages available: prophet, bsts, forecast)
- discovery::bayes role mostly works (for some reason it doesn't install rstan but discovery::forecaster – which also includes the discovery_computing::bayesian_statistics profile – does)
- discovery::allstar_cruncher role works (multi-purpose, has ALL of the packages from the other roles, so get your game on, go play)
Oct 31 2017
Making the forecasts has become too computationally intensive (takes a while and hogs up some resources on the stat box), so I think I need to disable them and remove the dashboard. The source code (for both the dashboard and the forecasting) will still be up and existing forecasts will be preserved, but no new ones will be made.
Oct 24 2017
@Gehel: getting the following errors. test-bayes, test-learner, test-deep-learner, test-forecaster, and test-allstar-cruncher all use Stretch.
Oct 23 2017
Ish? Until this is done, we're limited to using Ubuntu for the VMs that host our dashboards. Since the WM Cloud team (formerly WM Labs) is deprecating Ubuntu Trusty in favor of only offering Debian for VMs, we'll have to file a Phab ticket requesting a Trusty instance if we have to shut one down and launch a replacement. I don't think this task should be declined, but I am gonna adjust the priority to reflect where we are on this.
Oct 17 2017
Oct 13 2017
@chelsyx do you wanna add your stuff to https://github.com/wikimedia-research/SDoC-Initial-Metrics ?
Queries & data uploaded to https://github.com/wikimedia-research/SDoC-Initial-Metrics
Growth of number of deleters over time:
- Historical trends
Oct 12 2017
Oct 11 2017
- Most copyright-related deletions happen within 1 day of upload across almost all media types, with the exception of 'drawing' (SVGs)
- A lot of audio files are deleted within 1 minute or 1 week of upload
- Half of all images and PDFs deleted were deleted within 1 month of upload for non-copyright reasons
Reasons for files deleted in 2017:
Oct 10 2017
Oct 6 2017
@chelsyx: thanks and good job!
Oct 5 2017
Oct 3 2017
It would depend on how often things below the top 20 move into the top 20 in practice, not just in theory. We can use the search logs to find this out, no?
Oct 2 2017
Sep 28 2017
Using just the event logging data from 2017-08-01 to today (2017-09-28), here's a glimpse at queries from abandoned full-text searches:
Bootstrapping finally finished -_- second draft up at https://people.wikimedia.org/~bearloga/reports/ltr-test.html
Sep 27 2017
@mforns: we specify the analytics-store hostname in our R package (the function that makes sql queries: https://github.com/wikimedia/wikimedia-discovery-wmf/blob/master/R/mysql.R#L39--L76) which is used for querying both wiki content dbs as well as the log db. If we add a type argument that sets the hostname ("db1047" in case of type == "events", for example), what hostname should we use for non-eventlogging queries?
Deployed at https://discovery.wmflabs.org/forecasts/