User Details
- User Since
- Aug 9 2016, 1:58 PM (509 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Muwnd [ Global Accounts ]
Mar 21 2020
Jun 6 2017
@Capt_Swing I'm very interested to talk to you about it. But I guess this ticket is the wrong place for it. I already sent an email to you a few weeks ago: ms (a) mieo.de.
May 26 2017
May 12 2017
@Capt_Swing Will you continue the research on this topic? I'm asking because I plan to do a similar evaluation for link-based recommendation algorithm but instead a user study I want to use the Android app to conduct an online evaluation. (See: T142477)
Apr 20 2017
Apr 13 2017
Mar 29 2017
CirrusSearch API is used. See T143197
Mar 27 2017
@bd808 Thanks. I added the file:
Mar 11 2017
@Aklapper Is there anyhow a way to proceed with this?
Mar 3 2017
I mixed something up. The phab account is linked to my MediaWiki profile. But yes, the wikitech user page refers to a ticket of this phab account. Sorry for the confusion.
Mar 1 2017
Feb 4 2017
@dcausse Thank you very much! In the MediaWiki everything seems to work correctly. However, in the Android app it does not work. I cannot use citolytics-en.wmflabs.org as mediaWikiBaseUri / API endpoint. I keep getting these error messages when opening a Wiki article from within the Android app:
Jan 29 2017
Now you should have SSH access to hadoop000.math.eqiad.wmflabs. The ES dumps are located in /srv/wikisim/data/results/:
I'll prepare the ES bulk dumps for enwiki, simplewiki and ndswiki and upload them to hadooop.math.eqiad:/srv/wikisim/data/results/.
Jan 27 2017
- I uploaded a result file to one of our lab instances (hadoop000.math.eqiad). What would be the best approach so you can access it for testing?
- For the Oozie workflow integration I already prepared a PySpark script that reads the data from HDFS and send updates as bulk to ES ( https://gerrit.wikimedia.org/r/#/c/334130/4/oozie/citolytics/transferCitolyticsToES.py - it mainly reuses the code from the popularity_score script). If this script is not suitable for testing, I also can prepare data in the elastic bulk format.
- Regarding languages, it depends what would be the simplest way for testing. I can generate recommendations for only a single language but also for more or all that are available as XML dump.
Jan 26 2017
@dcausse What needs to be done after the code review? Or what are the next steps to get the code deployed?
Jan 25 2017
@Physikerwelt The uncompressed the results (enwiki) are around 50 GB in size (~10 GB compressed). Other languages will less. So they won't fit to github (max. 2 GB per file) and onedrive. For now, I'll start uploading them to a lab instance.
Hi. I'm having the same problem, when using Puppet to install role::elasticsearch::cirrus on a labs instance (hadoop000.math.eqiad.wmflabs). What should I git clone? Or is there any other work-around?
Jan 24 2017
Where can I upload the data? The data from all requested wikis won't fit on our labs instances.
Jan 12 2017
Jan 5 2017
Jan 4 2017
Dec 30 2016
Thanks! Can you recommend any Oozie starting point? Is there already a workflow that uses Wikipedia XML dumps? Or one that writes to ES?
Dec 20 2016
Can you provide the database schema where the data is stored? Then, I can create a query for the aggregation.
Thanks for setting it up.
Dec 19 2016
Dec 17 2016
@Physikerwelt @Ottomata I was able to run Flink jobs on YARN (see https://wikitech.wikimedia.org/wiki/Flink ). However, I could not enable Oovie / Hive using these instructions:
Dec 13 2016
Without having HDFS mounted Oozie fails, because it cannot access HDFS:
Dec 12 2016
Same problem when only enabling the hadoop role :/
Vagrant seems to call the lxc-attach help function:
Due to the error with --provision the Hadoop ports weren't set up correctly:
Not rebooting is not really a suitable solution when using the VM for development, since I also need to enable other roles or change port-forwarding.
Dec 1 2016
mschwarzer@mlp:/srv/mediawiki-vagrant$ vagrant --version Vagrant 1.7.4``
Nov 29 2016
Nov 21 2016
@Physikerwelt The data is currently not available, but I already requested the release (See https://phabricator.wikimedia.org/T125393 ). As soon as the data gets public I'll do the analysis.
Nov 15 2016
@EBernhardson @mpopov Any news on releasing the data?
Nov 3 2016
@Deskana Most of the work regarding CirrusSearch and Android is already done. Thus, I think we can keep the work load of your team at a minimum. In other words, I would be happy to do as much work as possible.
Nov 2 2016
As @leila said in T143197#2752764, this experiment requires that someone in the WMF needs to "own" it. In order to make that happen, I would like to know how we can support that or who to contact.
Nov 1 2016
@Nuria Thanks for the clarification. I'll review the project and update the corresponding tickets.
Oct 28 2016
@EBernhardson Thanks for pointing to the spreadsheet. It would be really great, if you can make the (anonymized) raw data available so that we can prepare our study.
@leila Thanks for your questions!
Oct 27 2016
Is the outcome (raw data/evaluation) of the A/B test still available? We would like to use it as reference for our Citolytics A/B test.
Oct 26 2016
@Physikerwelt do you have access to WMF resources where we can store the recommendations?
Oct 25 2016
@dcausse I currently do not have access to the analytics cluster. Is it possible to upload it somewhere else?
@Dbrant Yes, that's correct!
@Physikerwelt The JSON output for top-10 recommendations (including scores) is around 2GB in size (without scores 1.3 GB).
@dcausse The transferToES.py script should work to write the JSON data from HDFS to ES. But what would in general the best approach to get the Citolytics recommendations to the CirrusSearch ES instance?
Oct 22 2016
@Dbrant How did you evaluate your morelike A/B testing? ( https://phabricator.wikimedia.org/T125393 ) Is it possible to re-use your system for Citolytics?
Oct 21 2016
Sep 27 2016
@dcausse, can you point to the CirrusSearch process you mentioned for writing data from Hadoop/HDFS to elasticsearch. I could only find a class for writing to elasticsearch without HDFS.
Aug 18 2016
Thanks! This way sounds more feasible. I'll add an extra query prefix (citolytics:) to CirrusSearch.
Aug 17 2016
Aug 16 2016
The Wikipedia API with CirrusSearch extension is used:
Aug 15 2016
Clicks on article recommendations are tracked via the Analytics funnel API:
Aug 13 2016
The current automatic recommendation mechanism uses the MediaWikiApi ("morelike:"-query).
Aug 12 2016
@Physikerwelt can you tell me where I can find the respective source code?