Mon, Jul 9
Regarding the visualization problems, autocomplete, but specially the inline plots, here there is a possible way to explore:
After restarting the notebook, now I get this error. I have the same error if I use pyspark from stat1005. The solution that I found for this is working with python2.7 in pyspark
Sat, Jul 7
I'm ok changing to another language. I don't have strong preferences between Korean or Vietnamese, I think we should select the one with more probabilities of getting good/fast translations.
Thu, Jul 5
@Trizek-WMF , thanks for your efforts.
Can we push more for translators from/to Japanese? This is the main missing piece.
Mon, Jul 2
Mon, Jun 18
Jun 13 2018
@Trizek-WMF, I agree with you that instructions could be improved. I have myself needed to struggle a bit to translate to Spanish. But, we have already done many iterations to reach this quality of instructions, and although they are far from perfect, I don't thing that we should be blocked on this. So, please, let's just use this instructions.
Jun 12 2018
@Trizek-WMF : I have completed the translation to Spanish.
Jun 11 2018
@Trizek-WMF : you mean a subpage in Meta? We can create a new one under this one:
Jun 1 2018
Thanks for this stats @bmansurov
May 24 2018
@Trizek-WMF : I've updated the list of candidates.
For candidates coming from the Babel Template, I'm filtering out the ones with less than 11 revisions in each Wiki.
For the candidates coming from the CX, I'm filtering the ones with less than 11 translations. I'm also providing the number of translations that they have already done
May 23 2018
Hi @Trizek-WMF! Thanks for all the work the you have done in so little time.
May 19 2018
I have no experience with A/B testing. I could help later with some data processing.
May 16 2018
May 14 2018
We are considering to collaborate on this with EPFL. To proceed, with need to:
May 3 2018
May 2 2018
@bmansurov , ok let's try to solve that.
Here the remaining languages:
May 1 2018
Any reason for doing manually? there is no way to automatize this process?
en: Help request for mapping section titles
es: Solicitud de ayuda para mapear títulos de secciones
fr: Aidez-nous à associer des titres de section
ru: Просьба помочь с переводом заголовков разделов
Apr 23 2018
Please find the list of people to be contacted here: https://docs.google.com/spreadsheets/d/1vmTvSFitmsbpFKagLVxR2c2VcY8mBIVRdd_KUa_cIc4/edit?usp=sharing
Apr 17 2018
Thanks @bmansurov. Let's wait until tomorrow and see how it works. Then, I'll give you 30 usernames more.
@bmansurov, we need to change the spreadsheet permissions, we need to allow non-logged people to edit the document. Do you know how to do it?
Apr 10 2018
- @bmansurov you can find the list of users speaking English and another language here: https://docs.google.com/spreadsheets/d/1Xzy9yd3lC8yMJm8fGGGe2vGCeMmLwN700iRROHi0jCs/edit?usp=sharing, we can already contact them.
Mar 27 2018
We (@leila and me), have updated the labels, now we will use: same, overlap and different. And translated this in Spanish, and required help from staff and community for translating this labels in the other 4 languages.
We have also added 3 columns, for collecting different assessment in the case different opinions among reviewers.
Mar 22 2018
@leila , done
Mar 20 2018
Please find the data to labeled here: https://drive.google.com/drive/folders/1pzR3P16ck7FyrE7QgIpcSx1TPumTGA9u?usp=sharing
Mar 19 2018
Mar 13 2018
@JAllemandou : just for the record, in this case I meant the parquet partitions. See you in IRC
@JAllemandou : My understanding is that if you partition by a unique id, you sort by that key,and then all the contiguous ids are in the same partition, as explained here: https://hackernoon.com/managing-en,spark-partitions-with-coalesce-and-repartition-4050c57ad5c4
@JAllemandou : as we discussed on IRC, could you please add the timestamp for each revision?
Also it would be good to have the data partitioned by revision_id, because this would make easer futures joins to get additional information (e.g. user)
now 66G /home/dsaez/
Mar 12 2018
@Ottomata and @JAllemandou I found a work-around by creating an python2.7 virtualenv on stat1005.
I think that is the easiest solution right now. Updating python3 on the workers might be a good idea for the future :)
You can find the candidates for synonyms here:
Mar 11 2018
@JAllemandou : just came back to this. The parquet version is amazing!! Thank you very much!
Feb 27 2018
Feb 26 2018
@DarTar: Do you have any preference for the format of this dataset? I can think in two ways of present it:
Feb 15 2018
to make stat1004 we need to solve this: https://phabricator.wikimedia.org/T187178
Feb 14 2018
Feb 13 2018
Feb 12 2018
@bmansurov please find the candidates here: @stat1005:/home/dsaez/code/alignment/resultsMapping
Feb 10 2018
Feb 9 2018
@bmansurov could you try to upload the 20180201 dumps for en,ru,ar,jp,fr,es in parquet_
This is not urgent but might be useful for the section recommendations project.
Feb 8 2018
quick comment: from this results http://gapfinder.wmflabs.org/fr.wikipedia.org/v1/section/article/Barack_Obama
Feb 7 2018
Feb 5 2018
Jan 18 2018
X is the number of people that speaks 'en' and 'uz', Y is the number of people that speaks 'en' and 'fr' ...etc
this is great @bmansurov !
Jan 3 2018
Results are interesting for understanding topic-span of edit wars. Cross-topic edit wars are rare, and usually associated to very active users.
The viability of applying this approach for detecting harassment or more specifically wikihounding requires deeper analysis.
Jan 2 2018
Dec 6 2017
Nov 14 2017
Nov 8 2017
Sep 27 2017
Installing sshfs would be also a good solution for this and for https://phabricator.wikimedia.org/T176093
Sep 17 2017
@Aklapper, sorry, it is analytics. Tagged.
Sep 16 2017
Sep 6 2017
Sep 5 2017
Sep 1 2017
Aug 18 2017
Aug 17 2017
new production key:
my ssh config
Aug 9 2017
@RobH: I would prefer to have just one account, with 'diego' as username. I can delete the personal one.
just FYI my username in wikitech is diego, but my 'Instance shell account name:' is dsaez (diego was not available)