Aug 16 2022
Thank you very much! :) I still see 147G, when will I see the new capacity and which?
The machine is: wcdo02.wcdo.eqiad1.wikimedia.cloud
Aug 5 2022
Aug 4 2022
Hi @fnegri. We managed to fix the issue. Now the new machine runs with an up-to-date version of Debian.
In any case, thank you very much. We're revising that everything is working. If there's anything we need to consult you, we'll get to you through the IRC.
Jul 28 2022
Hi! Sorry, I had not opened phabricator for months. I did not realize there was this problem and I never received an e-mail. What should I do? I am freaking out to the possibility of losing all the data/processes in the server.
Oct 20 2021
Document with the different types of visualizations that are currently used to depict the content gaps:
Oct 19 2021
Yes! Good idea. Here they go.
Sep 8 2021
you're welcome! if I can help in any way, let me know! :)
Aug 20 2021
Aug 16 2021
The session went very well. We spent more time in the first part (presentations and Q&A) than we expected, but we had the opportunity to engage with the audience in a Remo table (Building 6) without any time-constraint.
Jul 16 2021
Hi @nskaggs. Sorry to ping you again. I was discussing the set-up with the rest of the team, and we'd like to know if you could give us a hand in re-arranging the server capacities. We think it would be better to split it in two servers: front-end and back-end.
Thank you very much for granting us these resources and for your kind words, @nskaggs. We are totally aware of the circumstances, and we hope to get more clarity on the final resources that will be necessary for the project to run and be sustainable. We will get back to you with a longer message about our future/long-term plans so that we can discuss them.
Jul 13 2021
Sorry to insist, but it has been a little more than a month since our first request, and we would need to have an infrastructure to process, store and visualize this data for our research. Cristian's estimations are a minimum to get started in the Wikimedia server.
Apr 16 2021
Yes, I set up the puppet and now it works. Thank you Brooke!
Apr 15 2021
Hi. I'm sorry but I've just logged in the server and I tried to access /public/dumps/ and I cannot see them. I'd like to be able to access them locally in the same way I can do it in the wcdo.eqiad.wmflabs server. Would it be possible to do it? Thanks.
Apr 13 2021
Not sure I added the right tags or mentioned who can take this task. Could anyone give us a hand? Thanks.
Apr 7 2021
Apr 6 2021
Hi @Bstorm. Could you please create the same link for the server (chm)? The reason is the same. We need to access /public/dumps directly to avoid downloading it to the server and occupying space.
Thank you very much.
Nov 9 2020
OK. I deleted the task and I created it again and it worked. But there's something wrong with the characters that made that translation get stuck.
Nov 5 2020
I managed to add him (davids). Thanks.
Oct 22 2020
I just tried to do this on:
Oct 12 2020
Ah, I see what you mean. It's good I have this workaround with the
I hope that at some point these truncated values will be fixed in the dump.
But meanwhile I went with the "ignore" into .decode and it works!
I think I understand what you mean, but I am not entirely familiar with encoding.
So, if the cl_sortkey field is breaking the conversion into utf-8, can't I just avoid this field somehow? Actually, I don't need it - I only need cl_from and cl_to.
Oct 9 2020
Oct 8 2020
I've noticed that other languages like Russian or Macedonian have the same problem.
Oct 7 2020
Aug 19 2020
By the way, I just found that in cawiki files for years 2011 and 2016 (I've just checked this one now) rows are not sorted. The file starts with timestamps of 2016-01 and then it follows with 2016-04 and it continues with another one from 2016-01. I guess that since it happened with a one-file-dump like gnwiki, and with one-file-per-year dump, the bug might also affect one-file-per-month, as it is the case for enwiki.
Please, let me know when it is fixed, as I rely on it for creating a dashboard.
Thank you very much.
Aug 18 2020
Aug 13 2020
Sorry to bother, but I am using the dumps and I see the same problem on gnwiki. This is a really small wikipedia which is all in one file. The rows are not sorted by timestamp and the last ones are from 2010. Maybe the problem is fixed for other sizes but not for all-in-one-file. Would you please take a look at it? Thank you.
Jun 5 2020
I totally understand. I would only ask you to consider releasing files
(datasets or dumps of any kind) additional to the queryable version,
because I need to use the data from the 300 languages I prefer "not
querying this much". Perhaps if the cluster is powerful enough, it could do
this once and leave the data there? In any case, it is just a suggestion,
you take into account many more factors and interests of other users, that
may be coincident with mine or not.
Jun 4 2020
Oh, I see. But these seem two different things. The API will increase the use of some data contained in this dataset. But making it queryable does not mean that it will become easy to obtain the number of editors. It would take querying the whole dataset to obtain the number of editors for all pages, which could be a bit costly, especially for English Wikipedia, or when doing it repeatedly as I have to do for all languages.
Jun 2 2020
What Dan is suggesting is the number of editors who intervened in an article. That's very useful.
May 20 2020
Oh, wow, it is almost 10x faster! Thanks.
Missatge de bd808 <firstname.lastname@example.org> del dia dc., 20 de
maig 2020 a les 1:09:
May 19 2020
Sure! No problem. Please, let me know when it is done. :) Thanks.
May 14 2020
@bd808 thank you for the news. the downtime is not a problem at all. if the migration could be done on Sunday or Monday it would be great.
Thank you very much for taking care of it and finding a solution.
May 13 2020
Thanks for your answer and for dedicating this time to look at the capacity dashboard.
May 11 2020
I did a couple of tests to understand better how it is working. The test consisted in running an estimation of the 1% of the Wikidata dump (900,000 lines), assuming that it has 85,696,352 qitems. (If I remember correctly, last October it was around 60M, the increase is notable but not comparable to the increase in time reading the dump).
May 5 2020
Thanks for looking at it. I remember it worked very well last July. Even in
November, I am not sure it was working as fast as in July.
I'll do one more test and I'll get back to you. Thanks.
May 4 2020
I assumed that Categories and Wikipedia: pages coming from language editions would maintain the ns from their origin wiki. We can close this. Now it is clear. Thanks, Addshore.
Apr 30 2020
The wikidata dump still does not include the namespace tag.
Apr 26 2020
I did create this bug report: https://phabricator.wikimedia.org/T251065
Apr 25 2020
I noticed that reading the dumps at /public/dumps from the wcdo.eqiad.wmflabs server is much slower now than the months after this NFS share was created.
Jul 18 2019
It works! Thanks!
Feb 15 2019
I need all the Wikidata qitems that relate to Wikipedia articles. If I understand it correctly, these are qitems that have namespace 0. Although not all qitems with namespace 0 necessarily have sitelinks (they could be just qitems without an article).
Feb 8 2019
The use case is to process the dumps and filter out qitems which do not relate to articles, this is why we put NS0. The JSON dump sample says there is ns field but in the final dump there is no such field.
Jan 11 2019
Current version of the Top CCC articles is at https://wcdo.wmflabs.org/databases/ with the name top_ccc_articles_current.db.
Dec 12 2018
Nope. I haven't for days. Since your last email.
Dec 2 2018
Thanks for the message akosiaris. I'm sorry the posts were heavy. This process runs only once a month.
May 6 2018
The solution was to do the join by code logics, even though this implied that the script had to run for a longer period of time.
Apr 27 2018
Apr 26 2018
Apr 24 2018
The size/number of parameters depends on the max_allowed_packet value.
I checked that: SELECT @@max_allowed_packet; and is 33554432 (32M).
It can go as high as 1G. The good news is that this buffer is only allocated as needed, so setting it to 1G is fairly harmless. 1G will allow you to insert all those 2 million tuples at once.
a) OK. Bad news.
p.pl_title IN (SELECT page_title FROM u3532__.'+item+'_page_titleswithredirects)
- OK to not possible. I had already assumed that I could not do that join.
I previously selected these two million articles with several criteria. I have them in a Sqlite3 database in the VPS.
Apr 23 2018
Apr 6 2018
Mar 17 2018
I had no problem in accessing tools-login.wmflabs.org before. I created the 2fa_reset.txt in marcmiquel account in toolforge, not in the wcdo tool account.
Mar 16 2018
Mar 11 2018
OK. Sorry if it becomes boring, but let me reframe it again and explain bit better. I develop the current situation with some new experiments with MySQL in Toolforge.
Mar 7 2018
Mar 6 2018
Aug 2 2017
Nope, closed. Thank you
Mar 28 2017
You can delete the databases. I am sorry for not replying, I did not see the e-mail. Thanks.
Feb 6 2017
Aug 9 2016
I just fixed the problem and now I'm going to code a workround not to saturate the server. thanks.
Jun 5 2016
hi chasemp, I cleant 30GB more. in few weeks i'll clean more. thanks! :)
Apr 22 2016
Done! I cleant more than 50 GB. I hope it's enough. However, I will clean
more in the following weeks. Cheers
Apr 21 2016
Thanks for the message. I created new files few days ago, in particular a very big one from the English Wiki. I will free it as soon as I can. I hope next week I can be done :)
Apr 14 2016
I requested a slot for the showcase, as @Halfak said. However, I am superbusy in the last months of my thesis and I am not able to prepare it properly. So I'd prefer waiting for another occasion to present you my research. Also when I have my material a bit more mature :) Thank you very much.
Mar 15 2016
@DarTar no problem. I am quite busy, so waiting until April or May won't be bad. I will let you know! Thanks! :)
Feb 24 2016
My excuses, I run a script in the bastion when I should have instead sent it to the grid. They noticed me this in the IRC channel and now it is clear.
Feb 21 2016
Aug 11 2015
Excellent! I see some differences and more information in it. Great. Thank
Jul 16 2015
Thanks for checking MZMcBride!
SELECT "eswiki",user_editcount FROM eswiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "enwiki",user_editcount FROM enwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "frwiki",user_editcount FROM frwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "itwiki",user_editcount FROM itwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "plwiki",user_editcount FROM plwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "nlwiki",user_editcount FROM nlwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "dewiki",user_editcount FROM dewiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "ruwiki",user_editcount FROM ruwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "ptwiki",user_editcount FROM ptwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "euwiki",user_editcount FROM euwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "zhwiki",user_editcount FROM zhwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "glwiki",user_editcount FROM glwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "ocwiki",user_editcount FROM ocwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "anwiki",user_editcount FROM anwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "huwiki",user_editcount FROM huwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "ukwiki",user_editcount FROM ukwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "warwiki",user_editcount FROM warwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "arwiki",user_editcount FROM arwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "viwiki",user_editcount FROM viwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "svwiki",user_editcount FROM svwiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "rowiki",user_editcount FROM rowiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "mswiki",user_editcount FROM mswiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "fawiki",user_editcount FROM fawiki_p.user WHERE user_name LIKE %s UNION ALL SELECT "cawiki",user_editcount FROM cawiki_p.user WHERE user_name LIKE %s ORDER BY user_editcount DESC
Jul 15 2015
Jul 13 2015
Apparently all issues are solved. I'm a bit surprised by all these changes. I haven't changed any line of code.
Now it does work.Thanks!
Jul 11 2015
A while ago I couldn't connect...