Sat, Nov 14
This last remark reminds me of something I wanted to add: Pages are (except for the first few thousand, as there has been a restart which lost track of the original ordering) numbered in the order in which they are created. As file history5 ends with pages created in February 2014, it is likely to find the more recent opinions on when a page is a 'Beginnetje' in the newer pages, so focusing on history6 might be wise. There will be removals of 'Beginnetje' from older pages, contained in history1..5, but that might be a fewer pages. RevisionId 40371396 is an edit of 14 February 2014, 45656960 is made on 1 January 2016 and 50643818 on 1 January 2018.
Thu, Nov 12
The results from file 4 and 6 are added. These gave .json-files from 1.9 and 2.3 MB. The results from history5 are a bit to big to add: a json-file from 27,4 MB is hard to add to github.
History5 contains pages added between December 2010 and February 2014.
Wed, Nov 11
Files history2 and history3 have been processed. See
- https://gist.github.com/Ronnie-V/5abd8aa3dd4518e580b652a178495965#file-nlwiki-20201101-pages-meta-history2-json (>380 kB)
- https://gist.github.com/Ronnie-V/5abd8aa3dd4518e580b652a178495965#file-nlwiki-20201101-pages-meta-history3-json (± 1 MB)
The other three files will be done tomorrow.
@Chtnnh Fine that you are running it from a utility script. The __name__ == "__main__" won't be an issue than, but it won't hurt either and makes it possible to run the script without using utility.
Your source file seems to be somewhere on a central computer. That's fine, I'm running the script on my home computer. In the public dumps there are six files for groups of pages, with a combined (packed with bz2) size of approx. 36 GB. Can you conform that size for your /mnt/data/xmldatadumps/public/nlwikimedia/latest/nlwikimedia-latest-pages-meta-history.xml.bz2 file, to make sure we are using the same source? pages-meta-history.xml sounds good to me.
Sat, Nov 7
Running the script (seems to) take(s) a lot of time. I ran it against the nlwiki-20201101-pages-meta-history2.xml-p134539p484052-file for page numbers up to 136000, and got a result in json. I added it to gist mentioned in the previous message, so we can see what result can be optained.
@Chtnnh, will this be enough for you to run it, see what the outcome is and so something with that?
Fri, Nov 6
My version, to run the first file, is at https://gist.github.com/Ronnie-V/5abd8aa3dd4518e580b652a178495965
Unfortunately, it will just give a result after processing the whole file, 150GB of (unzipped) data.
The attribute errors seem to be thrown for hidden versions, like https://nl.wikipedia.org/w/index.php?title=Nelson_Mandela&oldid=39191825 Seems like a good reason to exclude the page from the zipped version.
I have been looking at this script. It needs two more lines at the end:
Aug 10 2020
Good to hear you are going to change it.
And very good to hear input from the past is not lost, but simple not adequately transferred to Commons. For those interested, it will be possible to restore the missing data manually, and it might even be possible to have a look at re-exporting them.
Aug 6 2020
Maybe not everyone is aware of this pipe trick, maybe everyone else is even more into previewing what the result of their edit is than I am. In the Dutch community, there are six usernames with a ( in it. Four of them have a space before it, two not. Of these four, three are ' (WMF)'.
May 27 2020
I bet there is a big gap between 2 and 3, and most articles will be in there. But if ORES could help identify articles which belong in one of these four categories, I'd be happy if the remainder is in that gap. ORES could than, later on, help categorising the articles from the in between group and might identify candidates for the four categories.
Hey folks, Thanks for the nice meeting and your information.
The template name is 'Etalage', 42311910 is the revision number that got approved and then follow year, month and day of the decision to recognise this article as 'Etalage'.
The template name is 'Beginnetje'. It is followed by a category (1 out of a fixed list of 46), and then follow year, month and day of the decision.
May 13 2020
Dec 4 2019
Images are downloaded.
Now looking for a convenient way to upload all of these.
Nov 27 2019
Nov 23 2019
Thanks to Anton for realising this.
Infobox is created and used on some pages about plantages.
Anton is working on it at Wiki Techstorm 2019.
Nov 20 2019
Nov 13 2019
Nov 6 2019
@Ecritures , the file stated above which you call 'metadata' (it's just the data belonging to the specific picture), only contains 200 pictures, not the 10.000+ you said it would contain.
I can make four sets of 50 items each, but that is it. Please give me a file containing all 10k+ records.
Nov 5 2019
Overigens zijn deze afbeeldingen uit de tachtiger en negentiger jaren van de twintigste eeuw, Zijn deze vrij beschikbaar? Ik zet daar mijn vraagtekens bij.
Er staat nu een setje van 200 records op https://maior.memorix.nl/api/oai/raa/key/Elsinga/?verb=ListRecords&metadataPrefix=ese . Dat kan ik in vieren hakken, maar komt niet in de buurt van de 10.000 afbeeldingen.
Het uploaden van de hele bups is sowieso geen taak voor dit moment, dus hoort niet op mijn bordje te liggen.
Volgens mij zou Ecritures voor het downloaden zorgen. Leuk dat Ecritures dit opeens aan mij assigned, maar dat is niet hoe het werkt.
Nov 4 2019
Wat ben je aan het doen?
Oct 29 2019
Hoe vordert dit?
Oct 25 2019
Verwacht je dat het verzamelen van deze informatie de eerste taak is tijdens de workshop, of verkrijg je deze vooraf, bij voorkeur via het project, zodat er (kleine) batches van gemaakt kunnen worden die gebruikt kunnen worden tijdens de workshops en (grotere) batches om later te gebruiken?
Oct 23 2019
Oct 21 2019
I will divide it into batches, smaller ones and bigger ones, once I have received the results of T236032.
Sep 6 2019
I hope it will help the community with rethinking the blocking policy. The large number of indefinite blocks for anons and ranges should be a reason to rethink this, especially with [https://wikimania.wikimedia.org/wiki/2019:Research/Despite_the_ban:_doing_good_work_anonymously_on_Wikipedia this in mind]. The quality of edits by users of tor is about the same as the quality of all anonymous contributions.
Aug 21 2019
Thank you both. Where can I find this translatewiki so I can put in a language reference, instead of the (hardcoded?) 'Engels'?
And which placeholder is for the introducing text ('You are setting label, description and aliases in <Language> for Schema <Entitynumber>')? That one is in English, should be translated (made available in Dutch too).
Hi @Lydia_Pintscher , thanks for testing. I retested it. The problem remains the same, when my language preferences are set to Dutch (nl). [https://www.wikidata.org/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E105/de] then gives me 'You are setting label, description and aliases in Duits for Schema E105.' on top of the page, but 'Het label van het schema in het Engels' in grey in the box for the label.
The first sentence should be in Dutch, the second one should mention 'Duits' (instead of 'Engels').
When I switch to English or German as language, it seems to be working fine.
Aug 16 2019
Aug 15 2019
Any clues where these texts are coming from would be appreciated. I might want to dive into it in order to get this fixed, but I got no clue where to start looking.
May 30 2019
@SIryn, @DanielleJWiki, @Ecritures : If you would like to add more languages, the current languagetemplates are on Google drive. Translations are welcome.
Please note that there are differences between the languages. Maybe we should get to a more resembling text, for instance pointing to both the own URL of the institution and the commons-page for the institution (which does not always exist).
@SIryn, @DanielleJWiki, @Ecritures : Most of the work is done, see [https://commons.wikimedia.org/wiki/Special:Contributions/RonnieBot] and [https://commons.wikimedia.org/wiki/Template:RonnieVKoninklijke_Bibliotheek].
May 28 2019
May 27 2019
I've just been looking at this task and got some questions/remarks (sorry, in Dutch):
May 23 2019
It seems the first batch is imported, but not the rest of 'monumenten.xls'.
Not sure about the status of other files.
The script for making pages on the Dutch Wikipedia on the information in 'Kopie van Erfgoed van Strijd Bevrijding Verzet.xls' was kind of working at the end of WTS2018. Migth have another look at it, so we can make it work before WTS2019 and have all these articles made.
May 18 2019
Oct 28 2018
Good to see that you found a solution for the strange addition which changes integers to floats. New imports will be fine, so it's just a one time clean-up for 100 records. Manual handling is a quick solution, writing a script might be nicer and help in other cases
The spreadsheet which Slryn made available has also a description of each monument. If this information is available under the right license, we could use this spreadsheet to create pages for monuments and use the infobox and these descriptions to start with articles on a reasonable level. Some script to use a csv to create these pages and fill them was already made yesterday. I'll upload it to github.
If we get so much detailed information into Wikidata, we should try to use a Wikidata-driven template to show this information on Wikipedia. Will we use a special template for this, or update the template [https://nl.wikipedia.org/wiki/Sjabloon:Infobox_beeld]? Not all monuments are sculptures.
Should we make a subtask or a separate task for this?
Oct 24 2018
Looking at places like Bushiribana and their definition of 'population', this doesn't look like valuable information.
To preserve some quality, I'd recommend to find better info or refrain from importing.
Jul 14 2018
Jan 21 2016
I'd like to ask to reconsider this decline.