Pywikibot, Wikidata, i18n, GLAM stuff
Sat, Nov 28
Tue, Nov 24
Hi folks, happy to see activity on this task. Current code at https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painting_images.py#L704
Fri, Nov 13
@Cparle @matthiasmullie I noticed the change was merged. Do you have a pointer where the actual mappings are stored? See https://github.com/multichill/toollabs/blob/master/bot/commons/own_work_sdoc.py#L66 for a list of mappings that I would expect (minus the variants).
Sat, Nov 7
Thanks for adding screen. Can you add /usr/bin/mysql to the whitelist too? Just the client part. Mariadb server will auto disconnect any long open sessions and the client will just reconnect when a session is needed again.
Nov 1 2020
I just noticed this also breaks https://commons.wikimedia.org/wiki/Special:MediaSearch if you sort it by "recency".
Oct 29 2020
I looked around in old bugs and found T129046 . I think it went like this:
Oct 27 2020
Very confusing indeed. I updated https://www.wikidata.org/wiki/MediaWiki:Protect-text
Still means the wording of the protect page is incorrect. Currently it says "all users" and "allow only administrators". The "all users" is not correct because changing the protection to that won't make it possible for all users to edit, only for autoconfirmed users and above. I see we have T266394 for that
Oct 25 2020
Forked the Commons part in T266407 and left this one for the Wikidata part.
Oct 22 2020
Thanks for your explanation Gilles. You can see the bug in action at https://commons.wikimedia.org/wiki/Special:NewFiles
Oct 21 2020
@Gilles how did you come up with the number 50? The standard number of thumbnails in a Commons category is 200 and it's very common for galleries to have more thumbnails than that. I've hit this limit many times over the last couple of months. This is a source of much annoyance.
Not sure if this is related: https://commons.wikimedia.org/wiki/Category:Creative_Commons_Attribution-Share_Alike_4.0_International_missing_SDC_copyright_license is empty now, but https://commons.wikimedia.org/w/index.php?sort=last_edit_desc&search=incategory%3ACreative_Commons_Attribution-Share_Alike_4.0_International_missing_SDC_copyright_license&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns6=1&ns9=1&ns12=1&ns14=1&ns100=1&ns106=1 returns over 400.000 hits.
This was just a task for one of the in person events (feels like ages ago!). Online discussion continued on https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling
Oct 18 2020
Oct 17 2020
Oct 13 2020
Stepping back a bit. Wbeditentity should work more like the normal edit (action=edit) with things like how to handle edit conflicts and also minor edits.
Oct 11 2020
I just noticed this mid air collision: https://commons.wikimedia.org/w/index.php?title=File%3ANSG_Salmorth_PM19-09.jpg&type=revision&diff=485124008&oldid=485009736 .
If I understand Adman correctly, adding the option to use baserevid won't solve this. Using wbsetclaim isn't really an option because edits like this would take 8 edits instead of one. Maybe introduce baserevid and some kind of option for the bot to indicate that it wants strict checking instead of a warning?
Oct 10 2020
Raising this to high because more than half the files on Commons have structured data now so the chance of running into this is much higher.
The easiest solution is probably to just show the wikitext in view mode only like when you try to edit a protected page, but you don't have the right to do so. That way we can at least access the wikitext and copy it.
http://www.wikidata.org/entity/Q269728 is the entity uri (not https://www.wikidata.org/wiki/Q269728), http://www.wikidata.org/entity/Q269728.json redirects to https://www.wikidata.org/wiki/Special:EntityData/Q269728.json which returns the json. Marking this one as resolved.
Oct 6 2020
@CDanis based on the webserver logs we should know what domains give the most hits. Can you share a list of these domains which are not on the whitelist already? That would give a good indication of what might be missing before breaking it.
Oct 5 2020
Oct 1 2020
That would imply deleting things. Goal is just to identify images of paintings so that they can be linked to the correct Wikidata item.
Sep 30 2020
I had about 20 bots running and all crashed with errors like:
Sep 28 2020
Ok, I see, we only put misbehaving bots in this group.
Sep 24 2020
It's only on Commons and to reproduce you need a file with no structured data yet. First try doing an edit with "baserevid", you'll get a nasty API error (that's why I'm unable to use that).
To reproduce just do two edits close to each other from different jobs.
Sep 18 2020
Sep 16 2020
IIIF is a whole framework (the F in IIIF). Which API or API's are you planning to work on and what parts? Quite a few of them, see https://iiif.io/technical-details/#information-gathering . In the past we implemented small parts, like https://zoomviewer.toolforge.org/index.php?f=M104_ngc4594_sombrero_galaxy_hi-res.jpg&flash=no as part of T89552 (currently slightly broken). And of course https://mirador.toolforge.org/?manifest=https://wd-image-positions.toolforge.org//iiif/Q1231009/P18/manifest.json is nice too.
Sep 14 2020
Caused by this extension: https://github.com/Smile4ever/link-investigator It will investigate all the links so it will also follow the rollback lists on your watchlist causing a mass revert on all recently edited pages on your watchlist.
Sep 12 2020
Sep 7 2020
Bumping this to high because this makes longer pywikibot bot runs crash.
Aug 30 2020
I think this happens:
- Bot fires up and gets some tokens
- Bot does a lot of edits all with the same token
- For some reason after a long time the token is not valid anymore
- The bot continues to try to edit with the old invalid token instead of getting a new one
How long is a token supposed to be valid? Forever? The Pywikibot code seems to assume that it can use the same token for all edits in a one run. One run can take days or weeks. As a work around I now force the site to get a new token when I run into a problem. That seems to be stable.
I ran into this missing tables. Can the view be added please?
Aug 29 2020
I think I found what is going wrong. site.get_tokens(['csrf']) gets you a new token, but doesn't update the internal token state:
Aug 28 2020
I noticed the announcement on the maps-l list and I also noticed https://twitter.com/krmaher/status/1299203640188690434 where @Katherine-WMF replied. Someone recently mention to me "the only way to prevent WMF people from doing making stupid mistakes these days is tagging Katherine on Twitter".
Aug 26 2020
This basically broke the search for me, see https://commons.wikimedia.org/wiki/Commons:Village_pump#cirrusUserTesting=mediasearch_commons_breaks_all_my_queries,_how_to_turn_it_off? .
What is wrong with you people? Why do you unleash poorly tested junk on us without announcing it and without offering the option to opt-out?
Aug 24 2020
Technically this is already possible, the question is if we actually want to use it. Makes the data model a lot more complicated to work with.
@CBogen directly using the Commons logo? For Wikidata query service we use https://commons.wikimedia.org/wiki/File:Wikidata_Query_Service_Favicon.svg so you can easily see what tab it is, I would like to have the same for Commons.
@Xqt what is the correct way to force the bot to get a new token? Just site.get_tokens() or something else? I noticed some of my bots now stuck with 100s of errors in a row.
Aug 22 2020
Aug 16 2020
Aug 15 2020
Aug 12 2020
Aug 7 2020
This bug was filed quite some time ago, but seems it hasn't been noticed yet so tagging some people on it. It just got mentioned again on the village pump on Commons ( https://commons.wikimedia.org/wiki/Commons:Village_pump#Strange_WD_link_below_the_edit_window_of_a_file )
I checked the specs at http://www.cipa.jp/std/documents/e/DC-008-2012_E.pdf and looks like exif is at the start of the file. I wonder if you can speed up your bot by just downloading the first part of the file instead of the whole file.
Aug 6 2020
@dschwen usually your bot picks up missing coordinates ( https://commons.wikimedia.org/w/index.php?title=Special:Contributions/DschwenBot&dir=prev&offset=20200806083129&target=DschwenBot ). Any idea why it didn't do it for these example files? I assume it is because you look at the exif info from the database and not from the raw file?
I think the approach is wrong. Let's start with the first assumption. Nuke ( https://www.wikidata.org/wiki/Special:Nuke ) runs under your own account. Used it plenty of times. As an admin I have noratelimit so having noratelimit on a completely different group is not relevant. MassMessage is just an extension that runs in the background. It is a bot. It shouldn't go too fast and it shouldn't crash on ratelimits. Relevant historic bugs: T192690 and T184948 .
Aug 5 2020
Would be two edits. One for the wikitext and one for adding the structured data.
Aug 2 2020
Isn't this something new being rolled out? Would expect, beta, test, group1/2/3. So on https://test.wikipedia.org/w/api.php?action=help&modules=edit I currently see:
Jul 28 2020
Jul 26 2020
Jul 24 2020
I don't see any mention of copyright status. A lot of files on Commons don't have a license because these files are in the public domain, see for example https://commons.wikimedia.org/wiki/File:Georges_Ricard-Cordingley_(1873-1939)_-_Deep_Sea_Fishing_(morning)_-_RCIN_406333_-_Royal_Collection.jpg or https://commons.wikimedia.org/wiki/Special:ListFiles/BotMultichillT Not sure how to fit that exactly in the wording. Flickr has something similar, see https://www.flickr.com/search/?text=house going from the most restrictive to the most liberal. I like that approach.
Removed reference to T226453 . This is about Concept URI's on Commons (httpS) and not about concept URI's on Wikidata (http).
Jul 23 2020
Wait, what, didn't we have this discussion quite some time ago for Commons and decided it would be https from the start? How did the http slip in again? If I look at a not so random item https://commons.wikimedia.org/wiki/Special:EntityData/M90544172.rdf it says :
Jun 21 2020
Code to modify: https://commons.wikimedia.org/wiki/MediaWiki:Gadget-PermissionOTRS.js . Code in https://www.wikidata.org/wiki/MediaWiki:AddQuickClaim.js can probably used as a starting point.
Jun 6 2020
Nevermind, this user is blocked on two Wikipedia's, see https://meta.wikimedia.org/wiki/Special:CentralAuth?target=Lazy-restless
Jun 1 2020
May 31 2020
May 25 2020
Notes for future reference:
May 22 2020
May 16 2020
@Lokal_Profil @JeanFred any news on this? Haven't done this in a while and the documentation at https://commons.wikimedia.org/wiki/Commons:Monuments_database/Harvesting looks outdated. Implication is that you two are currently the only people who are able to add sources.
I'm just going to close this one as invalid because of lack of information and nothing happening for several years. Feel free to add more information and open it again.