Pywikibot, Wikidata, i18n, GLAM stuff
Sat, May 8
Fri, May 7
Tue, May 4
Thu, Apr 29
As a work around on pages like https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Old_European_art_missing_genre/Sweden where most thumbnails don't work:
wget -E -H -k -K -p https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Old_European_art_missing_genre/Sweden
Mar 20 2021
File names are bad URI's. Files get renamed all the time (see https://commons.wikimedia.org/w/index.php?title=Special:Log&offset=&limit=500&type=move ) causing all sorts of breakage. The pageid stays the same so the mediaid also stays the same. That's a much more stable identifier.
Mar 17 2021
Mar 7 2021
Mar 3 2021
Mar 2 2021
Why are we not using the description field for this? Seems more sensible to me than creating new properties
The URI for the image is https://commons.wikimedia.org/entity/M6919529 (yes, https, not http, that got messed up, see T258590). You can see an RDF representation of that at https://commons.wikimedia.org/entity/M6919529.rdf . We should someone include this URI on Wikidata too. The royal way would be to update the image data type which accepts the Mediainfo ID and does all the logic like the current image data type.
Mar 1 2021
Feb 27 2021
We now have more than 1 million files with location of creation on Commons, see https://commons.wikimedia.org/w/index.php?search=haswbstatement%3AP1071&title=Special%3ASearch
Feb 25 2021
Feb 21 2021
If it aint' broken, don't fix it? Let's just see what happens and if anything explodes, than focus on fixing that.
Feb 13 2021
Wrote https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Automated_image_uploads and shared the link by email some time ago.
Feb 9 2021
Feb 8 2021
This change is subject to the https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy . Please complete the steps listed there.
Feb 2 2021
I obviously like getting more structured data and better usage, but I see some issues with this task: The scope seems to be very broad. Is this intentional? What kind of time investment is expected? Might have a higher chance of making a difference when it's more tightly scoped. We have several steps in the process and possible tasks related to it:
- Convert existing data from wikitext to structured data. This is already happening on a small subset, but on a large scale. Not controversial as long as you don't remove any data. Still a ton of work to do here, but quite a few things are not clear yet on how to model things. Building some kind of workflow to extract knowledge from the category tree and show it to users for approval would be extremely useful.
- Show the structured data in a pretty way. You already mentioned some examples, https://commons.wikimedia.org/wiki/Module:Artwork and https://commons.wikimedia.org/wiki/Module:Information . The current approach here is to do incremental improvements without changing the look and view from what it used to be. You can also take a radical different approach like skins: You make another implementation for the same data with a completely different look and feel. With some logic logged-in users can enable this new template skin. That way you can experiment quite freely without disturbing the Monobook people. This task would be fun for someone who is on the edge of development and design.
Jan 21 2021
Reasoning why the convention is suddenly enforced are missing. Instructions on how to fix are also missing.
Jan 18 2021
Jan 17 2021
Jan 14 2021
Jan 6 2021
Just had it again:
pywikibot.data.api.APIMWException: internal_api_error_JobQueueError: [X-VEtQpAIDkAAHaGUqkAAADW] Caught exception of type JobQueueError
Jan 5 2021
Ha, right after I posted that my bot crashed twice. Now with internal API errors:
One of my robots ran non-stop for the last week so looks like it's not happening at the moment. You got to love intermittent problems.....
Jan 1 2021
Dec 29 2020
Thanks Sam, can you do a per user per year breakdown? I expect a small number of users with a lot of uploads.
I've observed quite a lot of inconsistencies over the past two weeks. I haven't looked very extensively at it, but I'm getting the impression that blocks of edits are missed and the timestamps are around a spike at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&viewPanel=8 . Order of magnitude is several 100s of edits.
Dec 21 2020
Dec 18 2020
Dec 16 2020
Dec 15 2020
Tested on production and works as expected. See for example https://commons.wikimedia.org/w/index.php?title=File:John_Quinton_Pringle_(1864-1925)_-_Poultry_Yard,_Gartcosh_-_GMA_37_-_National_Galleries_of_Scotland.jpg&action=edit&oldid=518956340
Dec 5 2020
Nov 28 2020
Nov 24 2020
Hi folks, happy to see activity on this task. Current code at https://github.com/multichill/toollabs/blob/master/bot/wikidata/find_painting_images.py#L704
Nov 13 2020
@Cparle @matthiasmullie I noticed the change was merged. Do you have a pointer where the actual mappings are stored? See https://github.com/multichill/toollabs/blob/master/bot/commons/own_work_sdoc.py#L66 for a list of mappings that I would expect (minus the variants).
Nov 7 2020
Thanks for adding screen. Can you add /usr/bin/mysql to the whitelist too? Just the client part. Mariadb server will auto disconnect any long open sessions and the client will just reconnect when a session is needed again.
Nov 1 2020
I just noticed this also breaks https://commons.wikimedia.org/wiki/Special:MediaSearch if you sort it by "recency".
Oct 29 2020
I looked around in old bugs and found T129046 . I think it went like this:
Oct 27 2020
Very confusing indeed. I updated https://www.wikidata.org/wiki/MediaWiki:Protect-text
Still means the wording of the protect page is incorrect. Currently it says "all users" and "allow only administrators". The "all users" is not correct because changing the protection to that won't make it possible for all users to edit, only for autoconfirmed users and above. I see we have T266394 for that
Oct 25 2020
Forked the Commons part in T266407 and left this one for the Wikidata part.
Oct 22 2020
Thanks for your explanation Gilles. You can see the bug in action at https://commons.wikimedia.org/wiki/Special:NewFiles
Oct 21 2020
@Gilles how did you come up with the number 50? The standard number of thumbnails in a Commons category is 200 and it's very common for galleries to have more thumbnails than that. I've hit this limit many times over the last couple of months. This is a source of much annoyance.
Not sure if this is related: https://commons.wikimedia.org/wiki/Category:Creative_Commons_Attribution-Share_Alike_4.0_International_missing_SDC_copyright_license is empty now, but https://commons.wikimedia.org/w/index.php?sort=last_edit_desc&search=incategory%3ACreative_Commons_Attribution-Share_Alike_4.0_International_missing_SDC_copyright_license&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns6=1&ns9=1&ns12=1&ns14=1&ns100=1&ns106=1 returns over 400.000 hits.
This was just a task for one of the in person events (feels like ages ago!). Online discussion continued on https://commons.wikimedia.org/wiki/Commons:Structured_data/Modeling
Oct 18 2020
Oct 17 2020
Oct 13 2020
Stepping back a bit. Wbeditentity should work more like the normal edit (action=edit) with things like how to handle edit conflicts and also minor edits.
Oct 11 2020
I just noticed this mid air collision: https://commons.wikimedia.org/w/index.php?title=File%3ANSG_Salmorth_PM19-09.jpg&type=revision&diff=485124008&oldid=485009736 .
If I understand Adman correctly, adding the option to use baserevid won't solve this. Using wbsetclaim isn't really an option because edits like this would take 8 edits instead of one. Maybe introduce baserevid and some kind of option for the bot to indicate that it wants strict checking instead of a warning?
Oct 10 2020
Raising this to high because more than half the files on Commons have structured data now so the chance of running into this is much higher.
The easiest solution is probably to just show the wikitext in view mode only like when you try to edit a protected page, but you don't have the right to do so. That way we can at least access the wikitext and copy it.
http://www.wikidata.org/entity/Q269728 is the entity uri (not https://www.wikidata.org/wiki/Q269728), http://www.wikidata.org/entity/Q269728.json redirects to https://www.wikidata.org/wiki/Special:EntityData/Q269728.json which returns the json. Marking this one as resolved.
Oct 6 2020
@CDanis based on the webserver logs we should know what domains give the most hits. Can you share a list of these domains which are not on the whitelist already? That would give a good indication of what might be missing before breaking it.
Oct 5 2020
Oct 1 2020
That would imply deleting things. Goal is just to identify images of paintings so that they can be linked to the correct Wikidata item.
Sep 30 2020
I had about 20 bots running and all crashed with errors like:
Sep 28 2020
Ok, I see, we only put misbehaving bots in this group.
Sep 24 2020
It's only on Commons and to reproduce you need a file with no structured data yet. First try doing an edit with "baserevid", you'll get a nasty API error (that's why I'm unable to use that).
To reproduce just do two edits close to each other from different jobs.
Sep 18 2020
Sep 16 2020
IIIF is a whole framework (the F in IIIF). Which API or API's are you planning to work on and what parts? Quite a few of them, see https://iiif.io/technical-details/#information-gathering . In the past we implemented small parts, like https://zoomviewer.toolforge.org/index.php?f=M104_ngc4594_sombrero_galaxy_hi-res.jpg&flash=no as part of T89552 (currently slightly broken). And of course https://mirador.toolforge.org/?manifest=https://wd-image-positions.toolforge.org//iiif/Q1231009/P18/manifest.json is nice too.
Sep 14 2020
Caused by this extension: https://github.com/Smile4ever/link-investigator It will investigate all the links so it will also follow the rollback lists on your watchlist causing a mass revert on all recently edited pages on your watchlist.
Sep 12 2020
Sep 7 2020
Bumping this to high because this makes longer pywikibot bot runs crash.
Aug 30 2020
I think this happens:
- Bot fires up and gets some tokens
- Bot does a lot of edits all with the same token
- For some reason after a long time the token is not valid anymore
- The bot continues to try to edit with the old invalid token instead of getting a new one
How long is a token supposed to be valid? Forever? The Pywikibot code seems to assume that it can use the same token for all edits in a one run. One run can take days or weeks. As a work around I now force the site to get a new token when I run into a problem. That seems to be stable.
I ran into this missing tables. Can the view be added please?
Aug 29 2020
I think I found what is going wrong. site.get_tokens(['csrf']) gets you a new token, but doesn't update the internal token state:
Aug 28 2020
I noticed the announcement on the maps-l list and I also noticed https://twitter.com/krmaher/status/1299203640188690434 where @Katherine-WMF replied. Someone recently mention to me "the only way to prevent WMF people from doing making stupid mistakes these days is tagging Katherine on Twitter".