Commonsist. Mostly harmless.
- User Since
- Dec 7 2014, 3:49 PM (157 w, 1 d)
- IRC Nick
- LDAP User
- MediaWiki User
Wed, Nov 29
How can we heat this up? Would kicking off a Commons proposal make any difference?
Nov 10 2017
Project page and a link for initial uploads is at https://commons.wikimedia.org/wiki/User:F%C3%A6/Project_list/Cooper-Hewitt
Oct 19 2017
A new Wikimedia Commons proposal has been created to allow for additional licenses for Data files. This would reduce the confusion about whether data imported from elsewhere needs attribution or can be redefined as CC0.
Oct 17 2017
I had not caught on that as well as templates, it's not possible to add data files to categories (unless I'm missing a way to do it). Again an unsatisfying workaround is to use Data talk pages, with a current example being the maintenance category: https://commons.wikimedia.org/wiki/Category:Data_files_with_Open_Street_Map_coordinates.
Oct 12 2017
As Data_talk pages are ordinary pages, for the example case I have raised https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Data_talk:Kuala_Lumpur_Districts.map.
Update: I may be quite wrong about readonly. Checking my upload logs, Wikimedia Commons has been reporting as read-only in response to attempted API uploads since 6:34 through to now (10:40) UK time.
Sep 14 2017
Good! However this task should not be marked as 'resolved' and the more general point that the WMF should be thinking of providing ZoomViewer facilities as part of the media viewer... or at least something that get maintained long term in the same way.
@dschwen is away, and has been for a long time. ZoomViewer should be migrated to being WMF supported. Without it, Commons is not a suitable platform for high resolution images which are now the norm for digital archives, such as high resolution scans of oil paintings.
Sep 13 2017
Sep 4 2017
Aug 20 2017
Jul 17 2017
Faebot was creating these tables using SQL, the same query across several projects. It stopped working due to time-outs after some WMF changes. To fix it I would need to break up the query so it can work within the more limited query times available. I might get around to fixing it, but it's floating in my sub-watermargin pile.
Jun 18 2017
This is indeed a resurrection of the 2 years old T121797, however that got waylaid by the same "bigger question" of creating an independent database to return general Hamming distances. If this proposal to make available image hashes (whether perception, difference or others), it has little chance of getting anywhere if we don't at least take the first step of being able to return the image hash on an API request, or database query for an image. This minimal change does not require much smart programming, nor creative design. With the hashes available, anyone can immediately search for hash matches, and if they wish to compare Hamming distance for non-matches, they can write separate scripts or tools to do it far more easily, the bit-wise difference being extremely simple. In my experiments with greater-than-zero distances, the results have much narrower potential utility, leading me to believe that this would be for analysing rather specialized collections and questions which means only having to process a constrained sample space. Simple matches, where the Hamming distance is zero, across all Commons images offers immediate benefits, namely finding duplicates and detecting copyright violations by matching new uploads against the hashes for already deleted images, rather than only doing a comparison with the SHA1 cryptic hash.
Jun 15 2017
Jun 13 2017
Jun 5 2017
It's working, thanks!
Jun 3 2017
Anyway, why we actually whitelist? Is there a Dos attack possible? Only
from our infrastructure i think but this can be solved by throttling upload
by URL actions by user. In some time maybe autoblock them.
May 31 2017
May 30 2017
I'm unclear as to why we are worried about tokens. If url upload is allowed, then the URL with a token passed as a parameter looks like:
May 24 2017
May 15 2017
May 11 2017
May 9 2017
*.esa.int works for what we know. The only images I've seen so far have been on the two specific domains listed.
May 6 2017
Apr 26 2017
I have no idea if the same person is behind this, or it's just a bit of haphazard pointy trolling, but this seems far too easy to disrupt email lists with cross-posted spam:
Example from today, directed at me.
Apr 21 2017
Yes, the inconsistency is worrying. However I'm also concerned that the recommended "fix" is slightly stupid from the GLAM uploads perspective. I am not going to tamper with perfectly okay original EXIF data, that matches the EXIF data in external archives, just because on Commons we invented an arbitrary and non-intelligent filter.
Feb 15 2017
I suggest this task is closed. There's too much push back against the task description for this to be realistic. If someone wishes to propose a new task aimed at the blacklist filter process helpfully parsing url redirects, perhaps for a limited number of iterations, and checking those against the blacklist again before rejecting a text, that would be positive.
Feb 14 2017
@Billinghurst See the discussion at https://commons.wikimedia.org/wiki/Commons:Bureaucrats%27_noticeboard#Upload_project_spam_blacklist_exception_.27right.27 which was started at the same time this task was opened.
Feb 13 2017
@matmarex This task is not about adding links to the spam-whitelist for a single GLAM upload project that has already completed. It is pointless to list individual bit.ly links, when what is requested is a generic solution in order to support and encourage "officially agreed" batch upload projects.
@Billinghurst that's the point of this task, to avoid volunteers like me having to write ever extending amounts of code to by-pass blacklists. We have the same problem with filename blacklists. I already have around 10 types of error trap in my upload process, I see little benefit in creating my own unique parser for all metadata fields on an GLAM import when the results post absolutely no risk whatsoever to Wikimedia Commons or our reusers and readers.
Feb 12 2017
With regard to later edit, my edit today to https://commons.wikimedia.org/w/index.php?title=File:Marcha_das_Mulheres_Negras_(23137414611).jpg&action=history would have been impossible as the text contained a bit.ly link. So, I dispute impossible.
Feb 1 2017
As Pharos and his folks are working to a deadline, I heartily endorse white-listing the site. This is no-risk as Pharos can ensure that test runs prove to everyone's confidence that licensing has been addressed and that the metadata is nicely handled, including credit templates and links to the license evidence, even if that requires an OTRS ticket.
Jan 31 2017
Jan 30 2017
Nudge. Why has this taken over 10 weeks?
Jan 23 2017
Uploads will be from https://finds.org.uk/, will *.finds.org.uk cater for that?
Jan 20 2017
The licence is CC-BY as stated at https://finds.org.uk/info/termsandconditions.
Nov 29 2016
Nov 26 2016
I mentioned BotPasswords as my understanding of https://www.mediawiki.org/wiki/Manual:Pywikibot/BotPasswords, was that it's just a password like any other, hence only as secure as the conventional login. However OAuth uses access tokens which provides an additional level of security. If the WMF is recommending that sysop accounts use 2FA, then my presumption would be that BotPasswords should be avoided on bot accounts with sysop rights for the same reasons.
Nov 24 2016
Thanks, I was unsure if those were the response. I do not see any moves to ensure the advice to administrators is enforced. I doubt there will be any new policies until the analysis itself.
Nudge It's coming up to 2 weeks since the OurMine hack became public knowledge. Please at least issue an interim analysis, I'm sure there is a good understanding of what happened and how. In the absence of any official analysis, volunteers are working on assumptions right now, such as whether administrators using longer passwords is sufficient protection and whether BotPasswords is okay to use on bot accounts with sysop rights, rather than OAuth.
Nov 22 2016
Nov 16 2016
It may be that publishing dates of password changes would be more than can be queried from the public database, however a table of admins showing which had adopted 2FA is the type of thing that I would struggle to imagine as any significant extra risk and has good value as part of the community agreeing new policies for trusted accounts. In terms of targeting, this is probably a lot less significant than sharing user_properties or analysing edit patterns, which are available to anyone.
Nov 15 2016
After digging into login.py, I'm wondering if we should be recommending that BotPasswords is avoided. Authorizing OAuth for user accounts wanting to simply run scripts for themselves, rather than making apps for others, happens automatically without needing any human approvals. For Pywikibot, once the user has their credentials, they simply add paste them into their local user-config.py, no other steps needed to get them to work.
Nov 14 2016
Nov 13 2016
Oct 29 2016
Sep 30 2016
More examples from different sources:
Sep 16 2016
Jul 13 2016
This is due to the NYPL uploads, which are coming to an end, though maybe with some final runs and housekeeping. As it happens I've started a wikibreak and packing to travel away on holiday, so my activity drops to zero from this weekend when I'll be forced by my husband to stay AFK until I get back, probably from 25 July. As you've raised this ticket, it's pushed my hand to drop any idea of maintaining remote access. :-)
Jul 3 2016
Jun 24 2016
This was a major headache for my NYPL uploads, due to being unable to get ['exists-normalized'] ignored. This bug is a real bear trap for batch uploads and is quite unobvious to debug.
Jun 22 2016
With respect to the photolib, I say yes let's add it to the whitelist too. The images are described as /mostly/ public domain (i.e. NOAA) however some have restrictions. So far I have only noticed where there are credits to donors. Anyway, the concern to be careful on checking copyright is that of a future batch uploader to investigate, which may well end up being me in a few months.
Jun 20 2016
Jun 13 2016
Jun 11 2016
Jun 6 2016
May 23 2016
Apr 8 2016
@AWossink Thanks for the reply. Even for a modest 500-ish images, as the intention is for an ongoing partnership, please consider running an on-Commons project page to help with future feedback and to help working collegiately with Commonsists. There are many examples under https://commons.wikimedia.org/wiki/Commons:Batch_uploading or if you have several partnerships you want to collect together and maintain under your own area, like I do with https://commons.wikimedia.org/wiki/User:Fæ/Project_list. Drop a note on the GLAMtools email list if you set something up.
Apr 5 2016
Is there a project page that explains why the chapter website is being used for training?