Jun 21 2021
In fact, this is not a problem, see https://phabricator.wikimedia.org/T222985#7164507
OK, so it seems the problem is in pbzip2. It is not able to decompress in parallel unless compression was made with pbzip2, too. But lbzip2 can decompress all of them in parallel.
Jun 20 2021
Are you saying that existing wikidata json dumps can be decompressed in parallel if using lbzip2, but not pbzip2?
I am realizing that maybe the problem is just that bzip2 compression is not multistream but singlestream. Moreover, using newer compression algorithms like zstd might decrease decompression speed even further, removing the need for multiple files altogether. See https://phabricator.wikimedia.org/T222985#7163885
As a reference see also this discussion.
Apr 28 2021
Are you sure lastrevid works like that for the whole dump? I think that dump is made from multiple shards, so it might be that lastrevid is not consistent across all items?
Apr 4 2021
Thank you for redirecting me to this issue. As I mentioned in T278204 my main motivation is in fact not downloading in parallel, but processing in parallel. Just decompressing that large file takes half a day on my machine. If I can instead use 12 machines on 12 splits, for example, I can do that decompression (or some other processing) in one hour instead.
Mar 24 2021
To me the most important thing is that I can have an uniform way to obtain a thumbnail for all content on commons, so that I can make a browsing interface using the API without special cases. What is returned is of secondary interest to me.
I do not know. To some icon representing tabular data? Similar to how ogg files have it?
Mar 23 2021
I realized I have exactly the same need as poster on StackOveflow: get a dump and then using real-time feed to keep it updated. But you have to know where to start with the real-time feed through EventStreams, using historical consumption to resume from the point the dump wasmade.
Mar 21 2021
I see that API does return the modified field: https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q1
Personally, I would love to have for each item in the dump a timestamp when it was created and a timestamp when it was last modified.
May 8 2017
I would still prefer just getting rid of manual reauthentications though.
You do not have to update database on all API requests, only on calls to authenticate and related OAuth calls (or are you calling OAuth calls API requests)? In any way, I do not mind having no expiration time, this is just if somebody really wants some dialogs. I am for less of them anyway. :-)
May 2 2017
I still don't understand what abuses are here? I mean, the rest f the world uses OAuth with "authentication". Why they do not have to worry about this type of abuse, or what is a difference in trade-offs and threat models?
Jun 26 2016
Also, here I am trying to address a general case, not my specific case.
But in some cases it is clear that it was deleted not because it contains sensitive information. We know why things get deleted based on various rules references. Why it could not be available based on those?
Apr 10 2015
Security is always a trade-off. Usability, security, attention, habituation, all those things. The issue is that even with asking the user for permission every time you can introduce security issues. Maybe it makes you feel better because it becomes user's responsibility, but it is also a problem of you designing such a system.
Mar 11 2015
Mar 1 2015
Feb 6 2015
Hm, why only permissionless apps? Other big sites which use OAuth do not make this distinction? Try GitHub, you get asked for permissions the first time, but then you just get redirected. You can also make the token expire after some time and then they have to reauthorize. But to have to authorize every time if they are any permissions is also not good security practice. Security research is showing that the danger is user habituation to the prompt. Then they will stop reading it. And next time an evil developer will change permissions requested, they will just click OK. Habituation to prompts are a big problem. So it is better to make a prompt only when there is something important. Like permissions being asked change. Or token expired. Or there was some other security issue (login happened from strange IP, geoIP is showing multiple connections from multiple continents, etc.).
I think there are two things here. One is authenticate-only permissions. But the other is confusion with Mediawiki OAuth between "authorize" OAuth flow and "authenticate" OAuth flow. In "authorize" user should be prompted with a dialog to confirm the permissions. But in "authenticate" they should not be if they have confirmed the permissions in the past, and the same permissions are being requested. Then user is just redirected through the flow, which makes it feel like they stayed on the original site the whole time.
Dec 15 2014
Was this now deployed or reverted back? Was schema applied?
Dec 13 2014
Latest socket.io-client is currently 1.2.1. So 0.9.17 is quite old. :-)