Wed, Feb 5
I would like to split this apart into distinct issues:
Tue, Jan 28
Just now get to answer this. Will have a look this week and get back to you.
Dec 11 2019
Sorry for not responding this earlier: Yes, most of these are known, but this overview and examples are certainly helpful. See also: https://github.com/wikiwho/WhoColor/issues
And the assessment of this having to do with the regexes used is also correct.
In regards to fixing it: We do not have the (wo)manpower on our side at this time to fix this. You are certainly invited to become a collaborator (or submit pull requests) on the github repo and we can deploy these changes, if they were tested beforehand, i.e. check if the parsing of the HTML breaks (more) for whatever fixes are administered. So I do not see why your team could not make and test these changes or not estimate it, but maybe I'm missing something.
Jul 16 2019
Sorry for the ambiguous wording. I meant if you are at revision 5 of an article with 10 revisions, and looking at a token that existed in revision 5 (e.g. with the call https://api.wikiwho.net/en/api/v1.0.0-beta/rev_content/OUR_DUMMY_ARTICLE/5/?......), and if that token has been deleted in, say, revision 8, then you would already see that in its "out" list, although you are at revision 5 currently.
Jul 11 2019
@Mooeypoo , is the memory error in the output still occurring in your testing? We tried a fix.
@ifried Hey Ifried, nice to meet you and good to hear :)
Jun 26 2019
Sorry for the late reply. Valid point regarding the memory, will look into
Apr 9 2019
@Niharika Alright. I won't be at the Hackathon unfortunately. But I'm in San Francisco in the week of May 13-17, so if you guys are based at the SF offices, I could simply swing by to talk about what is already there and what we could provide. Not strictly necessary, but might be helpful.
Apr 3 2019
@Niharika what is the current status of this project, do you need any input? Do you need the token ids in the output as you requested?
Feb 19 2019
Now that I had time to look at it: the
in the extended HTML is *not* actually the WikiWho token ID, but simply a positional index for the token for that revision. (I do admit that we have to update the documentation to that effect...)
The WhoColor userscript goes through the extended HTML in the conflict and age views and retrieves the respective conflict and age scores by order from the list in the inline model 2.
(the "class name" is simply the user id - or an ad-hoc user hash for IPs - but I guess you figured that out already)
Feb 15 2019
Sorry, I didn't see the notification for that message...
Jan 20 2019
Jan 16 2019
The annotations per token that the WikiWho APIs produce are always for the wiki markup, including all tokens, also those in tables, references. I does *not* expand templates or anything transcluded, which means that the content of those elements is not annotated for now, only the wiki markup that they are called with. That does not, therefore, pertain to tables, references and infoboxes in general, as long as nothing is transcluded. I.e. that is the first source of "error" in the sense that the API has simply no annotations for the transcluded content. (It could be added of course, but that would mean a couple more steps, including processing all templates. In practice I would rather add some nice-looking HTML in the frontend that says something along the lines of "could not color this template". )
Jan 7 2019
Hi, nice to see this is getting traction again, a proper browser extension/better interface/cleaner highlighting would be great!
Oct 9 2018
Of course checksums make lot of of sense for countless use cases, including many in research (mentioned paper was never intended to make a sweeping point to the contrary, but yes, discussion for another time).
And I think MCR is awesome, JFTR.
Apr 21 2018
Hi, I can only comment on how we implemented it for api.wikiwho.net, but these are good points in general as well:
- revisions/"when" vs. "who" : Regarding the actual implementation, you get the rev-ids of the origin and change revisions for a token and then fetch the meta-info for that revision in a second step, such as the editor and timestamp. So like you say, it is not "who" in the first instance, that is just derived.
- whitespace: we split the text into tokens at the whitespace (and other special chars), so you would not attribute changes where someone just adds/removes whitespace (i.e., whitespaces are not tokens) without altering other text pieces. Alas, if someone would split a word into two via a whitespace (or -conversely - concatenate) , we would attribute the "new" tokens to that editor. If we talk about "formatting" in a wiki-markup sense, one would probably have to ex-post filter those changes that touched "cosmetic" markup like section headers or hrules, which is doable, but more tricky. Or simply run the whole thing in parallel on the parsed, front-end text with formatting ignored.
Jan 17 2018
Can you tell us why not all languages can be added to the service? Technically speaking, why is the tool dependent on which language is it being used for?
Jan 13 2018
Hi, as an author of WikiWho/ WhoColor:
- Great that this is being picked up, I would be happy to be of assistance
- "Note that unlike bisecting, blaming - essentially, content persistence - is a complex problem, which has been the topic of several research projects in the past. Trying to write a new tool from scratch is probably not a good idea" --> yup, and we have evaluated WikiWho in that regard, showing high accuracy especially also for longer, more complex revision histories, although only for English so far (see the paper)
- Regarding speed: we are processing the EventStreams of several languages on the fly, that is not an issue. We just don't have any caching layer for the materialized json yet, but that is on the to do list. For the mid-term future (2-3 years), the upkeep and further development of the service is secured at GESIS (my employer) and also the extension to more languages (although maybe not all). But for the long term I also think hosting it at the WMF might make more sense.
Aug 2 2017
Seems to work - thanks :)
Jul 31 2017
any news on this?
Mar 30 2017
so, is there a solution? sounds like a quick fix would be replacing the name in the paws record. I would also be fine with my user being deleted from paws and to then do a new first login (I don't have anything important on there yet).
Mar 29 2017
Mar 2 2016
So I'm not certain how "demand" is usually measured, but based on discussions I witnessed, and my understanding of the challenges Wikipedia faces regarding quality, I would concur with @Qgil that content curation and bringing in more eyes to spot errors is something that is certainly helpful for the editor community.
Nov 3 2015
Oct 4 2015
hi, what i certainly can do is to provide mentoring needed regarding the implementation/extension of wikiwho in this setting and some general feedback
Jun 4 2015
hi, I just now became aware of this thread. Several things: (i) a collaborator just recently updated wikiwho to run with the newest version of the wikimedia-utilities and python3 (see https://github.com/maribelacosta/wikiwho/tree/python3 ). (ii) Be aware that wikiwho is the *only* solution (apart from Luca de Alfaro's A3 algo that we evaluated as well) that was soundly tested for the accuracy of the provenance attribution and that the solution is not trivial in many instances. Or I'm not aware of accuracy testing that might have been done so far for other approaches. This can be critical if used in a real editing scenario by an end-user. (iii) Not so related but maybe helpful: our API is already giving authorship information and you can use that too; example: wikiwho.net/wikiwho/wikiwho_api_api.py?revid=649876382&name=Laura_Bush&format=json¶ms=author. (iv) if anyone needs input or has feature requests we are happy to respond to them. it's just that the whole phabricator line of communication was not on my radar, but I will have an eye on it. Else, drop me a talk page message or an email to email@example.com or at github