Page MenuHomePhabricator

Tag all edits by editor used (WikiEditor, API, etc), not just VE
Open, Needs TriagePublic

Description

Having voluntary/manual tags such as envisioned in T185382: Add manual tag- messages to Wikimedia Messages for common tools/gadgets would be nice, but could we please have server-side tags for edits made with the API? This could be handy for identifying new tools, unflagged bots, etc. (Maybe even for finding some promising volunteer devs. :-)

My goal is: When I go to Special:RecentChanges, I want to be able to see which edits were not made through the traditional methods of a human clicking an edit button, typing something, and saving the changes. And I want this to be the case even if – perhaps especially if – the person using the API would rather that editors didn't notice that he's running an unauthorized spam bot, and therefore would not be willing to add a manual tag.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2018, 8:12 PM

API can also be used from the browser session for gadgets or user scripts.
I would say that human are not happy to get these edits marked as automated.

Using server tags for new features like the new wikitext editor seems okay for me, but not to split the edits in two classes.

Reedy added a subscriber: Reedy.EditedFeb 27 2018, 8:45 PM

I'm guessing.. We don't want this to be a MW core "feature"... But something we could do with a hook in CommonSettings.php

We did something similar for HHVM edits, but I don't know where that was implemented... But it did result in a lot of backlash, such as T75181 and many of the comments on https://www.mediawiki.org/wiki/Talk:HHVM

Legoktm closed this task as Declined.Feb 27 2018, 8:51 PM
Legoktm added a subscriber: Legoktm.

This would be overkill, and the proportion of edits made via the API would vastly outnumber index.php edits to the point it would be useless. For example nearly every edit made on Wikidata is via the API. Most edits on Commons are through gadgets and scripts that use the API.

Anomie added a subscriber: Anomie.Feb 27 2018, 9:22 PM

It might also make the change_tag database table excessively large, particularly before T185355: Normalize change tag schema gets done.

Also note that every edit made with VE goes through the API, as do most Flow posts, although neither goes through the action=edit action as far as I know. But WikiLove for example actually does go through action=edit internally.

And if spambots runners found that they were being caught by this (and cared), they could turn to screen-scraping the web UI edit form. Some may be doing that anyway, if they're old or if they're not MediaWiki-specific.

@Legoktm, if you think that tagging "non-traditional" edits would tag too many edits, then why don't we turn this around and tag the "normal" ones? Both modes of VisualEditor are already being tagged, so it wouldn't be that much of an expansion. The two modes of VisualEditor (the visual mode and the 2017 wikitext mode) appear to account for more than half of non-automated, non-script-based mainspace edits on most of the larger Wikipedias.

@Anomie, https://en.wikipedia.org/wiki/User:Whatamidoing_(WMF)/sandbox?action=edit opens VisualEditor's 2017 wikitext mode for me. What that link does depends upon your preference settings. (I don't mind having this blocked by T185355, if that makes the sites less likely to break. I also wouldn't mind turning it off at Wikidata.)

In relation to https://www.mediawiki.org/wiki/Contributors/Projects/Editing_performance I'm looking at some numbers that indicate that in some larger Wikipedias, more than half of the edits aren't using any of the regular editing environments. There is no "speed for opening the wikitext editing window" in Twinkle, so we have to drop those edits. But experienced editors have no way to see, for themselves, just how many edits are being made without using scripts like this. They shouldn't have to rely on someone with database access to find out how edits at their wikis are being made. They also need to know just how heavily their communities depend on those tools.

In any given sample of Special:RecentChanges, there is no way for normal editors to see what's "traditional" and what's "script-based". If you are a normal, non-technical editor, you are going to see "stuff that's tagged for VE" and "stuff that's not tagged". If you want to know how many of those edits are script-based, you have to have a friend with database access. That's not fair, it's not transparent, and it's not providing equal information to everyone.

Real example: An editor at dewiki has made more 2,000 edits so far today. This one editor has made about a quarter of all of the mainspace edits in the last two hours. There are no tags or other information on those edits about the script being used. There is no way for a normal editor to discover that these edits are script-based at all, by looking at the edits. The fact that he's using a script should be available, transparently, to everyone. It should not be something that only people with access to special knowledge/knowledgeable people can find out.

Real example: An editor at dewiki has made more 2,000 edits so far today. This one editor has made about a quarter of all of the mainspace edits in the last two hours. There are no tags or other information on those edits about the script being used. There is no way for a normal editor to discover that these edits are script-based at all, by looking at the edits. The fact that he's using a script should be available, transparently, to everyone. It should not be something that only people with access to special knowledge/knowledgeable people can find out.

How is an "API" tag, that would be present on a huge proportion of all edits, a useful answer to that? What the community probably actually wants to know, assuming they care in the first place, is what script is being used rather than just "OMGAPI!".

Reedy added a comment.Feb 28 2018, 8:31 PM

Can someone pull some stats from the API? What user-agents etc are making the most edits?

Then we can contact the authors/maintainers and look at getting them to add tags?

(Yes, I know I still need to make a release of AWB with the tagging enabled)

Reedy added a comment.Feb 28 2018, 8:56 PM

Can someone pull some stats from the API? What user-agents etc are making the most edits?
Then we can contact the authors/maintainers and look at getting them to add tags?
(Yes, I know I still need to make a release of AWB with the tagging enabled)

Actually. I should fork that into a separate task

How is an "API" tag, that would be present on a huge proportion of all edits, a useful answer to that? What the community probably actually wants to know, assuming they care in the first place, is what script is being used rather than just "OMGAPI!".

Identifying which script would be lovely for the major scripts, but probably not as useful as just "yes, it's a script" for others. For example, I use https://en.wikipedia.org/wiki/User:Kephir/gadgets/rater.js in my real-wiki-life. It probably accounts for ~1% of all (article-associated) talk page edits at enwiki, and approximately 0% of edits on any other wiki. I think there's value to other editors in knowing that these aren't normal edits – even if I fork the script and remove the edit summary. I think there's no need to have a separate entry in Special:Tags for a smaller tool.

Part of the value is knowing that there is more to maintaining the wikis than manual editing. The mere fact that millions of edits happen without the editor ever seeing a normal edit window is precisely what some core community members need to be learning.

At the risk of an unpalatable analogy: When you try to divide up money for medical research, rare diseases never look like they're worth any investment. Why would you spend any money on something that affects 0.001% of people? The answer is: Because there are so many of them (about seven thousand, but it depends how you count). About 10% of Americans have a rare disease, so a fair division would mean that the US government should spend 10% of its research effort on rare diseases.

I'm basically looking for a method that doesn't overlook or exclude these "rare" editors. Authorized bots (outside of Wikidata) are already flagged. Popular scripts such as AWB, Twinkle, and HotCat can have their own big-script tags. What can you do to identify that the rest of our tools exist?

Anomie added a comment.Mar 1 2018, 8:22 PM

I think there's value to other editors in knowing that these aren't normal edits

Do other editors think there's value in it? Or might they think this tag is clutter like the "hhvm" tag was?

Part of the value is knowing that there is more to maintaining the wikis than manual editing. The mere fact that millions of edits happen without the editor ever seeing a normal edit window is precisely what some core community members need to be learning.

We don't need a tag to tell us that millions of edits happen via the API. We already know that qualitatively, and Analytics probably has better tools to report quantitative numbers.

I'm basically looking for a method that doesn't overlook or exclude these "rare" editors. Authorized bots (outside of Wikidata) are already flagged. Popular scripts such as AWB, Twinkle, and HotCat can have their own big-script tags. What can you do to identify that the rest of our tools exist?

To really show that "rare editors" exist, you'd either want a tag that says "api edit not by a flagged bot or popular script" or you'd need a tool that takes the "all API edits" firehose and filters out the bots and popular scripts. I wonder whether Analytics could do that better too by filtering on the user agent.

Yes, I note that neither of those will show which specific edits were made via the API, but that's not one of the use cases suggested in T188433#4012411.

I remember the HHVM thing, and I remember it being rather – small potatoes. A couple of people wondered where it came from, were told, and were satisfied. One or two wanted better (i.e., intelligible to non-technical people) documentation. And that was about it.

We don't need a tag to tell us that millions of edits happen via the API. We already know that...

That's the problem: You know this. I know this. Normal editors do not know this. There is absolutely no way for a normal editor to figure out that an innocent-looking edit is an unflagged bot. There is no way for a normal editor to figure out that I edited the talk page ratings but never saw the page, and therefore could never have read any of the comments on it.

The label could say something like "API" or "script", and link to a page that says "BTW, this excludes anything tagged with the following". But it could also include everything that doesn't seem to be a manual edit, and just double-tag things that happen to be both AWB (a voluntary, manual tag) and script-based (an involuntary, server-based tag).

Anomie added a comment.Mar 2 2018, 6:58 PM

We don't need a tag to tell us that millions of edits happen via the API. We already know that...

That's the problem: You know this. I know this. Normal editors do not know this.

True. Although I'm not sure spamming a tag is a good way to tell them. Getting Analytics to put together useful stats and presenting them in a blog post that you then publicize (to mailing lists, on Facebook, in the Tech News, and/or in pages like enwiki's Signpost or dewiki's Kurier) might work better.

There is absolutely no way for a normal editor to figure out that an innocent-looking edit is an unflagged bot.

Also true. And also very different from telling normal editors that there are lots of API edits.

And an "API" tag isn't going to tell the normal editor that it's an unflagged bot, versus an edit by a human using a script or tool, versus an edit by a flagged bot that didn't apply the "bot" flag (unless they recognize the username, or look up the groups the user is in).

There is no way for a normal editor to figure out that I edited the talk page ratings but never saw the page, and therefore could never have read any of the comments on it.

An "API" tag wouldn't tell them whether you did or did not see the page. In fact, it's pretty likely that a script for editing talk page ratings via the API was specifically created to allow people to edit them while viewing the article instead of having to open the talk page.

Nor would the absence of an "API" tag, for that matter. You might have gone directly to the talk page and edited the ratings without ever looking at the article.

Tgr added a subscriber: Tgr.Mar 5 2018, 11:30 PM

And if spambots runners found that they were being caught by this (and cared), they could turn to screen-scraping the web UI edit form. Some may be doing that anyway, if they're old or if they're not MediaWiki-specific.

Haven't looked at edits so far but on registration pretty much all spambots use the web UI (I'd guess because spambots tend to be generic and it's a lot easier to write code for handling a wide range of web forms than a wide range of APIs).

We don't need a tag to tell us that millions of edits happen via the API. We already know that...

That's the problem: You know this. I know this. Normal editors do not know this.

True. Although I'm not sure spamming a tag is a good way to tell them. Getting Analytics to put together useful stats and presenting them in a blog post that you then publicize (to mailing lists, on Facebook, in the Tech News, and/or in pages like enwiki's Signpost or dewiki's Kurier) might work better.

That sounds like a nice little project, but a one-time blog post doesn't provide ongoing education. It doesn't help the newbie who joins the week/month/year/decade after the blog post. It doesn't help the ~99% of editors who don't read any mailing lists (a group that usually includes me), aren't on Facebook (me again), don't read Tech News (probably 99% of users, unfortunately), and don't read enwiki's Signpost (Number of page views in the last month: 6400. Number of active users at enwiki in the last month: 140,000).

Special:RecentChanges probably gets a million times more page views (across all the projects) than any blog post, and it will do so approximately forever. Information about what is going on with RecentChanges should be in RecentChanges (or linked to it).

Neil_P._Quinn_WMF added a comment.EditedMar 6 2018, 7:52 AM

We actually have a very strong need for better edit tagging for analysis purposes. For example, right now we can't effectively distinguish between edits made with the 2010 wikitext editor and edits made with non-bot editing tools which use the API. This is a problem, because when evaluating the need for and effects of interface changes, we obviously want to exclude edits made from interfaces which we don't control.

@Whatamidoing-WMF, this analysis-focused need seems pretty different from the editing-focused need you're discussing here: I would be satisfied by data that's accessible to analysts but invisible to average editors, so I'm planning to create a separate task for my request. Does that make sense to you?

That's fine, Neil. I'll be interested in seeing your (cleaned-up, vetted, and representative) data when you get it.

Esanders reopened this task as Open.Apr 2 2018, 9:47 PM
Esanders added a subscriber: Esanders.

Lots of interesting discussion still going on.

I agree with @Whatamidoing-WMF that there is a big visibility difference between recent changes, and blogs/newsletters. It seems odd that we tag VE, but not WikiEditor, or any API-based editors. If it is undesirable to have tags on every edit in RC, then perhaps we can hide all editor tags by default, and have an option to show them? At the moment, because we are only tagging VE, a lot of users will assume that the 90% of untagged edits are all WikiEditor.

As for users using WikiEditor in a bot-like way (e.g. with a user script), we could probably detect 99% of cases by looking for a lack of real input events.

Esanders renamed this task from Please tag all edits made via the API (on the server side) to Tag all edits by editor used (WikiEditor, API, etc), not just VE.Apr 2 2018, 9:48 PM

@Esanders renamed this task from Please tag all edits made via the API (on the server side) to Tag all edits by editor used (WikiEditor, API, etc), not just VE.

The API is not an "editor".

If you want to tag edits made by "WikiEditor", you can do that without spamming an "OMGAPI" tag on every API edit.

As I said earlier, I see no point in trying to tag "API" edits when that could mean VE, WikiLove, AWB, Huggle, Twinkle, various pywikibot-using tools, or a million other things.

Anomie moved this task from Unsorted to Blocked on the MediaWiki-API board.Apr 3 2018, 2:51 PM

@Anomie, is there a better/more practical way to differentiate manual and script-based edits?

Anomie added a comment.Apr 4 2018, 5:17 PM

The goalposts seem to keep moving here.

An "API" tag would not differentiate bots from automated scripts from manual edits that happen to be submitted via a script or tool. So if that's your goal now, the "API" tag you propose won't solve it in the first place.

Nor, for that matter, would tags for specific tools in all cases. AWB, for example, can and is used to make fully automatic bots, semi-automated script edits where a human reviews each change before saving, and fully manual edits where AWB serves to simply load pages in succession and maybe apply some general fixes.

Whatamidoing-WMF added a comment.EditedApr 9 2018, 3:13 PM

Anomie, I don't feel like the goalposts have moved at all. I thought I had explained them pretty clearly in the second paragraph, which begins "My goal is:". In particular, "the traditional methods of a human clicking an edit button, typing something, and saving the changes" is what I'm calling a "manual" edit, and anything that doesn't use that method is what I'm calling a "script-based" edit.

I apologize if my definitions don't line up precisely with the way you use those words. If the stated goal is still unclear to you, then please let me know, and I'll try to explain it in more detail.

Anomie added a comment.Apr 9 2018, 6:20 PM

There have been several goals proposed in the discussion of this task, and none seem to be well-satisfied by the request to tag every API edit.

  • From the description: "This could be handy for" a bunch of vague things.
    • Vague maybe use cases are hard to evaluate. These seem to have been largely ignored.
  • From the description: "I want to be able to see which edits were not made through the traditional methods of a human clicking an edit button, typing something, and saving the changes."
    • Depending on how you interpret "edit button", this might or might not include HotCat, the WikiLove extension, editing of Wikidata, Flow, various other user scripts and gadgets, the mobile apps, and applications like AWB. None use the standard "edit" tab on the web UI to open an editor, but all have some sort of button you click to start editing, then you might type something, and then you save.
    • Regardless, tagging "all edits made via the API" would include VE, and wouldn't include some spammer's bot that uses the EditPage form.
  • T188433#4011662 drifted into "people on-wiki might want to know how edits are being made".
  • T188433#4012411 then drifted that into "people need to be shown how many edits are made via the API".
    • Spamming a tag seems like an annoying way to do that. Proper analytics are more likely to give useful numbers than qualitatively seeing "lots of edits seem to have this OMGAPI tag".
  • T188433#4018516 brought up "seeing if something might be an unflagged bot" as a possible use case
    • But many other things use the API, so that would be a very noisy channel.
  • T188433#4018516 also mentioned "editing the talk page without seeing the article".
    • Which is totally confused as a justification for tagging API edits, since someone can easily go straight to the talk page with a "manual" edit or use a script in a context that shows the original page.
  • T188433#4098701 turned around and asked that normal EditPage edits be tagged so people wouldn't assume every untagged edit was via that editor, in addition to tagging all API edits.
    • Tagging normal EditPage edits isn't a bad idea, if someone is interested in tracking that. But the API isn't an "editor", it's a thing many different editors use to save their edits.
    • The sort of behavior tracking necessary to differentiate "human using EditPage" and "spambot using EditPage" might be problematic from a privacy perspective.
  • T188433#4102689 jumped back to trying to tell the difference between "manual" and "script-based" edits.
    • We've come full circle, I guess. And we're back to the same points as in bullet #2.

I think this response from early in the discussion is worth highlighting:

@Legoktm, if you think that tagging "non-traditional" edits would tag too many edits, then why don't we turn this around and tag the "normal" ones?

If we remove "API" from the current title of this task, then it fits pretty well with that suggestion. Perhaps we should finally drop the idea of trying to generically tag all API edits in favor of doing that.