RFC: Ditch wddx, dump, yaml, dbg, txt API formats
Closed, ResolvedPublic

Description

https://www.mediawiki.org/wiki/Requests_for_comment/Ditch_crappy_API_formats
Accepted on November 12, 2014.

MaxSem created this task.Apr 10 2015, 4:22 PM
MaxSem updated the task description. (Show Details)
MaxSem raised the priority of this task from to Normal.
MaxSem claimed this task.
MaxSem added a project: TechCom-RfC.
MaxSem added a subscriber: MaxSem.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 10 2015, 4:22 PM

Well, there's plenty of time to fix it ;)

@MaxSem: Please use descriptive task summaries.

Aklapper renamed this task from RFC: ditch crappy API formats to RFC: Ditch wddx, dump, yaml, dbg, txt API formats.Apr 10 2015, 7:44 PM
Aklapper set Security to None.
Anomie moved this task from Unsorted to Non-Code on the MediaWiki-API board.
daniel moved this task from Inbox to Approved on the TechCom-RfC board.Apr 15 2015, 8:55 PM
Legoktm closed this task as Resolved.EditedJul 2 2015, 4:44 AM
Legoktm added a subscriber: Legoktm.
Legoktm reopened this task as Open.Jul 2 2015, 4:45 AM

Uhh, forgot about yaml, dbg and txt.

Legoktm updated the task description. (Show Details)Jul 2 2015, 4:46 AM

We briefly discussed this in the archcom today. The second step of execution here was scheduled for 12 November 2015, which is tomorrow. There doesn't seem to be a commit yet for this, is there?

I would recommend we announce this to relevant mailing lists as a reminder, and let it ride the deployment train over the course of next week. And for third party wikis, it will be releases as part of MediaWiki 1.27.0.

Some numbers for reference:

https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki channel:api-feature-usage message:format

Last 7 days:

TermCount
format=txt267564
format=dbg138104
format=txtfm56095
format=yaml21914
format=dbgfm19
format=yamlfm13

For context, here is the overall 10 top:

https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki channel:api-feature-usage

Top 10 last 7 days:

TermCount
prop=langlinks&llurl9318987
action=query&prop=revisions+base&generatexml1807708
action=query&list=deletedrevs1142690
unclear-"now"-timestamp559538
action=tokens477398
action=expandtemplates&!prop333121
format=txt267581
action=search&srprop=score170086
format=dbg138072
action=parse&generatexml75473
Anomie added a subscriber: Anomie.Nov 12 2015, 3:09 AM

https://logstash.wikimedia.org/#/dashboard/elasticsearch/api-feature-usage might be slightly more useful, although in the end it's all the same data.

TL;DR summary is that hits are mostly IPs with little opportunity to contact whoever it is that's hitting these formats. There's maybe 4 where we have enough information that contact could even be possible.

Some analysis:

format=txt 267564

The top 100 is almost entirely IPs; I see one human user (who has tons of user scripts in their user .js) and one logged-in bot.

43% is coming from one IP with a generic agent, fetching extracts and pageinfo for seemingly-random articles.

10% is coming from various IPv6s that seem to belong to Facebook (they share a prefix and all include ":face:b00c:", and spot checking whois is consistent), generic agent exporting pages by pageid.

Another 9% is from one IP with a browser-like agent (probably fake), apparently fetching section 0 for US cities.

Another 8% is posts from an IP with an actually useful agent, best guess is it's a backend loading data for a phone app that matches the agent.

format=dbg 138104

99% is from one IP fetching page content with various agents, many from common web spiders. Almost certainly a live mirror of some sort.

format=txtfm 56095

96% from one IP with a generic agent, that seems to be fetching the top-revision timestamps for biographies on one wiki.

format=yaml 21914

82% requests with a "contact@" email address as the agent, at a domain that seems to be a brand monitoring/management company. Queries look like a strange way of getting HTML for various logos.

Another 9% has an agent attributing it to a particular bot. Actual queries seem to be just parsing the same page every 5 minutes.

format=dbgfm 19
format=yamlfm 13

So low it's not worth caring about.

We briefly discussed this in the archcom today. The second step of execution here was scheduled for 12 November 2015, which is tomorrow. There doesn't seem to be a commit yet for this, is there?

Not that I know of. I'll make one.

I would recommend we announce this to relevant mailing lists as a reminder, and let it ride the deployment train over the course of next week. And for third party wikis, it will be releases as part of MediaWiki 1.27.0.

I'd rather give slightly more notice on the reminder: let's let it ride the train for 1.27.0-wmf.8 rather than 1.27.0-wmf.7.

Change 252742 had a related patch set uploaded (by Anomie):
Stop testing deprecated API formats

https://gerrit.wikimedia.org/r/252742

Change 252743 had a related patch set uploaded (by Anomie):
API: Remove dbg, txt, and yaml formats

https://gerrit.wikimedia.org/r/252743

Change 252742 merged by jenkins-bot:
Stop testing deprecated API formats

https://gerrit.wikimedia.org/r/252742

Change 252743 merged by jenkins-bot:
API: Remove dbg, txt, and yaml formats

https://gerrit.wikimedia.org/r/252743

Legoktm updated the task description. (Show Details)Nov 18 2015, 5:51 PM
Legoktm closed this task as Resolved.
Krinkle moved this task from Approved to Implemented on the TechCom-RfC board.Feb 10 2016, 9:33 PM
Ricordisamoa awarded a token.
Ricordisamoa removed a subscriber: gerritbot.