Page MenuHomePhabricator

Proposal: add "action=readraw" to API since the non-API call "action=raw" is dangerous
Closed, InvalidPublic

Description

BACKGROUND

There is an "API" call to save edits (" http://mediawiki.org/wiki/API:Edit " "action=edit") but there does not seem to be any "API" call to read wikitext. Apparently the only way to do is the "non-API" call "action=raw". There are some minor and some major problems with this.

MAJOR PROBLEMS

Recently (ca 2019-Nov-20) the wikimedia servers began to aggressively deprecate agents with "poor crypto". This affects apparently ALL "API" and "non-API" calls. That's what I sometimes get instead of a login token or raw wikitext:

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Browser Connection Security Issues</title>

...

<a href="https://www.wikimedia.org"><img src="https://www.wikimed
ia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/stat
ic/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Your Browser's Connection Security is Outdated</h1>
<div class="content-text">
<hr>
<p lang="en" dir="ltr"><strong>English:</strong> Wikipedia is making the
site more secure. You are using an old web browser that will not be
able to connect to Wikipedia in the future. Please update your
device or contact your IT administrator.</p>

The probability to get this ugly hint seems to be ca 5%. If this comes instead of a login token then a bot fails to login. A sane bot will not try to edit then and abort instead (my bot does like this). If this comes instead of raw wikitext then the problem is worse. A bot could consider the "crypto whining" to be valid wikitext, perform some edits and submit it, probably having ca 95% chance to "succeed", thus to accidetally vandalise. I wonder how many wiki pages have been vandalised due to this bad strategy to make public wikies "more secure". Fortunately the "crypto whining" does not contain any trailing spaces or surplus blank lines (that my bot loves to remove), nor anything that resembles template calls (that I was fixing) and my bot is smart enough to renounce saving if the patching results in "no changes needed". I have not yet discovered a single page vandalised by my bot. The result is ca 95% "done" (good) and 5% "no changes needed" (bad, not done, but at least no vandalism). Still, this is inherently dangerous and IMHO bad design.

MINOR PROBLEMS

There are 3 minor problems with the "non-API" call "action=raw" unrelated to the recent "crypto whining":

  • No integrity check for the wikitext. An empty wikitext can be an indeed empty page, or caused by communication problems. The wikitext can be truncated or corrupted. Similar to above, this can cause accidental vandalism.
  • No way to read a page pointed by "pageid" ("action=edit" supports "pageid" but it's useless as long as "action=raw" doesn't).
  • Need to mix "API" and "non-API" calls.

PROPOSAL

Create an "API" call "action=readraw" and deprecate the "non-API" call "action=raw" then.

  • both "title" or "pageid" accepted
  • gives always raw wikitext, UTF-8 without BOM, LF EOL
  • the actual wikitext is prepended by 4 lines each terminated with LF EOL
    • fullpagename (raw UTF-8 without BOM) (particularly useful if called with "pageid")
    • number of last revision (variable-size DEC number or fixed-size 8-digit HEX number)
    • size of the wikitext (variable-size DEC number or fixed-size 8-digit HEX number) (can be ZERO)
    • MD5 of the wikitext (fixed-size 32-digit HEX number) (" https://mediawiki.org/wiki/API:Edit " "action=edit" already supports optional parameter "md5" and in the light of facts presented above it is not sane to assume that saving an edit should need it more than reading raw wikitext)

Every sane bot should validate at least the size of the wikitext, better bots can validate the MD5 too. Fullpagename and last revision can be useful in some situations and save some extra calls.

I think this proposal would solve the problems mentioned above without being unreasonably difficult to implement.

Event Timeline

Taylor created this task.Dec 1 2019, 3:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 1 2019, 3:13 PM
Taylor updated the task description. (Show Details)Dec 1 2019, 3:15 PM
Restricted Application added a project: Platform Engineering. · View Herald TranscriptDec 1 2019, 4:10 PM
Anomie closed this task as Invalid.Dec 2 2019, 1:35 PM
Anomie added a subscriber: Anomie.

There is an "API" call to save edits (" http://mediawiki.org/wiki/API:Edit " "action=edit") but there does not seem to be any "API" call to read wikitext.

See action=query&prop=revisions with rvprop=content.