Page MenuHomePhabricator

ProofreadPage index JSON access not working
Open, Needs TriagePublicBUG REPORT

Description

In T291167, released with 1.38.wmf0, access to ProofreadPage index fields by JSON was added. This appears to now not be working:

This URL should access the wikitext as JSON, but it does not:

https://en.wikisource.org/w/api.php?action=parse&format=json&page=Index%3ASandbox.djvu&prop=wikitext&contentformat=application%2Fjson&formatversion=2

The response is in the default Wikitext:

{
    "parse": {
        "title": "Index:Sandbox.djvu",
        "pageid": 779217,
        "wikitext": "{{:MediaWiki:Proofreadpage_index_template\n|Type=phdthesis\n|Title=Proofreading Sandbox\n|Language=fr\n|Volume=\n|Author=Wikisource\n|Translator=\n|Editor=\n|Illustrator=\n|School=\n|Publisher=\n|Address=\n|Year=2011\n|Key=\n|ISBN=\n|OCLC=\n|LCCN=\n|BNF_ARK=ark:/13960/t3903nz7b\n|ARC=\n|DOI=\n|Source=djvu\n|Image=[[Image:Historical Lectures and Addresses.djvu|page=7|300px]]\n|Progress=C\n|Transclusion=no\n|Validation_date=\n|Pages=<pagelist \n9=\"Illu s.\"\n/>\n|Volumes=\n|Remarks=\n|Width=900\n|Css=\n|Header=\n|Footer=\n}}"
    }
}

It should return something like this:

{
    "parse": {
        "title": "Index:Sandbox.djvu",
        "pageid": 3,
        "wikitext": "{\"fields\":{\"Type\":\"book\",\"wikidata_item\":\"\",\"Title\":\"Pericles\",\"Language\":\"en\",\"Volume\":\"1\",\"Author\":\"[[Pericles]]\",\"Translator\":\"\",\"Editor\":\"Editor name\",\"Illustrator\":\"\",\"School\":\"\",\"Publisher\":\"\",\"Address\":\"\",\"Year\":\"\",\"Key\":\"\",\"ISBN\":\"\",\"OCLC\":\"\",\"LCCN\":\"\",\"BNF_ARK\":\"\",\"ARC\":\"\",\"Source\":\"_empty_\",\"Image\":\"1\",\"Progress\":\"X\",\"Pages\":\"<pagelist\\n1=Cover\\n2=1 \\/>\",\"Volumes\":\"\",\"Remarks\":\"\",\"Width\":\"\",\"Css\":\"\",\"Header\":\"\",\"Footer\":\"\"},\"categories\":[\"Other cat\"]}"
    },
}

where the content is serialised JSON.

This has broken at least one tool which can't deserialise the resulting Wikitext as it was expecting JSON.

Event Timeline

This is probably a expected result of gerrit:767799 (T206253)

The documentation of the parameter at https://en.wikisource.org/w/api.php?modules=parse says:

contentformat
Content serialization format used for the input text. Only valid when used with text.

That why it was fixed, to match documentation (and avoid fatal errors, when used wrong).
Using action=parse to retrieve only the content of a revision is not the correct way. Use prop=revisions for it. The deprecated rvcontentformat supports this feature, but it can only be used with the old deprecated format. I will add a patch to allow rvcontentformat-{slot} to control the output format like done with rvcontentformat.

Change 850641 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] api: Add rvcontentformat-{slot} to define output format per slot

https://gerrit.wikimedia.org/r/850641

@Umherirrender: Thank you! Will this default to the main slot like other API calls, or will the slot need to always be specified explicitly?

I wonder if we should add a seperate index-data API to ProofreadPage (similar to how templatedata works) ? @Samwilson @Tpt

@Xover mentioned using Derived-MCR for this, but I'm not sure about the stability of the API and how integrated it is with existing MediaWiki Hooks and API's as of yet.

@Umherirrender: Thank you! Will this default to the main slot like other API calls, or will the slot need to always be specified explicitly?

The rvslots parameter must be given to get the non-deprecated format and to use rvcontentformat-main if the patch set gets merged. Not sure if rvslots=main is the default, when the deprecated format is removed.