Page MenuHomePhabricator

What's the reason for the `excerpt` property in the search/title endpoint?
Closed, ResolvedPublic

Description

The search/title endpoint seems to always return the title in the excerpt property. This is documented in https://www.mediawiki.org/wiki/API:REST_API/Reference#Schema as such but it's not clear what purpose this field has if it is the same as the value of the title field. Is it just there to have something non-null to return, and, in other words, to be able to mark it in the schema as required?

https://en.wikipedia.org/w/rest.php/v1/search/title?q=the%20rebel&limit=3

{
    "pages": [
        {
            "id": 52294167,
            "key": "The_Rebel_(South_Korean_TV_series)",
            "title": "The Rebel (South Korean TV series)",
            "excerpt": "The Rebel (South Korean TV series)",
            "description": "television series",
            "thumbnail": {
                "mimetype": "image/jpeg",
                "size": null,
                "width": 133,
                "height": 200,
                "duration": null,
                "url": "//upload.wikimedia.org/wikipedia/en/thumb/8/85/Poster_for_Thief_Who_Stole_the_People.jpg/133px-Poster_for_Thief_Who_Stole_the_People.jpg"
            }
        },
        {
            "id": 29730098,
            "key": "The_Rebel_Flesh",
            "title": "The Rebel Flesh",
            "excerpt": "The Rebel Flesh",
            "description": "episode of Doctor Who",
            "thumbnail": null
        },
        {
            "id": 5308866,
            "key": "The_Rebel_(TV_series)",
            "title": "The Rebel (TV series)",
            "excerpt": "The Rebel (TV series)",
            "description": "TV series",
            "thumbnail": {
                "mimetype": "image/jpeg",
                "size": null,
                "width": 150,
                "height": 200,
                "duration": null,
                "url": "//upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Nick_Adams_The_Rebel.JPG/150px-Nick_Adams_The_Rebel.JPG"
            }
        }
    ]
}

Details

Event Timeline

Restricted Application added a subscriber: revi. · View Herald TranscriptAug 3 2020, 6:12 PM

Similarly, I'm not sure why null is returned for some fields that could be also omitted. This seems to also add to extra values that need to be transferred to the client and parsed on the client side.

Update: filed T259837 for this.

Change #1268264 had a related patch set uploaded (by Alex Paskulin; author: Alex Paskulin):

[mediawiki/core@master] api-docs: Clarify behavior of excerpt property

https://gerrit.wikimedia.org/r/1268264

BPirkle triaged this task as Medium priority.Apr 6 2026, 8:54 PM
BPirkle moved this task from Incoming (Needs Triage) to Bugs & Chores on the MW-Interfaces-Team board.
BPirkle added subscribers: apaskulin, BPirkle.

The attached patch includes wording improvement that is as good as can be done with the current code structure.

However, it is a little awkward, and also a little brittle. That's because this same response schema is used in two different contexts (/search/title and /search/page). To be clear, that's a weakness of the code, not of the wording.

It would be possible to restructure the code to use separate schemas. The smallest code change would be to:

  • add another json schema file (well, probably delete the current one and add SearchResultsTitle.json and SearchResultsPage.json)
  • add a new message key to en.json and qqq.json, so that each schema file can describe (only) its format for the excerpt field
  • change SearchHandler::getResponseBodySchemaFileName() to load the appropriate json schema files based on the path. SearchHandler's $mode value can be used to distinguish - it is already used elsewhere in that handler.

In a quick response comparison (https://en.wikipedia.org/w/rest.php/v1/search/page?q=Earth&limit=50 vs https://en.wikipedia.org/w/rest.php/v1/search/title?q=Earth&limit=50) I didn't see any differences other than the excerpt field. I could be missing something, but I think the only difference between those two .json schema files would be that the excerpt field would point at different message keys for its description.

That's really not all that much code. I mostly listed all the steps in case we decide to go that direction, and it is someone other than me that does it.

Think that's worth it, @apaskulin , or are you content with your wording-only solution form the patch?

Thanks for your response, @BPirkle!

I could be missing something, but I think the only difference between those two .json schema files would be that the excerpt field would point at different message keys for its description.

I believe this is correct based on my testing.

Think that's worth it, @apaskulin , or are you content with your wording-only solution form the patch?

Totally up to you and the team. Separate schemas would definitely be a better experience for the docs, but having two schemas will be more work to maintain, and there's the engineering time required to set up the new schemas. Since this is probably low priority, maybe merge the wording-only patch now and leave this task open to do the split schemas later as a docs-experience improvement?

Since this is probably low priority, maybe merge the wording-only patch now and leave this task open to do the split schemas later as a docs-experience improvement?

I like that idea. Patch +2'd.

Change #1268264 merged by jenkins-bot:

[mediawiki/core@master] api-docs: Clarify behavior of excerpt property

https://gerrit.wikimedia.org/r/1268264

BPirkle claimed this task.

Patch merged, further improvements, if we choose to make any, can be done under a separate task.