Page MenuHomePhabricator

[mediawiki-api] Add method to PageListGetter to get list of outgoing links from a page
Closed, ResolvedPublic

Description

Currently, we have PageListGetter::getFromWhatLinksHere() to get in-bound links to a page. I think it'd be good to add PageListGetter::getLinksFromHere( $pageName, $targetNamespaces, $limit, $targetTitles, $dir ) to get a Pages object containing all pages that are linked to from the given page.

What do you think?

Event Timeline

Samwilson created this task.Feb 6 2018, 7:16 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2018, 7:16 AM

Mediawiki\DataModel\Pages doesn't support Page objects that don't have IDs, so out-bound links to non-existing pages can't be recorded. Also, the links API query doesn't return page IDs, so even for existing ones we'd have to do another API request.

One solution might be to not index the internal array of Pages with the page ID, but instead use some sort of normalized title, or combination of the two.

One solution might be to not index the internal array of Pages with the page ID, but instead use some sort of normalized title, or combination of the two.

That would work, alternatively we might just want a TitleListGetter or something like that? What api module would getLinksFromHere be using? What other data do we have?

@Samwilson @Addshore I'm a new contributor who's interested in working on this:). I tried using API:Links and it does return page IDs. Maybe I don't understand what's required?

@jeropbrenda: This is about adding code to https://github.com/addwiki/mediawiki-api - see the Project tags of this task.

@Aklapper Thanks for the reply! Yes, I've written and tested the code on my local copy of the repo, and it produced the expected output, which @Samwilson said wouldn't be possible with API:Links. So I was just sending the output of a simple GET request for confirmation in case I missed something.

@jeropbrenda thank you for looking into this!

I think the problem isn't with retrieving the list of outward links from the API (it sounds like that works correctly), but rather with representing them in mediawiki-api's Pages class. It's been a while since I looked into this, so I might be wrong (or this task might be out of date).

Perhaps it's an easy fix to add a new method for this. It'd be great if you could submit a patch! Are you familiar with submitting pull requests to Github?

jeropbrenda added a comment.EditedMar 23 2019, 10:19 AM

@Samwilson I get it now:). Yes, the response can be represented as a Pages object for existing pages. For non-existent pages, missing: "" is returned instead of a pageid. Do you want non-existent pages to be included in the output?

Yep, they should be included as well. But they can't be in referred to by an ID (because they don't have one). My idea above was to change that to use a composite of the ID and title, or maybe just the title. I'm not really sure what the best way is. The internals of Pages shouldn't really be exposed to the user, but perhaps they are (e.g. via Pages::toArray(). It feels like Pages::get() could be generalized to include missing pages — although there would have to be some way to distinguish between an integer page name and a page ID.

Yes, using the title would work because all pages have one. Would assigning negative pageids to pages with missing:"" be a dirty workaround?