Page MenuHomePhabricator

RFC: PageLookup service and PageRecord object
Closed, DeclinedPublic

Description

This RFC proposes to introduce:

  • PageRecord: a lightweight data object for representing wiki pages (that is, entries in the page table).
  • PageLookup: a narrow service interface for looking up such page entries by id or title.

Proto-Code of for PageRecord and PageLookup can be found towards the end of the proposal.

Rationale

Reduce usage of the Title object. Title is a heavy weight smart record with many dependencies, and tightly bound to the database layer. Replacing it with something more lightweight would improve modularity and testability.

TitleValue was introduced as a replacement of Title in cases where only the namespace and title text are known or needed. Such cases however are rather rare, in most cases more information is desired: we need a lightweight representation of existing wiki pages, not just of titles of pages that may exist.

PageLookup would also complement the infrastructure proposed at T107595: [RFC] Multi-Content Revisions.

Proto-Code

class PageRecord {
  public function getTitle(); // @return TitleValue
  public function getId(); // @return int
  public function isRedirect(); // @return bool
  public function getLatestRevisionId(); // @return int
  public function getTouchedDate(); // @return string
  public function getLanguageCode(); // @return string
  public function getContentModel(); // @return string
}

class PageLookup {
  public function findPageById( $id );
  public function findPageByTitle( $ns, $title );
}

Other services that list pages by some property can be added as need be. A generic mechanism for listing PageRecords could be implemented to replace QueryPage (or QueryPage could be re-implemented to use it).

Migration

Migration from Title to PageRecord would be done incrementally over time. For compatibility, Title would get two new methods:

class Title {
  ...
  public static function newFromPageRecord( PageRecord $page ); // @return Title
  public function getPageRecord(); // @return PageRecord
  ...
}

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel subscribed.
daniel moved this task from P1: Define to Old on the TechCom-RFC board.

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

Krinkle set Security to None.

November 6, and this proposal doesn't seem to have much traction, it is not on track. Unless there is a sudden change, I will leave the ultimate decision of pre-scheduling it for the Wikimedia-Developer-Summit-2016 to @RobLa-WMF and the Architecture Committee.

This has been scheduled for discussion on IRC #wikimedia-office on November 11, 22:00 UTC (2pm PST), see E89: RFC Meeting: Parser::getTargetLanguage / PageRecord (2015-11-11)

Is "id" in getId() and findPageById( $id ) a page ID, revision ID, or articleID? The proliferation of IDs in core is confusing.

Is "id" in getId() and findPageById( $id ) a page ID, revision ID, or articleID? The proliferation of IDs in core is confusing.

getId() returns the page ID. In the context of PageRecord and PageLookup, that seems obvious. What Title calls the "article ID" actually is also the page ID. The revision ID is returned by getLatestRevisionId().

This was discussed at the RFC IRC meeting on November 11 as the second topic. See the log at https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-11-11-21.59.log.html, starting at 22:43. The minutes features of MeetBot weren't really used, so here is an attempt at a summary:

The main objection came from @ori who doesn't think adding more classes is going to make the code base simpler. Money quote:

23:01:13 <ori> Joe Armstrong (Erlang guy), talking about the dangers of OOP, once compared it to "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. "
23:01:25 <ori> I agree that Title is now a jungle with a gorilla holding a banana, rather than just a banana
23:01:34 <ori> but i don't think adding a Banana class is going to fix that

Krinkle also seemed worried about a "Java style Class Explosion". Tim seemed more amiable to an approach with higher granularity: <TimStarling> the complexity doesn't go away when you mix everything together in a single class

There was agreement that Title vs WikiPage vs Page vs Article etc is confusing, and we should get rid of at least some of these. E.g. <Krinkle> Article as view object? That's WikiPage and ViewAction. Afaik all uses of Article can be removed. The discussion was mainly about the best strategy for refactoring / migration.

In the end, Tim suggested to write some code, so we can discuss a more concrete proposal.

Personal conclusion: Perhaps it would be useful to reframe the proposal: PageRecord and PageLookup will not replace Title. PageLookup is basically a refactoring of LinkCache, PageRecord is an auxilliary object that allows us to remove the circular dependency with Title. Currently, Title depends on LinkCache, and LinkCache depends on Title. That would be solved by PageLookup depending on PageRecord, and Title depending on PageRecord and PageLookup.

Next step: reframe proposal in terms of replacing LinkCache, as described above. Write some code for PageLookup and PageRecord, as Tim suggested.

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

daniel moved this task from Old to (unused) on the TechCom-RFC board.

Unassigning myself. Needs shepherd.

kchapman subscribed.

This RFC is being declined as obsolete. It was intended to be an RFC on part of the refactoring for MCR. If it comes up during the MCR refactoring other new RFCs will be created.

Krinkle subscribed.

Closing. And per @daniel, as author of this RFC, foregoing Last Call.