Page MenuHomePhabricator

Add support for wikidata summaries in the /page/summary/ endpoint
Open, LowPublic

Description

What

Add support for wikidata summaries in the page/summary endpoint.

Depends on language variant support in restbase T159985: Implement language variant support in the REST API

This is a goal of the RI team for Q1.

POC

https://gerrit.wikimedia.org/r/434193

Relevant info

Mocks

Initial mocks by @Nirzar to see what information we need to expose in the service:

Comments

We said it'd be nice to display the number of statements and sitelinks in order to give people a quick way to see how "good/useful" the item behind the link is. If you see an item has no sitelinks and statements then it's probably not worth clicking because it'll not tell you much more. But it's totally ok from my side to leave this out for now.

Event Timeline

Jhernandez triaged this task as High priority.Jul 9 2018, 11:11 AM
Jhernandez created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptJul 9 2018, 11:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Lydia_Pintscher @ovasileva @Nirzar Please, could you have a look at the task and review the mocks and information we should expose?

Once that is 100% clear we can proceed to reviewing the POC and working on it to merge it.

@bearND Could you please have a look too to the patch and spec and write what we expose so that we can come to agreement?

Pchelolo moved this task from Backlog to watching on the Services board.Jul 9 2018, 12:15 PM
Pchelolo edited projects, added Services (watching); removed Services.

From the Services side there are 2 things to consider:

  1. Invalidation. Right now we do not have summaries for wikidata, so new rules will need to be added to ChangeProp for cache invalidation. In case we include information like number of sitelinks etc, that's not actually changed by editing the item, we would need to thinkhow to track that information and invalidate it. Also, the normal summary endpoint is based on Parsoid HTML while the wikidata endpoint will likely be based on MW API, so it will need special invalidation rules.
  1. Storage. If we want to use the endpoint for Page Previews, we either need to store it in all the languages, or we need to make it fast enough so that the cache miss is acceptably quick to generate on the fly. Let's benchmark it when the endpoint is done, but regardless of the result, most likely we will not be able to store it as Cassandra right away because of capacity issue.
  1. Invalidation. Right now we do not have summaries for wikidata, so new rules will need to be added to ChangeProp for cache invalidation. In case we include information like number of sitelinks etc, that's not actually changed by editing the item, we would need to thinkhow to track that information and invalidate it. Also, the normal summary endpoint is based on Parsoid HTML while the wikidata endpoint will likely be based on MW API, so it will need special invalidation rules.

I might be missing something but sitelinks are stored and edited in the item. And their number is stored in page props and changed as edits are made afaik.

Update: Design is going to be looking into the reqs and discussing a bit more before going for implementation.

daniel added a subscriber: daniel.Aug 7 2018, 3:57 PM

A quick thought on how events for purging the cached preview could be triggered by Wikibase:

  • Any Wikibase client (including wikidata.org itself) receives "change notifications", which are picked up by the ChangeHandler, and can be intercepted by the WikibaseHandleChange hook.
  • A handler for the WikibaseHandleChange hook could find out whether a given change should cause the cached preview of the respective entity needs purging. Wikibase clients use a similar mechanism for purging pages when data that is used on these pages changes. The relevant code is in AffectedPagesFinder. I suppose at least the code of the getChangedAspects() method could be re-used.
  • Once it is known if the cached preview needs purging, the appropriate event can be pushed out to the EventBus using a DeferredUpdate.

This should be straight forward for any information local to the entity being changed (e.g. the description in different languages, or specific statements on that entity). Non-local information, such as the number of incoming links, can also be updated through that mechanism, but the count would need to be maintained explicitly in RESTBase somewhere.

Alternatively, such numbers could be re-calculated periodically instead of trying to keep them up to date continuously. E.g. if previews are cached for 24 hours, the number of references could be taken from Elastic when the preview is re-rendered after that time.

Product requirements from Olga and Lydia have been posted into parent task T111231: Page previews for Wikidata in case you want to have a look.

As I mentioned above, Design (Nirzar and Alex) are looking into it so we're leaving some more time before we act on it.

Aklapper changed the edit policy from "Custom Policy" to "All Users".Sep 17 2018, 5:50 PM
Aklapper changed Risk Rating from N/A to default.
Jhernandez lowered the priority of this task from High to Low.Sep 20 2018, 12:58 PM

After consulting, the teams won't be able to work on this on the near term, but will get to it on the future, so we shouldn't rush to implement the API when it is not going to be used.

I'm bringing this down to Low for now, and will raise prio again once we get resourcing to work on the frontend part (design and client team) to implement the API.