Page MenuHomePhabricator

[Spike 2hr] Investigate ability for page previews to appear in wikidata
Closed, ResolvedPublic

Description

Background

We would like to customize page previews to appear in wikidata. We would like to investigate the best way for these previews to appear based on the requirements below

Mocks

missing desc.png (1×2 px, 322 KB)

only desc.png (1×2 px, 325 KB)

missing desc with image.png (1×2 px, 331 KB)

Acceptance criteria

Investigate the following questions:

Blocked

  • Can we include the following in Wikidata previews (currently accessible via API calls): the total number of statements; the total number of labels; the total number of external identifiers; and the total number of site links?
  • Can we provide Wikidata previews that include Wikidata label and description in the user's preferred language using RESTBase? If so, how?
  • Would this be a problem from a performance perspective?

Event Timeline

ovasileva renamed this task from Investigate ability for page previews in wikidata to appear in user's preferred language to [Spike] Investigate ability for page previews in wikidata to appear in user's preferred language.Jun 13 2017, 4:09 PM
ovasileva triaged this task as High priority.
ovasileva renamed this task from [Spike] Investigate ability for page previews in wikidata to appear in user's preferred language to [Spike 1hr] Investigate ability for page previews in wikidata to appear in user's preferred language.Jun 13 2017, 4:15 PM
ovasileva updated the task description. (Show Details)
ovasileva updated the task description. (Show Details)
ovasileva renamed this task from [Spike 1hr] Investigate ability for page previews in wikidata to appear in user's preferred language to [Spike 2hr] Investigate ability for page previews in wikidata to appear in user's preferred language.Jun 14 2017, 5:23 PM

Would this use the fallback tree if the item description is not available in the user's language? (Probably easier to fall back to English for now.)

Discussed in the reading/services sync,

@Nirzar do you have mockups for the wikidata previews to see what we would actually need? (there is some info about suggested data on the description

Some info on a "page-summary" rest endpoint:

Making a wikidata "page-summary" rest endpoint seems reasonable, the thing to consider is if we would store language variants separate (different url or headers), and query from the client, which would need the language fallback tree on the endpoints. Stats information would be duplicate in the caches.

..., or if there isn't that much data, then the endpoint would be unique, and serve all languages for the description/name and the stats. Less cache duplication, bigger responses, language fallback tree processed on the client.


To the taker, have a look at the options and the available data on wikidata to inform the tradeoffs, and let's ping services too on review.

@Jhernandez https://phabricator.wikimedia.org/T111231#3345326

Those are some rough ideas. I also wanted to include the Q id but didn't know if that was possible.

@Nirzar, in what situations does a user see the third mock-up (the one with the image of a back of an envelope)?

@ovasileva, can you clarify what the following means?

Can we include the following in wikidata previews (currently accessible via api calls): total number of statements, total number of labels, total number of external identifiers, total number of site links

Where would these numbers (total number of statements, etc.) appear?

Can we provide wikidata previews that include wikidata label and description in the user's prefered language using RESTBase? If so, how?

This will need a meeting with @GWicke and @mobrovac/@Pchelolo.

Jdlrobson subscribed.

Blocked on:

  • Input from @ovasileva and @Nirzar
  • Having a meeting with services to discuss how different languages should work on single language domains e.g. wikidata

We discussed this in the Reading / Services sync meeting. One question that came up in the discussion is whether including all languages in the response would be feasible from a performance perspective. The advantage of this direction would be no cache fragmentation, the downside a larger response.

If we determine that returning all languages does not make sense, then we could consider using the accept-language header as the general language selection mechanism, in line with T122942: RFC: Support language variants in the REST API.

We discussed this in the Reading / Services sync meeting. One question that came up in the discussion is whether including all languages in the response would be feasible from a performance perspective. The advantage of this direction would be no cache fragmentation, the downside a larger response.

Which is the more expensive of the two? Would you like us to spend time figuring out the p50, p75, p95 number of descriptions and labels per item to get a better understanding of the expected size of the response?

If we determine that returning all languages does not make sense, then we could consider using the accept-language header as the general language selection mechanism, in line with T122942: RFC: Support language variants in the REST API.

Reading through that RFC, it feels like this is the accepted solution. Would it really make sense to special-case this endpoint? (I guess this depends on your answer to the first question).

@Nirzar, in what situations does a user see the third mock-up (the one with the image of a back of an envelope)?

@ovasileva, can you clarify what the following means?

Can we include the following in wikidata previews (currently accessible via api calls): total number of statements, total number of labels, total number of external identifiers, total number of site links

Where would these numbers (total number of statements, etc.) appear?

@bmansurov - up to @Nirzar, but I'm guessing at the bottom of the preview? "X statements", "Y labels", etc

Thanks, I should have been clearer. I meant to know whether these numbers would appear in every preview? This information seems excessive, but I'm not a designer. ;)

@bmansurov - yup, they would appear in every preview.

We discussed this in the Reading / Services sync meeting. One question that came up in the discussion is whether including all languages in the response would be feasible from a performance perspective. The advantage of this direction would be no cache fragmentation, the downside a larger response.

Which is the more expensive of the two? Would you like us to spend time figuring out the p50, p75, p95 number of descriptions and labels per item to get a better understanding of the expected size of the response?

Ultimately, I think the median / p99 compressed sizes of responses with a single vs. all languages would be helpful. It's really more about getting an idea of the ballpark we are talking about -- is this < 16k, or are we talking about >100 kb?

If we determine that returning all languages does not make sense, then we could consider using the accept-language header as the general language selection mechanism, in line with T122942: RFC: Support language variants in the REST API.

Reading through that RFC, it feels like this is the accepted solution. Would it really make sense to special-case this endpoint? (I guess this depends on your answer to the first question).

The caveat is that we discussed Accept-Language in the context of supporting language variants in the REST API. There is also {T114662: RFC: Per-language URLs for multilingual wiki pages}, which is focused on Wikidata, but does not consider APIs so far.

The caveat is that we discussed Accept-Language in the context of supporting language variants in the REST API.

Correct, but we discussed language selection in general for that RfC, and consequently I think that this should be the way to go, as intuitively language selection is to a language-independent project what language-variant selection is to a language-scoped one.

We will meet on Thursday to talk about this.

We discussed this on Thursday. It seems we can use request headers to signal the language for the wikidata use case. The big issue here is how much storage this would require. We need to provide some data about how much data would be in a Wikidata preview response; the amount of processing it would require; sort of traffic it would get. This will help the services team make a decision on the correct approach.

Agreed! Sounds like the same problem to me.

@ovasileva: I was going to be bold but I notice that the mocks in this task aren't on any other tasks (including T111231: Page previews for Wikidata). Could/should we refer to them there?

There's some extra questions in the description at the bottom that are wikidata specific (I'm not sure if the title of this tasks reflects them however), namely:

  • Can we include the following in Wikidata previews (currently accessible via API calls): the total number of statements; the total number of labels; the total number of external identifiers; and the total number of site links?

I'd say let's keep both open but maybe keep this as a subtask of the other task just so we can have all the wikidata info in one place?

Perhaps we can remove the language variant part of this and just leave the Wikidata-specific part?

ovasileva renamed this task from [Spike 2hr] Investigate ability for page previews in wikidata to appear in user's preferred language to [Spike 2hr] Investigate ability for page previews to appear in wikidata .Feb 23 2018, 3:21 PM
ovasileva updated the task description. (Show Details)

Perhaps we can remove the language variant part of this and just leave the Wikidata-specific part?

Done. Left a note on the language just so we can capture the wikidata-specific parts of this as well.

Can we include the following in Wikidata previews (currently accessible via API calls): the total number of statements; the total number of labels; the total number of external identifiers; and the total number of site links?

Yes. The API will return this information (the patch returns total site links and labels). I'm not sure what "number of external identifiers" is, and whether it's available but if it is available in the api I can also return this. The client however would need to be updated to display this information.

Can we provide Wikidata previews that include Wikidata label and description in the user's preferred language using RESTBase? If so, how?
Would this be a problem from a performance perspective?

Per https://phabricator.wikimedia.org/T188164#4324348 we'll use an Accept-Language header.

@ovasileva can we close this spike with the above information? I've updated T111231 with what needs to happen to make this happen.

ovasileva claimed this task.

Can we include the following in Wikidata previews (currently accessible via API calls): the total number of statements; the total number of labels; the total number of external identifiers; and the total number of site links?

Yes. The API will return this information (the patch returns total site links and labels). I'm not sure what "number of external identifiers" is, and whether it's available but if it is available in the api I can also return this. The client however would need to be updated to display this information.

Can we provide Wikidata previews that include Wikidata label and description in the user's preferred language using RESTBase? If so, how?
Would this be a problem from a performance perspective?

Per https://phabricator.wikimedia.org/T188164#4324348 we'll use an Accept-Language header.

@ovasileva can we close this spike with the above information? I've updated T111231 with what needs to happen to make this happen.

Sounds good, resolving