Page MenuHomePhabricator

[Task] research how to surface usage tracking data for editors
Closed, ResolvedPublic

Description

We are tracking which data from Wikidata is used where. This is only available in a database table right now. We need to find ways to surface this information for editors and make it useful for them.

STORY1: As a Wikipedia Editor, I want to know where information is transcluded from so I can check and edit the data. Concerning this issue, this data may come from wikidata.
STORY2: As a Wikidata Editor, I want to split an item that is actually about two concepts. Concerning this issue, this means that I want to check which (Wikipedia) pages transclude information from that item so I can prevent wrong data and/or correct the concerning (Wikipeda) pages to link to the right one of the two concepts.

Related Objects

Event Timeline

Lydia_Pintscher raised the priority of this task from to Medium.
Lydia_Pintscher updated the task description. (Show Details)
Jonas renamed this task from research how to surface usage tracking data for editors to [Task] research how to surface usage tracking data for editors.Aug 13 2015, 4:36 PM
Jonas set Security to None.

You could also think about some result filtering. For example, I would like to know which pages are using Wikidata, except all those authority control IDs. You could keep in mind that.

The Wikimedia-Hackathon-2016 starts tomorrow and this task is featured at T119703. We want to use T130776: Wikimedia Hackathon 2016 Opening Session to promote these projects and help recruiting volunteers to work for them.

If this task is ripe for hackathon work, please follow these instructions. If it is not ready, remove it from T119703 in order to avoid volunteers' frustration. Thank you!

Change 299284 had a related patch set uploaded (by Ladsgroup):
Introduce prop=wbcentityusage in API to expose wbc_entity_usage

https://gerrit.wikimedia.org/r/299284

Lydia_Pintscher subscribed.

(Adding to next sprint to get review of Amir's patch.

During todays sprint start there was quite some confusion because this ticket says it's about "research" but there is no discussion and no outcome of any research documented. We realize that the existing patch proposes a specific solution, but there are several problems with that:

  • During sprint start we realized we do not have consensus if the proposed solution is what we want to support.
  • We already spend a lot of time discussing the patch in Gerrit, which is the wrong place for such a discussion.
  • It's only a partial solution anyway, because it's only an API module, while this ticket talks about "surfacing to editors". Which means there will be at least an other sub-ticket introducing a special page or something like that.

We decided to split the review of the existing patch to T143118: [Task] Introduce wbentityusage API, leaving this as an unresolved "research" task. We picked only the "research" task for the current sprint to find answers for the questions above.

I was hesitant to make sub tasks but you are definitely right. That's what I've got so far from our discussion in gerrit and Lydia.

In client:

  • We should have two API modules one as a prop and one as a list, the prop one gets a gen and produces entities that have been used (T143118: [Task] Introduce wbentityusage API). The list one takes list of entities and produces pages that use these entities. (<put the phab number here>)
  • We should have a simple GUI in action=info (One or two rows in here for example). I think @Jan_Dittrich is on it. (<put the phab number here>)

In Repo:

  • Since we don't have these data in the database we can't do anything for now until we make a table to keep these data and then we can talk about using them afterwards.

Sounds good?

If you agree on the scheme, I make the phab cards.

@Ladsgroup Thanks for structuring this!

I know about the broad topic, but I would need to know a bit more about the assumed usecases to do a proper design ( basically only know what is in the brief description of this issue)

  • Who would be interested in the information?
  • In which situations would the information be interesting?
  • What are technical difficulties I need to know about?

I suppose it is the easiest to do that with Lydia in person, but if you have more information you could already post it would be also helpful.

Hey,
I guess Lydia would be a better person to answer those but as a Wikipedian I think of cases when there is incorrect datum in an article and I want to fix it but I don't know where this datum comes from. So I need to have list of items used in each article in order to find and fix that.

Speaking of technical difficulties, I can think of lots of them but it really depends on your design. May I suggest you to design it and/or build a mock and we alter that design (with your approval) if there was a technical difficulty holding back us to implement?

We have discussed this some more in story time last week. Here is what we discussed:

  • We start by showing it in action=edit on the client.
  • We try to add it to the existing transclusion list there.
  • As a next step we can add it to the transclusion list at the bottom of action=edit
  • On the repository we can just give a list of all clients that make use of the item and a link to a special page on each client that gives the list of pages using that particular item.

Amir: Does that help you?

Correction: "We start by showing it in action=info", which is T143148: Put entity usage data in action=info. The code already written in T143118: [Task] Introduce wbentityusage API should be split and restructured into a class that can be used in all three possible places (action=info, below the edit window and in the proposed API module).

It sounds good. Should we get this patch merged and then I start on the refactor or should we start splitting them now?

On a second thought. I'm not sure it would be a good idea to make the API module use the class, since we only alter an existing query and we don't redefine a new query. But for using in action=info and action=edit, one class is a good idea.

We picked this during todays sprint start. We agreed we want to merge the two patches (see T143118) first, and then split the code in later patches. @Ladsgroup, we picked T143148 for the current sprint and already moved it to "doing". Feel free to work on this and let us know what the state is. Thanks for your patience.

We have discussed this some more in story time last week. Here is what we discussed:

  • We start by showing it in action=edit on the client.
  • We try to add it to the existing transclusion list there.
  • As a next step we can add it to the transclusion list at the bottom of action=edit
  • On the repository we can just give a list of all clients that make use of the item and a link to a special page on each client that gives the list of pages using that particular item.

Amir: Does that help you?

The action=info bit is done, and we also have an API module that exposes the usage information.

We still want to see it on action=edit, and we still want action=info on the repo. Do we have tickets for these? Once these tickets exist, this ticket can be closed, since the investigation is complete and documented.

Okay, I made two phab cards but one question. Where in repo we are going to do this?

On the repository we can just give a list of all clients that make use of the item and a link to a special page on each client that gives the list of pages using that particular item.

Special page? Entity page? action=info?

Okay, I made two phab cards but one question. Where in repo we are going to do this?

On the repository we can just give a list of all clients that make use of the item and a link to a special page on each client that gives the list of pages using that particular item.

Special page? Entity page? action=info?

action=info for now. We add links there to a special page on all the clients that use the particular item. These special pages then give a list of articles on that particular client using that particular item. Does that make sense?

Ladsgroup moved this task from Doing to Review on the Wikidata-Sprint-2016-08-30 board.
Ladsgroup moved this task from Review to Done on the Wikidata-Sprint-2016-08-30 board.