Page MenuHomePhabricator

Wikimedia Technical Conference 2018 Session - Integrating data into our products
Closed, ResolvedPublic


Session Themes and Topics

  • Theme: Increasing our technical capabilities to achieve our strategy
  • Topic: Where do we need to integrate data?

Session Leader

  • Ramsey Isler


  • Joaquin Hernandez


Content is the key offering on Wikimedia projects, but it is also important to provide useful data about that content. Metadata, usage metrics, and content analysis are just a few areas where data can enhance our projects. This session will explore methods and motivations for using data of various types to expand and improve Wikimedia content and tools.

Questions to answer during this session

QuestionSignificance: Why is this question important? What is blocked by it remaining unanswered?
What are the use cases for structured data / metadata / semantic data on Wikipedia and other content Wikis? How are these use cases served now? What data types need support for curation, and what does not? Specifically mention categories and infoboxes.While we know we want to use more structured data on our content wikis, we haven’t clarified where and how we want to enable this. Understanding these use cases and the needs for curation will help us design ways to include data.
What type of semantic data can/do we want to attach to pages? What type of data do we need to attach to non-page entities like revisions, diffs, paragraphs, sentences, users, citations, etc?Given the use cases above, it should be obvious that we need to attach data to certain types of entities within MediaWiki. While most data may need to be associated with a page, sometimes we need to attach data to a revision (JADE) or another type of entity.
For which use cases should data be stored in a specific content wiki? For which use cases should data be stored on Wikidata and “imported” from there?Some data types may only be needed within a specific project, but others may be central and should be stored in Wikidata. Identifying the rules for how we choose will guide our architecture and provide a best practice for product owners/engineers.
Is it necessary for this data to be curated separately on the Wikidata client wiki (like en.wp.o), or only within Wikidata, with affordances to curate Wikidata from within the client wiki? Do all Wikimedia wikis need the ability to consume and integrate data from Wikidata?When using data from Wikidata in other wikis, how do should we support curation of that data? Do we build in a standard way to curate Wikidata from client Wikis? Do we support some sort of “forking” of the data and do we need to support upstreaming changes from the client wiki? These help us understand the needs of Data Federation.
What other sorts of data would be useful for end users to have? Perhaps info about how many times their content has been seen (or reused in the case of multimedia)? What kinds of data might be useful for volunteer devs?We need to think beyond just structured/semantic data. What sorts of other data about our content and how it is used do we have? Is it in a usable format? If not, what would it take to make it so?

Facilitator and Scribe notes

Facilitator reminders

Session Structure

  • Define session scope, clarify desired outcomes, present agenda
  • Discuss Focus Areas
    • Discuss and Adjust. ''Note that we are not trying to come to a final agreement, we are just prioritizing and assigning responsibilities!''
    • For each proposition [add etherpad link here]
      • Decides whether there is (mostly) agreement or disagreement and the proposition(s).
      • Decide whether there is more need for discussion on the topic, and how urgent or important that is.
      • Identify any open questions that need answering from others, and from who (product, ops, etc)
      • Decides who will drive the further discussion/decision process (ie: a four month deadline)
  • Discuss additional strategy questions [add etherpad link here]. For each question:
    • Decide whether it is considered important.
    • Discuss who should answer it.
    • Decide who will follow up on it.
  • Wrap up

Session Leaders please:

  • Add more details to this task description.
  • Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
  • Outline the plan for discussing this topic at the event.
  • Optionally, include what it will not try to solve.
  • Update this task with summaries of any pre-event discussions.
  • Include ways for people not attending to be involved in discussions before the event and afterwards.

Post-event Summary:

  • ...

Action items:

  • ...

Event Timeline

kchapman renamed this task from Wikimedia Technical Conference 2018 Session - Where do we need to integrate data? to Wikimedia Technical Conference 2018 Session - Integrating data into our products.Oct 3 2018, 2:40 AM
debt updated the task description. (Show Details)
debt edited subscribers, added: Jhernandez; removed: Ramsey-WMF.

Hello! We are starting to ramp up on session creation for the 2019 Wikimedia Technical Conference. If there is no longer anything remaining to do here please close this task to avoid confusion.

Hello! We are starting to ramp up on session creation for the 2019 Wikimedia Technical Conference. If there is no longer anything remaining to do here please close this task to avoid confusion.

@Ramsey-WMF: No reply hence resolving. If there is work left in this task, feel free to either set the status of this report back to "Open" via the Add Action...Change Status dropdown and associate an active project tag to this task, or create separate followup tasks. Thanks.