Page MenuHomePhabricator

[Research Area] Determine source(s) of truth for Toolhub data and requisite editing experiences
Closed, ResolvedPublic

Description

Research Area: Data Storage

Data will be available through some combination of git (toolinfo files), on-wiki (other pieces of information), and Striker (tools hosted on Toolforge created via Striker workflow). Given this, how do we unify the data into one place, strategically split it in certain ways (e.g. keyword information is on-wiki only) or at least make the editing experience seamless so that it doesn't feel like you're editing data in different places.

This problem is solved when:

  • We know from talking to tool developers what they prefer between writing their own toolinfo files, using the Striker interface, or not wanting to do either
  • We know how to split the data between different sources based on some kind of idea that makes sense
  • What the data editing experience will look like, including graceful degradation of the experience depending on where the data is hosted

Event Timeline

Three types of data attributes:

  • Toolinfo – Basic information outlining the tool, what it is, and who made it. This is the baseline, "authoritative" information for the catalog.
    • Abstractly speaking, the tool creator has "authority" over this content, since they know what their own tool is. However, this does not have to mean exclusivity. We can design a system at the technical level that allows anyone to edit and leave the matter of "when" to rules and discretion. This, I think, is a basic expectation that Wikimedians have. (It's certainly one I have as a Wikimedian.)
    • Conflicting with this is that the current directory allows tool-related data to come from anywhere, even non-wiki sources. However, a majority of current toolinfo data is either hosted on Striker or on a wiki page of some kind.
    • To balance these two things, we can make the primary source of truth a data store controlled by the catalog app, and design our workflows to prefer this over others. This more easily puts the data in the hands of the community.
    • However, toolinfo data hosted on Git and endpoints not editable by our community will be assumed to not be user editable, and the UI will reflect this. Generally this should be okay, since this only applies to basic information, including name, author, description, license.
  • Annotation layer – additional descriptive metadata.
    • This information will only be read out of the catalog app. It will not be read out of toolinfo. This is both to ensure community ownership of the data and to abstract away implementation details like Wikidata IDs and TWN message string identifiers.
  • Reviews and endorsements – you can basically think of these as annotations and stored accordingly.
Harej renamed this task from [Research Area] Determine source(s) of truth for tool catalog data and requisite editing experiences to [Research Area] Determine source(s) of truth for Toolhub data and requisite editing experiences.Mar 29 2018, 12:39 AM