Page MenuHomePhabricator

Allow multiple authors in toolinfo.json
Open, Stalled, MediumPublic

Description

The schema currently allows author to be just a string, but for tools with multiple authors it'd be great to be able to optionally provide an array of strings.

I think these strings could also be defined to be the same as the author field in extension.json, i.e. wikitext if people want to link elsewhere (although that's a separate issue to allow multiple).

(This is a follow-up to https://meta.wikimedia.org/wiki/Talk:Toolhub#Multiple_authors)

Event Timeline

I think these strings could also be defined to be the same as the author field in extension.json, i.e. wikitext if people want to link elsewhere (although that's a separate issue to allow multiple).

Toolhub is not itself built using MediaWiki, so any wikitext processing we introduce in our UI would be a partial implementation at best. It would also have to be re-implemented by every API consumer which seems like a difficult burden to place on users. I do like the idea of allowing links to some external "profile" or contact location for an author, but I think that in the Toolhub data model this would be better represented with a structured "person" object of some sort.

For inspiration, https://getcomposer.org/doc/04-schema.md#authors allows specifying an author's name, email, homepage, and role (free form text I guess)

Toolhub is not itself built using MediaWiki, so any wikitext processing we introduce in our UI would be a partial implementation at best. It would also have to be re-implemented by every API consumer which seems like a difficult burden to place on users. I do like the idea of allowing links to some external "profile" or contact location for an author, but I think that in the Toolhub data model this would be better represented with a structured "person" object of some sort.

Good points. Then I think @Legoktm's idea of matching the structure of Composer is a great one. Even if it was just name and homepage (I assume we wouldn't want to add email addresses, for privacy, and because there are other fields with details of how to contact maintainers).

But anyway, if there's a choice about prioritisation, then I think just allowing an array of strings would be great, and handling each string in the same way as the current author field.

@Samwilson thanks for raising this. I agree that the array vs single string approach is best, however, I'm pondering what use cases we would be solving for.

  1. Provide recognition to tool author(s)

I'd say we currently satisfy this as the usernames are provided on the author page.

  1. Allow end-users to contact tool author(s) for help/questions

As you mentioned, an email address is probably not ideal, but routing to a user's talk page could be an alternative? Likewise, if someone has a question related to the tool itself, maybe routing the user to the tool talk page itself? I think we need to better understand what the intentions are of the end-user to provide the best route.

  1. View all tools for a given author

In this case, it would require us to explicitly define authors per tool so that we have the relation between the two. Again, I think we should see who would find this feature useful in order to move towards the path of the array of authors.

Let me know if you think of any other use cases! I think given the impact @bd808 stated it would have on API consumers, I'd hold off on this for now unless we find evidence that this would provide more value than the effort to update our model.

sdkim triaged this task as Low priority.Nov 23 2021, 6:44 PM

Thanks for thinking about it thoroughly!

  1. Provide recognition to tool author(s)

I'd say we currently satisfy this as the usernames are provided on the author page.

I'm not sure I follow. Where is the author page? I think this is what first prompted me to want this feature, because at the moment the author string isn't linked. For example, the page for ia-upload lists me and Tpt, but there's no way to see that my name is not a username and his is (of course, I could change mine to be samwilson but it still wouldn't be unambiguous).

  1. Allow end-users to contact tool author(s) for help/questions

I'd been thinking that this would be good, but maybe you're right and actually it's best to just have the bug tracker links etc. and not make it obvious that the authors can be hassled directly (e.g. via on-wiki email).

  1. View all tools for a given author

This would make it easy for tool authors to link to a nice overview of their tools, from their wiki user pages etc.

I can see this isn't a simple change, so yeah totally agree that it isn't urgent. Maybe an easy workaround would be to document the form of author, which currently is just described as "Name of the tool developer", to make it clearer what should be done in the case of multiple authors.

I'm not sure I follow. Where is the author page? I think this is what first prompted me to want this feature, because at the moment the author string isn't linked. For example, the page for ia-upload lists me and Tpt, but there's no way to see that my name is not a username and his is (of course, I could change mine to be samwilson but it still wouldn't be unambiguous).

Sorry, I meant to say tool info page. An author page currently does not exist. If I understand you correctly, the expected behavior you would want is to have samwilson link to either a toolhub author page or to your userpage onwiki?

Maybe an easy workaround would be to document the form of author, which currently is just described as "Name of the tool developer", to make it clearer what should be done in the case of multiple authors.

Would you mind linking me to where this is? The add or remove tool form lists author, but maybe I'm missing something

If I understand you correctly, the expected behavior you would want is to have samwilson link to either a toolhub author page or to your userpage onwiki?

Yep, that's right. It could even link to Toolhub search results showing all tools for that author, if that's easier.

Would you mind linking me to where this is?

Under the 'Adding tools to the catalog' section on meta:Toolhub it says to "Create a JSON file conforming to the schema" and links to the schema at https://meta.wikimedia.org/wiki/Toolhub/Data_model where it has author defined as both "Name of the tool developer" (in the summary section) and "The primary tool developer" (in the schema).

If I understand you correctly, the expected behavior you would want is to have samwilson link to either a toolhub author page or to your userpage onwiki?

Yep, that's right. It could even link to Toolhub search results showing all tools for that author, if that's easier.

The technical backend for that exists (https://toolhub.wikimedia.org/search?author__term=Magnus+Manske) and T293194: Tags and author should link to filtered search queries documents the idea of connecting the displayed author information to search.

I'm going to work on a proposal for an alternate method of richly describing multiple humans involved in creating and maintaining a tool and their role in that particular tool. I am currently imagining a new JSON object to describe a human similar to Composer's authors collection or PECL's <lead>, <developer>, <contributor>, and <helper> roles. The majority use case in Toolhub is expected to be describing a Wikimedian, but we should also allow for describing humans who have authored a generic tool such as matterbridge which is used to power a tool like bridgebot.

bd808 changed the task status from Open to In Progress.Thu, Jan 13, 6:29 PM
bd808 raised the priority of this task from Low to Medium.

Here's the jsonschema for a new "person" object:

person:
  type: object
  properties:
    name:
      type: string
      maxLength: 255
      description: The full/formatted name of the person.
    wiki_username:
      type: string
      maxLength: 255
      description: The person's Wikimedia username.
    developer_username:
      type: string
      maxLength: 255
      description: The person's Wikimedia Developer account username.
    email:
      type: string
      maxLength: 255
      format: email
      description: Email address
    url:
      $ref: "#/definitions/url"
      description: Home page or other URL representing the person.
  required:
    - name

An associated change to the "author" property:

author:
 oneOf:
   - type: string
     maxLength: 255
   - type: array
     items:
       $ref: "#/definitions/person"
 description: The primary tool developers.

The oneOf here allows the crawler to continue to process the legacy single string implementation of authors. On the backend when it receives a single string it will split the string on commas and create a minimal {name: <value>} Person record for each found name. The array of Person objects variant becomes the canonical storage and API response form for the Toolhub backend.

I have local patches implementing the jsonschema change, the backend changes, and a minimally working update to the frontend for creating/editing toolinfo records. The frontend solution is currently minimal in that it does not yet implement the full range of properties for a Person or allow adding more than one Person as an author. I am going to hold the changes locally until I can put more effort into the frontend, or determine that I can hand that implementation off to someone else.

bd808 changed the task status from In Progress to Stalled.Wed, Jan 19, 6:28 PM