Page MenuHomePhabricator

Create an authoritative and well promoted catalog of Wikimedia tools
Open, MediumPublic

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Abbe98 added a subscriber: Abbe98.Feb 6 2017, 11:40 AM
zhuyifei1999 updated the task description. (Show Details)Feb 12 2017, 7:35 PM

I have planned work for Striker that is related (T149458: Manage shared tool accounts via Striker). From that task:

The collected metadata should be stored somewhere that is accessible outside of Striker to allow others to experiment with different interfaces for searching. Ideally this exposed storage would be in Elasticsearch which provides a lot of features for filtering and other aggregate operations.

Basically I would like to allow controlled collaborative editing of meta data about tools hosted in Toolforge that is a superset of the existing toolinfo.json standard developed by @Husky. I would then like this data set to be published in the Tool Labs Elasticsearch service so that Striker or any other tool could use the power of Elasticsearch to provide a nice interface for finding tools.

Tgr added a comment.Feb 15 2017, 3:44 AM

Ideally such a catalog would include all tools (users don't care that much about Tool Labs vs Labs vs gadets vs somebody's personal server; and it would be great to have a library/bot catalog as well although the audience is different for that). Maybe it could be done packagist-style, where you have to expose a json file but it can be at any URL which you can just register in some web interface.

Ideally such a catalog would include all tools (users don't care that much about Tool Labs vs Labs vs gadets vs somebody's personal server; and it would be great to have a library/bot catalog as well although the audience is different for that).

I do understand that tool labs isn't everything; that is why I didn't say that Striker would be the complete solution to this.

Maybe it could be done packagist-style, where you have to expose a json file but it can be at any URL which you can just register in some web interface.

Hay's directory already works like that, but in my mind it has a major flaw in that there is no collaborative editing of json files at random URLs. I can't easily add a new tag or fix the description of a published tool.

This class of problem is the sort of thing that Semantic MediaWiki, Wikibase, and Cargo seek to solve. All of those are potential solutions, but not likely to be quickly adopted on the Wikimedia production cluster.

Tgr added a comment.Feb 15 2017, 5:21 AM

This class of problem is the sort of thing that Semantic MediaWiki, Wikibase, and Cargo seek to solve. All of those are potential solutions, but not likely to be quickly adopted on the Wikimedia production cluster.

I wonder about that. I would like to push T155024: Store structured data needed for MediaWiki documentation as an RfC now that T155029: MediaWiki.org: Generate infoboxes from extension.json in git finished #11 on the wishlist. Daniel for example seemed convinced that installing Wikibase for mediawiki.org is the way to go and it could be done quickly (once the ability to use remote properties, being developed now for Commons, is in place).

"being developed now for Commons" with an expected first ship date about a year from today based on the timelines I've seen.

Tgr added a comment.Feb 15 2017, 6:06 AM

Sure but federation is not the main blocker for that (MCR is).

Tgr added a comment.Feb 15 2017, 9:43 AM

This was the fourth most popular item in the Developer Wishlist results. @bd808 is this something you plan to do as part of the Striker project or are you planning to a part of this task? In the latter case could it be broken up into subtasks so it is clear what tasks are still looking for an owner?

I'm really concerned Hay doesn't comment this task.

The tool produced is awesome for several reasons:

  • it supports a good metadata format
  • it allows to integrate tools labs, other labs or external tools, and so be a more comprehensive catalogue
  • the UI is efficient and works, both for the search and to present data

We need to reach Hay and see if there are some possibilities to have them to contribute.

Husky added a comment.EditedFeb 15 2017, 5:35 PM

Hey,
thanks @Dereckson for pointing me towards this discussion, didn't even know there was this discussion on upgrading the tool directory.

When i designed the tool directory i made a couple of design decisions that contributed to the relatively high usage and quality of the directory (there are almost 400 tools indexed now). The most important one is coupling the metadata of the tool to the source/repository of the tool itself. This is very much inspired by how package managers like npm and packagist work. In my experience using other approaches like editing data on a wiki page or another silo simply don't work because it's completely separate from the act of editing the sourcecode of the app. Editing the toolinfo.json is far less cumbersome because it's located right next to all your other sourcecode. A key to a successful tool directory is making the barrier to entry as low as possible.

Considering @scfc's comment: i very much agree that you should use all possible methods to advertise your tool, the tool directory is just another way to make your work known to the world. Given the fact that this phabricator issue exist and all the mail i get whenever the directory doesn't work proves it's useful to at least some people. Note what Magnus writes at the top of his own toolpage.

Considering @bd808's comment: of course it would be nice to give the directory an upgrade with fancy stuff like Elasticsearch, Striker and such, but really, what are we talking about here? It's just 400 tools, and history has proven that this number won't grow exponentially. I would be wary of designing a new version that's overly complicated for such a small dataset. And like i said above, integrating the metadata right there with the sourcecode, combined with easy publishing of the toolinfo.json file, is very much the reason why the tool directory is such a succes.

For me the way forward would simply be to maybe rewrite the frontend a bit (i would probably use Vue.js instead of the current jQuery codebase) and maybe to find a way to directly put it on the homepage of Tool Labs?

Unfortunately i don't have the time to contribute a large amount of time to a rewrite / upgrade of the existing solution, and i won't be attending the Wien hackathon.

It's just 400 tools, and history has proven that this number won't grow exponentially.

There are currently 1641 shared tools accounts on Tool Labs alone. As @Tgr points out in T115650#3028309 there are a lot of other things besides those that live in Tool Labs. I really don't think it is a stretch to think that the current list of 400 tools that people have submitted via toolinfo.json files represents a very small fraction of all tools that an "authoritative" catalog would contain especially if gadgets, user scripts, and other types of solutions are included. Will it grow to 4,000,000? Probably not. Will it grow to 4,000? That would not surprise me.


Any talk of the specific technical solutions needed is probably premature. I apologize for leading the discussion in that direction myself as few posts above. In my experience these discussion quickly devolve into experienced insiders debating the relative merits of existing solutions instead of discussing who the audience is and what their needs are. See past threads for examples:

The real place to start if we want a different outcome is with someone taking on the role of Product Owner and doing the initial work of describing the general problems that should be addressed and how solving those problems will result in a net positive change for the Wikimedia movement. We need to know what problem we are trying to solve and who we are trying to solve it for before deciding on the technical details of how to actually produce the desired result.

Tgr added a comment.Feb 15 2017, 11:37 PM

Quite a few people voted on this task so surely they wanted something more than a facelift. It would be nice to hear what their problems were. Is the existiong catalog not well-known / well-promoted? Does it have too little information about its entries? Does it not include many tools people are looking for? Those would be three fairly different directions to start in.

IMO there is a lot of data that needs to be managed centrally and not in the json files. For example an "abandoned" flag would be fairly pointless if the author has to set it. Ranking would improve the directory a lot. Tagging is also better done centrally so that a standard vocabulary can emerge.

Abit added a comment.Feb 22 2017, 11:35 PM

I can share what I've heard from the mostly-non-technical program organizers I work with about the problems that should be addressed.

Is the existiong catalog not well-known / well-promoted?

Yes. First, which existing catalog? The list of tools on tool labs, Hay's tool directory, or Magnus' remix of Hay's directory? From what I know, it seems Hay's is most well known, the remix seems best liked, but most people are not aware of any of them. It hurts to see people manually count/collect/track things when tools to do so already exist because they are unaware of them.

Does it have too little information about its entries?

Yes. I think part of the problem is that many tools have unanticipated uses by users unknown to the tool creator. For example, GLAMify is useful for GLAM campaigns, for follow ups to photo competitions such as Wiki Loves Earth, and for article improvement competitions. Wikimetrics is useful for counting number of pages created at an event but also survival of a cohort of editors.
Another part is that one person may say a tool is good for judging writing competitions and another may be looking for tools good for scoring editing contests, and the description and search will past like ships in the night.

Does it not include many tools people are looking for?

Yes. There are plenty of useful tools that aren't included on any of the existing lists. Some of these are the tools people are looking for, but I suspect there are more that are useful that people are not looking for, since the people do not know those tools exist.

Ricordisamoa updated the task description. (Show Details)Feb 23 2017, 1:45 AM
Ricordisamoa updated the task description. (Show Details)

I would like:

  • A way to add keywords, and improve the descriptions, for each tool. [problem#1]
  • Either a semi-formal ontology, or at least some sort of guidance/suggestions with keywords, to reduce some of the problems like plurals and synonyms. [problem#1]
  • A way to add tools myself, without having to sign-up for x different repositories/trackers.
  • A way to add/show screenshots, for us visual-thinkers, who don't remember what the tool was called, but do remember what it looks like. [Perhaps, via Extension:PageImages accessing any existing screenshots (which we'd encourage/add) in the primary documentation pages at mediawiki/wikitech?]
  • The ability to add tools that are not hosted on our infrastructure. But clearly marking them as "off-wikimedia". [to avoid privacy confusions]
  • The ability to add links to documentation about historic/defunct tools, so that people can at least learn about them. But clearly marking them as "not available". [cf. some great items in Atlasowa's page, linked in the task description.]
  • A link pointing to the sourcecode for each, and perhaps something indicating what license it is under.
  • A link pointing to a feedback page for each, to encourage wikilove from users.
  • A link to the where we (especially non-developers) can help with UI translation for each, if the tool is configured for that.
  • A pony. [tradition]

_
If we include Gadgets, it gets more complicated, because many are inherently limited to particular wikis (some by design, some by accident). Plus Gadgets as a whole will hopefully be further complexified soon, with "Global Gadgets". (Plus I'm still rooting for my proposed design upgrade.) But it would be nice to have them searchable, too.

_
n.b. I thought this page [that I stumbled upon] was starting to work out some of that, but I haven't dared experiment with it yet... https://wikitech.wikimedia.org/wiki/User:Magnus_Manske/hay_directory - or maybe that's for the remix directory?

_
problem#1:

  • e.g.1. From a search, there are 39 tools in Hay's directory that have "category", but only 22 have "categories" - if I type "categor" I get the union, 44.
  • e.g.2. Similar problems exist for "image", "images", "file", "files", "media", "multimedia", etc.

This class of problem is the sort of thing that Semantic MediaWiki, Wikibase, and Cargo seek to solve. All of those are potential solutions, but not likely to be quickly adopted on the Wikimedia production cluster.

But SMW is already installed on Wikitech; isn't that where tools are meant to be documented?

If the idea is that every tool (whether hosted on Tool Labs, or elsewhere) has a page on Wikitech (which seems a sensible idea to me), then would it not be reasonably straightforward to just use Semantic MediaWiki and PageForms to build a system of browsing and editing tools' metadata? It seems to me that the building blocks for such a thing are already in place.

For reference, T122865: Create a wiki documentation page for each tool is the ticket regarding wikitech pages for tools.

This class of problem is the sort of thing that Semantic MediaWiki, Wikibase, and Cargo seek to solve. All of those are potential solutions, but not likely to be quickly adopted on the Wikimedia production cluster.

But SMW is already installed on Wikitech; isn't that where tools are meant to be documented?

T53642: Get rid of SemanticMediaWiki/SRF/SF from wikitech.wikimedia.org is something that we have been working towards over the last 6-9 months and actually expect to complete in the coming fiscal year. One reason for this is mentioned in T62886#2909198, modern SMW versions are dependent on Composer in a way that is not trivial to deploy on the Wikimedia production cluster.

Qgil added a comment.EditedAug 30 2017, 6:49 AM

I wonder whether the progress on Toolsadmin now supports creating and documenting tools and https://toolsadmin.wikimedia.org/tools/ are steps in the direction of "an authoritative and well promoted catalog of Wikimedia tools" or a separate development.

I wonder whether the progress on Toolsadmin now supports creating and documenting tools and https://toolsadmin.wikimedia.org/tools/ are steps in the direction of "an authoritative and well promoted catalog of Wikimedia tools" or a separate development.

They are at least steps towards making it easier for things on Toolforge to be documented and that documentation to be both collaboratively edited and shared. The sharing is current done with https://toolsadmin.wikimedia.org/tools/toolinfo/v1/toolinfo.json which is only a sub-set of the data that Striker can collect now, but a good start.

I volunteer to help with any kind of content needs for this project, whether it's creating descriptions, keywords, a taxonomy, etc.

I worked on this at my previous job, where we used .yml files for each repo to keep track of information:

https://github.com/openopps/openopps-platform/blob/dev/.about.yml - example
Example of tool directory: http://brigade.codeforamerica.org/brigade/projects
Example of tool directory: https://18f.gsa.gov/what-we-deliver/

Cool directories for external collections that @MelodyKramer showed me:

The interesting thing about both of these is that they appear to be curated in that there are use-case driven organization which is a bit different from the freeform tags of toolinfo.json.

Qgil added a comment.Sep 12 2017, 8:33 AM

In relation to T158149: Find an owner for top 10 Developer Wishlist 2017 proposals, I dare to ask: what is the current status? :)

I would be happy to assist in some attempt at this problem, but I do not have the free time to actually commit to doing anything close to the majority of the work.


This wish is still missing a reasonable description of what use cases need to be solved and in what order. As it stands now there is a very large problem space that has a few partial solutions, but no clear description of what would be better. That in my mind is the first part of the problem that needs to be tackled.

  • Hay's Directory solves a problem: it allows the developer of a tool to maintain a standardized collection of metadata that can be aggregated by a central system and displayed to others.
  • The toolinfo system in toolsadmin (Striker) solves a related problem: it allows the technical community to collaborate on adding tags and updating the description of a toolinfo.json record for a given tool hosted on Toolforge if the tool's maintainers have made an initial record.
  • Magnus' hay directory user page solves a similar problem to the one solved by toolsadmin: it allows the technical community (users with accounts on Wikitech) to add and edit toolinfo.json metadata in a central wiki page. This allows collaborative editing, but with a user interface that is somewhat lacking.
  • Neither toolsadmin nor Magnus' page help typical Wikimedia users who do not have a technical contributor account participate in creating or curating toolinfo records.
  • None of these solutions enforce a common taxonomy for tools.
  • None of these solutions are particularly good at answering human questions like "How can I do X?" or "What is the most powerful Y?"

Many good points have been made thus far about why a better system would be nice. There are also some well informed opinions here about how making Yet Another Thing to solve the problem is likely to fail without a larger social component. This does not seem to me like the kind of problem that can be solved purely with software. Its just as much a culture and time problem. Tool creators/maintainers need to want to advertise. Tool users need to want to find new/different solutions for their workflows. Everyone needs to want to keep the information up to date.

I recently listed and researched GLAM-, Commons- and Wikidata-oriented tools in the context of SDC General (see T180197: [Epic] Support needed changes to volunteer tools for Wikimedia Commons and Wikidata that will benefit from operating with structured data on Commons). FWIW, I also categorized these myself in order to be able to group them better by functionality and their place in general workflows on Commons and Wikidata. Categories I outlined are:

  • get source media / metadata
  • source data cleaning
  • matching with Wikidata
  • media upload
  • data upload
  • "enhance - categorization"
  • admin / moderation
  • curation / organization
  • bulk / quick editing
  • generate attribution
  • statistics (Commons)
  • statistics (Wikidata)
  • reuse / visualization
  • search

You can explore the entire tool spreadsheet here; might be helpful. https://docs.google.com/spreadsheets/d/1GVR0jghBWuAGqJaT7KVXigMYWWNzdnrnwI9nWqfJrCo/edit#gid=0

You can explore the entire tool spreadsheet here; might be helpful. https://docs.google.com/spreadsheets/d/1GVR0jghBWuAGqJaT7KVXigMYWWNzdnrnwI9nWqfJrCo/edit#gid=0

@SandraF_WMF , the spreadsheet is not "shared". I'm assuming you meant to turn on public visibility for that spreadsheet.

You can explore the entire tool spreadsheet here; might be helpful. https://docs.google.com/spreadsheets/d/1GVR0jghBWuAGqJaT7KVXigMYWWNzdnrnwI9nWqfJrCo/edit#gid=0

@SandraF_WMF , the spreadsheet is not "shared". I'm assuming you meant to turn on public visibility for that spreadsheet.

Thanks for the heads up! I have made the spreadsheet publicly accessible now.

chasemp added a subscriber: chasemp.Jan 2 2018, 3:31 PM

Thanks for the heads up! I have made the spreadsheet publicly accessible now.

Really nice, thanks for sharing.

@Abit You may want to share the Sheet you created a few years back!

Abit added a comment.Jan 3 2018, 12:43 AM

You may want to share the Sheet you created a few years back!

How have I not shared that here yet? This list is incomplete and mostly out of date, but I was interested in tools that program organizers used to manage, track, and measure their programs. It is here: https://docs.google.com/spreadsheets/d/1iUvZZStf8k6RYdJYl2DLIxRdtCgWSYzEay5cCQg8MyE/edit?usp=sharing

Harej added a subscriber: Harej.Jan 4 2018, 11:51 PM

Ha, came here to add James :)

D3r1ck01 added a subscriber: D3r1ck01.
D3r1ck01 updated the task description. (Show Details)
D3r1ck01 edited projects, added Cloud-Services; removed Tools.

How the heck did cloud service remove itself by me trying to improve on the text in the task? :(. Adding it back.

Hmmm... This is weired, now the "Tools" project tag has been removed :(, why is this happening? Is it that "Tools" tag and "Cloud-Services" tag can't be on the same ticket? I didn't deliberately remove these tags, I just tried editing the task to improve on it then they get removed on their own? :(

<threadjack>

How the heck did cloud service remove itself by me trying to improve on the text in the task? :(. Adding it back.

This is a Phabricator "feature" that is not obvious at all, but makes some sense once it is explained. The Cloud-Services project is an umbrella project with things like Toolforge, Cloud-VPS, and Tools as sub projects. This nesting can go down multiple additional levels (e.g. Cloud-VPS (Quota-requests) is a child of the Cloud-VPS project). When a child project is on a task it shows up in the search results (and workboards if the child project is also a milestone project) for all of the parent and grandparent projects as well. Phabricator only shows the most deeply nested child project on the task itself.
</threadjack>

Harej claimed this task.Feb 3 2018, 3:23 AM
Harej edited projects, added Toolhub; removed Cloud-Services.Feb 3 2018, 3:42 AM
Harej raised the priority of this task from Low to Medium.Feb 7 2018, 3:18 AM
Harej added a comment.May 19 2018, 8:42 AM

Hey everyone, there's a page on Meta about Toolhub, https://meta.wikimedia.org/wiki/Toolhub

Of note, we have published a data model here: https://meta.wikimedia.org/wiki/Toolhub/Data_model. The data model is the list of different ways to describe each tool. Can you think of more ways tools can be described? What pieces of information help let you know that you've found the tool you're looking for?

Hey everyone, there's a page on Meta about Toolhub, https://meta.wikimedia.org/wiki/Toolhub
Of note, we have published a data model here: https://meta.wikimedia.org/wiki/Toolhub/Data_model. The data model is the list of different ways to describe each tool. Can you think of more ways tools can be described? What pieces of information help let you know that you've found the tool you're looking for?

Do you have a "Collect feedback around the data model page" kind of task? That'd be easier to point to people, add to Tech News, etc. Thanks.

T186382 I think would be the most relevant task.

Harej moved this task from Backlog to Radar on the Toolhub board.Jun 20 2018, 3:49 AM
Magol added a subscriber: Magol.Dec 4 2018, 5:43 PM
Quiddity updated the task description. (Show Details)Jan 16 2019, 8:39 AM
Dvorapa added a comment.EditedMar 22 2019, 9:05 PM

What's up? (@Harej)

What's up? (@Harej)

This project stalled out due to lack of software engineers to work on implementation. The Cloud Services team asked for Software Engineers in both the fiscal year 2017-2018 and 2018-2019 Wikimedia Foundation annual planning cycles, but did not win the "requisition number lottery" either time. I will be asking again in the 2019-2020 annual planning process for new staff to help build projects like this one and to take care of existing Cloud Services software projects that also have no assigned staff like Quarry and PAWS. Maybe the 3rd time I will find a way to be more persuasive. :)

I see, I wish you luck then :)

Harej moved this task from Inbox to Paused on the User-Harej board.May 1 2019, 5:20 PM
Quiddity removed Harej as the assignee of this task.Oct 13 2019, 9:17 PM
Quiddity removed a subscriber: MelodyKramer.