Page MenuHomePhabricator

Investigate options to mark file imports with their source wiki, make searchable
Closed, ResolvedPublic5 Estimated Story Points

Description

Make it possible to mop up file imports, by adding some type of marker to imported files. This should be joinable with other maintenance criteria (e.g. copyright templates), and specific to each source wiki.

Possible ideas:

Event Timeline

awight renamed this task from Mark file imports with their source wiki to Mark file imports with their source wiki, make searchable.Sep 10 2019, 9:57 PM
awight updated the task description. (Show Details)

We should rule out tags, because they aren't currently created dynamically. There are only a handful of tags, we would be diluting that system if adding a new tag for each source wiki.

Lena_WMDE renamed this task from Mark file imports with their source wiki, make searchable to Investigate options to mark file imports with their source wiki, make searchable.May 13 2020, 11:45 AM
Lena_WMDE set the point value for this task to 5.

Investigation:

  • Understanding the workflow of users
  • Understanding what tools they use to search
  • Understanding the opportunities/limitations of the options listed

General considerations:

Tags

When looking at Special:Tags, you realize the number of tags is limited. It might be a bad idea to automatically create possibly hundreds of tags for each source wiki.

It might be possible to hide auto-generated tags, while still having them in the database. But this would greatly reduce their usefulness as a filter in tools.

Templates

It's possible to search for pages that contain a template, e.g. via hastemplate: or Special:WhatLinksHere. However, as of now there is only 1 template. Filtering per template parameter is possible, but hard to use, and hard or even impossible to combine with other tools.

Exactly the same is already possible without a template, utilizing the HTML comment FileImporter adds.

Introducing individual templates per source makes this significantly easier, but still can't be easily combined with most tools.

Note a template typically needs to be placed at a specific position on the page. Implementing this in a way it can be configured is hard.

Categories

Note the software allows to add red categories that don't exist, and still use them as a filter. This might be a workaround for the current, inconsistent naming.

Unfortunately categories have pretty much the same issues as templates, and can't be easily combined with other tools.

However, in contrast to templates it doesn't matter where categories are placed in the wikitext.

What review tools are used?
  • Special:RecentChanges
    • Is able to list "upload log" entries as well.
    • Allows to filter by File: namespace.
    • Allows to filter by user experience (e.g. "newbies") as well as "IP users" (anonymous users).
    • Allows to filter by tag.
    • It's already possible to filter by the existing "fileimporter" tag.
    • The source wiki is visible in the comment, but there is no filter for this.
    • No way to search for templates or categories.
  • Special:RecentChangesLinked
    • Same as above, but allows to filter by template or category.
    • Allows to filter by non-existing (red) categories and templates.
    • Does not allow to filter by template parameter.
  • Special:Log
    • Allows to show only the "import log", specifically "transwiki imports", which is exactly what FileImporter does.
    • Filter by tag.
    • As above, the comment mentioning the source wiki is visible, but can't be used as a filter.
  • Special:NewFiles
    • Typically only used to review uploads by a specific user.
    • No tags, no templates, no categories.
  • Special:NewPages
    • Filter by namespace and tag. Nothing else we can use.
  • Special:Contributions
    • Requires a user name.
    • Filter by namespace and tag.
  • Special:ListFiles
    • No filter we can use.
  • Special:WhatLinksHere, Special:UnusedFiles, Special:UncategorizedFiles, …
    • Not useful.
  • Special:AllPages, Special:PrefixIndex, …
    • Filter by namespace. Otherwise not useful.
  • https://tools.wmflabs.org/newbie-uploads/
    • Provides a few filters that look like they are based on categories. But very limited.
    • No tags, no templates, no custom categories.
  • Other
    • Tools have easy access to tags, categories, and the fact if a page contains a template.
    • Tools typically don't have easy access to template parameters. This requires parsing the wikitext.

Conclusion so far

  • The absolute best support is for tags. Having an "Imported from de.wikipedia" tag would be amazing. Unfortunately it looks like we can't do this.
  • The next best solution seems to be to rely on Special:RecentChangesLinked, add a category following a strict naming scheme, and don't care if that category doesn't exist. Why?
Possible to-dos
  • Ask the Commons community if the list above misses relevant review tools.
  • Interview devs recently working on the tag schema (see T185355), e.g. @Ladsgroup.
  • Check if it's even possibly to dynamically create tags.
  • Ask if the Commons community is willing to settle on a category naming scheme, or introduce individual templates per source wiki.

I still wonder if a solution like using CirrusSearch with insource: is not already sufficient.

Maybe: A search like https://commons.wikimedia.org/wiki/Special:Search?sort=last_edit_desc&search=insource:%22FileImporter+from+//en.wikipedia.org%22&ns6=1 is certainly a nice additional tool. But I don't think it can solve all use cases:

  • It can't be used on special pages. Even if Special:Search can be tuned to replicate the result of some special pages, I don't think it can replace all of them.
  • Sorting search results by "creation" will sort most imported files at the very end, because it uses the original creation date.
  • Sorting by "edited" doesn't necessarily give the most recent imports, but all file description pages that have been edited recently.
  • The comment is not meant to be removed after an import was reviewed.
  • The comment will accidentally be copy-pasted to files that are not imported. This already happened (see my example search above).
  • The comment is not visible on the file description page.

Just for documentation. We want to avoid using tags because at the moment it seems not clear if they are meant to be used in large scale to support possibly hundreds of combinations in the form of imported from {en|de|ar|..}.wikipedia.org. So the most feasible options remaining are the usage of the search with the insource keyword or adding categories. The latter by probably using the format described.

I'm pinged here to give an opinion on the tags (I guess, given my work on the tags backend). In the backend tags are okay-ishly robust, specially regarding with potentially hundreds of different tags (AbuseFilter makes similar types of tags) and as long as you don't put five tag for all edits in commons, we should be fine*. Also I'm not sure if it actually ends up having tags for each of 900 wiki. Lots of wikis have local upload disabled and lots of them are too small to be be able to contribute to commons. But Special:Tags isn't built for this UX-wise. It doesn't have continuation (it just shows all tags) or grouping (based of prefix for example), filtering (in or out) or any basic user functionality in that regard. So my suggestion is to either improve UX of Special:Tags or let the commons community know and ask them how they feel about it.

  • Database-wise: Commons is pretty stressed in matter of storage but change_tag table is pretty small. *links tables and revision table is causing issues atm.

HTH

This task was written as an investigation, but I think we started transitioning into making a decision about how to implement searching by source wiki. I'm going to move this to "closed" because we've completed the technical investigation. As I understand it, we're now waiting for on-wiki communication and feedback from specific users to learn whether the workarounds are acceptable. The next steps are to reevaluate based on this feedback, and possibly do specific implementation.

awight claimed this task.