Page MenuHomePhabricator

Add configurable template to imported file info
Closed, ResolvedPublic2 Estimated Story Points

Description

In T232481: Investigate options to mark file imports with their source wiki, make searchable we discovered that searching for recent imports from a given source wiki is a difficult task. Let's solve by providing a means to mark imports with a category specific to the source wiki, for example "Imported from English Wikipedia". This category assignment should be highly customizable by the community.

Acceptance criteria:

  • Add a custom message which will be substituted into imported file info. This message should default to empty.
  • The message will take two parameters, the full URL of the source file and the import timestamp.
  • Must play well with messages that {{subst:...}}
  • Update documentation to explain how this message can be used, with example message and module source.

Open questions:

  • Would the community prefer natural-language site names as part of the category, like "English Wikipedia"? If so, is there already a module providing the mapping from fully-qualified hostname to natural-language display name? We might offer to write this module if it doesn't exist. There's already Module:Interwiki and tools based on it.
  • We'll probably include a default message with something like the text which is currently hardcoded: "<!--This file was moved here using FileImporter from $1-->\n". This message is always used in the content language, so for Wikimedia Commons wouldn't require translation. Should we add a note discouraging translation? Or is it useful to hypothetical, 3rd-party wiki families using this extension with a different content language? Should we bundle an empty message for this reason?
  • If file info text exists or has been customized, and if no template fixups will be made (TODO: this is strange), we skip annotating with our message. This seems wrong. Should we prepend the message on entering the import workflow, then let the user edit file info even including removing the message?
  • If we include the import annotation in the file info preview, and it transcludes automatic categories, then it will break our "no categories" detection.

Event Timeline

Lena_WMDE updated the task description. (Show Details)Jun 24 2020, 8:25 AM
Lena_WMDE set the point value for this task to 5.

I would like to explore our possibilities here a bit.

We have a $wgFileImporterTextForPostImportRevision setting already that contains the <!--This file was moved here using FileImporter from $1-->\n” comment we add to the top of all file description pages during the import (example). We could just add something like …\n{{subst:Imported with FileImporter|$2}}\n to the end of this and be pretty much done.

Well, except that this template should be created. We could do that, but leave it empty and let the community decide what should be in there.

Not optimal. Instead, these are my suggestions:

  • Categories pretty much always go to the end of a page. We should respect this convention. This means we should not re-use the existing setting.
  • However, both the comment and the category should be added the same time (currently done in ImportPlan::addImportComment).
  • I suggest to not introduce just another setting, but use a message that is empty by default. The main advantage is that the Commons admins can update this much easier. The message would contain something similar to what I showed above. But we don't decide on this.
  • What we need to decide is which parameters we provide to the message:
    • The full URL of the source file should be one of the parameters.
    • For convenience, it might be nice to provide the hostname as a separate parameter (e.g. en.wikipedia.org). But this can as well be extracted from the full URL with a bit of Lua code.
    • We could, in theory, provide the full URL of the config page that was used during the import. Would this be helpful?
    • We could provide the user name of the user doing the import. But I'm not sure if this is a good idea, or even needed.
    • Don't need the date, as the template can just use the current date.
  • We need to make sure the process works as expected when {{subst:…}} is used.
  • The message should not be localized. We should mark it accordingly, and make sure our code never tries to use a localization – only the canonical English one.

Don't forget we need to document and announce it.

What do you think?

We have a $wgFileImporterTextForPostImportRevision setting already that contains the <!--This file was moved here using FileImporter from $1-->\n” comment we add to the top of all file description pages during the import

This will be redundant if we introduce a template, so I recommend we delete the config and supporting code (or repurpose for our new message).

{{subst:Imported with FileImporter|$2}}\n` to the end of this and be pretty much done.

I'm ignorant about the reasons for using subst here. Reading through the manual for the Nth time, I still don't see any reason we would want the immediate substitution rather than transclusion. Transclusion on the other hand will allow future flexibility, improvements to categorization and so on.

Well, except that this template should be created. We could do that, but leave it empty and let the community decide what should be in there.

This part seems tricky. My instinct would be to provide a rough pass Lua implementation of our template, and ask the Commons community to suggest improvements or approve.

  • Categories pretty much always go to the end of a page. We should respect this convention.

I don't think this is a consideration when using transclusion. Templates which add categories can be transcluded anywhere in the text.

  • I suggest to not introduce just another setting, but use a message that is empty by default. The main advantage is that the Commons admins can update this much easier. The message would contain something similar to what I showed above. But we don't decide on this.

Interesting! I was tending towards providing a configuration setting where we supply a bare template name. In any case, there's no natural place to document the parameter contract, with any of the solutions we've explored so far.

  • But this can as well be extracted from the full URL with a bit of Lua code.

+1 from me.

This will be redundant […]

How? If we still want the comment on the top, but the category at the bottom, we need two different things. I don't think this ticket here allows us to remove the comment.

I'm ignorant about the reasons for using subst here.

We are not going to decide on this. But I would love to give the community the possibility to use subst: in this context, if they want.

My instinct would be to provide a rough pass Lua implementation […]

Sure, that's probably even better. The next question is where this Lua should love? In our codebase or on Commons?

I don't think this is a consideration when using transclusion.

True. But it is one when using subst:. I bring up subst: because pretty much all maintenance tools relevant for the workflow we are discussing here (bots, the hot-cat gadget) can not do anything with a "magic" category that is hidden in a template. These tools expect the category to be visible in the wikitext of the file description page.

there's no natural place to document the parameter contract […]

That's one of the reasons why I suggest a message. It will have $1, $2, … parameters, and these will be documented in qqq.json.

This will be redundant […]

How? If we still want the comment on the top, but the category at the bottom, we need two different things. I don't think this ticket here allows us to remove the comment.

Yes, if we still want both we'll need two different things. My point is just that the comment is much less useful, to the point that I consider it redundant. The only reason to keep IMO would be if any scripts depended on that text being present, but since we also use change tags these hypothetical legacy scripts should be fixed anyway.

We are not going to decide on this. But I would love to give the community the possibility to use subst: in this context, if they want.

+1

My instinct would be to provide a rough pass Lua implementation […]

Sure, that's probably even better. The next question is where this Lua should love? In our codebase or on Commons?

+1, it's too bad that there's no standard for bundling Lua and Template pages with an extension, like can be done with messages. It seems this lua will have to live on Commons.

awight added a comment.Jul 2 2020, 9:42 AM

I bring up subst: because pretty much all maintenance tools relevant for the workflow we are discussing here (bots, the hot-cat gadget) can not do anything with a "magic" category that is hidden in a template. These tools expect the category to be visible in the wikitext of the file description page.

Is it possible this limitation has already been overcome? Reading the HotCat source, it makes an API request to query-categories, which reads from the categorylinks table. These are the categories on the final document after transclusion.

Not to challenge the point that we should be compatible with "subst". We should be, for example subst can be used to produce both a static part such as a timestamp, and a dynamic part that is still transcluded.

But I believe that the context isn't so dire, we can provide a message which may subst or transclude a template that includes calculated categories, and many tools will be able to operate on those resulting categories.

That's one of the reasons why I suggest a message. It will have $1, $2, … parameters, and these will be documented in qqq.json.

Thank you for this brilliant suggestion! I'll update the task description with a summary of what we've settled on so far, and the remaining open questions.

awight updated the task description. (Show Details)Jul 2 2020, 11:48 AM
awight updated the task description. (Show Details)Jul 2 2020, 11:58 AM

Change 609157 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] [WIP] Experiment with post-import revision annotation

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FileImporter/ /609157

awight updated the task description. (Show Details)Jul 3 2020, 8:57 AM
awight updated the task description. (Show Details)Jul 3 2020, 9:27 AM

From today's discussion:

  • We decided to use a message.
  • Not discussed: What should be the name of this message? All we know is that it must start with fileimporter-….
  • It's empty by default. Which means the FileImporter code will not contain anything about per-source categories – not even an "imported from" template name! It's up to the Commons community to create a template (or reuse an existing one), and add it to this message.
  • The name of the message can be hard-coded.
  • Mark the message with {{notranslate}} in qqq.json.
  • Do not touch the existing $wgFileImporterTextForPostImportRevision setting and the <!--This file was moved here using FileImporter from $1--> comment for now. We can merge this later into the new message, if we want.
  • Make it so that the content of this message is placed directly after the comment, both at the top of the file description.
  • Make sure the message is used either in English (hard-coded), or the target wiki's content language. Never in the users interface language.
  • The message should ideally be inserted the same time the comment is inserted, before the import form is shown (which means the user can edit both), if that's possible.
  • Needs investigation: We have a service that's called CategoryExtractor. It uses the current wikitext and parses it to see if it contains categories, or not. The wikitext passed to this service needs to exclude the extra wikitext snippet from the message! A super trivial solution might be to search for the string and replace it with an empty string. This way the user is free to edit the snippet – but as long as it's not edited it's excluded.
  • I suggest to pass the domain and the full source URL as parameters to the message, in this order.

Change 609582 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Move join glue to code

https://gerrit.wikimedia.org/r/609582

This might just be my ignorance about subst, but I think I've written an impossible acceptance criterion:

Must play well with messages that {{subst:...}}

There's no such thing as a message that include subst because this would have been evaluated at the time the message is saved. The desired effect, transcluding a template immediately, might be impossible to control using the custom message scheme as we've designed it so far. It would be possible to evaluate the message with Message#text rather than Message#plain, which expands templates, but I don't think this is a good idea to use unconditionally. My only idea so far is to ask the community what they want to accomplish with "subst". If it's to include something calculated from a timestamp, for example an "Imported in June 2020" category, then maybe we should provide the import timestamp as a message parameter (I'll do that preemptively).

I've adjusted the acceptance criteria to reflect changes in my understanding.

awight updated the task description. (Show Details)Jul 6 2020, 11:23 AM
Lena_WMDE changed the point value for this task from 5 to 2.

Change 610794 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] [WIP] Message adding a category shouldn't break empty category detection

https://gerrit.wikimedia.org/r/610794

Change 609582 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Move join glue to code

https://gerrit.wikimedia.org/r/609582

Change 611267 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] Tests for post-import text customizable with message

https://gerrit.wikimedia.org/r/611267

Change 609157 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Post-import text customizable with a message

https://gerrit.wikimedia.org/r/609157

Change 611267 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Tests for post-import text customizable with message

https://gerrit.wikimedia.org/r/611267

Change 611306 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] Test that inContentLanguage() is called for post-import customizable message

https://gerrit.wikimedia.org/r/611306

Change 611306 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Test that inContentLanguage() is called for post-import customizable message

https://gerrit.wikimedia.org/r/611306

There's no such thing as a message that include subst because this would have been evaluated at the time the message is saved.

It is possible to delay the effect of subst: for one safe operation: https://meta.wikimedia.org/wiki/Help:Recursive_conversion_of_wikitext#Delaying_substitution_with_Template:subst. I just tested it locally and it works as I, as a user, would expect it to work. I can create the message fileimporter-post-import-revision-annotation as described on the help page. During the import, I can see the {{subst:…}} in the wikitext, and can edit it if I want. When actually doing the import, the substitution is executed, and the subst: disappears.

In other words: We can tick off this acceptance criteria as well.

thiemowmde updated the task description. (Show Details)Jul 20 2020, 10:51 AM
Lena_WMDE closed this task as Resolved.Jul 23 2020, 10:15 AM
Lena_WMDE claimed this task.