Page MenuHomePhabricator

Clean up bad wikitext caused by T314836
Open, MediumPublic

Description

E.g.

[[Αρχείο:Tselina.png|σύνδεσμος=//upload.wikimedia.org/wikipedia/commons/thumb/3/31/Tselina.png/220px-Tselina.png|μικρογραφία| Σελινόριζα που για φαγητό]]

should be changed to

[[Αρχείο:Tselina.png|μικρογραφία| Σελινόριζα που για φαγητό]]

Query to find such pages: https://global-search.toolforge.org/?q=%22%3D%2F%2Fupload.wikimedia.org%22&namespaces=0&title=

Summarizing investigation so far:

  • Per the query, thousands of pages across many (hundreds?) of languages are affected.
  • The replacement is pretty simple, removing link=XXX from image syntax.
    • This is complicated by localized namespaces and link attributes.
    • There is a risk for false positive matches, but that can be reduced by filtering down the results to only pages created by CX
  • Due to amount of pages affected, automation is needed

Possible solutions

  • Maintenance script
    • Language-team has experience in scripts, could use a common maintenance account
  • Bot
    • E.g. pywikibot
    • Language-team has less experience in bots
    • Don't have pre-existing bot account --> Would need to create a new one and request global edit/bot rights
      • Community process is at https://meta.wikimedia.org/wiki/Steward_requests/Bot_status
        • Is community process appropriate for WMF staff? It would be good for community review
        • Requesting access takes at least two weeks
        • Following this policy, there are some Wikipedias where we could not run the clean-up per policy
      • WMF process?
    • Are there any existing bot operators who could incorporate this clean up if requested?

So far I haven't found any good precedents to guide us choosing a solution.