Page MenuHomePhabricator

Remove GWToolset extension from Wikimedia Commons
Closed, ResolvedPublic

Description

We need to accept failure and remove this technical debt from Commons. It's barely used and we've been advising GLAM partners to not use it at all. See also the big alert at https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset

The current status of the tool:

  • Never got past initial prototype
  • Full of unresolved bugs
  • No team at the WMF that has active ownership

Normally I would like to have some form of replacement before removing something. In this case we have Pattypan.

Note: I was on the steering group for the development so I'm killing my own baby here.

https://www.mediawiki.org/wiki/Extension:GWToolset

For establishing community consensus: https://commons.wikimedia.org/wiki/Commons_talk:GLAMwiki_Toolset#Removing_this_tool_from_Commons

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Totally on board with killing this poor sickly baby.

That said, I do sincerely hope that the Wikimedia Foundation or another movement entity will, as soon as possible, seriously support a proper batch upload tool for Wikimedia Commons. Batch GLAM uploads are a primary source of high-quality educational media on Commons, usually accompanied by equally high-quality data (which is currently also more and more often linked/structured). A replacement would definitely need to support StructuredDataOnCommons.

Linking several kinda related tickets for reference:

https://www.mediawiki.org/wiki/Developers/Maintainers lists Structured Data as code stewards.
For potential sunsetting, please see https://www.mediawiki.org/wiki/Code_stewardship_reviews as a process, and what is needed.

Also, given the data in T270911#6713949 , do we know who (still) promotes this tool to its partners, and should there be a heads-up about this task?

I would suggest actively trimming down/emptying that group pro-actively. Especially for those that haven't used it in 2020 (I can definitely be removed for example)

Only 20 have log entries

ETH-Bibliothek has successfully migrated to Pattypan.
I've just informed the Swiss National Library about the plans to decommission the GWT.
And I've informed also the PdProject, but they probably have migrated already.

So much for the Swiss institutions, which have been on board since the inception of the GWT.

Similarly "I was on the steering group for the development", and I'm probably still the largest user of the tool.

There's no consensus to remove and Pattypan is not a functional replacement.

Similarly "I was on the steering group for the development", and I'm probably still the largest user of the tool.

That really doesn't seem to be the case, at least, not on your personal account. As far as the log show, you haven't used it since 2016, never mind in the last 2 years.

Then for users that have used it in the last two years...

MariaDB [commonswiki]> select distinct actor_name from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 AND log_timestamp > 20190101000000;
+----------------------------+
| actor_name                 |
+----------------------------+
| Pharos                     |
| Ndalyrose                  |
| ETH-Bibliothek             |
| Christian Ferrer           |
| OlafJanssen                |
| Beeld en Geluid Collecties |
| Swiss National Library     |
| Namrood                    |
+----------------------------+
8 rows in set (0.70 sec)

And then looking more in absolute numbers...

MariaDB [commonswiki]> select actor_name, count(ct_id) as total from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 GROUP BY log_actor ASC ORDER BY total desc;
+----------------------------------+--------+
| actor_name                       | total  |
+----------------------------------+--------+
| Pharos                           | 368523 |
| Christian Ferrer                 |  60770 |
| ETH-Bibliothek                   |  52935 |
| Ebastia1                         |  26523 |
| Swiss National Library           |   8375 |
| OlafJanssen                      |   8159 |
| MartinPoulter                    |   6387 |
| Hansmuller                       |   3853 |
| Beeld en Geluid Collecties       |   3635 |
| Jason.nlw                        |   3366 |
| Archives cantonales jurassiennes |   3087 |
|                                |   2208 |
| Ndalyrose                        |   1718 |
| TeklaLilith                      |    789 |
| Pdproject                        |    268 |
| Mmason23                         |    247 |
| 85jesse                          |    115 |
| Namrood                          |     51 |
| Timmietovenaar                   |     34 |
| Steinsplitter                    |      2 |
+----------------------------------+--------+
20 rows in set (0.75 sec)

And for clarity

MariaDB [commonswiki]> select * from change_tag_def where ctd_id = 71;
+--------+-----------+------------------+-----------+
| ctd_id | ctd_name  | ctd_user_defined | ctd_count |
+--------+-----------+------------------+-----------+
|     71 | gwtoolset |                0 |    551045 |
+--------+-----------+------------------+-----------+
1 row in set (0.01 sec)

adding David Haskiya - the Project Manager for the GWT at Europeana (now working at WM-Sweden).

Sorry you spent time on this analysis.

It's wrong because you are relying on the log which was only introduced a fairly long time after the tool was active, I believe based on my memory. For example, in 2014 the HABS uploads were over 300,000 photographs, and that was just one of the projects I used GWT on.

That really doesn't seem to be the case, at least, not on your personal account. As far as the log show, you haven't used it since 2016, never mind in the last 2 years.

It doesn't stop it being right for the time it's been active; ie since the tagging was added.

Which is part of the point here. It doesn't matter how much it was used 5+ years ago, it's about how much it's being used currently.

Uploads you did in 2014 don't make the extension used massively recently, does it?

"probably still the largest user of the tool" is accurate, that's what I wrote, not "used massively recently".

Thanks for your interest. I don't think there's any point in analysing this further, as there's nothing being proved.

But you're not actively using the extension.

Just because you were the most active user many years ago doesn't make your input any more important than anyone elses.

I did not say that I had used GWT "massively recently".

I did not say that my input is "more important than anyone elses".

Is there any point to this? I had made my point, there's no need to put words in my mouth I have not written here. Nor is it helpful to paint a picture that I'm weirdly puffing myself up like a complete idiot in a Phabricator discussion in order to feel important. Who would care?

Fact: I have been a significant user of this tool,.
Fact: I know about GWT.
Fact: I know how much money was invested in it as I was part of establishing it and the negotiations from the outset.
Fact: I believe in the Wikimedia Commons project.

I no longer understand what point you are making, or why you appear determined to marginalize my point of view.

Thanks for your critical observations about me. I am not seeking election, money, or glory. I win nothing by participating, in fact, it costs me my volunteer time and mental space that probably should be better spent rationalizing the news that one of my elderly relatives died of Covid19 in hospital yesterday. So how about dropping the stick you seem determined to beat me with, we all have other things to worry about.

If you want to disable GWT, do as you please, but you don't do it with my support because nothing here has convinced me that there is a process for replacing it with something easier or better for the same job.

Fact: I have been a significant user of this tool,.

"Significant" is a subjective term and thus can't be used to constitute a "fact"

Fact: I believe in the Wikimedia Commons project.

"Believe" is a vague and incomplete term and also can't be used to constitute a "fact"

"probably still the largest user of the tool" is accurate

If so, I suggest that you file a separate tasks with the requirements for you to not be a "large user", or a user at all, ever again. You could then propose it as blocker of this task, if needed.

The largest recent user by far was Pharos, I suppose for the MET uploads, so it would be useful to hear from him. I've left a message. From what I've seen, it should be easy to help him switch to a basic pywikibot upload script or similar, if still needed.

@Fae: Hi, it would be welcome to follow https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette and not assume that folks use a "stick" but that folks try to understand what software-related points there are, apart from "I used GWToolset a lot; I have not used it recently; an alternative lacks some functionality but I don't tell what functionality" (but you mentioned one functionality aspect now on Commons; thanks). That's why folks try to reach out to currently and recently active GWToolset users to better understand the consequences of a removal. (And regarding past development costs, they are rather irrelevant.)

Hi, I did not managed to find a suitable alternative, so I continue to use GWToolset at an average of 500 files by month. In the way I work I always have XML files ready to be uploaded and others in preparation, so if either a date is fixed one day for the end of the tool, I would be happy to be aware a few days before for that I can upload all that I have in my PC ready to be uploaded, in the purpose that I did not work for nothing. Thanks you, regards.

@ChristianFerrer could you exemplify what functionality other tooling is missing for your particular use-case? Sometimes it's a matter of lacking documentation, for example, Pattypan can upload files from URLs but few people know about that capability.

@Abbe98 yes I know Pattypan can upload by URL but I did not manage to do it, I have learnt to use Pattypan for files stored in my PC but despite several attempts I never managed to uploaded files by URL, and indeed there is no documentation.

The flow is also a bit of concern, e.g. it took me 12min to upload the 75 files available at https://commons.wikimedia.org/wiki/Category:Media_from_Gerken_2018_-_10.11646/zootaxa.4428.1.1, and this morning I uploaded a small batch of near 150 files within 9 min, the first file being https://commons.wikimedia.org/wiki/File:Sperchon_fuxiensis_(10.3897-zookeys.707.13493)_Figures_1%E2%80%933.jpg and the last being https://commons.wikimedia.org/wiki/File:Callosa_baiseensis_(10.3897-zookeys.703.13641)_Figure_7.jpg. Which is roughly half the time for double the files, which is 4 times faster.

That is not a serious issue for 100 or 200 files, but for for e.g. 15000 files (https://commons.wikimedia.org/wiki/Category:Mollusca_in_the_MNHN), that is not the same thing.

So honestly even if the maintenance of the tool is abandoned so that it does not cost any more money to maintain it, I would be rather happy to be able to use it as long as it works. But clearly, yes, I would also be happy to know how to upload by URL with Pattypan. Especially if the fate of the GWtoolset is sealed.

And also is Pattypan able to avoid uploading duplicates ?

Change 676911 had a related patch set uploaded (by Amire80; author: Amire80):

[translatewiki@master] Move GWToolset to Wikimedia Legacy

https://gerrit.wikimedia.org/r/676911

Change 676911 merged by jenkins-bot:

[translatewiki@master] Move GWToolset to Wikimedia Legacy

https://gerrit.wikimedia.org/r/676911

For everyone's info, currently no Code-Stewardship-Reviews are taking place as there is no clear path forward and as this is not prioritized work.
(Entirely personal opinion: I also assume lack of decision authority due to WMF not having a CTO currently. However, discussing this is off-topic for this task.)

well - other than GWtoolset running remotely, and pattypan requiring image downloads and spreadsheet wrangling.
interesting christmas present to commons, to turn off a tool that was developed by WMUK and unsupported for years, in favor of a java script tool supported by a single volunteer. perhaps you would care to consider a general tool management process?

At the risk of getting off-topic. This is not surprising at all. GWToolset (With all respect to the people who developed it, i know you tried hard in a difficult situation and were generally set up for failure) is a design by committee tool where the committee did not adequately understand the requirements. A single developer working alone and understanding what needs to be done, will beat that every time, hands down. It is likely that the main reason this got deployed at all was due to internal politics, sunk cost fallacy and bad optics of WMF causing WM-UK/europena spending a lot of money for nothing. If this was an extension written for "free" by a volunteer, there is a 0% chance it would have been deployed in the first place.

Not saying this to be mean - I think everyone involved really did try there best, and communication failures, especially on the WMF side, set things up for failure. However, i think a lot of interesting lessons could be learned from GWToolset if anyone ever wants to try again with something similar. Which would be cool. So far it seems like WM-DE (unless i am misinformed) is really the only chapter that has successfully collaborated on mediawiki development. I think Wikimedia would be a lot healthier if more groups participated on the technical side.

Not saying this to be mean - I think everyone involved really did try there best, and communication failures, especially on the WMF side, set things up for failure. However, i think a lot of interesting lessons could be learned from GWToolset if anyone ever wants to try again with something similar. Which would be cool. So far it seems like WM-DE (unless i am misinformed) is really the only chapter that has successfully collaborated on mediawiki development. I think Wikimedia would be a lot healthier if more groups participated on the technical side.

The result wasn't really what we expected and hoped for. This was learning money for Wikimedia. The is no reason to keep this tool around. Having a broken tool instead of none at all is worse because it lowers the incentive to get a new tool. Accept failure, remove the tool, move on.

At the moment the Commons extension for OpenRefine is in development, see https://github.com/OpenRefine/CommonsExtension . The first file was uploaded today. This should become a very good alternative.

Though not perfect, the tool don't seems to be broken to me, I still use it:
https://commons.wikimedia.org/w/index.php?hidebots=1&translations=filter&hidecategorization=1&hideWikibase=1&tagfilter=gwtoolset&limit=1000&days=30&enhanced=1&title=Special:RecentChanges&urlversion=2
The last files uploaded this morning.
As long as I can use it, and as long OpenRefine is not really used for batch-uploading, I will be happy the tool stay available even if not maintened.

Since a few days I have not been able to use the GWToolset. The circle is complete. Bye.

If you give an actionable bug report, maybe someone could help fix it….

@Reedy thank you, it's nice but since the tool is destined to be stopped, so no one wastes time. That I stop batchuploading is not especially a big issue, I have a ton of things to do.
For the record, and if I'm not wrong, this file has been the last one uploaded with the GWToolset.

@ChristianFerrer Would you be interested in an introduction to OpenRefine (perhaps together with a few other former GWtoolset users)? I'm very interested to figure out together with you how OpenRefine can support the kind of uploads you want to do.

Batch uploading is now possible (although it is still with experimental code that hasn't been merged yet, but since you are a GWtoolset user, I think you will be able to do it just fine).

@Spinster No thank you I am starting a period of increased professional activity, I cannot invest in this subject for the moment. When in a few months my schedule allows it, and if I'm still interested in mass upload I'll take a look at this tool, and contact you if needed. Thanks you very much.

Hi. It has been more than two years since this task was created. The log doesn't work (T326177), so I cannot easily know how much is GWToolset used now. Last time it did work, the only person who was using it was @ChristianFerrer. As far as I can see, he's uploading many more files now, which is great. I might be wrong, but I guess he's not using GWToolset for that.

Is it time to undeploy it? :)

@Amire80, indeed I don't use it anymore, a few month ago it stopped to work for me and I was never able to use it anymore.

Change 921252 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Disable GWToolset from Commons

https://gerrit.wikimedia.org/r/921252

Change 921253 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Remove GWToolset configuration (1/2)

https://gerrit.wikimedia.org/r/921253

Change 921254 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Remove GWToolset configuration (2/2)

https://gerrit.wikimedia.org/r/921254

Legoktm subscribed.

We're going to move forward with removing this, given it's broken and well, long overdue. Given this is Commons-only we don't really need to use User-notice I think, a post on the village pump there should be good enough.

There haven't been any new log entries in basically a year:

MariaDB [commonswiki_p]> select * from logging where log_type="gwtoolset" order by log_timestamp desc limit 5;
+-----------+-----------+----------------------+----------------+-----------+---------------+-----------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+----------+
| log_id    | log_type  | log_action           | log_timestamp  | log_actor | log_namespace | log_title | log_comment_id | log_params                                                                                                                                                                                                                                                                                                                                                                                                                                    | log_deleted | log_page |
+-----------+-----------+----------------------+----------------+-----------+---------------+-----------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+----------+
| 324989278 | gwtoolset | mediafile-job-failed | 20220527112107 |      1026 |             6 | No_title  |              4 | a:2:{s:21:"4::metadata-record-nr";i:194;s:10:"5::message";s:361:"Other contributors: A media file with the identical title "File:Symphylella communa (10.3897-zookeys.1003.60210) Figure 5.jpg" already exists in the wiki. It was edited or created by someone other than you.
original URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486497.jpg
evaluated URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486497.jpg";} |           0 |        0 |
| 324989277 | gwtoolset | mediafile-job-failed | 20220527112107 |      1026 |             6 | No_title  |              4 | a:2:{s:21:"4::metadata-record-nr";i:195;s:10:"5::message";s:361:"Other contributors: A media file with the identical title "File:Symphylella communa (10.3897-zookeys.1003.60210) Figure 6.jpg" already exists in the wiki. It was edited or created by someone other than you.
original URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg
evaluated URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg";} |           0 |        0 |
| 324989276 | gwtoolset | mediafile-job-failed | 20220527112107 |      1026 |             6 | No_title  |              4 | a:2:{s:21:"4::metadata-record-nr";i:194;s:10:"5::message";s:361:"Other contributors: A media file with the identical title "File:Symphylella communa (10.3897-zookeys.1003.60210) Figure 5.jpg" already exists in the wiki. It was edited or created by someone other than you.
original URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486497.jpg
evaluated URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486497.jpg";} |           0 |        0 |
| 324989274 | gwtoolset | mediafile-job-failed | 20220527112107 |      1026 |             6 | No_title  |              4 | a:2:{s:21:"4::metadata-record-nr";i:195;s:10:"5::message";s:361:"Other contributors: A media file with the identical title "File:Symphylella communa (10.3897-zookeys.1003.60210) Figure 6.jpg" already exists in the wiki. It was edited or created by someone other than you.
original URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg
evaluated URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg";} |           0 |        0 |
| 324989139 | gwtoolset | mediafile-job-failed | 20220527111507 |      1026 |             6 | No_title  |              4 | a:2:{s:21:"4::metadata-record-nr";i:195;s:10:"5::message";s:361:"Other contributors: A media file with the identical title "File:Symphylella communa (10.3897-zookeys.1003.60210) Figure 6.jpg" already exists in the wiki. It was edited or created by someone other than you.
original URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg
evaluated URL: https://zookeys.pensoft.net/showimg.php?filename=oo_486498.jpg";} |           0 |        0 |
+-----------+-----------+----------------------+----------------+-----------+---------------+-----------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+----------+
5 rows in set (1 min 9.834 sec)

To do:

  • Log messages need to be copied on-wiki
  • Run namespaceDupes(??) to move pages out of the GWToolset namespace
  • Remove users from the gwtoolset group

^ all done. I ended up moving the pages "manually" with moveBatch so they could end up at a nicer title than the autogenerated one.

Change 921252 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable GWToolset from Commons

https://gerrit.wikimedia.org/r/921252

Mentioned in SAL (#wikimedia-operations) [2023-05-19T14:57:11Z] <legoktm@deploy1002> Started scap: Backport for [[gerrit:921252|Disable GWToolset from Commons (T270911)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-19T14:58:38Z] <legoktm@deploy1002> legoktm: Backport for [[gerrit:921252|Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-19T15:06:57Z] <legoktm@deploy1002> Finished scap: Backport for [[gerrit:921252|Disable GWToolset from Commons (T270911)]] (duration: 09m 46s)

Change 921380 had a related patch set uploaded (by Amire80; author: Amire80):

[translatewiki@master] Remove GW Toolset

https://gerrit.wikimedia.org/r/921380

Change 921380 merged by jenkins-bot:

[translatewiki@master] Remove GW Toolset

https://gerrit.wikimedia.org/r/921380

Legoktm claimed this task.

Gone! Will file a ticket for archival.

And so the story of the GWToolset ends. Thanks to all who helped on maintaining it and also giving it its deserved resting place. We did a farewell ceremony at the autumn Wikimedia NL hackathon last year, for those interested, here are the pictures.

Change 921253 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove GWToolset configuration (1/2)

https://gerrit.wikimedia.org/r/921253

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:07:51Z] <legoktm@deploy1002> Started scap: Backport for [[gerrit:921253|Remove GWToolset configuration (1/2) (T270911)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:09:57Z] <legoktm@deploy1002> legoktm: Backport for [[gerrit:921253|Remove GWToolset configuration (1/2) (T270911)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:17:42Z] <legoktm@deploy1002> Finished scap: Backport for [[gerrit:921253|Remove GWToolset configuration (1/2) (T270911)]] (duration: 09m 51s)

Change 921254 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove GWToolset configuration (2/2)

https://gerrit.wikimedia.org/r/921254

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:19:12Z] <legoktm@deploy1002> Started scap: Backport for [[gerrit:921254|Remove GWToolset configuration (2/2) (T270911)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:41:24Z] <legoktm@deploy1002> legoktm: Backport for [[gerrit:921254|Remove GWToolset configuration (2/2) (T270911)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-31T07:58:11Z] <legoktm@deploy1002> Finished scap: Backport for [[gerrit:921254|Remove GWToolset configuration (2/2) (T270911)]] (duration: 38m 58s)

Hi. It has been more than two years since this task was created. The log doesn't work (T326177) [...]

Should Wikimedia-production-error be removed from reports about undeployed extensions?

Hi. It has been more than two years since this task was created. The log doesn't work (T326177) [...]

Should Wikimedia-production-error be removed from reports about undeployed extensions?

Yes, except that per T337062, in this case they should be marked as Invalid instead.