Page MenuHomePhabricator

Remove GWToolset extension from Wikimedia Commons
Open, Needs TriagePublic

Description

We need to accept failure and remove this technical debt from Commons. It's barely used and we've been advising GLAM partners to not use it at all. See also the big alert at https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset

The current status of the tool:

  • Never got past initial prototype
  • Full of unresolved bugs
  • No team at the WMF that has active ownership

Normally I would like to have some form of replacement before removing something. In this case we have Pattypan.

Note: I was on the steering group for the development so I'm killing my own baby here.

https://www.mediawiki.org/wiki/Extension:GWToolset

For establishing community consensus: https://commons.wikimedia.org/wiki/Commons_talk:GLAMwiki_Toolset#Removing_this_tool_from_Commons

Event Timeline

I concur with this - for the GLAM community, we have not recommended GWToolset at all in the last two years, and have been telling folks to use Pattypan as a general bulk upload tool, or to use one of many well-documented bot scripts.

I support this generally.

Do we know how widely it's actually used? Obviously that needs to be a factor, making sure people are made aware, migrate etc

Do we know how widely it's actually used?

It is rarely used, and has not been recommended by the GLAMWIKI community to new users for several years.

The user group is currently granted to about 50 users. Looking at recent edits made with it I only see some uploads by @ChristianFerrer.

Do we know how widely it's actually used?

It is rarely used, and has not been recommended by the GLAMWIKI community to new users for several years.

We all know "Do not use" (of any form) doesn't stop people actually using it. :)

So since 2016, ~550K log entries:

MariaDB [commonswiki]> select COUNT(ct_id) from change_tag where ct_tag_id = 71;
+--------------+
| COUNT(ct_id) |
+--------------+
|       550791 |
+--------------+
1 row in set (0.16 sec)

48K in 2020

MariaDB [commonswiki]> select count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) WHERE ct_tag_id = 71 AND log_timestamp > 20200101000000;
+--------------+
| count(ct_id) |
+--------------+
|        48050 |
+--------------+
1 row in set (5.65 sec)

2020 count by month

MariaDB [commonswiki]> select MONTH(log_timestamp), count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) WHERE ct_tag_id = 71 AND log_timestamp > 20200101000000 GROUP BY YEAR(log_timestamp), MONTH(log_timestamp) ASC;
+----------------------+--------------+
| MONTH(log_timestamp) | count(ct_id) |
+----------------------+--------------+
|                    1 |         3424 |
|                    2 |         2853 |
|                    3 |        14696 |
|                    4 |         4038 |
|                    5 |        17768 |
|                    6 |          582 |
|                    7 |          700 |
|                    8 |          564 |
|                    9 |         1004 |
|                   10 |          668 |
|                   11 |         1133 |
|                   12 |          620 |
+----------------------+--------------+
12 rows in set (0.69 sec)

Count by Year, Month for the length of the logs...

MariaDB [commonswiki]> select YEAR(log_timestamp), MONTH(log_timestamp), count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) WHERE ct_tag_id = 71 GROUP BY YEAR(log_timestamp), MONTH(log_timestamp) ASC;
+---------------------+----------------------+--------------+
| YEAR(log_timestamp) | MONTH(log_timestamp) | count(ct_id) |
+---------------------+----------------------+--------------+
|                2016 |                    2 |         7578 |
|                2016 |                    3 |         3907 |
|                2016 |                    4 |         4889 |
|                2016 |                    5 |         1087 |
|                2016 |                    6 |         2746 |
|                2016 |                    7 |          507 |
|                2016 |                    8 |         4813 |
|                2016 |                    9 |          304 |
|                2016 |                   12 |         3996 |
|                2017 |                    1 |        15269 |
|                2017 |                    2 |        15412 |
|                2017 |                    3 |        16106 |
|                2017 |                    4 |        16512 |
|                2017 |                    5 |        71745 |
|                2017 |                    6 |       139145 |
|                2017 |                    7 |       127458 |
|                2017 |                    8 |         8163 |
|                2017 |                    9 |         1168 |
|                2017 |                   12 |        22099 |
|                2018 |                    1 |         2034 |
|                2018 |                    2 |         1365 |
|                2018 |                    3 |         1101 |
|                2018 |                    4 |        11998 |
|                2018 |                    6 |            4 |
|                2018 |                    7 |           84 |
|                2018 |                    8 |         1008 |
|                2018 |                    9 |         1150 |
|                2018 |                   10 |            2 |
|                2018 |                   12 |           24 |
|                2019 |                    1 |         2334 |
|                2019 |                    3 |          160 |
|                2019 |                    5 |         1079 |
|                2019 |                    6 |         5669 |
|                2019 |                    7 |         2419 |
|                2019 |                    8 |         4058 |
|                2019 |                    9 |          113 |
|                2019 |                   10 |         1793 |
|                2019 |                   11 |         1502 |
|                2019 |                   12 |         1940 |
|                2020 |                    1 |         3424 |
|                2020 |                    2 |         2853 |
|                2020 |                    3 |        14696 |
|                2020 |                    4 |         4038 |
|                2020 |                    5 |        17768 |
|                2020 |                    6 |          582 |
|                2020 |                    7 |          700 |
|                2020 |                    8 |          564 |
|                2020 |                    9 |         1004 |
|                2020 |                   10 |          668 |
|                2020 |                   11 |         1133 |
|                2020 |                   12 |          620 |
+---------------------+----------------------+--------------+
51 rows in set (0.85 sec)

Thanks Sam, can you do a per user per year breakdown? I expect a small number of users with a lot of uploads.

And in a bit more of a graphical format

2016-02, 7578
2016-03, 3907
2016-04, 4889
2016-05, 1087
2016-06, 2746
2016-07, 507
2016-08, 4813
2016-09, 304
2016-12, 3996
2017-01, 15269
2017-02, 15412
2017-03, 16106
2017-04, 16512
2017-05, 71745
2017-06, 139145
2017-07, 127458
2017-08, 8163
2017-09, 1168
2017-12, 22099
2018-01, 2034
2018-02, 1365
2018-03, 1101
2018-04, 11998
2018-06, 4
2018-07, 84
2018-08, 1008
2018-09, 1150
2018-10, 2
2018-12, 24
2019-01, 2334
2019-03, 160
2019-05, 1079
2019-06, 5669
2019-07, 2419
2019-08, 4058
2019-09, 113
2019-10, 1793
2019-11, 1502
2019-12, 1940
2020-01, 3424
2020-02, 2853
2020-03, 14696
2020-04, 4038
2020-05, 17768
2020-06, 582
2020-07, 700
2020-08, 564
2020-09, 1004
2020-10, 668
2020-11, 1133
2020-12, 620

Screenshot 2020-12-29 at 18.16.28.png (1×2 px, 141 KB)

Totals per year...

MariaDB [commonswiki]> select YEAR(log_timestamp), count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) WHERE ct_tag_id = 71 GROUP BY YEAR(log_timestamp) ASC;
+---------------------+--------------+
| YEAR(log_timestamp) | count(ct_id) |
+---------------------+--------------+
|                2016 |        29827 |
|                2017 |       433077 |
|                2018 |        18770 |
|                2019 |        21067 |
|                2020 |        48050 |
+---------------------+--------------+
5 rows in set (0.81 sec)

Totals per year per User...

MariaDB [commonswiki]> select YEAR(log_timestamp), actor_name, count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 GROUP BY YEAR(log_timestamp), log_actor ASC;
+---------------------+----------------------------------+--------------+
| YEAR(log_timestamp) | actor_name                       | count(ct_id) |
+---------------------+----------------------------------+--------------+
|                2016 |                                |         2208 |
|                2016 | Pdproject                        |           59 |
|                2016 | Hansmuller                       |         3853 |
|                2016 | Ebastia1                         |          110 |
|                2016 | Timmietovenaar                   |           34 |
|                2016 | OlafJanssen                      |          142 |
|                2016 | MartinPoulter                    |         6387 |
|                2016 | Jason.nlw                        |         3366 |
|                2016 | ETH-Bibliothek                   |         9553 |
|                2016 | Beeld en Geluid Collecties       |          304 |
|                2016 | 85jesse                          |          112 |
|                2016 | Ndalyrose                        |          365 |
|                2016 | Archives cantonales jurassiennes |         3087 |
|                2016 | Mmason23                         |          247 |
|                2017 | Steinsplitter                    |            2 |
|                2017 | Pdproject                        |           21 |
|                2017 | Ebastia1                         |        26413 |
|                2017 | Pharos                           |       364741 |
|                2017 | OlafJanssen                      |         5717 |
|                2017 | ETH-Bibliothek                   |        30339 |
|                2017 | Swiss National Library           |         4976 |
|                2017 | Beeld en Geluid Collecties       |           76 |
|                2017 | 85jesse                          |            3 |
|                2017 | TeklaLilith                      |          789 |
|                2018 | Pdproject                        |          188 |
|                2018 | Pharos                           |         1448 |
|                2018 | OlafJanssen                      |         1021 |
|                2018 | ETH-Bibliothek                   |        11906 |
|                2018 | Swiss National Library           |         3014 |
|                2018 | Ndalyrose                        |         1193 |
|                2019 | Christian Ferrer                 |        14920 |
|                2019 | Pharos                           |         2334 |
|                2019 | OlafJanssen                      |         1279 |
|                2019 | ETH-Bibliothek                   |         1137 |
|                2019 | Beeld en Geluid Collecties       |         1237 |
|                2019 | Ndalyrose                        |          160 |
|                2020 | Christian Ferrer                 |        45596 |
|                2020 | Swiss National Library           |          385 |
|                2020 | Beeld en Geluid Collecties       |         2018 |
|                2020 | Namrood                          |           51 |
+---------------------+----------------------------------+--------------+
40 rows in set (0.89 sec)

And with a bit more sorting...

MariaDB [commonswiki]> select YEAR(log_timestamp), actor_name, count(ct_id) from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 GROUP BY YEAR(log_timestamp), log_actor ASC ORDER BY YEAR(log_timestamp), actor_name;
+---------------------+----------------------------------+--------------+
| YEAR(log_timestamp) | actor_name                       | count(ct_id) |
+---------------------+----------------------------------+--------------+
|                2016 | 85jesse                          |          112 |
|                2016 | Archives cantonales jurassiennes |         3087 |
|                2016 | Beeld en Geluid Collecties       |          304 |
|                2016 | ETH-Bibliothek                   |         9553 |
|                2016 | Ebastia1                         |          110 |
|                2016 |                                |         2208 |
|                2016 | Hansmuller                       |         3853 |
|                2016 | Jason.nlw                        |         3366 |
|                2016 | MartinPoulter                    |         6387 |
|                2016 | Mmason23                         |          247 |
|                2016 | Ndalyrose                        |          365 |
|                2016 | OlafJanssen                      |          142 |
|                2016 | Pdproject                        |           59 |
|                2016 | Timmietovenaar                   |           34 |
|                2017 | 85jesse                          |            3 |
|                2017 | Beeld en Geluid Collecties       |           76 |
|                2017 | ETH-Bibliothek                   |        30339 |
|                2017 | Ebastia1                         |        26413 |
|                2017 | OlafJanssen                      |         5717 |
|                2017 | Pdproject                        |           21 |
|                2017 | Pharos                           |       364741 |
|                2017 | Steinsplitter                    |            2 |
|                2017 | Swiss National Library           |         4976 |
|                2017 | TeklaLilith                      |          789 |
|                2018 | ETH-Bibliothek                   |        11906 |
|                2018 | Ndalyrose                        |         1193 |
|                2018 | OlafJanssen                      |         1021 |
|                2018 | Pdproject                        |          188 |
|                2018 | Pharos                           |         1448 |
|                2018 | Swiss National Library           |         3014 |
|                2019 | Beeld en Geluid Collecties       |         1237 |
|                2019 | Christian Ferrer                 |        14920 |
|                2019 | ETH-Bibliothek                   |         1137 |
|                2019 | Ndalyrose                        |          160 |
|                2019 | OlafJanssen                      |         1279 |
|                2019 | Pharos                           |         2334 |
|                2020 | Beeld en Geluid Collecties       |         2018 |
|                2020 | Christian Ferrer                 |        45596 |
|                2020 | Namrood                          |           51 |
|                2020 | Swiss National Library           |          385 |
+---------------------+----------------------------------+--------------+
40 rows in set (0.87 sec)
In T270911#6713923, @Majavah wrote:

The user group is currently granted to about 50 users.

43 actually :)

I would suggest actively trimming down/emptying that group pro-actively. Especially for those that haven't used it in 2020 (I can definitely be removed for example)

Only 20 have log entries

MariaDB [commonswiki]> select distinct actor_name from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71;
+----------------------------------+
| actor_name                       |
+----------------------------------+
| Archives cantonales jurassiennes |
| 85jesse                          |
| Jason.nlw                        |
| MartinPoulter                    |
| Timmietovenaar                   |
| Mmason23                         |
| Pdproject                        |
| Hansmuller                       |
| ETH-Bibliothek                   |
|                                |
| OlafJanssen                      |
| Ndalyrose                        |
| Beeld en Geluid Collecties       |
| Ebastia1                         |
| Pharos                           |
| Swiss National Library           |
| TeklaLilith                      |
| Steinsplitter                    |
| Christian Ferrer                 |
| Namrood                          |
+----------------------------------+
20 rows in set (0.78 sec)

So we can definitely remove at least 23, halving the size of the group.

Then for users that have used it in the last two years...

MariaDB [commonswiki]> select distinct actor_name from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 AND log_timestamp > 20190101000000;
+----------------------------+
| actor_name                 |
+----------------------------+
| Pharos                     |
| Ndalyrose                  |
| ETH-Bibliothek             |
| Christian Ferrer           |
| OlafJanssen                |
| Beeld en Geluid Collecties |
| Swiss National Library     |
| Namrood                    |
+----------------------------+
8 rows in set (0.70 sec)

Is there much work to migrate from GWToolset to Pattypan?

It feels like one of those if there's not, and no real objections to doing so... When we have agreement from the Commons community at large, make sure the right people are notified, and then set a deadline of turning it off in maybe 2-3 months (ie end of Feb or March)?

Is there much work to migrate from GWToolset to Pattypan?

What do you mean?
Do you mean - to get the people who use it to change their workflow? Yeah - anyone who's competent to use this tool is competent to use other tools (and most of them can probably code their own bots for the purposes too).
But if you mean to get the code migrated and its functionality merged into PattyPan... No. They're completely different things that attack the same problem in quite opposite ways.

Is there much work to migrate from GWToolset to Pattypan?

It feels like one of those if there's not, and no real objections to doing so... When we have agreement from the Commons community at large, make sure the right people are notified, and then set a deadline of turning it off in maybe 2-3 months (ie end of Feb or March)?

well - other than GWtoolset running remotely, and pattypan requiring image downloads and spreadsheet wrangling.
interesting christmas present to commons, to turn off a tool that was developed by WMUK and unsupported for years, in favor of a java script tool supported by a single volunteer. perhaps you would care to consider a general tool management process?

a tool that was developed by WMUK

It was developed by Europeana.
Several European chapters provided the initial funding to Europeana to develop it.
I would know - I brokered that deal at Wikimania Haifa as a wmf contractor, announced the beginning of the project during glamcamp amsterdam as a volunteer, and (several years later) was responsible for launching the tool upon the world as a Europeana contractor.

Totally on board with killing this poor sickly baby.

That said, I do sincerely hope that the Wikimedia Foundation or another movement entity will, as soon as possible, seriously support a proper batch upload tool for Wikimedia Commons. Batch GLAM uploads are a primary source of high-quality educational media on Commons, usually accompanied by equally high-quality data (which is currently also more and more often linked/structured). A replacement would definitely need to support StructuredDataOnCommons.

Linking several kinda related tickets for reference:

https://www.mediawiki.org/wiki/Developers/Maintainers lists Structured Data as code stewards.
For potential sunsetting, please see https://www.mediawiki.org/wiki/Code_stewardship_reviews as a process, and what is needed.

Also, given the data in T270911#6713949 , do we know who (still) promotes this tool to its partners, and should there be a heads-up about this task?

I would suggest actively trimming down/emptying that group pro-actively. Especially for those that haven't used it in 2020 (I can definitely be removed for example)

Only 20 have log entries

ETH-Bibliothek has successfully migrated to Pattypan.
I've just informed the Swiss National Library about the plans to decommission the GWT.
And I've informed also the PdProject, but they probably have migrated already.

So much for the Swiss institutions, which have been on board since the inception of the GWT.

Similarly "I was on the steering group for the development", and I'm probably still the largest user of the tool.

There's no consensus to remove and Pattypan is not a functional replacement.

Similarly "I was on the steering group for the development", and I'm probably still the largest user of the tool.

That really doesn't seem to be the case, at least, not on your personal account. As far as the log show, you haven't used it since 2016, never mind in the last 2 years.

Then for users that have used it in the last two years...

MariaDB [commonswiki]> select distinct actor_name from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 AND log_timestamp > 20190101000000;
+----------------------------+
| actor_name                 |
+----------------------------+
| Pharos                     |
| Ndalyrose                  |
| ETH-Bibliothek             |
| Christian Ferrer           |
| OlafJanssen                |
| Beeld en Geluid Collecties |
| Swiss National Library     |
| Namrood                    |
+----------------------------+
8 rows in set (0.70 sec)

And then looking more in absolute numbers...

MariaDB [commonswiki]> select actor_name, count(ct_id) as total from change_tag INNER JOIN logging ON (ct_log_id=log_id) INNER JOIN actor ON (log_actor=actor_id) WHERE ct_tag_id = 71 GROUP BY log_actor ASC ORDER BY total desc;
+----------------------------------+--------+
| actor_name                       | total  |
+----------------------------------+--------+
| Pharos                           | 368523 |
| Christian Ferrer                 |  60770 |
| ETH-Bibliothek                   |  52935 |
| Ebastia1                         |  26523 |
| Swiss National Library           |   8375 |
| OlafJanssen                      |   8159 |
| MartinPoulter                    |   6387 |
| Hansmuller                       |   3853 |
| Beeld en Geluid Collecties       |   3635 |
| Jason.nlw                        |   3366 |
| Archives cantonales jurassiennes |   3087 |
|                                |   2208 |
| Ndalyrose                        |   1718 |
| TeklaLilith                      |    789 |
| Pdproject                        |    268 |
| Mmason23                         |    247 |
| 85jesse                          |    115 |
| Namrood                          |     51 |
| Timmietovenaar                   |     34 |
| Steinsplitter                    |      2 |
+----------------------------------+--------+
20 rows in set (0.75 sec)

And for clarity

MariaDB [commonswiki]> select * from change_tag_def where ctd_id = 71;
+--------+-----------+------------------+-----------+
| ctd_id | ctd_name  | ctd_user_defined | ctd_count |
+--------+-----------+------------------+-----------+
|     71 | gwtoolset |                0 |    551045 |
+--------+-----------+------------------+-----------+
1 row in set (0.01 sec)

adding David Haskiya - the Project Manager for the GWT at Europeana (now working at WM-Sweden).

Sorry you spent time on this analysis.

It's wrong because you are relying on the log which was only introduced a fairly long time after the tool was active, I believe based on my memory. For example, in 2014 the HABS uploads were over 300,000 photographs, and that was just one of the projects I used GWT on.

That really doesn't seem to be the case, at least, not on your personal account. As far as the log show, you haven't used it since 2016, never mind in the last 2 years.

It doesn't stop it being right for the time it's been active; ie since the tagging was added.

Which is part of the point here. It doesn't matter how much it was used 5+ years ago, it's about how much it's being used currently.

Uploads you did in 2014 don't make the extension used massively recently, does it?

"probably still the largest user of the tool" is accurate, that's what I wrote, not "used massively recently".

Thanks for your interest. I don't think there's any point in analysing this further, as there's nothing being proved.

But you're not actively using the extension.

Just because you were the most active user many years ago doesn't make your input any more important than anyone elses.

I did not say that I had used GWT "massively recently".

I did not say that my input is "more important than anyone elses".

Is there any point to this? I had made my point, there's no need to put words in my mouth I have not written here. Nor is it helpful to paint a picture that I'm weirdly puffing myself up like a complete idiot in a Phabricator discussion in order to feel important. Who would care?

Fact: I have been a significant user of this tool,.
Fact: I know about GWT.
Fact: I know how much money was invested in it as I was part of establishing it and the negotiations from the outset.
Fact: I believe in the Wikimedia Commons project.

I no longer understand what point you are making, or why you appear determined to marginalize my point of view.

Thanks for your critical observations about me. I am not seeking election, money, or glory. I win nothing by participating, in fact, it costs me my volunteer time and mental space that probably should be better spent rationalizing the news that one of my elderly relatives died of Covid19 in hospital yesterday. So how about dropping the stick you seem determined to beat me with, we all have other things to worry about.

If you want to disable GWT, do as you please, but you don't do it with my support because nothing here has convinced me that there is a process for replacing it with something easier or better for the same job.

Fact: I have been a significant user of this tool,.

"Significant" is a subjective term and thus can't be used to constitute a "fact"

Fact: I believe in the Wikimedia Commons project.

"Believe" is a vague and incomplete term and also can't be used to constitute a "fact"

"probably still the largest user of the tool" is accurate

If so, I suggest that you file a separate tasks with the requirements for you to not be a "large user", or a user at all, ever again. You could then propose it as blocker of this task, if needed.

The largest recent user by far was Pharos, I suppose for the MET uploads, so it would be useful to hear from him. I've left a message. From what I've seen, it should be easy to help him switch to a basic pywikibot upload script or similar, if still needed.

@Fae: Hi, it would be welcome to follow https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette and not assume that folks use a "stick" but that folks try to understand what software-related points there are, apart from "I used GWToolset a lot; I have not used it recently; an alternative lacks some functionality but I don't tell what functionality" (but you mentioned one functionality aspect now on Commons; thanks). That's why folks try to reach out to currently and recently active GWToolset users to better understand the consequences of a removal. (And regarding past development costs, they are rather irrelevant.)

Hi, I did not managed to find a suitable alternative, so I continue to use GWToolset at an average of 500 files by month. In the way I work I always have XML files ready to be uploaded and others in preparation, so if either a date is fixed one day for the end of the tool, I would be happy to be aware a few days before for that I can upload all that I have in my PC ready to be uploaded, in the purpose that I did not work for nothing. Thanks you, regards.

@ChristianFerrer could you exemplify what functionality other tooling is missing for your particular use-case? Sometimes it's a matter of lacking documentation, for example, Pattypan can upload files from URLs but few people know about that capability.

@Abbe98 yes I know Pattypan can upload by URL but I did not manage to do it, I have learnt to use Pattypan for files stored in my PC but despite several attempts I never managed to uploaded files by URL, and indeed there is no documentation.

The flow is also a bit of concern, e.g. it took me 12min to upload the 75 files available at https://commons.wikimedia.org/wiki/Category:Media_from_Gerken_2018_-_10.11646/zootaxa.4428.1.1, and this morning I uploaded a small batch of near 150 files within 9 min, the first file being https://commons.wikimedia.org/wiki/File:Sperchon_fuxiensis_(10.3897-zookeys.707.13493)_Figures_1%E2%80%933.jpg and the last being https://commons.wikimedia.org/wiki/File:Callosa_baiseensis_(10.3897-zookeys.703.13641)_Figure_7.jpg. Which is roughly half the time for double the files, which is 4 times faster.

That is not a serious issue for 100 or 200 files, but for for e.g. 15000 files (https://commons.wikimedia.org/wiki/Category:Mollusca_in_the_MNHN), that is not the same thing.

So honestly even if the maintenance of the tool is abandoned so that it does not cost any more money to maintain it, I would be rather happy to be able to use it as long as it works. But clearly, yes, I would also be happy to know how to upload by URL with Pattypan. Especially if the fate of the GWtoolset is sealed.

And also is Pattypan able to avoid uploading duplicates ?

Change 676911 had a related patch set uploaded (by Amire80; author: Amire80):

[translatewiki@master] Move GWToolset to Wikimedia Legacy

https://gerrit.wikimedia.org/r/676911

Change 676911 merged by jenkins-bot:

[translatewiki@master] Move GWToolset to Wikimedia Legacy

https://gerrit.wikimedia.org/r/676911

For everyone's info, currently no Code-Stewardship-Reviews are taking place as there is no clear path forward and as this is not prioritized work.
(Entirely personal opinion: I also assume lack of decision authority due to WMF not having a CTO currently. However, discussing this is off-topic for this task.)

well - other than GWtoolset running remotely, and pattypan requiring image downloads and spreadsheet wrangling.
interesting christmas present to commons, to turn off a tool that was developed by WMUK and unsupported for years, in favor of a java script tool supported by a single volunteer. perhaps you would care to consider a general tool management process?

At the risk of getting off-topic. This is not surprising at all. GWToolset (With all respect to the people who developed it, i know you tried hard in a difficult situation and were generally set up for failure) is a design by committee tool where the committee did not adequately understand the requirements. A single developer working alone and understanding what needs to be done, will beat that every time, hands down. It is likely that the main reason this got deployed at all was due to internal politics, sunk cost fallacy and bad optics of WMF causing WM-UK/europena spending a lot of money for nothing. If this was an extension written for "free" by a volunteer, there is a 0% chance it would have been deployed in the first place.

Not saying this to be mean - I think everyone involved really did try there best, and communication failures, especially on the WMF side, set things up for failure. However, i think a lot of interesting lessons could be learned from GWToolset if anyone ever wants to try again with something similar. Which would be cool. So far it seems like WM-DE (unless i am misinformed) is really the only chapter that has successfully collaborated on mediawiki development. I think Wikimedia would be a lot healthier if more groups participated on the technical side.

Not saying this to be mean - I think everyone involved really did try there best, and communication failures, especially on the WMF side, set things up for failure. However, i think a lot of interesting lessons could be learned from GWToolset if anyone ever wants to try again with something similar. Which would be cool. So far it seems like WM-DE (unless i am misinformed) is really the only chapter that has successfully collaborated on mediawiki development. I think Wikimedia would be a lot healthier if more groups participated on the technical side.

The result wasn't really what we expected and hoped for. This was learning money for Wikimedia. The is no reason to keep this tool around. Having a broken tool instead of none at all is worse because it lowers the incentive to get a new tool. Accept failure, remove the tool, move on.

At the moment the Commons extension for OpenRefine is in development, see https://github.com/OpenRefine/CommonsExtension . The first file was uploaded today. This should become a very good alternative.

Though not perfect, the tool don't seems to be broken to me, I still use it:
https://commons.wikimedia.org/w/index.php?hidebots=1&translations=filter&hidecategorization=1&hideWikibase=1&tagfilter=gwtoolset&limit=1000&days=30&enhanced=1&title=Special:RecentChanges&urlversion=2
The last files uploaded this morning.
As long as I can use it, and as long OpenRefine is not really used for batch-uploading, I will be happy the tool stay available even if not maintened.

Since a few days I have not been able to use the GWToolset. The circle is complete. Bye.

If you give an actionable bug report, maybe someone could help fix it….

@Reedy thank you, it's nice but since the tool is destined to be stopped, so no one wastes time. That I stop batchuploading is not especially a big issue, I have a ton of things to do.
For the record, and if I'm not wrong, this file has been the last one uploaded with the GWToolset.

@ChristianFerrer Would you be interested in an introduction to OpenRefine (perhaps together with a few other former GWtoolset users)? I'm very interested to figure out together with you how OpenRefine can support the kind of uploads you want to do.

Batch uploading is now possible (although it is still with experimental code that hasn't been merged yet, but since you are a GWtoolset user, I think you will be able to do it just fine).

@Spinster No thank you I am starting a period of increased professional activity, I cannot invest in this subject for the moment. When in a few months my schedule allows it, and if I'm still interested in mass upload I'll take a look at this tool, and contact you if needed. Thanks you very much.