Page MenuHomePhabricator

Multiple DB-related errors during harvesting
Closed, ResolvedPublic

Description

84 errors (for 117 datasets?

ERROR: Unknown error occurred when processing country ir in lang fa
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_ir_(fa)`.`lat` at row 1")
--
ERROR: Unknown error occurred when processing country se-arbetsl in lang sv
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country de-nrw-bm in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country es-ct in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country ro in lang ro
(1406, "Data too long for column 'adresa' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang fr
(1406, "Data too long for column 'classement' at row 1")
--
ERROR: Unknown error occurred when processing country ie in lang en
(1265, "Data truncated for column 'number' at row 1")
--
ERROR: Unknown error occurred when processing country hu in lang hu
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country ch2 in lang de
(1406, "Data too long for column 'fotobeschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country gb-eng in lang en
(1406, "Data too long for column 'name' at row 1")
--
ERROR: Unknown error occurred when processing country rs in lang sr
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_rs_(sr)`.`lon` at row 1")
--
ERROR: Unknown error occurred when processing country be-wal in lang fr
(1406, "Data too long for column 'nom_objet' at row 1")
--
ERROR: Unknown error occurred when processing country al in lang sq
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_al_(sq)`.`lat` at row 1")
--
ERROR: Unknown error occurred when processing country uy in lang es
(1406, "Data too long for column 'monumento' at row 1")
--
ERROR: Unknown error occurred when processing country es in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country gb-nir in lang en
(1406, "Data too long for column 'hb' at row 1")
--
ERROR: Unknown error occurred when processing country aq in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ch-old in lang en
(1265, "Data truncated for column 'kgs_nr' at row 1")
--
ERROR: Unknown error occurred when processing country no in lang no
(1265, "Data truncated for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country us-ca in lang en
(1265, "Data truncated for column 'refnum' at row 1")
--
ERROR: Unknown error occurred when processing country fr in lang fr
(1406, "Data too long for column 'notice' at row 1")
--
ERROR: Unknown error occurred when processing country it-bz in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ca-prov in lang en
(1366, "Incorrect integer value: '––' for column `s51138__heritage_p`.`monuments_ca-prov_(en)`.`idm` at row 1")
--
ERROR: Unknown error occurred when processing country th in lang th
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country il in lang he
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country de-he in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country pt in lang pt
(1406, "Data too long for column 'designacoes' at row 1")
--
ERROR: Unknown error occurred when processing country fr-object in lang fr
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country mx in lang es
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country be-bru in lang nl
(1406, "Data too long for column 'bouwdoor' at row 1")
--
ERROR: Unknown error occurred when processing country au in lang en
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_au_(en)`.`lon` at row 33")
--
ERROR: Unknown error occurred when processing country be-wal in lang nl
(1406, "Data too long for column 'descr_nl' at row 1")
--
ERROR: Unknown error occurred when processing country gb-sct in lang en
(1265, "Data truncated for column 'hb' at row 1")
--
ERROR: Unknown error occurred when processing country es-vc in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country hr in lang hr
(1406, "Data too long for column 'arhitekt' at row 1")
--
ERROR: Unknown error occurred when processing country jp-nhs in lang en
(1406, "Data too long for column 'comments' at row 1")
--
ERROR: Unknown error occurred when processing country za in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country pa in lang es
(1406, "Data too long for column 'descripcion' at row 1")
--
ERROR: Unknown error occurred when processing country ar in lang es
(1406, "Data too long for column 'direccion' at row 1")
--
ERROR: Unknown error occurred when processing country sr in lang commons
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_sr_(nl)`.`lon` at row 2")
--
ERROR: Unknown error occurred when processing country pl in lang pl
(1406, "Data too long for column 'nazwa' at row 1")
--
ERROR: Unknown error occurred when processing country in in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country es in lang es
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_es_(es)`.`lat` at row 1")
--
ERROR: Unknown error occurred when processing country wlpa-es-ct in lang ca
(1406, "Data too long for column 'descripcio' at row 1")
--
ERROR: Unknown error occurred when processing country pk in lang en
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_pk_(en)`.`lon` at row 1")
--
ERROR: Unknown error occurred when processing country be-wal in lang en
(1406, "Data too long for column 'descr_nl' at row 1")
--
ERROR: Unknown error occurred when processing country us in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country at in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country dk-bygning in lang da
(1265, "Data truncated for column 'systemnrbyg' at row 1")
--
ERROR: Unknown error occurred when processing country by in lang be-tarask
(1406, "Data too long for column 'name' at row 1")
--
ERROR: Unknown error occurred when processing country eg in lang ar
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country de-nrw-k in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country mt in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ug in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ca-fed in lang en
(1406, "Data too long for column 'address' at row 1")
--
ERROR: Unknown error occurred when processing country tn in lang fr
(1406, "Data too long for column 'monument' at row 1")
--
ERROR: Unknown error occurred when processing country il-npa in lang he
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country nl in lang nl
(1265, "Data truncated for column 'type_obj' at row 1")
--
ERROR: Unknown error occurred when processing country cn in lang en
(1406, "Data too long for column 'designation' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang en
(1406, "Data too long for column 'address' at row 1")
--
ERROR: Unknown error occurred when processing country pe in lang es
(1406, "Data too long for column 'direccion' at row 1")
--
ERROR: Unknown error occurred when processing country ee in lang et
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_ee_(et)`.`lat` at row 1")
--
ERROR: Unknown error occurred when processing country nl-gem in lang nl
(1406, "Data too long for column 'objnr' at row 1")
--
ERROR: Unknown error occurred when processing country ph in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country it in lang it
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country gh in lang en
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country sk in lang de
(1406, "Data too long for column 'beschreibung-de' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang nl
(1406, "Data too long for column 'adres' at row 1")
--
ERROR: Unknown error occurred when processing country de-by in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country iq in lang ar
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country es-gl in lang gl
(1406, "Data too long for column 'notas' at row 1")
--
ERROR: Unknown error occurred when processing country fr in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country cz in lang cs
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country dk-fortids in lang da
(1265, "Data truncated for column 'fredningsnummer' at row 1")
--
ERROR: Unknown error occurred when processing country am in lang hy
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country gb-wls in lang en
(1406, "Data too long for column 'notes' at row 1")
--
ERROR: Unknown error occurred when processing country wlpa-at in lang de
(1406, "Data too long for column 'Beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ru in lang ru
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ua in lang uk
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country cl in lang es
(1366, "Incorrect integer value: 'S/N' for column `s51138__heritage_p`.`monuments_cl_(es)`.`id` at row 1")
--
ERROR: Unknown error occurred when processing country ch in lang de
(1406, "Data too long for column 'anzeige-adresse' at row 1")
--
ERROR: Unknown error occurred when processing country pt-wd in lang pt
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_pt-wd_(pt)`.`lon` at row 26")
--
ERROR: Unknown error occurred when processing country co in lang es
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country ca-muni in lang en
(1366, "Incorrect integer value: '––' for column `s51138__heritage_p`.`monuments_ca-muni_(en)`.`idm` at row 1")
--

Then we also have

2023-06-13_04:29:40 Update monuments_all table...
ERROR 1292 (22007) at line 514: Truncated incorrect DECIMAL value: ''
2023-06-13_04:30:50 Make more detailed statistics...

Fatal error: Uncaught DBException: SQL error 1366: Incorrect string value: '\xD0\xE2\x80\xA6' for column `s51138__heritage_p`.`statistics`.`muni` at row 1
Last query was: INSERT INTO statistics VALUES("2022-10-15","st_total","by","[[Навасёлкі (Навасёлкаўскі сельсавет, Ляхавіцкі раён)|�…","be-tarask","wikipedia","1") ON DUPLICATE KEY UPDATE value = "1" in /
mnt/nfs/labstore-secondary-tools-project/heritage/heritage/api/includes/Database.php on line 138

DBException: SQL error 1366: Incorrect string value: '\xD0\xE2\x80\xA6' for column `s51138__heritage_p`.`statistics`.`muni` at row 1
Last query was: INSERT INTO statistics VALUES("2022-10-15","st_total","by","[[Навасёлкі (Навасёлкаўскі сельсавет, Ляхавіцкі раён)|�…","be-tarask","wikipedia","1") ON DUPLICATE KEY UPDATE value = "1" in /
mnt/nfs/labstore-secondary-tools-project/heritage/heritage/api/includes/Database.php on line 138
2023-06-13_04:51:33 Make a list of images without id...
ERROR: Unknown error occurred when processing country ir in lang fa
(1054, "Unknown column 'tl_namespace' in 'where clause'")
ERROR: Unknown error occurred when processing country se-arbetsl in lang sv
(1054, "Unknown column 'tl_namespace' in 'where clause'")
... (77 times)

Event Timeline

Looks like this is happening since April 7th

Likely related to the MariaDB upgrade T301949

Looks like there are a couple of changes.

  • inserting empty strings no longer seems to be interpreted as NULL (all errors about "Incorrect double value" for lat/lon)
  • Inserting too long values seem to no longer result in them getting truncated, instead resulting in an error (the errors in statistics, likely most "Data too long for column" errors)
    • Unclear why some are still getting truncated (but raising other errors), a quick glance suggests that this is happening for int values (the rest being varchar)

Some worrying things:

  • Looking at the ua_uk error it references the column "site" which doesn't appear in either monuments_config/ua_uk.json or sql/create_table_monuments_ua_(uk).sql or any template/*.sql (unless this is an error from filling the comonscat table).

The last set of errors are independent of the database change, but instead due to tl_namespace (and tl_title) being removed in mw 1.39. We would instead need to go via tl_target_id > lt_namespace/lt_title. Breaking this out into a separate task T341773

Maybe @fnegri or @Ladsgroup can shine a light on if any of the changes are due to the MariaDB upgrade and what the recommended migration pathways are in that case?

Mentioned in SAL (#wikimedia-cloud) [2023-08-21T16:00:30Z] <wm-bot> <jeanfred> Set EMPTY_STRING_IS_NULL SQL mode (using ) for T338987

Mentioned in SAL (#wikimedia-cloud) [2023-08-21T16:00:33Z] <wm-bot> <jeanfred> Set EMPTY_STRING_IS_NULL SQL mode (using SET sql_mode = 'STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,EMPTY_STRING_IS_NULL';) for T338987

Not sure that EMPTY_STRING_IS_NULL will fix it, but it's worth a try. This is probably the related change in MariaDB:

https://mariadb.com/kb/en/upgrading-from-mariadb-101-to-mariadb-102/#sql-mode

You can also check the guides for upgrading to 10.3 and 10.4:

From a quick look, I don't see anything there that would explain the "Data too long for column" error.

We did not change any configuration option during the upgrade, as far as I remember, so the only differences should be the ones listed in the upgrade guides, but it's very possible I'm missing something.

If you can't find a way to fix those errors, please paste some example queries that are failing and I will have a look.

One example of

(1366, "Incorrect double value: '' for column s51138__heritage_p.monuments_pk_(en).lon at row 1")

would be

REPLACE INTO monuments_pk_(en) (source, number, prov_iso, description, address, district, lon, monument_article, registrant_url) VALUES (//en.wikipedia.org/w/index.php?title=List_of_cultural_heritage_sites_in_Balochistan,_Pakistan&oldid=1139504719, BA-2, PK-BA, [[Nindo Damb]], Ornach Valley, Tehsil Wadh, [[Killa Abdullah District]], , Nindo_Damb, BA-2)

The problem could be that innodb_strict_mode changed from OFF to ON in MariaDB 10.2. Try set innodb_strict_mode = OFF; before issuing your query.

Change 951174 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Use None as empty value in SQL queries

https://gerrit.wikimedia.org/r/951174

One example of

(1406, "Data too long for column 'image' at row 1")

REPLACE INTO `monuments_ir_(fa)` (`source`, `id`, `name`, `city`, `address`, `registration_date`, `age`, `image`, `ISO`, `province`, `monument_article`) VALUES (//fa.wikipedia.org/w/index.php?title=ویکی‌پدیا:فهرست_یادمان‌های_ملی_ایران/اصفهان&oldid=35389593, 115, مناره‌های دردشت, اصفهان, خ ابن سینا، بازارچه دردشت، کوی دردشت, ۱۳۱۰/۱۰/۱۵, قرن ۷ و ۸ ق, %D9%85%D9%86%D8%A7%D8%B1%D9%87_%D9%87%D8%A7%DB%8C_%D8%AF%D8%B1%D8%AF%D8%B4%D8%AA_%D8%A7%D8%B2_%D8%AF%D8%A7%D8%AE%D9%84_%DB%8C%DA%A9%DB%8C_%D8%A7%D8%B2_%D8%AE%D8%A7%D9%86%D9%87_%D9%87%D8%A7%DB%8C_%D8%AA%D8%A7%D8%B1%DB%8C%D8%AE%DB%8C_%D8%A7%D8%B7%D8%B1%D8%A7%D9%81.jpg, IR-04, اصفهان, )

Contrast with a successful query:

REPLACE INTO `monuments_ir_(fa)` (`source`, `id`, `name`, `city`, `address`, `registration_date`, `age`, `image`, `ISO`, `province`, `monument_article`) VALUES (//fa.wikipedia.org/w/index.php?title=ویکی‌پدیا:فهرست_یادمان‌های_ملی_ایران/اصفهان&oldid=35389593, 110, پل الله وردیخان/ سی‌وسه پل/ پل جل, اصفهان, م انقلاب، زاینده رود, ۱۳۱۰/۱۰/۱۵, صفویه, %D8%B3%DB%8C_%D9%88_%D8%B3%D9%87_%D9%BE%D9%84.JPG, IR-04, اصفهان, )

Another example of (1406, "Data too long for column 'classement' at row 1") in be-vlg:fr:

REPLACE INTO `monuments_be-vlg_(fr)` (`source`, `id`, `classement`, `commune`, `section_communale`, `section_communale_id`, `adresse`, `nom_objet`, `lat`, `lon`, `image`, `monument_article`, `registrant_url`) VALUES (//fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Borsbeek&oldid=147591324, 12759, [http://onroerenderfgoed.ruimte-erfgoed.be/Default.aspx?tabid=14603&ObjectID=OA002314 OA002314]<br />[[File:Monumentenschildje blauw wit.svg|18px]] [http://onroerenderfgoed.ruimte-erfgoed.be/Default.aspx?tabid=14603&ObjectID=OA002315 OA002315]<br />[[File:Monumentenschildje blauw wit.svg|18px]] [http://onroerenderfgoed.ruimte-erfgoed.be/Default.aspx?tabid=14603&ObjectID=OA003545 OA003545]<br />[[File:Monumentenschildje blauw wit.svg|18px]] [http://onroerenderfgoed.ruimte-erfgoed.be/Default.aspx?tabid=14603&ObjectID=OA003546 OA003546], Borsbeek, Borsbeek, 11007A, Frans Beirenslaan 2, Ferme dite du Saint-Esprit (''Heilige Geest'') 
{{nl}} ''"Heilige Geesthoeve"'', 51.190763, 4.476851, Borsbeek Heilige-Geesthoeve 2.JPG, , https://inventaris.onroerenderfgoed.be/dibe/relict/12759)

vs.

REPLACE INTO `monuments_be-vlg_(fr)` (`source`, `id`, `classement`, `commune`, `section_communale`, `section_communale_id`, `adresse`, `nom_objet`, `lat`, `lon`, `monument_article`, `registrant_url`) VALUES (//fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Bornem&oldid=204796004, 1907, Oui, Bornem, Bornem, 12007A, Kloosterstraat 37, {{nl}}  Kasteel Dulft, 51.095954, 4.230427, , https://inventaris.onroerenderfgoed.be/dibe/relict/1907)

It seems to me that "Data too long for column X" means bad data in the source tables. The harvesting used to ignore that, now it does not. The proper fix is to correct the source data, but we can’t do all that.

Taking care of it before the query run is annoying, because we would need to reimplement the type check (if varchar do this, if int do that etc.)

We could just catch pymysql.err.DataError and skip the monument? That feels a bit heavyhanded − one bad field and the whole row is skipped

Change 951180 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Guard against empty primary key during monument update

https://gerrit.wikimedia.org/r/951180

This comment was removed by JeanFred.

Change 951428 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Upgrade MariaDB Docker image from 10.0 to 10.4.29

https://gerrit.wikimedia.org/r/951428

The problem could be that innodb_strict_mode changed from OFF to ON in MariaDB 10.2. Try set innodb_strict_mode = OFF; before issuing your query.

That sounds promising, thanks! Bit of a n00b question, but is that the same thing as sql_mode which I could set in the pymysql object?

That sounds promising, thanks! Bit of a n00b question, but is that the same thing as sql_mode which I could set in the pymysql object?

This is also at the edge of my mariadb knowledge. The documentation says they are "similar", but I'm not sure what happens if you set only one of them. You can try removing all "STRICT_*" options from sql_mode, something like

sql_mode = 'ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';

According to the examples in this page, SQL_MODE can also explain the errors "Data too long for column..."

Reading again the docs for innodb_strict_mode I think enabling that variable will cause a different set of errors (the ones listed in the doc page). So my understanding is that innodb_strict_mode and sql_mode are similar only in the sense they change "warnings" into "errors" when enabled, but they apply to two different sets of errors.

The errors that you are seeing seem related to the sql_mode variable, and to the STRICT_TRANS_TABLES option in it, that was not present by default in MariaDB 10.1, but is present since MariaDB 10.2.4. So removing STRICT_TRANS_TABLES from sql_mode should hopefully fix them.

Tried

STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

(The annoying thing is that I don’t have yet support to test such things in my local docker-compose setup)

(The annoying thing is that I don’t have yet support to test such things in my local docker-compose setup)

Why not? It should work if you use a similar MariaDB version.

(The annoying thing is that I don’t have yet support to test such things in my local docker-compose setup)

Why not? It should work if you use a similar MariaDB version.

Just was not sure how MariaDB would read that configuration :) but I figured it out − overriding the command.

The errors that you are seeing seem related to the sql_mode variable, and to the STRICT_TRANS_TABLES option in it, that was not present by default in MariaDB 10.1, but is present since MariaDB 10.2.4. So removing STRICT_TRANS_TABLES from sql_mode should hopefully fix them.

Yep, I can confirm locally that removing STRICT_TRANS_TABLES from sql_mode does fix the issue. Thanks for the guidance!

Nice one! Can this task be resolved, or is there something else that you would like to fix?

Change 951542 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Remove `STRICT_TRANS_TABLES` SQL mode from our MariaDB database

https://gerrit.wikimedia.org/r/951542

Change 951174 merged by jenkins-bot:

[labs/tools/heritage@master] Use None as empty value in SQL queries

https://gerrit.wikimedia.org/r/951174

Change 951180 merged by jenkins-bot:

[labs/tools/heritage@master] Guard against empty primary key during monument update

https://gerrit.wikimedia.org/r/951180

Change 951428 merged by jenkins-bot:

[labs/tools/heritage@master] Upgrade MariaDB Docker image from 10.0 to 10.4.29

https://gerrit.wikimedia.org/r/951428

Change 951542 merged by jenkins-bot:

[labs/tools/heritage@master] Remove `STRICT_TRANS_TABLES` SQL mode from our MariaDB database

https://gerrit.wikimedia.org/r/951542

Nice one! Can this task be resolved, or is there something else that you would like to fix?

All patches are now merged and deployed. Let's wait overnight to see whether harvesting succeeds. :)

Harvesting ran over night − still 84 errors :/

ERROR: Unknown error occurred when processing country ir in lang fa
(1406, "Data too long for column 'image' at row 1")
--
ERROR: Unknown error occurred when processing country se-arbetsl in lang sv
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country de-nrw-bm in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country es-ct in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country ro in lang ro
(1406, "Data too long for column 'adresa' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang fr
(1406, "Data too long for column 'classement' at row 1")
--
ERROR: Unknown error occurred when processing country ie in lang en
(1265, "Data truncated for column 'number' at row 1")
--
ERROR: Unknown error occurred when processing country hu in lang hu
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country ch2 in lang de
(1406, "Data too long for column 'fotobeschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country gb-eng in lang en
(1406, "Data too long for column 'name' at row 1")
--
ERROR: Unknown error occurred when processing country rs in lang sr
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country be-wal in lang fr
(1406, "Data too long for column 'nom_objet' at row 1")
--
ERROR: Unknown error occurred when processing country uy in lang es
(1406, "Data too long for column 'monumento' at row 1")
--
ERROR: Unknown error occurred when processing country es in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country gb-nir in lang en
(1406, "Data too long for column 'hb' at row 1")
--
ERROR: Unknown error occurred when processing country aq in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ch-old in lang en
(1265, "Data truncated for column 'kgs_nr' at row 1")
--
ERROR: Unknown error occurred when processing country no in lang no
(1265, "Data truncated for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country us-ca in lang en
(1265, "Data truncated for column 'refnum' at row 1")
--
ERROR: Unknown error occurred when processing country fr in lang fr
(1406, "Data too long for column 'notice' at row 1")
--
ERROR: Unknown error occurred when processing country it-bz in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ca-prov in lang en
(1366, "Incorrect integer value: '––' for column `s51138__heritage_p`.`monuments_ca-prov_(en)`.`idm` at row 1")
--
ERROR: Unknown error occurred when processing country th in lang th
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country il in lang he
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country de-he in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country pt in lang pt
(1406, "Data too long for column 'designacoes' at row 1")
--
ERROR: Unknown error occurred when processing country fr-object in lang fr
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country mx in lang es
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country be-bru in lang nl
(1406, "Data too long for column 'bouwdoor' at row 1")
--
ERROR: Unknown error occurred when processing country au in lang en
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_au_(en)`.`lon` at row 35")
--
ERROR: Unknown error occurred when processing country be-wal in lang nl
(1406, "Data too long for column 'descr_nl' at row 1")
--
ERROR: Unknown error occurred when processing country gb-sct in lang en
(1265, "Data truncated for column 'hb' at row 1")
--
ERROR: Unknown error occurred when processing country es-vc in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country hr in lang hr
(1406, "Data too long for column 'arhitekt' at row 1")
--
ERROR: Unknown error occurred when processing country jp-nhs in lang en
(1406, "Data too long for column 'comments' at row 1")
--
ERROR: Unknown error occurred when processing country za in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country pa in lang es
(1406, "Data too long for column 'descripcion' at row 1")
--
ERROR: Unknown error occurred when processing country ar in lang es
(1406, "Data too long for column 'direccion' at row 1")
--
ERROR: Unknown error occurred when processing country sr in lang commons
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_sr_(nl)`.`lon` at row 2")
--
ERROR: Unknown error occurred when processing country pl in lang pl
(1406, "Data too long for column 'nazwa' at row 1")
--
ERROR: Unknown error occurred when processing country in in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country es in lang es
(1406, "Data too long for column 'lugar' at row 1")
--
ERROR: Unknown error occurred when processing country wlpa-es-ct in lang ca
(1406, "Data too long for column 'descripcio' at row 1")
--
ERROR: Unknown error occurred when processing country be-wal in lang en
(1406, "Data too long for column 'descr_nl' at row 1")
--
ERROR: Unknown error occurred when processing country us in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country at in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country dk-bygning in lang da
(1265, "Data truncated for column 'systemnrbyg' at row 1")
--
ERROR: Unknown error occurred when processing country by in lang be-tarask
(1406, "Data too long for column 'name' at row 1")
--
ERROR: Unknown error occurred when processing country eg in lang ar
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country de-nrw-k in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country mt in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ug in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ca-fed in lang en
(1406, "Data too long for column 'address' at row 1")
--
ERROR: Unknown error occurred when processing country tn in lang fr
(1406, "Data too long for column 'monument' at row 1")
--
ERROR: Unknown error occurred when processing country il-npa in lang he
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country nl in lang nl
(1265, "Data truncated for column 'type_obj' at row 1")
--
ERROR: Unknown error occurred when processing country cn in lang en
(1406, "Data too long for column 'designation' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang en
(1406, "Data too long for column 'address' at row 1")
--
ERROR: Unknown error occurred when processing country pe in lang es
(1406, "Data too long for column 'direccion' at row 1")
--
ERROR: Unknown error occurred when processing country ee in lang et
(1406, "Data too long for column 'aadress' at row 1")
--
ERROR: Unknown error occurred when processing country nl-gem in lang nl
(1406, "Data too long for column 'objnr' at row 1")
--
ERROR: Unknown error occurred when processing country ph in lang en
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country it in lang it
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country gh in lang en
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country sk in lang de
(1406, "Data too long for column 'beschreibung-de' at row 1")
--
ERROR: Unknown error occurred when processing country be-vlg in lang nl
(1406, "Data too long for column 'adres' at row 1")
--
ERROR: Unknown error occurred when processing country de-by in lang de
(1406, "Data too long for column 'beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country iq in lang ar
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country es-gl in lang gl
(1406, "Data too long for column 'notas' at row 1")
--
ERROR: Unknown error occurred when processing country fr in lang ca
(1265, "Data truncated for column 'prot' at row 1")
--
ERROR: Unknown error occurred when processing country cz in lang cs
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country dk-fortids in lang da
(1265, "Data truncated for column 'fredningsnummer' at row 1")
--
ERROR: Unknown error occurred when processing country am in lang hy
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country gb-wls in lang en
(1406, "Data too long for column 'notes' at row 1")
--
ERROR: Unknown error occurred when processing country wlpa-at in lang de
(1406, "Data too long for column 'Beschreibung' at row 1")
--
ERROR: Unknown error occurred when processing country ru in lang ru
(1406, "Data too long for column 'description' at row 1")
--
ERROR: Unknown error occurred when processing country ua in lang uk
(1265, "Data truncated for column 'site' at row 1")
--
ERROR: Unknown error occurred when processing country cl in lang es
(1366, "Incorrect integer value: 'S/N' for column `s51138__heritage_p`.`monuments_cl_(es)`.`id` at row 1")
--
ERROR: Unknown error occurred when processing country ch in lang de
(1406, "Data too long for column 'anzeige-adresse' at row 1")
--
ERROR: Unknown error occurred when processing country pt-wd in lang pt
(1366, "Incorrect double value: '' for column `s51138__heritage_p`.`monuments_pt-wd_(pt)`.`lon` at row 3")
--
ERROR: Unknown error occurred when processing country co in lang es
(1406, "Data too long for column 'id' at row 1")
--
ERROR: Unknown error occurred when processing country ca-muni in lang en
(1366, "Incorrect integer value: '––' for column `s51138__heritage_p`.`monuments_ca-muni_(en)`.`idm` at row 1")
--
ERROR 1292 (22007) at line 514: Truncated incorrect DECIMAL value: ''
2023-08-23_04:30:04 Restart the categorization job...
--
ERROR: Unknown error occurred when processing country in-com in lang commons
Language 'commons' does not exist in family wikipedia

Hmmm, running SELECT @@SQL_MODE, @@GLOBAL.SQL_MODE; again in s51138__heritage_p I’m getting

STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

Did I mess up the set? Let’s try again.

Mentioned in SAL (#wikimedia-cloud) [2023-08-23T07:13:45Z] <wm-bot> <jeanfred> Removed STRICT_TRANS_TABLES from SQL_MODE (again) for T338987

Ah, so the SQL mode setting simply does not stick. Running again:

MariaDB [s51138__heritage_p]> SELECT @@SQL_MODE, @@GLOBAL.SQL_MODE;
+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+
| @@SQL_MODE                                                                                | @@GLOBAL.SQL_MODE                                                                         |
+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+
| STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+

Edited the code in place on Toolforge to add a sql_mode argument to the pymysql connection object. If that works out, I’ll submit a Gerrit patch.

Change 951840 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Remove `STRICT_TRANS_TABLES` SQL mode from our MariaDB database

https://gerrit.wikimedia.org/r/951840

Ah, so the SQL mode setting simply does not stick.

Yes, variables like SQL_MODE are known as "session variables" and are only set for that connection, and lost when you close the connection.

Global variables like @@GLOBAL.SQL_MODE do stick, but you need extra permissions to modify them, and we don't want to change it because it would affect all users and all databases.

Harvesting ran today without issue − until the very last step:

2023-08-23_18:22:51 Update monuments_all table...
ERROR 1292 (22007) at line 514: Truncated incorrect DECIMAL value: ''

Line 514 of that one is

/* Canada in English # Federal */
REPLACE INTO
  `monuments_all_tmp` (
    `country`, `lang`, `id`, `adm0`, `adm1`, `adm2`, `name`, `address`, `municipality`, `lat`, `lon`, `lat_int`, `lon_int`, `image`, `wd_item`, `commonscat`, `source`, `changed`, `monument_article`, `registrant_url`
) SELECT
    'ca-fed' AS `country`,
    'en' AS `lang`,
    `idf` AS `id`,
    'ca' AS `adm0`,
    LOWER(`prov_iso`) AS `adm1`,
    `municipality` AS `adm2`,
    `name` AS `name`,
    `address` AS `address`,
    `municipality` AS `municipality`,
    `lat` AS `lat`,
    `lon` AS `lon`,
    ROUND(`lat` * @granularity) AS `lat_int`,
    ROUND(`lon` * @granularity) AS `lon_int`,
    `image` AS `image`,
    `wd_item` AS `wd_item`,
    `commonscat` AS `commonscat`,
    `source` AS `source`,
    `changed` AS `changed`,
    `monument_article` AS `monument_article`,
    `registrant_url` AS `registrant_url`
    FROM `monuments_ca-fed_(en)`
    WHERE NOT (`idf` IS NULL OR `idf`='');

Change 951981 had a related patch set uploaded (by Jean-Frédéric; author: Jean-Frédéric):

[labs/tools/heritage@master] Do not use `STRICT_TRANS_TABLES` SQL mode when filling monuments_all table

https://gerrit.wikimedia.org/r/951981

Grepping through the logs for errors, only 2:

ERROR: Unknown error occurred when processing country de-he in lang de
(1048, "Column 'wd_item' cannot be null")

ERROR: Unknown error occurred when processing country in-com in lang commons
Language 'commons' does not exist in family wikipedia
ERROR: Unknown error occurred when processing country de-he in lang de
(1048, "Column 'wd_item' cannot be null")

This one is a bit weird since as per templates/monuments_all_intro.sql:

`wd_item` varchar(255) DEFAULT NULL,

only the wikidata_table requires wd_item to not be NULL. And this should only be used for sparql confis which de-he_de isn't.

ERROR: Unknown error occurred when processing country in-com in lang commons
Language 'commons' does not exist in family wikipedia

Weird that it flags this but not my or sr which also run on Commons. Unless it is remembering some setting from in_en as it is processing in-com_en

ERROR: Unknown error occurred when processing country in-com in lang commons
Language 'commons' does not exist in family wikipedia

Weird that it flags this but not my or sr which also run on Commons. Unless it is remembering some setting from in_en as it is processing in-com_en

It's because it's the only one with a missingCommonscatPage. Fixing it.

Change 952313 had a related patch set uploaded (by Lokal Profil; author: Lokal Profil):

[labs/tools/heritage@master] Do not assume wikipedia for project

https://gerrit.wikimedia.org/r/952313

Change 952313 merged by jenkins-bot:

[labs/tools/heritage@master] Do not assume wikipedia for project

https://gerrit.wikimedia.org/r/952313

@Lokal_Profil Thanks! I have the two STRICT_TRANS_TABLES open patches manually applied on the server, so I’ll only be able to deploy your changes once they are merged.

@Lokal_Profil Thanks! I have the two STRICT_TRANS_TABLES open patches manually applied on the server, so I’ll only be able to deploy your changes once they are merged.

I'll +2 them. Wanted to check if they were what you had been running on when you made it work

Change 951981 merged by jenkins-bot:

[labs/tools/heritage@master] Do not use `STRICT_TRANS_TABLES` SQL mode when filling monuments_all table

https://gerrit.wikimedia.org/r/951981

Change 951840 merged by jenkins-bot:

[labs/tools/heritage@master] Do not use `STRICT_TRANS_TABLES` SQL mode when connecting to database

https://gerrit.wikimedia.org/r/951840

Should the "SET sql_mode =" patch also be applied to the other sql templates (classic_table.sql.template and wikidata_table.sql.template)?

Should the "SET sql_mode =" patch also be applied to the other sql templates (classic_table.sql.template and wikidata_table.sql.template)?

These are not failing, and are only CREATE statements, so I think they’re fine to keep as is.

JeanFred claimed this task.

Harvesting has been stable for a few days now − closing as Resolved 🎉

Great work @JeanFred ! And many thanks @fnegri for the assist!