At the moment it's 0.
Version: unspecified
Severity: enhancement
At the moment it's 0.
Version: unspecified
Severity: enhancement
Heh, you were the first one adding tables with those fields as NOT NULL DEFAULT '0'.
I agree. Most other fields are NOT NULL with not-set stored as an empty string, but lat and long are doubles.
https://fisheye.toolserver.org/browse/erfgoed/erfgoedbot/sql/fill_table_monuments_all.sql?r1=736&r2=737 isn't very nice. Should probably fix it on the python import bot. If lat=0 and lon=0 insert NULL
Still need to fix some leftovers:
mysql> SELECT COUNT(*) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0;
+----------+
COUNT(*) |
+----------+
186 |
+----------+
1 row in set (12.71 sec)
@Multichill: Is this still valid? Is it simply the scraper which puts empty values as '0'?
Ran the query again
MariaDB [s51138__heritage_p]> SELECT COUNT(*) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0;
+----------+
COUNT(*) |
+----------+
237 |
+----------+
1 row in set (3.20 sec)
So yes, it's still seems to be valid. Judging from the low number this is probably a fault in the source. Split out a bit more:
MariaDB [s51138__heritage_p]> SELECT adm0, COUNT(adm0) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0 GROUP BY adm0;
+------+-------------+
adm0 | COUNT(adm0) |
+------+-------------+
am | 50 |
aq | 3 |
at | 4 |
be | 1 |
de | 56 |
es | 8 |
fr | 52 |
mx | 1 |
nl | 41 |
pa | 2 |
pk | 2 |
pl | 3 |
pt | 1 |
ru | 10 |
sk | 1 |
us | 2 |
+------+-------------+
16 rows in set (1.41 sec)
Change 281930 had a related patch set uploaded (by Lokal Profil):
[NOT TESTED] Re-implement checks to not use globals
One source for this was identified in T134727. Together with the increased use of checkLat/checkLon (in https://gerrit.wikimedia.org/r/281930 once tables get recreated) this should make it easier to identify the problematic values.
The problem still remains though that even after checkLat/checkLon (or any future checks) fail we still have fieldValue = u'' and so it gets added to the SQL (which interprets this as 0).
The easiest solution would probably be to add a loop in fill_table_monuments_all to check if lat=0 and lon=0 (or lat=0 or lon=0 if we believe this is unlikely to happen naturally) and if so replace these by NULL.
UPDATE `monuments_all_tmp` SET `lat`=NULL,`lon`=NULL, `lat_int`=NULL, `lon_int`=NULL WHERE `lat`=0 and `lon`=0;
A nicer solution would be to catch these even before they get added to the country tables.
Ok. This is now better but many are still failing.
I'll add the cleanup bit to monuments_all. I've also taken a look at the source pages which have triggered this. In many the cause is simply that coordinates have been added in dms format or with a template.
MariaDB [s51138__heritage_p]> SELECT COUNT(source), source FROM monuments_all WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source;
@JeanFred The biggest miscreant here are the French lists. Would you mind either fixing them or finding someone on fr.wiki who would be willing to do it. The two issues I've identified is that people use 46/33/49.37/N and {{Coordonnées/dms2dec|46/33/49.37/N}} instead of 46.563714.
Per wiki failing values:
SELECT COUNT(country) as count, CONCAT(country, '-', project) as wiki FROM monuments_all WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY wiki ORDER BY count DESC; +-------+------------------+ | count | wiki | +-------+------------------+ | 219 | fr-wikipedia | | 56 | de-by-wikipedia | | 36 | nl-gem-wikipedia | | 15 | sk-wikipedia | | 6 | ru-wikivoyage | | 2 | ua-wikipedia | | 2 | es-wikipedia | | 1 | ch-wikipedia | | 1 | uy-wikipedia | | 1 | mx-wikipedia | +-------+------------------+
Change 289359 had a related patch set uploaded (by Lokal Profil):
Set empty lat lon to NULL in monuments_all (and wlpa_all)
Change 289359 merged by jenkins-bot:
Set empty lat lon to NULL in monuments_all (and wlpa_all)
SELECT COUNT(source), source FROM monuments_all WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source; now returns an empty set so this task can be closed.
Failing values are reported on discussion pages (example) and still appear in the source tables if one wants to tackle them systematically e.g.
MariaDB [s51138__heritage_p]> SELECT COUNT(source), source FROM `monuments_fr_(fr)` WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source; +---------------+----------------------------------------------------------------------------------------------------------------------------+ | COUNT(source) | source | +---------------+----------------------------------------------------------------------------------------------------------------------------+ | 1 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_l%27Ard%C3%A8che&oldid=126335492 | | 10 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_l%27Aube&oldid=125071516 | | 57 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_La_R%C3%A9union&oldid=125660928 | | 50 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Moulins&oldid=122244819 | | 51 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Denis_%28La_R%C3%A9union%29&oldid=125660953 | | 20 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Paul_%28La_R%C3%A9union%29&oldid=125811460 | | 30 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Pierre_%28La_R%C3%A9union%29&oldid=125811503 | +---------------+----------------------------------------------------------------------------------------------------------------------------+
I'll open tasks for the top 3-4 of these and manually handle the others.