Page MenuHomePhabricator

Lat/lon should be NULL when empty
Closed, ResolvedPublic

Description

At the moment it's 0.


Version: unspecified
Severity: enhancement

Related Objects

Mentioned In
T135990: Fix wrongly formatted coordinates in fr_(fr)
T135989: Fix wrongly formatted coordinates in nl-gem_(nl)
T135987: Fix wrongly formatted coordinates in de-by_(de)
T135988: Fix wrongly formatted coordinates in sk_(de)
T55688: Statistics module uses country field instead of lang field to link to Wikipedia
T135502: Undefined index: project in /data/project/heritage/heritage/api/includes/FormatHtml.php
T134727: Strip comments from fields during harvest
T39974: <s>Add</s> Remove monument_random field + index to the monuments_all table
Mentioned Here
T55688: Statistics module uses country field instead of lang field to link to Wikipedia
T135502: Undefined index: project in /data/project/heritage/heritage/api/includes/FormatHtml.php
rTHER39780e212438: Correcting field matchings for two fr.wiki templates
rTHER977c07f10f57: Change i18n message to allow non-Wikipedia projects
rTHER5f4532ca3a69: Standardise php whitespace to tab
rTHERb7b297baf633: Add lang and project to statistic reports
rTHER476267f1d9f7: Set empty lat lon to NULL in monuments_all (and wlpa_all)
T134727: Strip comments from fields during harvest
T39974: <s>Add</s> Remove monument_random field + index to the monuments_all table
rTHER2d3ee40f672b: [NOT TESTED] Remove monument_random from database and api.
rTHER766d814b90ef: Add lang argument where pywikibot.Site().language() was used
rTHERe2fac07f9ba8: Add test for checkLat/checkLon and test checkers/converters are triggered
rTHERd2c242ac804b: Re-implement checks to not use globals

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:22 AM
bzimport set Reference to bz37422.
bzimport added a subscriber: Unknown Object (MLST).

Heh, you were the first one adding tables with those fields as NOT NULL DEFAULT '0'.

I agree. Most other fields are NOT NULL with not-set stored as an empty string, but lat and long are doubles.

Still need to fix some leftovers:

mysql> SELECT COUNT(*) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0;
+----------+

COUNT(*)

+----------+

186

+----------+
1 row in set (12.71 sec)

@Multichill: Is this still valid? Is it simply the scraper which puts empty values as '0'?

Ran the query again

MariaDB [s51138__heritage_p]> SELECT COUNT(*) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0;
+----------+

COUNT(*)

+----------+

237

+----------+
1 row in set (3.20 sec)

So yes, it's still seems to be valid. Judging from the low number this is probably a fault in the source. Split out a bit more:

MariaDB [s51138__heritage_p]> SELECT adm0, COUNT(adm0) FROM monuments_all WHERE lat='' OR lon='' OR lat=0 OR lon=0 GROUP BY adm0;
+------+-------------+

adm0COUNT(adm0)

+------+-------------+

am50
aq3
at4
be1
de56
es8
fr52
mx1
nl41
pa2
pk2
pl3
pt1
ru10
sk1
us2

+------+-------------+
16 rows in set (1.41 sec)

Change 281930 had a related patch set uploaded (by Lokal Profil):
[NOT TESTED] Re-implement checks to not use globals

https://gerrit.wikimedia.org/r/281930

Change 281930 merged by jenkins-bot:
Re-implement checks to not use globals

https://gerrit.wikimedia.org/r/281930

One source for this was identified in T134727. Together with the increased use of checkLat/checkLon (in https://gerrit.wikimedia.org/r/281930 once tables get recreated) this should make it easier to identify the problematic values.

The problem still remains though that even after checkLat/checkLon (or any future checks) fail we still have fieldValue = u'' and so it gets added to the SQL (which interprets this as 0).

The easiest solution would probably be to add a loop in fill_table_monuments_all to check if lat=0 and lon=0 (or lat=0 or lon=0 if we believe this is unlikely to happen naturally) and if so replace these by NULL.

UPDATE `monuments_all_tmp` SET `lat`=NULL,`lon`=NULL, `lat_int`=NULL, `lon_int`=NULL WHERE `lat`=0 and `lon`=0;

A nicer solution would be to catch these even before they get added to the country tables.

Ok. This is now better but many are still failing.

I'll add the cleanup bit to monuments_all. I've also taken a look at the source pages which have triggered this. In many the cause is simply that coordinates have been added in dms format or with a template.

MariaDB [s51138__heritage_p]> SELECT COUNT(source), source FROM monuments_all WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source;

@JeanFred The biggest miscreant here are the French lists. Would you mind either fixing them or finding someone on fr.wiki who would be willing to do it. The two issues I've identified is that people use 46/33/49.37/N and {{Coordonnées/dms2dec|46/33/49.37/N}} instead of 46.563714.

Per wiki failing values:

SELECT COUNT(country) as count, CONCAT(country, '-', project) as wiki FROM monuments_all
WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY wiki ORDER BY count DESC;
+-------+------------------+
| count | wiki             |
+-------+------------------+
|   219 | fr-wikipedia     |
|    56 | de-by-wikipedia  |
|    36 | nl-gem-wikipedia |
|    15 | sk-wikipedia     |
|     6 | ru-wikivoyage    |
|     2 | ua-wikipedia     |
|     2 | es-wikipedia     |
|     1 | ch-wikipedia     |
|     1 | uy-wikipedia     |
|     1 | mx-wikipedia     |
+-------+------------------+

Change 289359 had a related patch set uploaded (by Lokal Profil):
Set empty lat lon to NULL in monuments_all (and wlpa_all)

https://gerrit.wikimedia.org/r/289359

Change 289359 merged by jenkins-bot:
Set empty lat lon to NULL in monuments_all (and wlpa_all)

https://gerrit.wikimedia.org/r/289359

SELECT COUNT(source), source FROM monuments_all WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source; now returns an empty set so this task can be closed.

Failing values are reported on discussion pages (example) and still appear in the source tables if one wants to tackle them systematically e.g.

MariaDB [s51138__heritage_p]> SELECT COUNT(source), source FROM `monuments_fr_(fr)` WHERE (lat='' AND lon='') OR (lat=0 AND lon=0) GROUP BY source;
+---------------+----------------------------------------------------------------------------------------------------------------------------+
| COUNT(source) | source                                                                                                                     |
+---------------+----------------------------------------------------------------------------------------------------------------------------+
|             1 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_l%27Ard%C3%A8che&oldid=126335492                   |
|            10 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_l%27Aube&oldid=125071516                           |
|            57 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_La_R%C3%A9union&oldid=125660928                    |
|            50 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Moulins&oldid=122244819                            |
|            51 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Denis_%28La_R%C3%A9union%29&oldid=125660953  |
|            20 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Paul_%28La_R%C3%A9union%29&oldid=125811460   |
|            30 | //fr.wikipedia.org/w/index.php?title=Liste_des_monuments_historiques_de_Saint-Pierre_%28La_R%C3%A9union%29&oldid=125811503 |
+---------------+----------------------------------------------------------------------------------------------------------------------------+

I'll open tasks for the top 3-4 of these and manually handle the others.