Page MenuHomePhabricator

Statistics for Canada give errors monuments database
Closed, ResolvedPublic

Description

Canadian monuments give errors when trying to approach statistics:
https://tools.wmflabs.org/heritage/api/api.php?action=statistics&stcountry=ca&format=html&limit=0

linked from https://commons.wikimedia.org/wiki/Commons:Monuments_database/Statistics

(for comparison, ch works fine)

Errors:
Notice: Undefined offset: 3 in /data/project/heritage/heritage/api/includes/Statistics.php on line 109
Notice: Undefined offset: 2 in /data/project/heritage/heritage/api/includes/Statistics.php on line 109
etc (many times)
Warning: Cannot modify header information - headers already sent by (output started at /data/project/heritage/heritage/api/includes/Statistics.php:109) in /data/project/heritage/heritage/api/includes/FormatBase.php on line 22

then below that, some table is printed.

Event Timeline

Can’t reproduce locally because the Statistics table is not exported in the dump, as it seems :-(

Change 378975 had a related patch set uploaded (by Jean-Frédéric; owner: Jean-Frédéric):
[labs/tools/heritage@master] Remove warnings on statistics page when there are undefined fields

https://gerrit.wikimedia.org/r/378975

I have reproduced locally − key is to create the table and run _buildStats.php. Found the issues, submitted a patch.

I've investigated the root cause a bit further. Running the following gives me the occurrences causing the problems

select * from `statistics` where (length(`idx`) - length(replace(`idx`, ':', ''))) < 3 LIMIT 5;

a few examples are:


+------------+------------+------------------------------------------------------------------------------------------------------+-------+
| day | item | idx | value |
+------------+------------+------------------------------------------------------------------------------------------------------+-------+
| 2017-11-13 | st_address | ca:[[Joe Batt's Arm-Barr'd Islands-Shoal Bay, Newfoundland and Labrador|Joe Batt's Arm-Barr'd Island | 0 |
| 2017-11-13 | st_address | ca:[[Lieu historique national du Canada de la Ferme-Experimentale-Centrale / Central Experimental Fa | 0 |
| 2017-11-13 | st_address | ca:[[Lushes Bight-Beaumont-Beaumont North, Newfoundland and Labrador|Lushes Bight-Beaumont-Beaumont | 0 |
+------------+------------+------------------------------------------------------------------------------------------------------+-------+

Essentially all of them have an idx which is larger than the max value of 100 characters.

So our choice is either to bump the idx size from varchar(100) to something large enough (probably a start) or to restrict the allowed size of the municipality field already in makeIdx (the non-municipality bits normally take 16 characters with 25 being the likely max for be-tarask).

Looking at some of the examples (linebreaks in municipalities etc.) just bumping the size will probably still result in breakages.

Change 391139 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] [WIP] Fix bad idx entries

https://gerrit.wikimedia.org/r/391139

I added the following to the StatsBuilder.storeValue() function (which should be the only way of writing to the statistics table) but it doesn't seem to catch anything, indicating that the designed way of building the statistics table should work correctly. And an earlier debug also shows that the value of $index is passed correctly to this function with muni cropped correctly per https://gerrit.wikimedia.org/r/391139.)

if ( ( substr_count( $index, ':' ) != 3 ) ) {
	$this->debug( $index );
}

Since the actual entries found in the db table are still incorrect there must ether be some other function modifying the table or something going wrong on the db side.

@JeanFred: Could you use your docker magic to activate general logging (query logging) in the dev environment? I manage to get docker to find the .cnf file and create the output dir but can't seem to get the logging to actually happen.

@JeanFred: Could you use your docker magic to activate general logging (query logging) in the dev environment? I manage to get docker to find the .cnf file and create the output dir but can't seem to get the logging to actually happen.

Clarification. I get the mariadb to read my .cnf file and with show GLOBAL variables like "general_log%";/show variables like "general_log%"; I see that the settings have been loaded. I get docker to map the output location to an accessible volume. But no log files seem to be created. Neither accessible from outside the container nor when using docker-compose run --rm db bash then navigating to the output directory. I've stuck my settings in a patch.

Edit: Spotted the following in docker-compose logs:
[ERROR] mysqld: File '/var/log/mysql/query.log' not found (Errcode: 13 "Permission denied") so this is back to being a docker issue =)

Change 392652 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] [WIP]Activate general query log in dev mode

https://gerrit.wikimedia.org/r/392652

Change 392652 abandoned by Lokal Profil:
[WIP]Activate general query log in dev mode

Reason:
Never worked and not needed anymore

https://gerrit.wikimedia.org/r/392652

Change 391139 merged by jenkins-bot:
[labs/tools/heritage@master] Fix bad idx entries

https://gerrit.wikimedia.org/r/391139

The coding part should be done.

This still needs:

  • code to be deployed
  • old table to be droped
  • new table to be created and populated

Code deployed. I don't knwo where my brain is misfiring but when looking in the db on toolforge (mysqluse s53806__heritage_p;show tables;) I only see monuments and monuments_cache. Did we at some point move to a different db?@JeanFred

Change 378975 abandoned by Lokal Profil:
Remove warnings on statistics page when there are undefined fields

Reason:
No longer needed with https://gerrit.wikimedia.org/r/#/c/391139/

https://gerrit.wikimedia.org/r/378975

Code deployed. I don't knwo where my brain is misfiring but when looking in the db on toolforge (mysqluse s53806__heritage_p;show tables;) I only see monuments and monuments_cache. Did we at some point move to a different db?@JeanFred

It should have been s51138__heritage_p of course, how could I ever have confused them (or u13367__heritage)

JeanFred triaged this task as Medium priority.Sep 20 2018, 7:05 AM