"PHP Warning: data error" from gzdecode() in ApiGraph.php and ApiQueryMapData.php
Closed, ResolvedPublic

Description

Error Message

Warning: data error in /srv/mediawiki/php-1.31.0-wmf.12/extensions/Graph/includes/ApiGraph.php on line 125

Stack Trace

n/a

Notes

116           $ppValue = $this->getDB()->selectField( 'page_props', 'pp_value', [
117                'pp_page' => $title->getArticleID(),
118                'pp_propname' => 'graph_specs',
119            ], __METHOD__ );
120
121            if ( $ppValue ) {
122                // Copied from TemplateDataBlob.php:newFromDatabase()
123                // Handle GZIP compression. \037\213 is the header for GZIP files.
124                if ( substr( $ppValue, 0, 2 ) === "\037\213" ) {
125                     $ppValue = gzdecode( $ppValue );
                }
mmodell created this task.Jan 3 2018, 10:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 3 2018, 10:44 PM
greg triaged this task as High priority.

-> Contributors-Team per [[mw:Dev/Maintainers]]

-> High, this is spamming a lot and looks worrisome at first glance/hard to know what the cause is (if the train or not).

Krinkle renamed this task from data error in ApiGraph.php on line 125 to "PHP Warning: data error" from gzdecode() in ApiGraph.php on line 125.EditedSep 17 2018, 4:17 PM
Krinkle changed Risk Rating from N/A to default.
Krinkle added a subscriber: Krinkle.

Still seen. About 5,000 hits in the last 7 days. Mostly on ruwiki.

Sample
wiki: ruwiki
url: /w/api.php?format=json&formatversion=2&action=graph&title=***%2Fpageview_peaks&hash=***
http_method: GET

PHP Warning: data error


#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array, array)
#1 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Graph/includes/ApiGraph.php(125): gzdecode(string)
#2 /srv/mediawiki/php-1.32.0-wmf.20/extensions/Graph/includes/ApiGraph.php(35): Graph\ApiGraph->getFromStorage(string, string)
#3 /srv/mediawiki/php-1.32.0-wmf.20/includes/api/ApiMain.php(1587): Graph\ApiGraph->execute()
#4 /srv/mediawiki/php-1.32.0-wmf.20/includes/api/ApiMain.php(531): ApiMain->executeAction()
#5 /srv/mediawiki/php-1.32.0-wmf.20/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#6 /srv/mediawiki/php-1.32.0-wmf.20/api.php(87): ApiMain->execute()
Krinkle renamed this task from "PHP Warning: data error" from gzdecode() in ApiGraph.php on line 125 to "PHP Warning: data error" from gzdecode() in ApiGraph.php and ApiQueryMapData.php.Sep 25 2018, 3:17 AM

Similar, presumably related:

url: /w/api.php?format=json&formatversion=2&action=query&titles=**.map&prop=mapdata&mpdlimit=max&mpdgroups=_***

PHP Warning: data error

#1 /srv/mediawiki/php-1.32.0-wmf.22/extensions/Kartographer/includes/ApiQueryMapData.php(47): gzdecode(string)
#2 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiQuery.php(249): Kartographer\ApiQueryMapData->execute()
#3 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(1587): ApiQuery->execute()
#4 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(531): ApiMain->executeAction()
#5 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#6 /srv/mediawiki/php-1.32.0-wmf.22/api.php(87): ApiMain->execute()

It just happened again during EU SWAT.

Still seen from /srv/mediawiki/php-1.32.0-wmf.24/extensions/Kartographer/includes/ApiQueryMapData.php:47.

#1 /srv/mediawiki/php-1.32.0-wmf.24/extensions/Kartographer/includes/ApiQueryMapData.php(47): gzdecode(string)
#2 /srv/mediawiki/php-1.32.0-wmf.24/includes/api/ApiQuery.php(249): Kartographer\ApiQueryMapData->execute()
#3 /srv/mediawiki/php-1.32.0-wmf.24/includes/api/ApiMain.php(1587): ApiQuery->execute()
#4 /srv/mediawiki/php-1.32.0-wmf.24/includes/api/ApiMain.php(531): ApiMain->executeAction()
#5 /srv/mediawiki/php-1.32.0-wmf.24/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#6 /srv/mediawiki/php-1.32.0-wmf.24/api.php(87): ApiMain->execute()
#7 /srv/mediawiki/w/api.php(3): include(string)
TheDJ added a subscriber: TheDJ.EditedOct 5 2018, 10:44 AM

I guess we don't check for max size of property field, the encoded data gets automatically truncated during saving of the varbinary (do we have strict sql?)
I guess that would cause decoding errors. The pageview_peaks incident that @Krinkle listed earlier is a pretty large page, it contains about 1700 graphs......

Bit difficult to figure out though. gzdecode documentation is pretty sucky on warnings and errors.

MSantos moved this task from Unsorted to General on the Maps (Kartographer) board.Oct 9 2018, 3:11 PM
MSantos added a subscriber: MSantos.EditedOct 10 2018, 9:39 PM

I experimented something but it didn't help me to see any solution so far, I am now thinking it can have more than one case and I just got one randomly.

  1. I got a test case from the logs.
  2. Removed the part where it gets external data from the old Map Data structure to the new one
  3. The error wasn't appearing on the logs anymore (Search for Kartographer from 2018-10-10 20:00 to 2018-10-10 21:00)

I kept the old wiki page here for comparison on the logs.

The fun part is that I changed only one of the two maps in the page and the error stopped occurring, so the problem is not the map data source and the new external source didn't decrease its size but maybe it has something to do with JsonConfig data sanitization and compression.

My kibana/logstash skills are limited and I don't know how to find the incidents that @Krinkle listed but I am going to try other ways to debug that. If you have any suggestions, I appreciate.

MSantos claimed this task.Oct 10 2018, 9:39 PM

Here's a few data points.

From https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors, querying message:"ApiQueryMapData.php", last 7 days:

  • By channel: mediawiki/error: 1476 count.
  • By host: (fairly evenly distributed, nothing stands out).
  • By wiki: Only 3 wikis: enwiki (1421), enwikivoyage (52), commonswiki (3).

Then, looking at individual events, here's a few example urls that triggered it recently:

[W75jvQpAAE4AAC0yq3MAAACI] en.wikipedia.org
/w/api.php?format=json&formatversion=2&action=query&titles=User%3A*****&prop=mapdata&mpdlimit=max&mpdgroups=_*******` 

[W75aOApAIDkAADCXYtAAAABY] en.wikipedia.org
/w/api.php?format=json&formatversion=2&action=query&titles=Interstate%2076%20%28Ohio%E2%80%93New%20Jersey%29&prop=mapdata&mpdlimit=max&mpdgroups=_*******

[W71BywrAIFoAAJQ4HxoAAABC] commons.wikimedai.org
/w/api.php?format=json&formatversion=2&action=query&titles=Data%3AInterstate%2095.map&prop=mapdata&mpdlimit=max&mpdgroups=_*******

For more information about any of these, query reqId: followed by the request ID from above.

I guess we don't check for max size of property field, the encoded data gets automatically truncated during saving of the varbinary (do we have strict sql?)
I guess that would cause decoding errors. The pageview_peaks incident that @Krinkle listed earlier is a pretty large page, it contains about 1700 graphs......

Bit difficult to figure out though. gzdecode documentation is pretty sucky on warnings and errors.

We don't have strict sql yet [1] [2]. Confirming what @TheDJ guessed, these pages populate pp_value on page_props table with content that are greater than the max value for blob. From an example [3] we can see that the value truncates on the max possible value.

root@localhost[(none)]> select OCTET_LENGTH(pp_value) from commonswiki.page_props where pp_page = 63398 and pp_propname = 'kartographer';
+------------------------+
| OCTET_LENGTH(pp_value) |
+------------------------+
|                  65535 |
+------------------------+
1 row in set (0.00 sec)

To be honest I don't know what is the best way to handle it. Here are some questions:

  1. Is it desirable to give a feedback to the user and deny data submission that is too long to be stored?
  2. Should the column be changed to mediumblob or longblob? (I don't put faith in that solution)
  3. Or the extensions should be handling large data better?

In Kartographer case it breaks with the snapshot map but works fine with the interactive map.

[1] https://phabricator.wikimedia.org/T108255
[2] https://www.mediawiki.org/wiki/Development_policy#Database_policy
[3] https://commons.wikimedia.beta.wmflabs.org/wiki/Data:Mitchell_Map_1850_Persian_Empire_Map.map

To be honest I don't know what is the best way to handle it. Here are some questions:

  1. Is it desirable to give a feedback to the user and deny data submission that is too long to be stored?

Yes.

  1. Should the column be changed to mediumblob or longblob? (I don't put faith in that solution)

No, that kind of schema change would be very disruptive and that isn't what this column is for.

  1. Or the extensions should be handling large data better?

Yes, but mostly by doing (1).

Yurik added a subscriber: Yurik.Oct 24 2018, 9:09 PM

(1) is not feasible -- if I build a page with the geojson or a complex graph, and some article adds multiple templates (e.g. an article with 10 graphs), the article blob may not fit all the data, causing this issue. In many cases, there is no way to even give feedback to the user, e.g. if the template gets updated after the article is created.

Yes, another reason why using these via templates was always a bad idea, as I said at the time. :-)

Yurik added a comment.Oct 24 2018, 9:19 PM

Ahem... you did not offer any better alternatives :-) The proper solution is to simply change blob to large storage. That said, I don't think page props should be used for it. Instead, it should be a general "large blob storage" key-value store, where different extensions could store hash-based data. This way multiple page revisions could reuse the same blob, and at the same time updates to templates would create new blobs. This would also require usage counting (tricky). There is another big task about this.

TheDJ added a comment.Oct 25 2018, 6:58 AM

@MSantos I guess the most elementary step to be taken here, is that we should handle this more gracefully at write, rather than failing in gzdecode at read, with an error that is hard to trace back to the path causing the error.

@MSantos I guess the most elementary step to be taken here, is that we should handle this more gracefully at write, rather than failing in gzdecode at read, with an error that is hard to trace back to the path causing the error.

^ this

The only graceful thing we can do is something like this:

while (True) {
  gzData = gzip(rawData);
  if (length(gzData) < XXX) break;
  unset(rawData[length(rawData)-1]);  // delete last element
}

Could be optimized further.

Having data that are too large to our database isn't some sort of abuse? If so, shouldn't the user be warned about that? I think that changing the input randomly before storing it can fix the problem of not breaking, but could create another on the UX side.

Yurik added a comment.EditedOct 26 2018, 7:33 PM

More than 64KB is too large? That sounds ... strange when discussing modern systems that are perfectly capable of storing much bigger data rows. Especially when we clearly have a valid use case for it.

We currently do not have any good way to report when page's pageprops exceed this limit because the limit could be exceeded after a template page is saved and the primary page is updated in a follow-up batch job. That's why I proposed as a temporary workaround to remove some json entries, while ensuring valid JSON structure. We may also want to add another page prop or a category to make it easy to find these pages.

More than 64KB is too large?

That's the current limit, we should play with that until it changes.

We currently do not have any good way to report when page's pageprops exceed this limit because the limit could be exceeded after a template page is saved and the primary page is updated in a follow-up batch job.

I understand now why we need a temporary workaround, thank you for the explanation.

That's why I proposed as a temporary workaround to remove some json entries, while ensuring valid JSON structure

One more question: changing the data wouldn't be worst than not storing it at all? To be honest, I can't see value in the data manipulation.

Yurik added a subscriber: MaxSem.Oct 26 2018, 10:02 PM

Adding a new table shouldn't be too hard, just requires someone to work on it :) @MaxSem actually had some prototype of this a few years back :)
Changing data would help because that data is essentially a key-value dictionary of objects. Each object is independent, and represents a separate graph. If there is a giant single graph on the page, this won't help. But if there are multiple large graphs, deleting some of them would make it possible for at least some of them to show up, rather than the whole page to fail.

Krinkle added a comment.EditedOct 26 2018, 10:16 PM

(1) is not feasible --

I would agree that limiting the size isn't always pretty, but it is not infeasible. It's actually quite simple.

Even in the case of a template update, the blob doesn't write itself. The extension explicitly assembles the value and decides to store it. You have control there. Maybew we should throw an exception, or abort the parse, or omit the graph with an error message (for preview) after we accumulate a certain amount of data (like we do with transclusion limits in core), e.g. print only the earlier graphs but not after a certain number of graphs of certain size are reached etc.

TemplateData does this already (example). We can keep looking for long-term solutions, but for the time being, storing more than is allowed is a breach of contract. We need to either explicitly disallow it (disable/abort), or refuse it (gracefully show no graphs), or limit it (no additional graphs after a certain amount has been reached).

Right now the lack of explicit handling for this is implicitly resulting in the second option (refusal, show no graphs), because we write more than fits, the blob is trimmed and becomes invalid, and on retrieval a non-fatal error is encountered.

It's been 10 months since this error was reported. If none of the recommended options appeal to you, I suggest we stick with what we've been doing for the last few years (no graphs shown if size is reached), but do so in a way that doesn't produce errors in the logs that indicate application failure.

@Krinkle the (1) was to give instant feedback to the user -- that's the part that I don't think is feasible in all cases -- because template could grow in size, and blob formation happens post-save. In some cases it might be possible when we are doing page preview to show a warning in case templates already exceed blob size. I do agree the extension should handle it gracefully, and remove some of the data to keep the JSON structure.

Change 470222 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/Graph@master] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/470222

Change 470222 merged by jenkins-bot:
[mediawiki/extensions/Graph@master] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/470222

Krinkle closed this task as Resolved.Oct 28 2018, 10:42 PM
Krinkle removed a project: Patch-For-Review.
Krinkle claimed this task.

note that this won't fix the issue, just hide it from the logs :)

Indeed. The logs are a shared environment (by default anyway). The benefit of centralised monitoring in prod is that problematic deployments can automatically be detected as such, and that unattended spikes result in alarms for ops.

I'd like to formalise that these benefits are given in return for not letting non-issues pollute the logs for many weeks (or months, in this case). This is problematic because it hurts productivity for everyone else. E.g. falsely aborted deployments, false alarms in weekends for ops, stress and uncertainty during maintenance due to errors popping up as potential problems when they are actually known non-issues appearing at low frequency (such as after deployments every week, or during server maintenance or switchovers).

If the cost of these shared benefits can't be paid, we may need to invest in an ability to wholesale disable monitoring for warnings/errors from certain extensions based on paths in the stacktrace or some such. (We can't do it for fatals, but it'd be a start.)

Change 470292 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/Graph@wmf/1.33.0-wmf.1] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/470292

Change 470292 merged by jenkins-bot:
[mediawiki/extensions/Graph@wmf/1.33.0-wmf.1] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/470292

Mentioned in SAL (#wikimedia-operations) [2018-10-28T23:36:28Z] <krinkle@deploy1001> Synchronized php-1.33.0-wmf.1/extensions/Graph: T184128 - I02da92de33 (duration: 00m 58s)

Change 471493 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/extensions/Kartographer@master] Handle gracefully oversized map data

https://gerrit.wikimedia.org/r/471493

Change 479455 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/extensions/Kartographer@master] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/479455

Change 479455 merged by jenkins-bot:
[mediawiki/extensions/Kartographer@master] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/479455

Change 479475 had a related patch set uploaded (by Jforrester; owner: Mholloway):
[mediawiki/extensions/Kartographer@wmf/1.33.0-wmf.8] Suppress gzdecode() warning

https://gerrit.wikimedia.org/r/479475