Page MenuHomePhabricator

Graph not displayed if linked to a wikidata query
Open, HighPublic

Description

Hello, I'd like to trace the fact that the graph are no longer working since few days.
If graph is linked to raw data, it will show OK. (like there Original MediaWiki Template )

If graph is linked to a wikidata query (like other examples in the same page Original MediaWiki Template ) , it won't show [if you hit preview, you see the graph, graph being disappearing if wikipage saved].

The problem is reproduced whichever wiki language.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2019, 11:07 AM
TheDJ added a subscriber: TheDJ.Jun 21 2019, 12:18 PM

please provide a link where you experience the issue.

TheDJ added a comment.EditedJun 21 2019, 12:56 PM

The image returns:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error","method":"get","uri":"/www.mediawiki.org/v1/page/graph/png/Template%3AGraph%3ALines/0/1d5fea924d176cd28d92d2c76a7084a5e83c9563.png"}

Do you see why, when you modify something minor, you hit "preview", the graph is built and seeable, and when you save, you have a blank image?

TheDJ added a comment.Jun 21 2019, 1:10 PM

In preview you always have JS based graphs, while after saving, you get a prerendered graph. Something is wrong in the pre rendering, not sure what.

In any wikilanguage, the code won't work and the image will link to something like ""https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error"".

Bouzinac removed Yurik as the assignee of this task.Jun 22 2019, 12:03 PM
Bouzinac updated the task description. (Show Details)
Bouzinac added a subscriber: Yurik.
Bouzinac updated the task description. (Show Details)Jun 22 2019, 12:06 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJun 22 2019, 12:06 PM
Aklapper renamed this task from Extension Graph won't show if linked to a wikidata query to Graph not displayed if linked to a wikidata query.Jun 22 2019, 1:04 PM
Ayack added a subscriber: Ayack.Jun 23 2019, 4:18 PM

@Smalyshev Can you have a look at this? Is it maybe related to the recent changes on json output format?

Output format change was undone, so I don't think it relates to output format. I don't have much visibility into what's happening inside graph parts though, so not sure. Maybe @Yurik can be of any help?

Yurik added a comment.Jul 2 2019, 8:21 PM

@Smalyshev i would love to help, but it is a bit hard to say without access to the logs or the servers

I am not even sure who to ask... Whoever owns Graphoid service I guess? Maybe Services team?

I am not even sure who to ask... Whoever owns Graphoid service I guess? Maybe Services team?

Please see https://www.mediawiki.org/wiki/Developers/Maintainers

So, Maintainers page says https://www.mediawiki.org/wiki/Wikimedia_Services. Which is listed as obsolete and has no mention of Graphoid and refers to https://www.mediawiki.org/wiki/Core_Platform_Team.

Also, judging from T211881: graphoid: Code stewardship request there's no real maintainer for this component now.

So, right here I am not sure what to do, unfortunately. I would be glad to investigate it on Wikidata-Query-Service side but for that I'd need to know which query is issued and what happens with it - and I have no visibility and no knowledge about it. Sorry.

Yurik added a comment.Jul 2 2019, 9:36 PM

@Smalyshev the query should be identical as being issues from the browser when you do a "page preview" with a graph that uses WDQS query. Graphoid does exactly the same steps as the browser - essentially making all the requests and putting together a resulting image.

The devil is in the details - how exactly graphoid does it? Does it provide proper user agent? Which headers it sends? What data it receives and how it parses it? On any of these stages could be the problem that leads to breakage, but I am not sure which one and how to check.

Yurik added a comment.Jul 2 2019, 9:49 PM

Thanks, it makes sense. I was only suggesting to search the logs for the same query as being ran from the in-browser's page preview.

Hello, it's all the more bizarre as it's working here https://fr.wikipedia.org/wiki/Modèle:Aéroport-Statistiques/Sandbox (with a play button : it's linked to a Wikidata query and not calling templates Graph, calling directly extension graph). This case is all the more puzzling as https://fr.wikipedia.org/wiki/Modèle:Aéroport-Statistiques/Sandbox2 is a very simple Vega graph with raw fixed data and a play button : not working ….

Pamputt added a subscriber: Pamputt.Jul 3 2019, 6:08 PM
Pchelolo moved this task from Backlog to watching on the Services board.Jul 9 2019, 12:02 PM
Pchelolo edited projects, added Services (watching); removed Services.

Couldn't debug for too long but it looks to me like either the spec is wrong or the wrong version of vega is being loaded to handle it, because the parser is doing a forEach on what should be a JSON object.

EvanProdromou added a subscriber: EvanProdromou.

For now, I'll take a look to see who on our team can take care of this.

Fetching the graph from the referenced page is making a request to https://www.mediawiki.org/api/rest_v1/page/graph/png/Template%3AGraph%3ALines/0/1d5fea924d176cd28d92d2c76a7084a5e83c9563.png

It creates the following log in graphoid:

[2019-07-24T18:40:21.050Z] ERROR: graphoid/62 on scb1001: Message not supplied (domain=www.mediawiki.org, format=png, title=Template:Graph:Lines, revid=0, id=1d5fea924d176cd28d92d2c76a7084a5e83c9563.png, levelPath=error/vega, request_id=7d9d19b0-ae42-11e9-8ac6-97f379e2517d)
    apicall: {
      "format": "json",
      "formatversion": "2",
      "action": "graph",
      "title": "Template:Graph:Lines",
      "hash": "1d5fea924d176cd28d92d2c76a7084a5e83c9563"
    }
    --
    vegaErr: Error: Load failed with response code 403.
        at maybeWrapAsError (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/bluebird/js/release/util.js:61:12)
        at /srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/bluebird/js/release/nodeback.js:38:50
        at done (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/vega/src/parse/spec.js:102:26)
        at onError (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/vega/src/parse/data.js:13:5)
        at /srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/vega/src/parse/data.js:19:9
        at VegaWrapper.dataParser (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/mw-graph-shared/src/VegaWrapper.js:384:5)
        at cb (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/mw-graph-shared/src/VegaWrapper.js:56:25)
        at Request._callback (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/datalib/src/import/load.js:167:7)
        at Request.self.callback (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/request/request.js:186:22)
        at emitTwo (events.js:106:13)
        at Request.emit (events.js:191:7)
        at Request.<anonymous> (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/request/request.js:1163:10)
        at emitOne (events.js:96:13)
        at Request.emit (events.js:188:7)
        at IncomingMessage.<anonymous> (/srv/deployment/graphoid/deploy-cache/revs/7979a400e0042f1bf568d6638f62058a60cde288/node_modules/request/request.js:1085:12)
        at IncomingMessage.g (events.js:292:16)
    --
    request: {
      "url": "/www.mediawiki.org/v1/png/Template%3AGraph%3ALines/0/1d5fea924d176cd28d92d2c76a7084a5e83c9563.png",
      "headers": {
        "user-agent": "curl/7.38.0",
        "x-request-id": "7d9d19b0-ae42-11e9-8ac6-97f379e2517d"
      },
      "method": "GET",
      "params": {
        "0": "/www.mediawiki.org/v1/png/Template:Graph:Lines/0/1d5fea924d176cd28d92d2c76a7084a5e83c9563.png"
      },
      "query": {},
      "remoteAddress": "127.0.0.1",
      "remotePort": 50932
    }

I don't know enough about graphoid to try digging deeper.

Yurik added a comment.Jul 24 2019, 7:17 PM

vegaErr: Error: Load failed with response code 403. -- Vega attempts to call Wikidata API to get the needed data, and I suspect that API returns 403. I would look at the HTTP request Vega makes (it should be very similar to the query stored in the graph on the wiki page), and try to find it in WDQS logs. Perhaps WDQS now blocks some HTTP requests that do not appear to originate from the browser (i.e. have fewer headers than expected)?

Can we verify that Vega sets proper user agent (and by that I mean some string that identifies it) when sending queries to WDQS?

Smalyshev triaged this task as High priority.Jul 25 2019, 12:28 AM

Hello @Yurik, is there any update on your side? Thanks :)

Yurik added a comment.Mon, Jul 29, 2:57 PM

@Lea_Lacroix_WMDE not from my side - I'm a bit overbooked at the moment with my main job (elastic.co) and family. It will take me some effort to get the system running again on my laptop to see what Graphoid sends to the servers. It might be easier to track it from the server side logs if anyone has that access.

WDoranWMF removed EvanProdromou as the assignee of this task.Mon, Jul 29, 2:58 PM

@Yurik Thanks for your answer. We can start by investigating on the user-agent header issue possibility. I created [[ T229236 | a ticket ]].

Change 526442 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/graphoid@master] Improve user agent

https://gerrit.wikimedia.org/r/526442

I looked at webrequest for four hours around the time of Petr's post, and I couldn't see any 403s to wikidata.org/w/api.php. If someone could know when the error would show up, you could find it in the webrequest table very easily:

select *
  from wmf.webrequest
 where uri_host = 'www.wikidata.org'
   and uri_path = '/w/api.php'
   and http_status = '403'
   and year=2019 and month=7 and day=24 and hour in (16, 17, 18, 19)
 limit 200;

@Milimetric I believe graphoid will not go via varnish, but directly to an app server, so no surprise you didn't find the request.

Ok, I went through kafka stream and this is what graphoid is calling in terms of MW API:

curl 'https://mediawiki.org/w/api.php?action=graph&format=json&hash=1d5fea924d176cd28d92d2c76a7084a5e83c9563&title=Template:Graph:Lines&formatversion=2'

which returns some JSON with the following URI:

wikidatasparql:///?query=SELECT%20%3Fdecade%20%28COUNT%28%3Fdecade%29%20AS%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ3305213%20.%0A%20%20%3Fitem%20wdt%3AP571%20%3Finception%20.%0A%20%20BIND%28%20year%28%3Finception%29%20as%20%3Fyear%20%29.%20%0A%20%20BIND%28%20ROUND%28%3Fyear%2F10%29%2A10%20as%20%3Fdecade%20%29%20.%0A%20%20FILTER%28%20%3Fyear%20%3E%201400%29%0A%7D%20GROUP%20BY%20%3Fdecade%20ORDER%20BY%20%3Fdecade
Yurik added a comment.Thu, Aug 1, 5:28 PM

@Pchelolo Graphoid first calls the action=graph to get the data, but then it should also call to the WDQS directly using that query. Also, you can see what that request looks like if you go to the wiki page with a graph, click edit source, and do a page preview -- your browser should make very similar request to WDQS, except that unlike Graphoid, browser forces a few headers like user agent (IIRC)

TL;DR; the user agent is not set, it just shows up as - so that's what WDQS sees. From what @Smalyshev says above, this is what's causing the 403s, right?

Queries to find this: doh, my bad, of course these are on the query.wikidata.org domain, so:

 select *
   from wmf.webrequest
  where uri_host = 'query.wikidata.org'
    and http_status = '403'
    and uri_query like '?query=%decade%'
    and year=2019 and month=7 and day=24 and hour in (16, 17, 18, 19)
    and webrequest_source='text'
    limit 200
;

Which shows me 14 requests, I'm assuming the ones made above for testing. The user agent didn't get parsed, so I sudo -u analytics and look in wmf_raw.webrequest for just hour 18 which had the most hits:

ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

 select *
   from wmf_raw.webrequest
  where uri_host = 'query.wikidata.org'
    and http_status = '403'
    and uri_query like '?query=%decade%'
    and year=2019 and month=7 and day=24 and hour = 18
    and webrequest_source='text'
    limit 200
;

the user agent is not set, it just shows up as - so that's what WDQS sees

In this case you'd get 403, yes. This needs to be fixed.

Milimetric added a comment.EditedThu, Aug 1, 8:13 PM

Vega seems to allow you to control headers via the dataHeaders property, but it's not really documented, I found it here: https://github.com/vega/vega/blob/af5cc1df42eb5aaf2f478d0bda69313643fe0532/docs/releases/v1.5.4/vega.js#L378

If that's right, then I guess the code to update would be initVega for v1 and v2:

@Yurik does that sound right? I've no time to babysit a patch, but it seems easy to try/test.

Yurik added a comment.Thu, Aug 1, 8:44 PM

Vega1 doesn't need it - it doesn't support external URLs. Vega2 approach sounds correct. Thx for digging into it!