graphoid: Code stewardship request
Open, NormalPublic

Description

Intro

The Graphoid service [1] was introduced in the infrastructure a few years ago (3 years 8 months per f56e53bb2d). It is meant as a fallback mechanism for clients that don't support javascript or all the new javascript and HTML5 features that the vega library[2] requires. As a fallback mechanism it generates a PNG out of a graph that would normally be served as a set of HTML elements. It is also very useful to keep the data a client needs to download from our servers to a minimum as it allows displaying a graph as png and avoid having to download the entire vega library.

Issues

During the migration of the service to the kubernetes-based deployment pipeline a number of issues became evident which hindered and effectively paused the migration.

The issues identified were:

An unorthodox architecture of the API of the service.

See https://www.mediawiki.org/api/rest_v1/#!/Page_content/get_page_graph_png_title_revision_graph_id for the specification.

The graphoid service fetches the graph from the mediawiki API using essentially mediawiki pages as a data store. However to identify the required graph it requires knowing the identity of the graph which is essentially the hash of the graph.

However, that begs a question. If the caller of the graphoid service API already knows the hash of the graph, that means they either got it via talking to mediawiki, or that they calculated it themselves (aka they already have the entire graph). If they already have the entire graph, why not POST it to the service and obtain the resulting PNG themselves? Things become even more convoluted if the user is mediawiki itself (that's to my knowledge actually the case) at which case having mediawiki instruct another service to create requests back to it (the API endpoint actually) causes a cascading number of requests hitting the mediawiki API. Up to now this hasn't caused an outage, in my opinion simply because of the low traffic the graphoid receives.

This tight coupling of the 2 component (graphoid+mediawiki), makes benchmarking of the service unnecessarily risky and difficult. Having to obtain a number of titles+revisions+hashes beforehand is an unnecessary nuisance. Add in the risk that benchmarking the graphoid service might cause undue load to the mediawiki API and it essentially makes the operating parameters of the service unknown, meaning that any kind of support to it by anyone can be best-effort.

Connections to the public API endpoints

Graphoid does not support our discovery endpoints (e.g. api-rw.discovery.wmnet) but rather talks directly to the public LVS endpoints (en.wikipedia.org). That has a numbers of consequences

  • It makes it difficult to point graphoid to another mediawiki cluster/DC/availability-zone making operations more difficult.
  • Potentially populates the public edge caches with content that may or may not be requested
  • Is prone to cache invalidation issues.

Unconventional protocol schemes

Another result of the need to talk to the mediawiki API and at the same time support for data in it has created an unwieldy number of network protocol schemes. For reasons yet unknown to me, HTTPS urls are(were?) not supported in trusted graphs by vega, nor were protocol relative urls (e.g. //url_to_whatever). Protocol relative urls are already considered an anti-pattern anyway (since 2014) [3], but for some reason support for this was implemented.

An example can be seen at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/graphoid/+/refs/heads/master/config.dev.yaml#72 were 2 extra protocols are defined (wikiuploadraw and wikidatasparql). The only documentation to these is at https://www.mediawiki.org/wiki/Extension:Graph#External_data which is not explain what or why. Looking at the code [4] reveals a whole set of more were originally conceived but are not documented (and probably used).

Dependencies are not versioned

Not all dependencies in package.json are version pinned [5]. This, bundled with the fact the vega library had undergone significant changes, meant that in May 2018, when the first efforts to add graphoid in the service pipeline started failed. In the end the npm shrinkwrap approach was followed in order to succeed, but it's not exactly great.

Rubric

  • Current maintainer

? @Yurik is the original author and maintainer in the past

  • Number, severity and age of known and confirmed security issues

0

  • Was it a cause of production outages or incidents. List them

No, but with such a low rate of requests that's explainable

  • Does it have sufficient hardware resources for now and the near future (to take into account expected usage growth)?

No. The scb cluster where graphoid is currently residing is already deprecated and slated to be decomissioned. Services on that cluster are meant to be deployed on the deployment pipeline and kubernetes.

  • Is it a frequent cause of monitoring alerts that need action, and are they addressed timely and appropriately?

No

  • When it was first deployed to Wikimedia production

2015

  • Usage statistics based on audiences served

https://grafana.wikimedia.org/d/000000068/restbase?panelId=15&fullscreen&orgId=1 points to < 1 rps, https://grafana.wikimedia.org/d/000000021/service-graphoid?panelId=11&fullscreen&orgId=1 points to more, ~ 50rps (that's excluding monitoring)

  • Changes commited in the last 1, 3, 6 and 12 months

1 month. => 1
3 months => 3
6 months => 3
12 months =>6

  • Reliance on outdated platforms (e.g. operating systems)

The service relies on an old version of the vega[2] library, namely vega 1 (1.5.3) and vega2 (2.6.4) as well a few vega plugins. Vega 3 is out and is being evaluated at T172938 which hasn't seen any traffic in almost a year.

  • Number of developers who committed code in the last 1, 3, 6, and 12 months
  1. me (@akosiaris) and @mobrovac for the process of migrating the service to the deployment pipeline. No functionality was added/removed, no bugs fixed, nor anything else.
  • Number and age of open patches

1 and it's a deployment pipeline change

  • Number and age of open bugs

34 tasks at https://phabricator.wikimedia.org/project/board/1191/

  • Number of known dependencies

None.

  • Is there a replacement/alternative for the feature? Is there plan for replacement

Graphoid per [1] is already a fallback mechanism that was meant to address old and non javascript supporting browsers. To my understanding, the plan was always to use it as a stopgap while the browser population caught up with the new features required by vega[2] and then be removed from the infrastructure.

  • Submitter's recommendation (what do you propose be done?)

My personal take would be to investigate whether the service still serves a valuable purpose and based on the numbers decide whether to remove it from the infrastructure or not. If it has served it's function, that's great. If it's still useful and we do decide to keep it around, a significant amount of work will have to done to address the issues I 've highlighted above. Whether that is worth it or not, is a good question.

[1] https://www.mediawiki.org/wiki/Extension:Graph#Graphoid_service
[2] http://vega.github.io/
[3] https://www.paulirish.com/2010/the-protocol-relative-url/
[4] https://github.com/nyurik/mw-graph-shared/blob/master/src/VegaWrapper.js#L193
[5] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/graphoid/+/refs/heads/master/package.json

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Milimetric added a comment.EditedDec 13 2018, 4:48 PM

The reason Graphoid was initially developed was to provide a static image so we wouldn't have to bundle Vega and d3 resource loader modules with every page response. Those would add significant weight, especially now that we support multiple versions. This still seems to be the case. If you load https://www.mediawiki.org/wiki/Extension:Graph and look at the Network traffic, the vega RL module is not fetched. But when you click the "play" button on the interactive graph, then vega is downloaded.

So I think Graphoid is essential for graphs on mediawiki. Also I'm biased because I wrote the first prototype for this, I think graphs are a much needed way to display data. Without them, data would be rendered as static images that make collaboration and updating hard.

So, I think Graphoid and the Graph extension need a permanent home. And I agree the current implementation is too brittle to invest in. If someone decides to adopt Graphs, I'll help in any way I can. If nobody does, I'll attempt to rewrite it by myself and take into consideration all the good points that @akosiaris made here, as well as one additional major design goal (however, this is not ideal, having me as yet another single point of failure for this service):

Static images should be updated as data is updated. So when the dependency service is implemented and goes live, graphs and graphoid should use it to regenerate graphs. Any new implementation should loosely coordinate with the folks working on the dependency service to make sure this is possible.

Yurik added a comment.Dec 13 2018, 5:38 PM

However, that begs a question. If the caller of the graphoid service API already knows the hash of the graph, that means they either got it via talking to mediawiki, or that they calculated it themselves (aka they already have the entire graph). If they already have the entire graph, why not POST it to the service and obtain the resulting PNG themselves? Things become even more convoluted if the user is mediawiki itself (that's to my knowledge actually the case) at which case having mediawiki instruct another service to create requests back to it (the API endpoint actually) causes a cascading number of requests hitting the mediawiki API. Up to now this hasn't caused an outage, in my opinion simply because of the low traffic the graphoid receives.

@akosiaris, the logic in <An unorthodox architecture of the API of the service> is fundamentally flawed. The client only knows about the graph's hash because MediaWiki parser knew the exact graph data, calculated a hash, and stored that data under that hash in a key-value store (page_props), and included that hash in the HTML. Also, this structure is identical to the way maps function -- map data is calculated by the parser, stored in the key-value page_props with the hash as the key, and kartotherian service does exactly the same steps as graphoid - pulls that data out of page_props to render a static image (so that Leaflet libs are not downloaded until the user interacts with the map). In short - the ONLY component that knows what user wants to draw is MediaWiki parser.

In short - every service that wants to draw user data must be somehow aware of what MediaWiki parser does. If the data is small enough, you can decouple it by passing it in the URL (e.g. math). Graphs and maps data is much bigger - hence the only possible solution for the parser is to store the data and pass the key/hash in the URL.

Per what @Milimetric has said - static image generation is a must for speedy page rendering. I do think that wiki pages should be more interactive from the start, but unless you want to significantly increase download size, we need a a static image until the user even scrolls to some graph half-way down the page. So your proposal to remove it is clearly incorrect unless you want to force every client to support recent javascript and good bandwidth.

Yurik added a comment.Dec 13 2018, 7:48 PM

@akosiaris also, please add usage before the Varnish - to see how often graphoid objects are actually requested by the user, rather than how often there is a cache miss. Plus a similar stats for getting map snapshots.

Tgr added a subscriber: Tgr.Dec 13 2018, 10:08 PM

The reason Graphoid was initially developed was to provide a static image so we wouldn't have to bundle Vega and d3 resource loader modules with every page response.

Also it is generally a requirement for page content to be readable without Javascript. Which is relied on even by our own tooling sometimes, e.g. the PDF rendering service loads pages with Javascript disabled.

greg triaged this task as Normal priority.Dec 18 2018, 7:50 PM
akosiaris added a comment.EditedJan 14 2019, 6:33 PM

Some numbers to help inform the decision

Graph usage

Using WMCS resources I extracted the following numbers for the graph extensions. Making the (slightly false[1]) assumption that there is a 1:1 correlation between the presence of the graph_specs property and the page having a graph we obtain the following table showing the population of pages with a graph for the top 50 projects.

WikiNumber of pages
arwiki569237
afwiki66446
huwiki56926
ruwikinews38375
enwiki13760
rowiki11379
ruwiki8670
nowiki8537
urwiki5004
plwiki2835
jawiki2049
dewiki1599
myvwiki1199
frwiki917
azbwiki916
metawiki867
kowiki681
ptwiki656
glwiki402
itwiki375
eswiki312
zhwiki306
cswiki265
mediawikiwiki211
hewiki157
bawiki139
mswiki130
hywiki126
azwiki106
ukwiki100
ckbwiki77
euwiki70
bewiki68
fawiki59
svwiki53
cawiki41
fiwiki37
ttwiki36
simplewiki36
etwiki34
frwiktionary33
thwiki25
nlwiki25
lvwiki25
enwiktionary20
hiwiki19
bnwiki18
testwiki17
bswiki17
frwikiversity15

I 've tried getting a visualization of the entire data set (all 893 projects/dbs my WMCS account had access to) in place, but it's not really useful

as it almost immediately drops to 0. Someone with better visualization skills than me, feel free to create something more meaningful. Anyway my take is that 2 dozens wikis incorporate graphs with about a handful being the heavy users with the first one being an order of magnitude larger than the runner ups.

Note that the data above is here just for informational purposes, this code stewardship request is JUST for graphoid and not the graph extension

Graphoid usage

In the timeperiod of 2019-01-08T00:00:06.000Z to 2019-01-14T15:59:59.000Z, druid's webrequest_sampled_128 returns the following table (this is the entirety of the data, no top 50 this time around)

Wikigraphoid requests
en.wikipedia.org21249
ru.wikipedia.org18697
ja.wikipedia.org16795
de.wikipedia.org4190
fr.wikipedia.org3914
pl.wikipedia.org2756
es.wikipedia.org2155
it.wikipedia.org1850
hu.wikipedia.org1279
ro.wikipedia.org818
zh.wikipedia.org425
el.wikipedia.org280
cs.wikipedia.org269
pt.wikipedia.org171
ko.wikipedia.org136
he.wikipedia.org109
ar.wikipedia.org99
sv.wikipedia.org96
vi.wikipedia.org81
nl.wikipedia.org62
hy.wikipedia.org61
www.mediawiki.org32
uk.wikipedia.org29
eu.wikipedia.org22
bn.wikipedia.org19
fa.wikipedia.org14
id.wikipedia.org14
no.wikipedia.org14
www.wikidata.org14
be.wikipedia.org13
ca.wikipedia.org12
ur.wikipedia.org12
meta.wikimedia.org10
wikipedia.org10
pcd.wikipedia.org9
ru.wikinews.org9
bg.wikipedia.org8
sr.wikipedia.org8
et.wikipedia.org7
af.wikipedia.org6
lv.wikipedia.org5
en.wiktionary.org4
ps.wikipedia.org4
en.wikisource.org3
kk.wikipedia.org3
en.m.wikipedia.org2
fi.wikipedia.org2
gl.wikipedia.org2
hi.wikipedia.org2
se.wikimedia.org2
azb.wikipedia.org1
bcl.wikipedia.org1
bs.wikipedia.org1
ckb.wikipedia.org1
commons.wikimedia.org1
da.wikipedia.org1
fr.wikisource.org1
fr.wikiversity.org1
ka.wiktionary.org1
lt.wikipedia.org1
myv.wikipedia.org1
pl.wikimedia.org1
simple.wikipedia.org1
sk.wikipedia.org1
su.wikipedia.org1

On an unrelated note, I find funny that the top wiki in terms of graphs is very low in requests in graphoid. In fact, I get the impression (could be false) the two ordered datasets don't correlate very well with one another but I think this is a question best left to be answered in some other task.

@Milimetric. I am guessing by the name of the table I should multiply by 128 and divice by ~7 to get the daily amount of requests. Does that sounds correct?

The plot this time around is more telling

. Some 10 wikis make usage of the graphoid service, after that adoption drops sharply.

[1] There are a few pages (my tests at the very least) with older revisions containing graphs but the current one not, at which case the graph_specs page property does not exist in the page. I know of no way to estimate easily the number of such pages.

akosiaris added a comment.EditedJan 14 2019, 6:38 PM

The reason Graphoid was initially developed was to provide a static image so we wouldn't have to bundle Vega and d3 resource loader modules with every page response. Those would add significant weight, especially now that we support multiple versions. This still seems to be the case. If you load https://www.mediawiki.org/wiki/Extension:Graph and look at the Network traffic, the vega RL module is not fetched. But when you click the "play" button on the interactive graph, then vega is downloaded.

@Milimetric you sir, add knowledge that I was lacking. Thanks for that. This information should definitely make it to https://www.mediawiki.org/wiki/Extension:Graph#Graphoid_service.

So I think Graphoid is essential for graphs on mediawiki. Also I'm biased because I wrote the first prototype for this, I think graphs are a much needed way to display data. Without them, data would be rendered as static images that make collaboration and updating hard.

For what is worth, I agree.

So, I think Graphoid and the Graph extension need a permanent home. And I agree the current implementation is too brittle to invest in. If someone decides to adopt Graphs, I'll help in any way I can. If nobody does, I'll attempt to rewrite it by myself and take into consideration all the good points that @akosiaris made here, as well as one additional major design goal (however, this is not ideal, having me as yet another single point of failure for this service):

Which major design goal would that be? /me genuinely interested

I do agree with you that taking alone over the entirety of the project (alongside all your other undertakings nevertheless) is not ideal. There should be a team for this.

Static images should be updated as data is updated. So when the dependency service is implemented and goes live, graphs and graphoid should use it to regenerate graphs. Any new implementation should loosely coordinate with the folks working on the dependency service to make sure this is possible.

Agreed.

akosiaris updated the task description. (Show Details)Jan 14 2019, 6:40 PM

@akosiaris also, please add usage before the Varnish - to see how often graphoid objects are actually requested by the user, rather than how often there is a cache miss. Plus a similar stats for getting map snapshots.

Done. I stuck to the graphoid service public API for now, I am not sure how to get map snapshots to be honest.

The reason Graphoid was initially developed was to provide a static image so we wouldn't have to bundle Vega and d3 resource loader modules with every page response.

Also it is generally a requirement for page content to be readable without Javascript. Which is relied on even by our own tooling sometimes, e.g. the PDF rendering service loads pages with Javascript disabled.

Indeed. One more reason for this tool to be adopted!

CDanis added a subscriber: CDanis.Jan 14 2019, 6:46 PM

@akosiaris, the logic in <An unorthodox architecture of the API of the service> is fundamentally flawed. The client only knows about the graph's hash because MediaWiki parser knew the exact graph data, calculated a hash, and stored that data under that hash in a key-value store (page_props), and included that hash in the HTML. Also, this structure is identical to the way maps function -- map data is calculated by the parser, stored in the key-value page_props with the hash as the key, and kartotherian service does exactly the same steps as graphoid - pulls that data out of page_props to render a static image (so that Leaflet libs are not downloaded until the user interacts with the map). In short - the ONLY component that knows what user wants to draw is MediaWiki parser.

Thanks for adding that information. In hindsight I probably was unclear, sorry about that. My point was the the end user client is being asked to fetch a PNG via an identifier that means nothing to it. It is also very difficult to calculate (I tried!). Such an internal technicality of the graph extension should not be exposed to users of the API.

By the way, page_props.pp_value field is a blob and hence bound to 65k. That does not sound like enough space to store the representation. In fact from what I see at T184128 this has already happened?

In short - every service that wants to draw user data must be somehow aware of what MediaWiki parser does. If the data is small enough, you can decouple it by passing it in the URL (e.g. math). Graphs and maps data is much bigger - hence the only possible solution for the parser is to store the data and pass the key/hash in the URL.

There are different architectures that can accomplish the same thing without this approach, e.g. composition of the end page at a later stage by inclusion of the png (one that is pre generated via the jobqueue asynchronously and stored in swift).

Per what @Milimetric has said - static image generation is a must for speedy page rendering. I do think that wiki pages should be more interactive from the start, but unless you want to significantly increase download size, we need a a static image until the user even scrolls to some graph half-way down the page. So your proposal to remove it is clearly incorrect unless you want to force every client to support recent javascript and good bandwidth.

My proposal was to gauge whether the service is still valuable or not and then decide on a course of action taking into consideration the service currently does not have a maintainer. It was also a not very well informed proposal, thanks for adding the required information, that's what this task was for as far as I am concerned. I am hoping that with the numbers provided we will be able to estimate the value of the service to our projects.

In any case, and with the risk of repeating myself, the service is under a code stewardship request cause it does not have currently a maintainer. If we want to keep it around, we need first and foremost a maintainer. The technical information added in this task is probably of paramount importance to any potential maintainer (and parts of it should make it to https://www.mediawiki.org/wiki/Extension:Graph#Graphoid_service).

Tgr added a comment.Jan 14 2019, 8:29 PM

Are those numbers reliable? Arabic Wikipedia gets about 5M pageviews a day, and it sounds like almost every article has a graph (or maybe it's used on non-article pages?) - compared to that the ~1000 graph views daily seem surreally low.

Are those numbers reliable? Arabic Wikipedia gets about 5M pageviews a day, and it sounds like almost every article has a graph (or maybe it's used on non-article pages?) - compared to that the ~1000 graph views daily seem surreally low.

Good question. I guess the best way to answer is to share the code that generated those numbers and someone reviews it. So here goes: P7986 for the number of graphs per project and P7987 for the requests to graphoid. Note that the latter seems to be a time based table containing only the last week so exact numbers should be not be possible to replicate, but I'd expect the magnitudes to not change.

You did a raise a valid point about arwiki and I 've put some numbers in P7988 . This is however very tangentially related to graphoid, for which this task is about. I would direct any more research into it to different task.

@akosiaris, the logic in <An unorthodox architecture of the API of the service> is fundamentally flawed. The client only knows about the graph's hash because MediaWiki parser knew the exact graph data, calculated a hash, and stored that data under that hash in a key-value store (page_props), and included that hash in the HTML. Also, this structure is identical to the way maps function -- map data is calculated by the parser, stored in the key-value page_props with the hash as the key, and kartotherian service does exactly the same steps as graphoid - pulls that data out of page_props to render a static image (so that Leaflet libs are not downloaded until the user interacts with the map). In short - the ONLY component that knows what user wants to draw is MediaWiki parser.

[...]
By the way, page_props.pp_value field is a blob and hence bound to 65k. That does not sound like enough space to store the representation. In fact from what I see at T184128 this has already happened?

Considering that what do you think about giving a little push on the following discussion? T119043: Graph/Graphoid/Kartographer - data storage architecture

Don't know if that can help with the stewardship review but that is something that IMHO should be on the road-map of the projects Graph/Graphoid/Kartographer.

Which major design goal would that be? /me genuinely interested

The next paragraph in T211881#4820828, the thing you agreed to below:

Static images should be updated as data is updated. So when the dependency service is implemented and goes live, graphs and graphoid should use it to regenerate graphs. Any new implementation should loosely coordinate with the folks working on the dependency service to make sure this is possible.

Agreed.

(I had a colon there, hiding after the parens, sorry for my convoluted grammar)

@akosiaris, the logic in <An unorthodox architecture of the API of the service> is fundamentally flawed. The client only knows about the graph's hash because MediaWiki parser knew the exact graph data, calculated a hash, and stored that data under that hash in a key-value store (page_props), and included that hash in the HTML. Also, this structure is identical to the way maps function -- map data is calculated by the parser, stored in the key-value page_props with the hash as the key, and kartotherian service does exactly the same steps as graphoid - pulls that data out of page_props to render a static image (so that Leaflet libs are not downloaded until the user interacts with the map). In short - the ONLY component that knows what user wants to draw is MediaWiki parser.

[...]
By the way, page_props.pp_value field is a blob and hence bound to 65k. That does not sound like enough space to store the representation. In fact from what I see at T184128 this has already happened?

Considering that what do you think about giving a little push on the following discussion? T119043: Graph/Graphoid/Kartographer - data storage architecture

Reading that task and realizing how much time has passed since the last comment (it's 3 years!) made me sad. I did leave a comment indeed, while also setting it to stalled (after 3 years nonetheless!) but it's becoming apparent to me that the entirety of the graph functionality is in dire need of some love.

In any case, and with the risk of repeating myself, the service is under a code stewardship request cause it does not have currently a maintainer. If we want to keep it around, we need first and foremost a maintainer.

@akosiaris From what I gathered in this discussion, the course of action on this should be to find a Code Steward for graphoid. Funding/resources aside, is there a logical home for graphoid?

The technical information added in this task is probably of paramount importance to any potential maintainer (and parts of it should make it to https://www.mediawiki.org/wiki/Extension:Graph#Graphoid_service).

Agreed. @Milimetric @Yurik Is this something that we can do before finding the Code Steward?

Given the SCB deprecation and the incoming work load to re-architect this to work on Kubernetes to whoever takes on Graphoid maintenance, maybe it would be useful to outline what that work would be?

The graphoid repo itself doesn't really seem that complicated, given the only thing it seems to do is render vega data models. There is not much code in the repo either.

Please let me know if I'm misunderstanding something but the trickier issue seems to be that vega will do data fetching for you to fully expand certain data models depending on how they are configured (1)? If this is the case then that seems to be core feature of the service.

  • What is the solution for a service that needs to make external requests when deployed to kubernetes?
  • Can this be solved at the infrastructure level and configuration of the service, or does it need to be resolved by changing the core feature of the service itself?

I personally fail to see how this would be refactored, the only thing that comes to mind is parsing the vega model in PHP before rendering the fallback image and expanding any data urls to have the actual data before printing a URL in the HTML, but that is a terrible use of PHP and then the service itself would render some vega models only.

Sorry if I'm missing something, but understanding what the future maintainers are going to be asked to do with the project seems quite relevant to considering on taking on its maintenance.

Hi all - I was aware of this task but hadn't been following it. But it was brought to my attention as having some momentum, so here I am! I have some information I can dredge up that I think may help shed some light on some paths forward. I also want to check in with some product and design people about any sense on forthcoming product interventions in the area of interactive or, for that matter, materialized graphs.

In any case, and with the risk of repeating myself, the service is under a code stewardship request cause it does not have currently a maintainer. If we want to keep it around, we need first and foremost a maintainer.

@akosiaris From what I gathered in this discussion, the course of action on this should be to find a Code Steward for graphoid.

Yes, agreed.

Funding/resources aside, is there a logical home for graphoid?

I don't think I can answer that :(

Given the SCB deprecation and the incoming work load to re-architect this to work on Kubernetes to whoever takes on Graphoid maintenance, maybe it would be useful to outline what that work would be?

Definitely,

The graphoid repo itself doesn't really seem that complicated, given the only thing it seems to do is render vega data models. There is not much code in the repo either.

Please let me know if I'm misunderstanding something but the trickier issue seems to be that vega will do data fetching for you to fully expand certain data models depending on how they are configured (1)? If this is the case then that seems to be core feature of the service.

I am not sure I understand the question. Are you asking if being able to fetch graph/data model specs from a URL is a problem or a feature (or even both :) )?

  • What is the solution for a service that needs to make external requests when deployed to kubernetes?

Define external. External as in outside the service or outside the WMF infrastructure? Note that kubernetes doesn't really change much the way requests are being made, aside from a few hardening details. In kubernetes we block all outgoing requests and only allow specific destinations. If above said destinations are in the WMF infrastructure then we add them to the whitelist. Specifically for mediawiki, we want the service to use our internal discovery URLs (https://api-rw.discovery.wmnet) with the corresponding HTTP Host: header for the project being accessed, for easier operational actions (being able to depool an entire DC for a service) and avoiding of polluting the edge caches by mistake. If the requests are outside of the WMF infrastructure they need to go via a proxy (also to be added to the whitelist). This is to decrease the impact of a compromised service.

By the way, any kind of changes/work would be probably unrelated to kubernetes. The execution patterns (the extra hardening describe above aside) are more or less the same as far as the software is concerned.

  • Can this be solved at the infrastructure level and configuration of the service, or does it need to be resolved by changing the core feature of the service itself?

Again I am not sure I understand (probably because I am lacking some part of the picture). What kind of configuration changes would you see on the service that solves the problem?

I personally fail to see how this would be refactored, the only thing that comes to mind is parsing the vega model in PHP before rendering the fallback image and expanding any data urls to have the actual data before printing a URL in the HTML, but that is a terrible use of PHP and then the service itself would render some vega models only.

I don't have a great answer either (again probably because I am missing parts of the picture). Assuming time, resources and the capacity to change it, I would probably try to slowly refactor graphoid to become more of a lambda function that just consumes data models via POST and returns the images, probably storing those images in Swift, with mediawiki (or any other interested client in the future) just POSTing via the jobqueue (and that's already an optimization to avoid timeouts).

But that's the greater overall architectural non fully fleshed out vision I have in mind and there's probably a lot of details and the devils is in them. I expect there is functionality that would either need to be dropped or refactored in this vision.

That being said there seems to be a lot of other work even without changing the architecture that the maintaining team would need to do, namely upgrading the vega library, sorting out and upgrading dependencies, using well pinned versions in package.json and not pulls from github master branches, fixing the public API endpoints problem.

Sorry if I'm missing something, but understanding what the future maintainers are going to be asked to do with the project seems quite relevant to considering on taking on its maintenance.

Fully agreed. I 'd like to help as much as possible to figure that out, but again I don't have the full picture either.