Page MenuHomePhabricator

Create {{#get_program_data:}}
Open, Needs TriagePublicFeature

Description

Create {{#get_program_data:}} parser function to transclude output of certain programs run server-side.

The programs that can be run and their parameters should be defined in the extension settings. All user input in {{#get_program_data:}} should be sanitised, and the rules for this sanitisation should be configurable in LocalSettings.php.

Programs' output is to be cached in edg_url_cache or similar table according to the extension settings.

Possible use cases: emulating the defunct GraphViz extension, as well as Score or Timeline; importing data from man, etc.

Event Timeline

alex-mashin renamed this task from Create {{#get_exe_data:}} to Create {{#get_program_data:}}.Aug 25 2021, 2:20 AM
alex-mashin updated the task description. (Show Details)

Change 714665 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Add {{#get_program_data:}} and mw.ext.externalData.getProgramData

https://gerrit.wikimedia.org/r/714665

This patch is very interesting, and it will be great to get this functionality available to MediaWiki users in one way or another. I'm wondering, though, specifically about the functionality to display graphs and other images. It doesn't really fit in with the concept of "External Data" - it's not retrieving data from elsewhere; rather, it's just using external code to display data stored locally on the wiki.

Do you know about the Flex Diagrams extension?

https://www.mediawiki.org/wiki/Extension:Flex_Diagrams

It's another of my extensions (from about a year ago), and it may be a better fit for this functionality like GraphViz, Ploticus, etc. I think all this graphing makes more sense in FD than in ED. It's a different approach, though. What do you think?

Flex Diargams is an interesting extension, too, though its concept is very different from {{#get_program_data:}}. I will consider adding software used by that extension as new examples to {{#get_program_data:}}. I am not sure that the opposite can be easily done, though. Interactive editing is a magnificent feature, although not so relevant, when diagrams are produced from SMW queries or other external data, which is the use case that I am accustomed to (in my next commit, there is an example of automatically drawn class inheritance diagram for ExternalData).

Some of the examples in the commit message for {{#get_program_data:}} (GraphViz, ploticus and Lilypond) were mainly chosen because those programs are or were used by WikiMedia projects via other extensions (I would add MathJax if I had thought of it earlier). This choice does not imply that displaying graphical data (also as a fallback solution when a dedicated extension goes defunct, as happened to GraphViz) is the main task of {{#get_program_data:}}. This is only a bonus.

The main task of this feature is to provide access to what I would call "reference applications", which extract data already present in the operating system (and I would call it external regarding the wiki): man, whatis, whois, nslookup, geoiplookup, locale -k, apt -show, apt-cache showpkg, apt-cache depends, composer show, etc.

So, in my opinion, {{#get_program_data:}} and FlexData are two distinct things, only potentially intersecting.

Sorry, I should have been clearer - I understand that #get_program_data can be used to get true external data from the server, such as the results of an "ls" call. My concern was just about the part of the patch that handles visualizations - it really feels different from the rest of External Data, both in its aims and its syntax.

You're right, though, that Flex Diagrams wouldn't work for displays of dynamic data, such as the result of an SMW query.

Or I should say, Flex Diagrams as it's currently structured wouldn't work for that. But maybe the best approach is to add this functionality to Flex Diagrams - so you could have a call like:

{{#display_diagram:
format=graphviz
|layout=dot
|data=digraph Example1{
    bgcolor = "transparent";
    label = "Human Life";
    splines = ortho;
    ...
}}

Basically, any diagram format that Flex Diagrams supported could either be stored in a standalone wiki page, or directly within the display parser function. This would prevent having to have two copies of every JS library or other display-related code, one in External Data and one in Flex Diagrams. And I think it would be easier to understand for admins as well: Flex Diagrams is the extension for displaying diagrams, External Data is the extension for displaying... well, external data.

What do you think?

My concern was just about the part of the patch that handles visualizations

It's only the short EDParserFunctions::doExternalValueRaw that implements the <external> tag and $parser->setHook( 'external', [ 'EDParserFunctions', 'doExternalValueRaw' ] );. You don't even need to document it; or you can make it off by default (e.g., $edgExeRawTag = false;).

it really feels different from the rest of External Data, both in its aims and its syntax.

Speaking of aims: the distinction between "data" and "service" is not as sharp as it may seem. Take {{#get_web_data:}}: how do you know that it is used to fetch data and not to call some online converter?

Or I should say, Flex Diagrams as it's currently structured wouldn't work for that. But maybe the best approach is to add this functionality to Flex Diagrams - so you could have a call like:

To add this functionality to Flex Diagrams will take many more lines of patch than would be removed from ExternalData if the "visualisation" part is removed; and I suspect, little will remain of Flex Diagrams as it is after that.

{{#display_diagram:

Should be a tag, or wiki parser will mess it up.

This would prevent having to have two copies of every JS library or other display-related code, one in External Data and one in Flex Diagrams.

ExternalData is not meant to contain its own copy of the software that it accesses. Even if it is a node.js module brought by Flex Diagrams, I think it can invoke it from where it is.

And I think it would be easier to understand for admins as well: Flex Diagrams is the extension for displaying diagrams, External Data is the extension for displaying... well, external data.

Visualisation features of ExternalData may remain esoteric, available only to wiki admins whose spirit is ready.

So, what you are talking about is more lightweight than I thought it was. Still, the whole visualization thing is not quite just a side-effect of #get_program_data, because you have added to the code at least two utility functions that help to process GraphViz and SVG data. Maybe it's not a big deal, but... it's something.

Why do you think #display_diagram in Flex Diagrams would need to be a tag function instead? And why do you say "little will remain of Flex Diagrams as it is after that"? Just calling, say, GraphViz on the server doesn't sound like a big deal. Maybe I'm missing something?

Still, the whole visualization thing is not quite just a side-effect of #get_program_data, because you have added to the code at least two utility functions that help to process GraphViz and SVG data.

Those functions (EDConnectorExe::edfWikilinks4dot and EDConnectorExe::edfInnerXML) are not critical to the functionality; they are there only to make setting $edgExePreprocess and $edgExePostprocess in LocalSettings.php easier for the wiki admin.

Why do you think #display_diagram in Flex Diagrams would need to be a tag function instead?

To stop wiki parser from further processing it, which may lead to < being replaced with &lt; in XML tags. As far as I know, using a tag with 'markerType' => 'nowiki' in the return statement is the only way; I tried to implement is as a parser function, but with no success.

And why do you say "little will remain of Flex Diagrams as it is after that"?

I suppose that deep integration of a universal framework for calling programs will require refactoring of the Flex Diagrams making much of its code redundant; and there will be code applicable only to the old functionality of Flex Diagram, like visual editing.

Just calling, say, GraphViz on the server doesn't sound like a big deal. Maybe I'm missing something?

Flex Diagrams integrates the JavaScript libraries used to render some kinds of diagrams. {{#get_program_data:}} is not shipped with any software and makes no assumptions about what software wiki admin will install and bind to {{#get_program_data:}}. The approaches are too different for integration, I am afraid.

Okay, thank you for these explanations. I guess I thought functions like edfWikilinks4dot() were just the "tip of the iceberg" for what you wanted to add to External Data, but now I see that this is pretty much it, as far as data visualization.

Looking more through the patch, I do have two other questions/comments:

  • I think <externalvalue> makes more sense as a tag name than <external> - I think it's clearer and more consistent. What do you think?
  • Why did you remove "Error:" from the error messages? A lot of the core MediaWiki error messages contain it too.

Okay, thank you for these explanations. I guess I thought functions like edfWikilinks4dot() were just the "tip of the iceberg" for what you wanted to add to External Data, but now I see that this is pretty much it, as far as data visualization.

Yes, that's it. Although, I think, I should remove the edf prefix, since the functions are not global.

Looking more through the patch, I do have two other questions/comments:

  • I think <externalvalue> makes more sense as a tag name than <external> - I think it's clearer and more consistent. What do you think?

OK.

  • Why did you remove "Error:" from the error messages? A lot of the core MediaWiki error messages contain it too.

I presume, the word "Error" had been there since the time when the error messages were not wrapped with <span class="error">. Now they are, which makes them red and obvious. I thought there was no point in keeping the redundant word "Error" in some of the messages any more.

Okay, cool. Right, renaming those methods probably makes sense. As for the error messages - my understanding is that core MediaWiki's error displays involve both the word "Error" and the "error" CSS class. If that's the case, then I think External Data should do that too - it's always better to match MW's approach, when possible.

As for the error messages - my understanding is that core MediaWiki's error displays involve both the word "Error" and the "error" CSS class.

Added "Error:"'s.

Well, this seems like a better solution... I can't remember why some messages start with "Error" and some don't. I should note that there's no reason to put this change into this patch (it's usually better to have a different patch for each change), but that part is okay. Also, the current wording for "externaldata-xml-error" ("Error: XML $1 at line $2.") seems strange.

Also, the current wording for "externaldata-xml-error" ("Error: XML $1 at line $2.") seems strange.

throw new EDParserException( 'externaldata-xml-error',
	xml_error_string( xml_get_error_code( $xml_parser ) ),
	xml_get_current_line_number( $xml_parser )
);

$1 is the complete error message from XML parser, which can be anything, $2 is the line number. There is nothing to add to it.

I know, but the "XML" in there will make the sentence ungrammatical, I think. I think the previous wording ("XML error: ...") will make more sense.

Change 714665 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Add {{#get_program_data:}} and mw.ext.externalData.getProgramData

https://gerrit.wikimedia.org/r/714665

Change 718364 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Move version reports for external software to EDConnectorExe and make program calls cached

https://gerrit.wikimedia.org/r/718364

Change 718364 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Move version reports for external software to EDConnectorExe and make program calls cached

https://gerrit.wikimedia.org/r/718364

Change 721250 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Extract program errors from stdout

https://gerrit.wikimedia.org/r/721250

Change 721257 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Allow to override resource limits per program in {{#get_program_data:}}

https://gerrit.wikimedia.org/r/721257

Change 721250 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Extract program errors from stdout

https://gerrit.wikimedia.org/r/721250

Change 721257 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Allow to override resource limits per program in {{#get_program_data:}}

https://gerrit.wikimedia.org/r/721257

BTW, are we going to document the new function?

Change 721653 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Fix typo in EDConnectorExe::limits()

https://gerrit.wikimedia.org/r/721653

Change 721653 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Fix typo in EDConnectorExe::limits()

https://gerrit.wikimedia.org/r/721653

@alex-mashin - yes, it will definitely be good to document #get_program_data. :) Once the next version of External Data is released, it will need to be documented, but documentation about it now would be fine as well. Feel free to add documentation yourself, if you are thinking about it...

Change 722086 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Bind program calls to \"dynamic tags\"

https://gerrit.wikimedia.org/r/722086

Change 722086 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Bind program calls to \"dynamic tags\"

https://gerrit.wikimedia.org/r/722086

Change 722676 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Allow setting $edgExeCommand as an array

https://gerrit.wikimedia.org/r/722676

Change 722676 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Allow setting $edgExeCommand as an array

https://gerrit.wikimedia.org/r/722676

Change 722909 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Substiture user-supplied parameters in environment variables

https://gerrit.wikimedia.org/r/722909

Change 722909 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Substitute user-supplied parameters in environment variables

https://gerrit.wikimedia.org/r/722909

Change 723372 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Minor fixes to EDConnectorExe::addSoftware()

https://gerrit.wikimedia.org/r/723372

Change 723372 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Minor fixes to EDConnectorExe::addSoftware()

https://gerrit.wikimedia.org/r/723372