[SF] Special:RunQuery enhancement to be able to populate red-links automatically
Closed, DeclinedPublic


Mentioned first here:


The ephemeral nature of the templatized pages produced by Special:RunQuery forms has turned out to have many emergent uses that were not originally intended. Special:RunQuery makes conditionally branching forms possible:


And, if auto-populating of red links on Special:RunQuery pages is enabled, as described here:


it will allow Semantic Forms to create or edit more than one page with a single form:

(For files)

(In general)

It might enable this enhancement too:


One compelling use-case for creating more than 1 page would be when many pages contain identical information, but differ in only one property, such as a serial number. Here's an example of an actual set-up for doing this kind of thing (login with Demo/test):


That will search to see if a page with those serial numbers has already been created. Right now, it's possible to select in the form that you want to be able to make auto-create links, so users can create the pages by clicking on them 1 at a time. For creating a small number of pages, that works well. But, when the number of pages to create numbers in the thousands, or more, having the ability to auto-create the pages could be helpful.

Since such an ability to autocreate enormous numbers of pages could rapidly pollute a wiki, I have been giving some thought about how to handle that.

One idea I've had was to create a new extension that allows for the creation of cryptographic hashes in templates. Then, a password can be stored in hash-form in a template, and the originating string "key" to the hash can be given out as a password to unlock the mass-data-entry feature. The Special:RunQuery ephemeral display template would check to make sure the password produces the required hash to execute the mass-create function.

Another idea involved "tagging" each auto-created page with a property that would indicate some distinguishing facts like who created the pages, and on what date. Then, mass deleting the pages would be easier to do.

I'm sure there's other, possibly better ways to restrict, control, and mitigate the effects of the usage of such a powerful mass-create feature from more fundamental code in Semantic Forms (as opposed to within a template).

Version: unspecified
Severity: enhancement
See Also:


Badon created this task.Feb 2 2012, 7:34 AM
Badon added a comment.Feb 2 2012, 8:54 PM

I think like a template programmer.

A better way to handle security is to set permissions for users and groups in LocalSettings.php. That's probably the best place to configure whether mass-auto-created pages have a particular property associated with them, or not. Doing it in LocalSettings.php will prevent intentional circumvention by a malicious user trying to implement a Denial of Service attack by spamming the wiki to death with mass-auto-creates.

The hashed-password idea should probably be done in PHP too, configured in LocalSettings.php, or in some administrator-only Special page, with data stored in the database. It would be trivial for a malicious user to hack out password checking code from a template.

Indeed, such a change could lead to an enormous amount of unwanted pages created, either accidentally or on purpose. If the goal is to create a lot of pages automatically, that can already be done with the Data Transfer extension or a similar tool. I'm setting this to "wontfix".

Badon added a comment.Feb 16 2012, 7:59 PM

While with Special:RunQuery, I can control how the data is entered, but with the Data Transfer extension, it appears that it is already entirely automated, with no way to control any automated decision-making, nor any mechanism for human semi-automated decision-making. It appears to me that the fears of "an enormous amount of unwanted pages created" are actually vastly worsened to the point of uncontrolled demolition by substituting the Data Transfer extension in place of this enhancement.

I have not yet used the Data Transfer extension because it is unstable right now, so my impressions of it are based on the documentation. Is what I've written above correct?

Data Transfer, by default, closes off importing to just administrators, and you know in advance exactly which pages are getting created.

I see. Clearly, then, Data Transfer is meant to be used rarely or in limited circumstances, and that use does not overlap at all with the use of Special:RunQuery. Special:RunQuery can be used when it is not known in advance what pages need to be created. It can determine if the page already exists, and any other facts, so that a human decision can be made before proceeding with auto-creation.

Will you reconsider your decision to close this as WONTFIX in favor of Data Transfer?

If so, then LocalSettings.php parameters can be used to allow auto-creation of red links, and set group permissions for that capability. If it is disabled by default, then the wiki administrators can enable it once they are aware of the feature, and the measures needed to avoid its pitfalls.

What's a case where (a) you don't know in advance what pages are needed, and (b) the standard "Creates pages with form" approach won't work?

Badon added a comment.Feb 17 2012, 1:31 AM

It would be used in cases for large quantities of data entry where most or all of the data is identical. Here's an example (login with Demo/test):


That is a list of serial numbers that are supposed to have their own unique "CC1234567890" page. We have discovered instances where there can be as many as several thousand different serial numbers that need to be entered into their own pages, with all other information identical except for the serial number.

Right now, we can use auto-edit links to manually click on them one at a time, and with enough people doing that, it seems to be getting done slowly. However, I have been told that some of the expert hobbyists have put together a few large lists of serial numbers that have already been verified, and are ready for data entry.

Those could be automatically entered very quickly if auto creation of red links worked on Special:RunQuery. I can design that search results template to produce a red link just for that purpose.

If you need any other information, let me know.

If you already have the list of serial numbers, can't you just use Data Transfer?

Badon added a comment.Feb 17 2012, 6:02 AM

No, because the list of serial numbers is only a list of serial numbers. It doesn't tell us anything about whether they have already been entered, nor if the entered data is correct.

With the Special:RunQuery form I showed you, not only can you determine if the serial number exists, you can also verify or refute if other related semantic data matches some additional criteria. If it does not match, it can be brought to the attention of the person doing the data entry (colored red), and they can make a decision about whether the existing data is correct and their data is faulty, or if the existing data is faulty, and their data is correct.

In that case, semi-automatic data entry is required, with a real person deciding what should be done, if anything.

Special:RunQuery is extremely flexible in the goals it can achieve. I'm sure there are innumerable other cases that I cannot think of that can benefit from sophisticated usage of Special:RunQuery, including the ability to automatically or semi-automatically (to any degree) verify, create, and change wiki pages.

One very powerful thing Special:RunQuery can do is grant the ability to share the results. For large quantities of data that need human scrutiny, a single administrator (or even hundreds of them) cannot possibly be expected to handle it with Data Transfer. With Special:RunQuery, links to "to do" lists can be placed at prominent locations in the wiki so the users can chip away at it one small task at a time.

And the best part? Nobody needs to know anything technical at all about how MediaWiki, wiki code, SMW, and SF work. I have done some informal experiments with people who have only rudimentary computer literacy, and minimal English comprehension. All of them are able to make decisions about whether to click an auto-edit link, or not, based on a simple presentation of info from Special:RunQuery.

For example, it takes decades for an expert to gain the skills and knowledge necessary to identify previously unknown types of coins. With SMW and Special:RunQuery, I can simply present images for comparison to users, and the user can essentially click "yes" or "no", depending on whether they match or not.

In the first casual test I did with this, 2 previously unknown coin types were discovered almost immediately based on an expert-curated collection of serial numbers that were donated for testing purposes. All of the world's experts did not find these during the last 20 years, not even the expert who assembled and individually hand-selected each serial number in the data. Lupo's ImageAnnotator combined with SMW and Semantic Forms' Special:RunQuery are what made that discovery so effortless.

Imagine what something like this could do for biology, astronomy, and on, and on! For the first time ever, uneducated amateurs could discover and identify new species faster than they're going extinct, simply using something as mundane as random people's vacation photographs. The collection of images, video, and audio at WikiMedia Commons might get more interesting if it were set up to be explored in an uncountable variety of ways by the many curious amateur researchers out there. I wonder what the Library of Congress has that nobody has taken a close look at before?

Yaron, Special:RunQuery is as much of a panacea as I have ever personally experienced in any field or domain, in my lifetime. I'm thrilled to find it in semantic ontology, where it might make the notion of an "information age" seem passe, at a whole new level. Maybe this is hyperbole, but what if it isn't? It would be a shame if the full capabilities of Special:RunQuery were not brought to maturity.

I hope you agree.

Hi, that's nice to hear, though I still don't understand what you're talking about. As you note, you can also use Special:RunQuery to find pages, and to create them via #autoedit. So what's missing?

Badon added a comment.Feb 17 2012, 8:32 PM

"#autoedit" is a bit of a misnomer. It is actually for semi-automated editing, since it requires the user to click a link for each action performed. So far that has worked adequately in some cases where there are not much more than 100 clicks required to be clicked per person. Even in that case, the time spent clicking around could be better spent moving on to the next task, if it were possible to do fully automatic page creation and page edits.

Right now, fully automated page creation can be done by populating red links automatically - but that capability is missing from Special:RunQuery. Also, there is no capability for fully automated editing of existing pages in Special:RunQuery. So, in short, these are the missing features in Special:RunQuery:

  • Fully automated page creation.
  • Fully automated page editing.

Semi-automated page creation and editing can both be done in Special:RunQuery with #autoedit. Fully automated creation and editing are not possible, because:

  • Fully automated page creation cannot be done by populating red links using the existing functionality of "Creates pages with form".
  • Fully automated page editing cannot be done with #autoedit.

There are a few different ways to add the missing functionality. Here is the 1st way, by building on the existing "Creates pages with form" and the #autoedit feature:

  • Enable Special:RunQuery to be able to fully automatically populate red links using the existing functionality of "Creates pages with form". This, by nature, will not occur more than once even if the page is reload.
  • Add a fully automated page editing feature to #autoedit, perhaps with some internal ID or timestamp to ensure a page reload will not repeat the page edit.

Here is the 2nd way, by building on the existing #autoedit feature:

  • Add a fully automated page creation feature to #autoedit, perhaps with an #ifexist check to stop page creation if a page refresh might cause it to repeat the page creation.
  • Add a fully automated page editing feature to #autoedit, perhaps with some internal ID or timestamp to ensure a page refresh will not repeat the page edit.

Here is the 3rd way, by creating a new #fullautoedit feature:

  • #fullautoedit will create pages if they do not exist.
  • #fullautoedit will edit pages if they exist.

Is that a better description of what is missing?

I still don't understand - if you want "fully-automated" page creation or editing, as you put it, you can just use Data Transfer.

Badon added a comment.Feb 19 2012, 8:01 PM

No, that isn't right. Special:RunQuery and Data Transfer are not equivalent. Here is a simpler breakdown:

  • Special:RunQuery can be safely used by anyone............Data Transfer can't.
  • Special:RunQuery can dynamically generate data...........Data Transfer can't.
  • Special:RunQuery can dynamically manipulate data.........Data Transfer can't.
  • Special:RunQuery can dynamically verify data.............Data Transfer can't.
  • Special:RunQuery can do anything that....................Data Transfer can't.

I know Special:RunQuery pretty well, but I'm not 100% sure about Data Transfer, since I haven't used it. Maybe you're planning additional features for Data Transfer to cover these shortcomings, but haven't mentioned that yet? If you could be more specific about what you don't understand, I can tinker around maybe set up a demo or something to illustrate.

What do you think about the possibility of adding the fully automatic page creation and editing feature to Special:RunQuery red links, versus adding the feature to #autoedit (or a new #fullautoedit)?

Maybe it would help if you gave a useful example. The one example I can see, of people assembling a list of serial numbers that should have their own page, can just be solved with a batch process - there's nothing "dynamic" about it.

If any decisions need to be made about the data to be entered, then it needs to be done dynamically, in response to user input. When that process is complete, then the data may be entered automatically by Special:RunQuery.

The one example you can see of assembling a list of serial numbers that should have their own page, cannot be solved with a batch process if any of the following are true:

  • The user is not an administrator.
  • If dynamically generated or manipulated data will be stored in addition or in place of the the raw data entered, such as timestamps, hashes, counts, related data states (the weather on a particular day), calculations, etc, at the time of entry.
  • If dynamically manipulated data is entered, like asking the user to compare their entered data with pre-existing data, so Special:RunQuery can trim duplicate data, and add new data, from any unsorted source.
  • If the data must be checked automatically or semi automatically for accuracy before it is entered, by comparison with something else (microformats, for example).
  • The page that the serial number is to go on has an automatically generated name based on data already in the wiki.
  • Additional information needs to be entered, that is automatically determined and prompted-for from the user during the data entry process.

That list isn't exhaustive, I hope that is obvious. Special:RunQuery is very different from Data Transfer. If this information isn't enough to lead to envisioning an unlimited number of other examples, I'm sure we could ask for further ideas on the mailing list.

Special:RunQuery can do anything Data Transfer can do, except automatically enter the data (for now), and it can do an unlimited amount of other things that Data Transfer cannot do.

Here's another example:


MIT is collecting prices. If a data entry operator first enters into a Special:RunQuery form a list of regional prices for a commodity at various points in time, and also enters in another list of regional commodity supply quantities available, then Special:RunQuery can use those 2 correlated sources of information to automatically calculate the total value of available commodities at the time of entry.

That information is added to the submitted data, and also used to track not only prices, but also to calculate a data model representative of supply and demand in relation to price, that can be stored with each data point when the Special:RunQuery form is submitted.

Each data point would have its own dynamically generated, and automatically assigned wiki page name, which is then available for querying to produce more sophisticated representations of the data.

Data transfer can't do that. Nothing but Special:RunQuery can do that. Data transfer could be used to move pre-existing data from some other system, into SMW, but it's not a realistic way to actually use SMW for anything requiring data entry (which is everything that works well with Semantic data systems).

Once this enhancement is added, it will become easier to set up entry-points for Special:RunQuery forms that are preloaded with data based on the entry point (start in category "fruits" to add another fruit):


Well, the fact that anyone, not just an administrator, could create large sets of pages using Special:RunQuery is one of the reasons why I think it's a bad idea. I still think it would be helpful to hear an actual use case - your mention of ideas like storing the weather, while logically valid, seem far-fetched - I've never heard of someone wanting to store temporal information, like the current weather (other than the time itself) in a wiki.

Badon added a comment.Feb 25 2012, 2:22 AM

My examples are pretty varied, there's no need to fixate on one detail about the weather. But, if you want an example of why someone would want to store weather data, just look at amateur radio operators collecting data for meteorologists:



And here's an example of some raw data collected by one of the amateur radio hobbyists (NOAA):


Putting all that data in one place to be accessible by anyone would be made much easier with Special:RunQuery if it had ability to autocreate pages.

About restricting usage of the proposed auto-create feature:

Such things have never been an intractable problem for MediaWiki before, and there's no reason why it should be for Semantic Forms and Special:RunQuery. Configuration control can be given to administrators so they can ensure Special:RunQuery auto-create isn't abused, and if it is (even by accident), easily delete the pages produced with it. I'm sure you are more able to formulate a strategy for solving that issue than I am, but here's a few ideas to get started with:

  • Obviously, the auto-create feature should be disabled by default.
  • $wgGroupPermissions['SF_SRQautocreate']['autocreate'] = true;
  • $sfgSRQAutoCreate['Form:Some_Form']['Template:Some_template'] = true;
  • $sfgSRQAutoCreateConfirmPassphrase['SomeUser']['Form:Some_Form'] = SomePass;

More info: http://www.mediawiki.org/wiki/Manual:User_rights_management

As a form and template author, I would probably display a random word and ask the user to type it in to confirm that they really understand what they're doing, after showing a summary of actions that will be executed.

For a specific non-speculative example of my own use case where this feature might be helpful, see the first comment 1 of this report. In that case, data is checked against existing data, and the template makes decisions about what can be done before offering the option to make changes or additions to existing data, versus creating new data.

Hopefully this is enough, but if you need anything else, let me know.

Well, thanks for clarifying that some people might want to store the current weather - my point was that you wouldn't want to do it in a wiki. Anyway, if you created a small example with wireframes and the like, that might be helpful to your case.

Badon added a comment.Mar 13 2012, 1:45 AM

(In reply to comment #16)

Well, the fact that anyone, not just an administrator, could create large sets
of pages using Special:RunQuery is one of the reasons why I think it's a bad
idea. I still think it would be helpful to hear an actual use case - your
mention of ideas like storing the weather, while logically valid, seem
far-fetched - I've never heard of someone wanting to store temporal
information, like the current weather (other than the time itself) in a wiki.

(In reply to comment #18)

Well, thanks for clarifying that some people might want to store the current
weather - my point was that you wouldn't want to do it in a wiki. Anyway, if
you created a small example with wireframes and the like, that might be helpful
to your case.

About "temporal" information like the weather - I think you meant to say that nobody would want to store "temporary" or "ephemeral" information in a wiki. That's incorrect. Here's a few examples of ephemeral information already being stored in a wiki:

For reference, here's the definition of "ephemeral":


As far as examples go, I've already given you one that's actively in use, as well as many hypothetical examples. I'm not sure what more you're looking for in examples. I have given a lot of additional thought to this, and I've realized that being able to create multiple pages automatically resolves all the requests for the ability to create more than one page with a single form. Here's one of them, for batch file uploads:


You're already aware of that one, and others. That could be handled easily and safely by restricting the maximum number of pages that can be auto-created with one form submission, to some configurable small number. So, the feature would need another configuration parameter "$sfgSRQAutoCreateMaxPages". Here's the others, too, all together:

  • Auto-create disabled by default.
  • $wgGroupPermissions['SF_SRQautocreate']['autocreate'] = true;
  • $sfgSRQAutoCreate['Form:Some_Form']['Template:Some_template'] = true;
  • $sfgSRQAutoCreateConfirmPassphrase['SomeUser']['Form:Some_Form'] = SomePass;
  • $sfgSRQAutoCreateMaxPages = 5;

In my own application, there are 3 types of pages that need to be created for each kind of collector coin entered into the site. The type, the specimen, and a sighting for the coin (where it was seen out in the world - usually eBay). Being able to create more than one page at a time allows a much more complex guided "wizard" style data entry process that finishes with creating whatever pages need to be created, if they do not already exist or do not have the same data.

I'm already using Special:RunQuery for a very simple guided "wizard" style data entry process, but it's limited to creating only one page. After that, the users must figure out on their own what the next step is.

I have no doubt this enhancement request would be a helpful to have available. But, if in your judgment it doesn't fit into the scope of your plan for Special:RunQuery, I'll just have to trust your leadership on that. At the very least, we have fleshed out some specifics of how such a thing should work, even if it doesn't end up being a part of Special:RunQuery. That's an important first step, so for that much alone, I am grateful that we had this discussion.

I of course was talking about information that's only accurate at the moment of saving.

Badon added a comment.Mar 13 2012, 2:16 AM

I understood that. That type of information is probably more the rule, rather than the exception. The only variance is in the length of time the information remains accurate. Sometimes it's seconds, sometimes it's centuries. It is straightforward to handle them all in similar ways.

Another example is how MediaWiki handles edit conflicts - the fact of being in conflict or not depends on ephemeral information.

Of course, but that information isn't stored.

Badon added a comment.Mar 13 2012, 3:09 AM

But, you understand that KIND of information could be stored, as in the examples I've mentioned, right?

Badon added a comment.Mar 13 2012, 3:10 AM

I suppose it's irrelevant whether it gets stored or not. It determines whether other data is stored or not.

Add Comment