Page MenuHomePhabricator

CVE-2026-0668: VisualData extension: Regular Expression Denial of Service (ReDoS) via crafted user input
Closed, ResolvedPublicSecurity

Description

This regex is used to process user-provided data in SubmitForm's replaceFormula: /<\s*([^<>]+)\s*>/ . It is vulnerable to a Regular Expression Denial-of-Service attack as it is an expression of third degree polynomial complexity.

In addition, /\s*,\s*/, a polynomial of second degree complexity is used in these files:

The query processor also uses many inefficient regexes:

More inefficient regexes:

  • /^(\d+)(.+)?$/ (2nd degree), used in the Carousel result printer but is this actually an issue, @Thomas-topway-it?
  • /^\*\s*([^\*].*)$/ (2nd degree), used by the schema processor
  • /^preload-data(\?(.+?))?=(.+)$/ (2nd degree), used by the parser function
  • /\s*\|\+\s*/ (2nd degree), used by the parser function
  • Also this 'separator' regex

Affilation: Miraheze/WikiTide Foundation security reviewer
(Branched off from T385935 per @sbassett's instruction)

Details

Risk Rating
Low
Author Affiliation
Other (Please specify in description)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Redmin renamed this task from Evil regex used to process user-provided data in VisualData to Evil regexes are used to process user-provided data in VisualData.Feb 21 2025, 12:24 PM
Redmin updated the task description. (Show Details)

@R4356th meantime I've fixed the remaining regex https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/VisualData/+/2b7094da7bba630fb142690379fa7049b8f1edb5%5E%21/

regarding the other problematic regex, could you also post proposals or suggestions for each of them ?
thanks a lot

Redmin renamed this task from Evil regexes are used to process user-provided data in VisualData to Evil regexes are used to process user-provided input in VisualData.Feb 24 2025, 2:52 PM
Redmin updated the task description. (Show Details)

What if we replace /\s*,\s*/ with it's literal text equivalent so a regex isn't being used at all?

In T387008#10579704, @R4356th wrote:

What if we replace /\s*,\s*/ with it's literal text equivalent so a regex isn't being used at all?

If that's possible (i.e. the whitespace characters are both known and discrete) then that would absolutely be preferred over wildcard operators.

mmartorana changed the task status from Open to In Progress.Feb 26 2025, 4:38 PM
mmartorana triaged this task as Low priority.
mmartorana changed Risk Rating from N/A to Low.
sbassett assigned this task to Thomas-topway-it.
sbassett edited projects, added SecTeam-Processed; removed Security-Team.

Resolving for now. Any follow-up/improvement patches should have their own bug.

Apologies for the late response but I think you closed the wrong task. T385935 has been addressed, not this one.

Ah, ok, I guess I was confused by x-posting of commits, etc.

/^\s*(.+?)\s*=\s*(.+?)\s*$/ could be replaced by /^\s*+([^=]++)\s*+=\s*+(.++)\s*+$/ perhaps, and this could be repeated for all the other regexes that follow the same pattern.

In T387008#10579704, @R4356th wrote:

What if we replace /\s*,\s*/ with it's literal text equivalent so a regex isn't being used at all?

If that's possible (i.e. the whitespace characters are both known and discrete) then that would absolutely be preferred over wildcard operators.

Let's do this then, @Thomas-topway-it.

/^\s*(.+?)\s*(ASC|DESC)?\s*$/i could be replaced by /^(?:\s*+(.++)\s++(ASC|DESC)\s*+|\s*+(.++)\s*+)$/i

@R4356th thank you for your suggestions. I was working on some major patches listed here but I plan to get soon back to this. (I need to understand the proposals and test them)

hi @R4356th, I think I can replace this
/^\s*(.+?)\s*=\s*(.+?)\s*$/ with this /^\s*+([^=]++)\s*+=\s*+(.++)\s*+$ and this /^\s*(.+?)\s*(ASC|DESC)?\s*$/i with this /^(?:\s*+(.++)\s++(ASC|DESC)\s*+|\s*+(.++)\s*+)$/i, however what about the other regexes mentioned in the task description ?
I'm also available on matrix-element if you want to go through all that interactively

Another solution for this type of problem is to just limit the max size of the field, for fields that are expected to be short.

Very sorry for the late response. @Thomas-topway-it,I think the ones you suggest (and some others I suggested above) would work and should be used because efficient regexes are just better. For the rest though, I think going with Bawolff's suggestion would be the best idea here considering the amount of work needed to fix them all unless you expect massive inputs?

the fact regarding Brian's suggestion is that json-schema specification https://json-schema.org/draft/2020-12/json-schema-validation does not impose a maximum length for properties name. On the other hand we are using

`path` TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,

in the props table that sets a max limit of 65,535 bytes for the complete path (a json path can be composed of more than one property).

I think a good practice would be to set a property limit of VARCHAR(255), and truncating or throwing an error when the user imports a schema which exceeds such limit.

If you can post the complete set of regexes taking into account such constraint I could work on the remaining aspects.

another problem is that in several affected regexes the subject (between angular quotes) is a json path (to be precise a "printout", I need to update names) rather than a single property

I think a good practice would be to set a property limit of VARCHAR(255), and truncating or throwing an error when the user imports a schema which exceeds such limit.

I am surprised that many characters could be needed but yeah, that should be more than safe.

If you can post the complete set of regexes taking into account such constraint I could work on the remaining aspects.

Assuming the constraint is a limit of VARCHAR(255), this set of regexes needs to be fixed because they are 5th degree polynomials. The 4th degree polynomial /^\s*(.+?)\s*(ASC|DESC)?\s*$/i should not cause trouble given the constraint.

@Redmin I'm sorry for the delay, I have been working on some other features like the calendar format and the upgrade to MW 1.44 (now merged). However I think I've found a good solution to fix this problem
https://tour.json-schema.org/content/03-Objects/04-Applying-Schema-to-Property-Names

I was not convinced to set arbitrary rules to the property names but in fact with a PropertyNames pattern this makes sense.
So I've identified the way to fix this and plan to implement it soon, I hope also this week.

Okay, thanks for letting me know but please do keep in mind the presence of 5th degree polynomial complexity regexes.

@Redmin I've removed all preg_split and replaced with a linear function

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/VisualData/+/c370cb1c536dddab43f9f873996032696bfb937d%5E%21/

I've also fixed

/^(\d+)(.+)?$/ (Carousel format)
/^\*\s*([^\*].*)$/ (Schema processor)

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/VisualData/+/5fc916b840572f45290c1e5e62da8529a6d866a8%5E%21/

I could also replace

/^preload-data(\?(.+?))?=(.+)$/

with

/^preload-data(?:\?([^=]*))?=(.+)$/

regarding these https://github.com/wikimedia/mediawiki-extensions-VisualData/blob/e969643ad44391e08403f37c959c34fc8e8930f7/includes/classes/QueryProcessor.php#L558C3-L565C5
as far as I understand they are linear, do you have any reference showing the contrary ?

regarding

/^\s*(.+?)\s*(ASC|DESC)?\s*$/i

could I use

/^\s*+(.++)\s*+(ASC|DESC)?\s*+$/i

instead of

/^(?:\s*+(.++)\s++(ASC|DESC)\s*+|\s*+(.++)\s*+)$/i

Also, my previous comment is not relevant since /<\s*([^<>]+)\s*>/ (to parse properties) was already fixed by https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/VisualData/+/2b7094da7bba630fb142690379fa7049b8f1edb5%5E%21/

We're now tracking these issues for the next supplemental security release: T404620.

should now be fixed, so perhaps this issue can be closed ?

If you feel the issues related to this task are completely resolved, then we should resolve the task.

mmartorana renamed this task from Evil regexes are used to process user-provided input in VisualData to CVE-2026-0668: VisualData extension: Regular Expression Denial of Service (ReDoS) via crafted user input.Jan 7 2026, 5:37 PM
mmartorana changed the visibility from "Custom Policy" to "Public (No Login Required)".Jan 9 2026, 2:48 PM
mmartorana changed the edit policy from "Custom Policy" to "All Users".

Change #1224136 abandoned by Mmartorana:

[mediawiki/extensions/VisualData@REL1_45] fix T387008

https://gerrit.wikimedia.org/r/1224136