Page MenuHomePhabricator

Extra tab is prepended to quoted fields in TSV output format
Closed, ResolvedPublic

Description

For example, when I download the results of this query: https://quarry.wmflabs.org/query/36378

I get:

"	rev_id"	"	rev_timestamp"	"	page_id"	"	page_title"	"	user_id"	"	actor_name"	"	user_registration"	"	ug_group"	"	archived"
3051580	"	20160917041145"	152080	"	$"	36077	"	Koavf"	"	20121111063553"		0
3151740	"	20170217015742"	154584	"	'''Swiss_German'''"	74811	"	Andrewssi2"

Note that a tab appears in the beginning of quoted values. This whitespace should not be there.

I would expect something that looks like this:

"rev_id"	"rev_timestamp"	"page_id"	"page_title"	"user_id"	"actor_name"	"user_registration"	"ug_group"	"archived"
3051580	"20160917041145"	152080	"$"	36077	"Koavf"	"20121111063553"		0
3151740	"20170217015742"	154584	"'''Swiss_German'''"	74811	"Andrewssi2"

Event Timeline

Halfak renamed this task from Extra whitespace is added to quoted fields in TSV output format to Extra tab is prepended to quoted fields in TSV output format.May 24 2019, 3:57 PM
Halfak updated the task description. (Show Details)

Looks like this affects the CSV writer too.

This is from T209226: Quarry can be affected by CSV Injection. It's not supposed to hit every line. I'm looking into it.

Change 512420 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[analytics/quarry/web@master] output.py: Fix logic error in _inner_csv_injection_escape

https://gerrit.wikimedia.org/r/512420

Change 512420 merged by jenkins-bot:
[analytics/quarry/web@master] output.py: Fix logic error in _inner_csv_injection_escape

https://gerrit.wikimedia.org/r/512420

Mentioned in SAL (#wikimedia-cloud) [2019-05-25T12:22:00Z] <wm-bot> framawiki: Deployed cc0c0a7 on -web-01 T224300

Framawiki assigned this task to zhuyifei1999.
Framawiki added a subscriber: Framawiki.

Thanks for the report.