Page MenuHomePhabricator

Extra tab is prepended to quoted fields in TSV output format
Closed, ResolvedPublic

Description

For example, when I download the results of this query: https://quarry.wmflabs.org/query/36378

I get:

"	rev_id"	"	rev_timestamp"	"	page_id"	"	page_title"	"	user_id"	"	actor_name"	"	user_registration"	"	ug_group"	"	archived"
3051580	"	20160917041145"	152080	"	$"	36077	"	Koavf"	"	20121111063553"		0
3151740	"	20170217015742"	154584	"	'''Swiss_German'''"	74811	"	Andrewssi2"

Note that a tab appears in the beginning of quoted values. This whitespace should not be there.

I would expect something that looks like this:

"rev_id"	"rev_timestamp"	"page_id"	"page_title"	"user_id"	"actor_name"	"user_registration"	"ug_group"	"archived"
3051580	"20160917041145"	152080	"$"	36077	"Koavf"	"20121111063553"		0
3151740	"20170217015742"	154584	"'''Swiss_German'''"	74811	"Andrewssi2"

Event Timeline

Halfak created this task.Fri, May 24, 3:54 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFri, May 24, 3:54 PM
Halfak renamed this task from Extra whitespace is added to quoted fields in TSV output format to Extra tab is prepended to quoted fields in TSV output format.Fri, May 24, 3:57 PM
Halfak updated the task description. (Show Details)

It looks like maybe this is to blame? https://github.com/wikimedia/analytics-quarry-web/blob/4b3583c4cf7f45b7bac56b8df9dfd0799a12111a/quarry/web/output.py#L80

I'm honestly not sure why prepending a tab ever makes sense.

Looks like this affects the CSV writer too.

This is from T209226: Quarry can be affected by CSV Injection. It's not supposed to hit every line. I'm looking into it.

Change 512420 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[analytics/quarry/web@master] output.py: Fix logic error in _inner_csv_injection_escape

https://gerrit.wikimedia.org/r/512420

Change 512420 merged by jenkins-bot:
[analytics/quarry/web@master] output.py: Fix logic error in _inner_csv_injection_escape

https://gerrit.wikimedia.org/r/512420

Mentioned in SAL (#wikimedia-cloud) [2019-05-25T12:22:00Z] <wm-bot> framawiki: Deployed cc0c0a7 on -web-01 T224300

Framawiki closed this task as Resolved.Sat, May 25, 12:23 PM
Framawiki assigned this task to zhuyifei1999.
Framawiki added a subscriber: Framawiki.

Thanks for the report.