Page MenuHomePhabricator

Lost connection to MySQL server while trying to import 39M rows of csv data (toolsdb)
Closed, ResolvedPublic

Description

I am trying to import a large CSV file into a toolforge MySQL table:

jsub -mem 4g -cwd mysqlimport --defaults-file=replica.my.cnf --local -h tools.db.svc.eqiad.wmflabs --ignore-lines 1 --fields-terminated-by=, s51434__uprn_p /shared/external-data/OpenUPRN/osopenuprn_202006.csv

This does, repeatedly, not succeed:

/usr/bin/mysqlimport: Error: 2013, Lost connection to MySQL server during query, when using table: osopenuprn_202006

It does work with a small test subset (100 rows).

How can I import the full file?

Event Timeline

bd808 subscribed.

This file is 1.9GiB in size and contains 39,207,496 lines of data. I would suggest splitting your attempt to load the data into much smaller chunks. Maybe start with trying ~1M lines at a time?

bd808 renamed this task from Lost connection to MySQL server (toolforge) to Lost connection to MySQL server while trying to import 39M rows of csv data (toolsdb).Jul 12 2020, 4:15 PM
bd808 moved this task from Backlog to ToolsDB on the Data-Services board.

Trying 1M chunks, first two worked, but then:

mysqlimport: Error: 2013, Lost connection to MySQL server during query, when using table: osopenuprn_202006

I'll try even smaller ones, but it's getting a bit ridiculous. Why disconnect the server in the middle of a query?

Magnus claimed this task.

Worked with 10K batches. I still think this rapid disconnect in the middle of a query is a bug.