init-db.js expects a gzipped file at the FEED_PATH and reads it as a stream. Every line is transformed and then written into the db.
- afaik. this is the best practice way of reading the file, which uncompressed is about 4GB and cannot be stored in its entirety in memory
- it's being done asynchronously, which is hitting upon a known problem with line reader in that the stream will close when it finishes pushing lines through the pipe, not when all the pipes are done resolving and is solved/hacked around via the closeConnectionWhenDone function
- regardless, around 100k lines, the process will kill itself without any additional warnings. I'm making the assumption this is because it's ooming (quick research suggests a back pressure problem possibly?)
Please investigate:
- If there's a better way to batch import this data. There are expected to be millions of lines.
- If the stream is the best practice, then how can the implementation be improved so it doesn't fail?
- Is there a better way to implement the stream?
- How will the import deal with errors and retries?
And as a stretch goal:
- How long will the entire import take? I think this is important to know because it seems weird to be importing data as we're deprecating it if importing takes a very long time. The estimated completion time might have an impact on our scheduling frequency.