Page MenuHomePhabricator

Investigate parallelisation of harvesting job
Open, HighPublic

Description

The last successful harvesting job (update_database.py) took close to 10 hours, which is mental.

2018-09-04_06:33:08 Starting full monument update.
2018-09-04_06:33:08 Load changes to monuments_config...
2018-09-04_06:33:23 Recreating the source tables...
2018-09-04_06:33:37 Full source database update...
2018-09-04_17:03:21 Update monuments_all table...
2018-09-04_17:58:27 Make statistics...

Each dataset harvest being independent, we should consider parallelizing the processing.

Event Timeline

One more data point from today’s harvesting:

2018-09-05_03:01:13 Full source database update...
2018-09-05_13:20:31 Update monuments_all table...

So the harvesting took again 10.5 hours.

This is still roughly 10.5 hours.

2019-08-26_03:01:05 Full source database update...
2019-08-26_13:45:03 Update monuments_all table...