Here is a quick description of the current status:
- Because of https://phabricator.wikimedia.org/T296021 we manually fixed the borders and we decided to re-import everything to fix potential inconsistencies for all planet
- We disabled OSM sync and kartotherian to free up some resources
- We triggered the osm-initial-import
- It failed (still need to file the ticket with the findings) with some PG errors
Related logs from postgres master on maps1009:
2022-01-13 18:34:52 GMT [28131]: [3-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 WARNING: terminating connection because of crash of another server process 2022-01-13 18:34:52 GMT [28131]: [4-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-01-13 18:34:52 GMT [28131]: [5-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-01-13 18:34:52 GMT [28155]: [3-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 WARNING: terminating connection because of crash of another server process 2022-01-13 18:34:52 GMT [28155]: [4-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-01-13 18:34:52 GMT [28155]: [5-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-01-13 18:34:52 GMT [28314]: [3-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 WARNING: terminating connection because of crash of another server process 2022-01-13 18:34:52 GMT [28314]: [4-1] user=kartotherian,db=gis,app=[unknown],client=127.0.0.1 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly
Imposm failure:
Jan 13 18:34:53 [2022-01-13T18:34:53Z] 20:34:49 pq: the database system is in recovery mode Jan 13 18:34:54 imposm3 failed to complete initial import
Also some failures from kartotherian (geoshapes hitting PG):
error: remaining connection slots are reserved for non-replication superuser connections
at Connection.parseE (/srv/deployment/kartotherian/deploy-cache/revs/65895c017dbd85ceddbb950b89a25a159b551212/node_modules/pg/lib/connection.js:539:11)
at Connection.parseMessage (/srv/deployment/kartotherian/deploy-cache/revs/65895c017dbd85ceddbb950b89a25a159b551212/node_modules/pg/lib/connection.js:366:17)
at Socket.<anonymous> (/srv/deployment/kartotherian/deploy-cache/revs/65895c017dbd85ceddbb950b89a25a159b551212/node_modules/pg/lib/connection.js:105:22)
at Socket.emit (events.js:198:13)
at Socket.EventEmitter.emit (domain.js:448:20)
at addChunk (_stream_readable.js:288:12)
at readableAddChunk (_stream_readable.js:269:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)From grafana related to pg connections:
https://grafana.wikimedia.org/goto/J77_mTJ7k
Things to investigate:
- How can we deal with the failing OSM import which is the main issue and needs to happen soon?
- How can we avoid PG connection starvation on OSM masters?
- Is this issue somehow related with PG replication?