What is the problem?
I am finding circumstances where data for IPs is being updated incorrectly. Based on very limited investigation, it seems to happen when yesterday's feed has duplicate IPs.
For example, I cannot reproduce the issue when 20240103.json.gz has only one line.
Steps to reproduce problem
Assuming docker:
- Setup the docker environment (docker compose up -d; docker compose exec web mkdir /tmp/ipoid; docker compose exec web node -e "require('./create-users.js')();")
- Copy the two json snippets in Reproduction data and save them to tmp/20240103.json.gz and tmp/20240104.json.gz
- Copy them to the ipoid docker container (docker compose exec web cp tmp/20240103.json.gz /tmp/ipoid/20240103.json.gz; docker compose exec web cp tmp/20240104.json.gz /tmp/ipoid/20240104.json.gz)
- docker compose exec web ./main.sh --init true --today 20240103 --debug true
- docker compose exec web ./main.sh --yesterday 20240103 --today 20240104 --debug true
- docker compose exec db mysql -u ipoid_ro -ppassword3 -e "SELECT * FROM actor_data; SELECT * FROM actor_data_tunnels" ipoid
Expected behaviour: Output of final step is:
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | pkid | ip | org | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | 1 | 2.1.1.1 | NULL | 0 | 1 | | | | 0 | aN | 1 | 1709640944 | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
Observed behaviour: Output of final step is:
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | pkid | ip | org | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | 8 | 2.1.1.1 | T | 0 | 1 | | | | 0 | AA | 4 | 1709640763 | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ +---------------+-----------+ | actor_data_id | tunnel_id | +---------------+-----------+ | 8 | 1 | +---------------+-----------+
Further observations:
- query_split_aaaaa.sql
$ docker compose exec web cat /tmp/ipoid/sub/query_split_aaaaa.sql DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1',NULL,0,1,'','','',0,'aN',1);@@@@@ DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1',NULL,0,1,'','','',0,'AA',1);@@@@@ DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1','T',0,1,'','','',0,'AA',4);@@@@@INSERT INTO actor_data_tunnels (actor_data_id, tunnel_id) VALUES( (SELECT pkid FROM actor_data WHERE ip = '2.1.1.1'), (SELECT pkid FROM tunnels WHERE operator = 'gum_vpn' ) );@@@@@
- After step 4, the state of the database is:
$ docker compose exec db mysql -u ipoid_ro -ppassword3 -e "SELECT * FROM actor_data; SELECT * FROM actor_data_tunnels" ipoid +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | pkid | ip | org | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+ | 5 | 2.1.1.1 | NULL | 0 | 1 | | | | 0 | AA | 1 | 1709649258 | +------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
Environment
ipoid commit 3fc04e78cb82ac031188446aed0aa1210d1200f0
Reproduction data
20240103.json.gz
{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA"}, "organization": "T"}
{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA"}, "organization": "T", "tunnels": [{"type": "PROXY"}, {"type": "VPN", "operator": "gum_vpn", "entries": [], "exits": ["ObMTI", "SslqcWO", "WKsnAnBCCb", "H", "uEfXOJMxwD", "jHsvLwo", "bZbYfLfl", "lznFa", "Nn", "MFwiCmD", "K", "NfKPVVSAhF", "J", "CLMaFs"]}], "risks": ["GEO_MISMATCH"], "infrastructure": ["MOBILE"], "as": {"number": 566553}, "services": ["\u00112", "\u0098\u00df\ud936\udd2b", "\ud8d9\udd03\u009b", "\u00ab\u0012\u00be", "", "\u0010\u00c0\u0091\u00ec\u001b\ud94b\ude2a\u00f4\u001d +", "p\u0097\u00bf", "\u0082T_\ud805\udd41MM\u0093", "\u00e1", "\udad3\udd40\u00a8,7", "\u009cm\u00f7a\u00f3\u00a2v\ud9c7\ude5c", "T", ""]}
{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA", "state": "T"}}20240104.json.gz
{"client": {}, "ip": "2.1.1.1", "location": {"city": "ABCTgQ", "country": "aN", "state": "PgBXjNQZfVFB"}, "infrastructure": ["SATELLITE", "DATACENTER", "DATACENTER", "MOBILE"]}