Page MenuHomePhabricator

Data updated incorrectly if yesterday's feed has duplicate IPs
Closed, DeclinedPublicBUG REPORT

Description

What is the problem?

I am finding circumstances where data for IPs is being updated incorrectly. Based on very limited investigation, it seems to happen when yesterday's feed has duplicate IPs.

For example, I cannot reproduce the issue when 20240103.json.gz has only one line.

Steps to reproduce problem

Assuming docker:

  1. Setup the docker environment (docker compose up -d; docker compose exec web mkdir /tmp/ipoid; docker compose exec web node -e "require('./create-users.js')();")
  2. Copy the two json snippets in Reproduction data and save them to tmp/20240103.json.gz and tmp/20240104.json.gz
  3. Copy them to the ipoid docker container (docker compose exec web cp tmp/20240103.json.gz /tmp/ipoid/20240103.json.gz; docker compose exec web cp tmp/20240104.json.gz /tmp/ipoid/20240104.json.gz)
  4. docker compose exec web ./main.sh --init true --today 20240103 --debug true
  5. docker compose exec web ./main.sh --yesterday 20240103 --today 20240104 --debug true
  6. docker compose exec db mysql -u ipoid_ro -ppassword3 -e "SELECT * FROM actor_data; SELECT * FROM actor_data_tunnels" ipoid

Expected behaviour: Output of final step is:

+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
| pkid | ip      | org  | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
|    1 | 2.1.1.1 | NULL |            0 |     1 |           |            |              |         0 | aN               |     1 |   1709640944 |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+

Observed behaviour: Output of final step is:

+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
| pkid | ip      | org  | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
|    8 | 2.1.1.1 | T    |            0 |     1 |           |            |              |         0 | AA               |     4 |   1709640763 |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
+---------------+-----------+
| actor_data_id | tunnel_id |
+---------------+-----------+
|             8 |         1 |
+---------------+-----------+

Further observations:

  • query_split_aaaaa.sql
$ docker compose exec web cat /tmp/ipoid/sub/query_split_aaaaa.sql
DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1',NULL,0,1,'','','',0,'aN',1);@@@@@
DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1',NULL,0,1,'','','',0,'AA',1);@@@@@
DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '2.1.1.1';@@@@@DELETE FROM actor_data WHERE ip = '2.1.1.1';@@@@@INSERT INTO actor_data (ip,org,client_count,types,conc_city,conc_state,conc_country,countries,location_country,risks) VALUES ('2.1.1.1','T',0,1,'','','',0,'AA',4);@@@@@INSERT INTO actor_data_tunnels (actor_data_id, tunnel_id) VALUES( (SELECT pkid FROM actor_data WHERE ip = '2.1.1.1'), (SELECT pkid FROM tunnels WHERE operator = 'gum_vpn' ) );@@@@@
  • After step 4, the state of the database is:
$ docker compose exec db mysql -u ipoid_ro -ppassword3 -e "SELECT * FROM actor_data; SELECT * FROM actor_data_tunnels" ipoid
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
| pkid | ip      | org  | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
|    5 | 2.1.1.1 | NULL |            0 |     1 |           |            |              |         0 | AA               |     1 |   1709649258 |
+------+---------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
Environment

ipoid commit 3fc04e78cb82ac031188446aed0aa1210d1200f0

Reproduction data

20240103.json.gz

{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA"}, "organization": "T"}
{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA"}, "organization": "T", "tunnels": [{"type": "PROXY"}, {"type": "VPN", "operator": "gum_vpn", "entries": [], "exits": ["ObMTI", "SslqcWO", "WKsnAnBCCb", "H", "uEfXOJMxwD", "jHsvLwo", "bZbYfLfl", "lznFa", "Nn", "MFwiCmD", "K", "NfKPVVSAhF", "J", "CLMaFs"]}], "risks": ["GEO_MISMATCH"], "infrastructure": ["MOBILE"], "as": {"number": 566553}, "services": ["\u00112", "\u0098\u00df\ud936\udd2b", "\ud8d9\udd03\u009b", "\u00ab\u0012\u00be", "", "\u0010\u00c0\u0091\u00ec\u001b\ud94b\ude2a\u00f4\u001d +", "p\u0097\u00bf", "\u0082T_\ud805\udd41MM\u0093", "\u00e1", "\udad3\udd40\u00a8,7", "\u009cm\u00f7a\u00f3\u00a2v\ud9c7\ude5c", "T", ""]}
{"client": {}, "ip": "2.1.1.1", "location": {"country": "AA", "state": "T"}}

20240104.json.gz

{"client": {}, "ip": "2.1.1.1", "location": {"city": "ABCTgQ", "country": "aN", "state": "PgBXjNQZfVFB"}, "infrastructure": ["SATELLITE", "DATACENTER", "DATACENTER", "MOBILE"]}