Page MenuHomePhabricator

Missing IPs on update when they are duplicated
Open, Needs TriagePublicBUG REPORT

Description

What is the problem?

I am finding circumstances where IPs are missing entirely from the ipoid database on update.

I haven't been able to investigate this very thoroughly yet, but so far it seems to happen when the IP is duplicated in today's feed. If you remove the duplicate line from the 20240102.json.gz reproduction data it is imported correctly.

I have only seen it happen on update so far. If you take the 20240102.json.gz reproduction data and do a fresh import it is imported correctly.

Steps to reproduce problem

Assuming docker:

  1. Setup the docker environment (docker compose up -d; docker-compose exec web mkdir /tmp/ipoid; docker-compose exec web node -e "require('./create-users.js')();")
  2. Copy the two json snippets in Reproduction data and save them to tmp/20240101.json.gz and tmp/20240102.json.gz
  3. Copy them to the ipoid docker container (docker compose exec web cp tmp/20240101.json.gz /tmp/ipoid/20240101.json.gz; docker compose exec web cp tmp/20240102.json.gz /tmp/ipoid/20240102.json.gz)
  4. docker compose exec web ./main.sh --init true --today 20240101 --debug true
  5. docker compose exec web ./main.sh --yesterday 20240101 --today 20240102 --debug true
  6. docker compose exec db mysql -u ipoid_ro -ppassword3 -e "SELECT * FROM actor_data" ipoid

Expected behaviour: The final step returns:

+------+-----------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
| pkid | ip        | org  | client_count | types | conc_city | conc_state | conc_country | countries | location_country | risks | last_updated |
+------+-----------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+
|    3 | 127.3.2.1 | NULL |            3 |     1 |           |            |              |         0 |                  |     1 |   1709639334 |
+------+-----------+------+--------------+-------+-----------+------------+--------------+-----------+------------------+-------+--------------+

Observed behaviour: The final step returns nothing.

Further observations:

  • The query file contains only one row, deleting the old IP:
$ docker compose exec web cat /tmp/ipoid/sub/query_split_aaaaa.sql
DELETE a FROM actor_data_behaviors a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '0:0:0:0:0:0:0:0';@@@@@DELETE a FROM actor_data_proxies a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '0:0:0:0:0:0:0:0';@@@@@DELETE a FROM actor_data_tunnels a INNER JOIN actor_data b on a.actor_data_id = b.pkid WHERE b.ip = '0:0:0:0:0:0:0:0';@@@@@DELETE FROM actor_data WHERE ip = '0:0:0:0:0:0:0:0';@@@@@
  • yesterday_today.unique.sorted contains a line with just {}:
$ docker compose exec web cat /tmp/ipoid/yesterday_today.unique.sorted
{"client": {"count": 3, "types": []}, "ip": "::", "location": {"state": "B", "city": "BDCB", "country": "n"}}@@@@@<
{"client": {"count": 3}, "ip": "127.3.2.1", "location": {"city": "AuC"}, "infrastructure": ["SATELLITE", "MOBILE"], "as": {"number": 393472, "organization": "BHysDNLpKBqt"}}@@@@@>
{"client": {"count": 3}, "ip": "127.3.2.1", "location": {"city": "uuC"}, "infrastructure": ["SATELLITE", "MOBILE"], "as": {"number": 393472, "organization": "BHysDNLpKBqt"}}@@@@@>
{}
Environment

ipoid commit 3fc04e78cb82ac031188446aed0aa1210d1200f0

Reproduction data

20240101.json.gz

{"client": {"count": 3, "types": []}, "ip": "::", "location": {"state": "B", "city": "BDCB", "country": "n"}}

20240102.json.gz

{"client": {"count": 3}, "ip": "127.3.2.1", "location": {"city": "AuC"}, "infrastructure": ["SATELLITE", "MOBILE"], "as": {"number": 393472, "organization": "BHysDNLpKBqt"}}
{"client": {"count": 3}, "ip": "127.3.2.1", "location": {"city": "uuC"}, "infrastructure": ["SATELLITE", "MOBILE"], "as": {"number": 393472, "organization": "BHysDNLpKBqt"}}

Event Timeline

I have found an example of a missing IP where there is a duplicate IP in yesterday's feed.

20240101.json.gz

{"client": {}, "ip": "192.0.0.0", "location": {}}
{"client": {}, "ip": "192.0.0.0", "location": {"city": "Ju", "state": "hrgwhBhwO", "country": "rwSHMD"}}

20240102.json.gz

{"client": {}, "ip": "192.0.0.0", "location": {}}