What is the problem?
There are circumstances where duplicate rows are written to the database tables.
So far, I have seen it with actor_data_behaviors, actor_data_proxies, behaviors and proxies.
I am not sure if this will lead to any problems. Perhaps incorrect counting of numbers of proxies?
Steps to reproduce problem
- Download and put it into the ipoid/tmp directory
- If necessary, start up docker in the ipoid directory (e.g. docker compose up -d)
- Initialise the database: docker compose exec web node init-db.js
- Run this command: docker compose exec web node import-data.js ./tmp/duplicate_rows_reprod.json.gz
- Access the database (e.g. docker compose exec db mysql test -u root -p)
- Run the queries:
SELECT * FROM (SELECT *, COUNT(*) AS duplicates FROM actor_data_behaviors GROUP BY actor_data_id,behavior_id) AS foo WHERE duplicates > 1; SELECT * FROM (SELECT *, COUNT(*) AS duplicates FROM actor_data_proxies GROUP BY actor_data_id,proxy_id) AS foo WHERE duplicates > 1; SELECT * FROM (SELECT *, COUNT(*) AS duplicates FROM behaviors GROUP BY behavior) AS foo WHERE duplicates > 1; SELECT * FROM (SELECT *, COUNT(*) AS duplicates FROM proxies GROUP BY proxy) AS foo WHERE duplicates > 1;
Expected behavior: No results should be returned
Observed behavior: Output like:
+---------------+-------------+------------+ | actor_data_id | behavior_id | duplicates | +---------------+-------------+------------+ | 52 | 2 | 2 | | 53 | 2 | 2 | | 106 | 14 | 2 | | 107 | 10 | 2 | | 107 | 14 | 2 | +---------------+-------------+------------+ 5 rows in set (0.001 sec) +---------------+----------+------------+ | actor_data_id | proxy_id | duplicates | +---------------+----------+------------+ | 54 | 3 | 2 | | 54 | 7 | 5 | | 108 | 3 | 2 | | 108 | 14 | 5 | +---------------+----------+------------+ 4 rows in set (0.000 sec) +------+----------+------------+ | pkid | behavior | duplicates | +------+----------+------------+ | 2 | NULL | 2 | +------+----------+------------+ 1 row in set (0.000 sec) +------+-------+------------+ | pkid | proxy | duplicates | +------+-------+------------+ | 1 | NULL | 6 | +------+-------+------------+ 1 row in set (0.000 sec)
Environment
ipoid commit 8cb472e49f7b8297940a808a004a66a81ebbfaf4