What is the problem?
I have seen a few examples of data being imported incorrectly or differently, but it is not consistently reproducible.
I wonder if this is a symptom of a larger issue which we should investigate.
The example below shows where a row is not being written to the actor_data_tunnels table when it should. It occurs maybe half the times I have tried it.
Here are other examples I have found so far:
- T339204#8968114 data not being written to actor_data_tunnels (possibly the same bug as in the reproduction steps below)
- {T341180}
- T339837: "Data too long for column 'anonymous' at row 1"
- T339324: "Data too long for column 'type' at row 1"
Steps to reproduce problem
- Save the JSON from "Reproduction data" below as a .gz file (e.g. reprod.json.gz) into the ipoid/tmp directory
- If necessary, start up docker in the ipoid directory (e.g. docker compose up -d)
- Initialise the database: docker compose exec web node init-db.js
- Run this command: docker compose exec web node import-data.js ./tmp/reprod.json.gz
- Access the database (e.g. docker compose exec db mysql test -u root -p)
- Run the query: SELECT * FROM actor_data_tunnels;
- You may need to repeat steps 3 and 4 multiple times to reproduce
Expected behavior: Output something like:
MariaDB [test]> SELECT * FROM actor_data_tunnels; +---------------+-----------+ | actor_data_id | tunnel_id | +---------------+-----------+ | 1 | 1 | +---------------+-----------+
Observed behavior: Output something like:
MariaDB [test]> Empty set (0.001 sec)
Environment
ipoid commit 8cb472e49f7b8297940a808a004a66a81ebbfaf4
Reproduction data
{"as": {"number": 12345, "Organization": "Foobar"}, "client": {"behaviors": ["FILE_SHARING"], "concentration": {"geohash": "srwcr5ugt", "city": "Foobar", "state": "Foobar", "country": "NL", "skew": 0.5, "density": 0.5, "countries": 10}, "count": 10, "countries": 10, "proxies": ["OXYLABS_PROXY"], "spread": 12345, "types": ["MOBILE"]}, "infrastructure": "MOBILE", "location": {"city": "Baku", "state": "Baku City", "country": "AZ"}, "organization": "Foobar", "risks": ["CALLBACK_PROXY"], "services": ["IPSEC"], "tunnels": [{"operator": "TUNNELBEAR_VPN"}], "ip": "6bcaac10-88d8-4e2c-8e98-ae27d8840e26"}
Useful scripts
- Script to try the same test JSON feed multiple times with the entries in a different order: https://gitlab.wikimedia.org/dwalden/ipmasking-testing/-/blob/main/comparison.sh
- Uses shuf to reorder the data
- Script to compare what is stored in the database with the JSON feed: https://gitlab.wikimedia.org/dwalden/ipmasking-testing/-/blob/main/ipoid_data_validation_2.py
- Script to generate random test data: https://gitlab.wikimedia.org/dwalden/ipmasking-testing/-/blob/main/spur_random_data.py
QA Results - iPod
AC | Status | Details |
---|---|---|
1 | ❌ | https://phabricator.wikimedia.org/T341660 here |