Page MenuHomePhabricator

Inconsistent use of "UNKNOWN" for types and risks
Closed, ResolvedPublicBUG REPORT

Description

What is the problem?

When an entry in the JSON feed has no types or risks or they are empty lists, it records them as UNKNOWN.

If they are values that the import script does not recognise, it records them as an empty string.

For consistency and to save future confusion, we should probably always record invalid values or values it does not recognise as UNKNOWN.

Steps to reproduce problem
  1. Save the JSON from "Reproduction data" below as a .gz file (e.g. reprod.json.gz) into the ipoid/tmp directory
  2. If necessary, start up docker in the ipoid directory (e.g. docker compose up -d)
  3. Initialise the database: docker compose exec web node init-db.js
  4. Run this command: docker compose exec web node import-data.js ./tmp/reprod.json.gz
  5. Access the database (e.g. docker compose exec db mysql test -u root -p)
  6. Run the query: SELECT ip,risks,types FROM actor_data;

Expected behavior: Each row and column has UNKNOWN
Observed behavior: Output is:

SELECT ip,risks,types FROM actor_data;
+--------------------------------------+---------+---------+
| ip                                   | risks   | types   |
+--------------------------------------+---------+---------+
| 9ec540f8-c191-4c11-9d88-47a86c312324 |         |         |
| 9ec540f8-c191-4c11-9d88-47a86c312325 | UNKNOWN | UNKNOWN |
| 9ec540f8-c191-4c11-9d88-47a86c312326 | UNKNOWN | UNKNOWN |
| 9ec540f8-c191-4c11-9d88-47a86c312327 |         |         |
| 9ec540f8-c191-4c11-9d88-47a86c312328 |         |         |
+--------------------------------------+---------+---------+
Environment

ipoid commit 8cb472e49f7b8297940a808a004a66a81ebbfaf4

Reproduction data
{"as": {"Organization": ""}, "client": {"behaviors": [], "concentration": {}, "count": false, "countries": false, "proxies": [], "spread": false, "types": [false]}, "infrastructure": false, "location": {}, "organization": false, "risks": [false], "services": [], "tunnels": [], "ip": "9ec540f8-c191-4c11-9d88-47a86c312324"}
{"as": {"Organization": ""}, "client": {"behaviors": [], "concentration": {}, "count": false, "countries": false, "proxies": [], "spread": false, "types": []}, "infrastructure": false, "location": {}, "organization": false, "risks": [], "services": [], "tunnels": [], "ip": "9ec540f8-c191-4c11-9d88-47a86c312325"}
{"as": {"Organization": ""}, "client": {"behaviors": [], "concentration": {}, "count": false, "countries": false, "proxies": [], "spread": false}, "infrastructure": false, "location": {}, "organization": false, "services": [], "tunnels": [], "ip": "9ec540f8-c191-4c11-9d88-47a86c312326"}
{"as": {"Organization": ""}, "client": {"behaviors": [], "concentration": {}, "count": false, "countries": false, "proxies": [], "spread": false, "types": [""]}, "infrastructure": false, "location": {}, "organization": false, "risks": [""], "services": [], "tunnels": [], "ip": "9ec540f8-c191-4c11-9d88-47a86c312327"}
{"as": {"Organization": ""}, "client": {"behaviors": [], "concentration": {}, "count": false, "countries": false, "proxies": [], "spread": false, "types": ["foo"]}, "infrastructure": false, "location": {}, "organization": false, "risks": ["foo"], "services": [], "tunnels": [], "ip": "9ec540f8-c191-4c11-9d88-47a86c312328"}

QA Results - Ipoid SQL

ACStatusDetails
1For consistency and to save future confusion, we should probably always record invalid values or values it does not recognise as UNKNOWN

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Store types and risks as 'UNKNOWN' if unknown value providedrepos/mediawiki/services/ipoid!37tchandersunknown-types-risksmain
Customize query in GitLab

Event Timeline

QA notes
Because of this change https://gitlab.wikimedia.org/repos/mediawiki/services/ipoid/-/merge_requests/28,

When testing this one will need to run docker-compose when one this commit 8cb472e49f7b8297940a808a004a66a81ebbfaf4 to check "before behaviour" and run docker-compose again on main to test this change
cc: @GMikesell-WMF

@Tchanders With the new patch tested, unrecognized or invalid values, now display them as UNKNOWN as seen from the screenshots below. Thanks for your work and thanks @TThoabala for the info update. I will move this to Done.

Status: ✅ PASS
Environment: Ipoid SQL Visual Studio Code
OS: macOS Ventura
Browser: N/A
Device: MBP
Emulated Device:N/A
Test Links: N/A
Old Ipoid commit: 8cb472e49f7b8297940a808a004a66a81ebbfaf4
New Ipoid commit: 4bc031990460a3c84a6fc3b3bd02589b3db312f9

✅AC1: For consistency and to save future confusion, we should probably always record invalid values or values it does not recognise as UNKNOWN

Old PatchNew Patch
2023-08-28_06-38-20.png (920×1 px, 211 KB)
2023-08-28_07-37-03.png (1×2 px, 451 KB)

@Tchanders With the new patch tested, unrecognized or invalid values, now display them as UNKNOWN as seen from the screenshots below. Thanks for your work and thanks @TThoabala for the info update. I will move this to Done.

Status: ✅ PASS
Environment: Ipoid SQL Visual Studio Code
OS: macOS Ventura
Browser: N/A
Device: MBP
Emulated Device:N/A
Test Links: N/A
Old Ipoid commit: 8cb472e49f7b8297940a808a004a66a81ebbfaf4
New Ipoid commit: 4bc031990460a3c84a6fc3b3bd02589b3db312f9

✅AC1: For consistency and to save future confusion, we should probably always record invalid values or values it does not recognise as UNKNOWN

Old PatchNew Patch
2023-08-28_06-38-20.png (920×1 px, 211 KB)
2023-08-28_07-37-03.png (1×2 px, 451 KB)