Page MenuHomePhabricator

Decide how to represent missing data vs unexpected data
Closed, DeclinedPublic

Description

In output-sql generateInsertActorQueries, we build a representation of the data for each IP address to be stored in the actor_data table:

const actorObj = {
	actor_data: {
		ip: actor.ip,
		org: actor.organization,
		client_count: actor.client.count || 0,
		types: actor.client.types ?
			getActorTypes(actor.client.types) : actorTypes.UNKNOWN,
		conc_city:
			actor.client.concentration && actor.client.concentration.city ? actor.client.concentration.city : '',
		conc_state:
			actor.client.concentration && actor.client.concentration.state ? actor.client.concentration.state : '',
		conc_country:
			actor.client.concentration && actor.client.concentration.country ? actor.client.concentration.country : '',
		countries: actor.client.countries || 0,
		location_country: actor.location.country || '',
		risks: actor.risks ? getActorRisks(actor.risks) : riskTypes.UNKNOWN
	},
	behaviors: actor.client.behaviors || [],
	proxies: actor.client.proxies || [],
	tunnels: actor.tunnels && actor.tunnels.length ?
		getTunnels(actor.tunnels) : false
};

How we handle empty fields is inconsistent - e.g. for behaviors, we store nothing, but for types we store actorTypes.UNKNOWN.

  • What should we store for missing fields? (Which might be legitimately empty)
  • What should we store for unexpected data? (E.g. an unrecognized behavior)

Event Timeline

Tunnels that are missing operators are ignored completely. Should we revisit this, based on the below?

Q: Quite a few IPs are associated with tunnel activity (VPN, proxy) don’t have an operator. What does this mean for the IP?
A: This means we were unable to attribute the service operator for an IP. Often times these are privately hosted VPNs stood up by individual users. It still stands, however, that the IP is being used as a VPN/Proxy.