[Spike] investigate prevalence of spiders within dashboard data
Closed, ResolvedPublic3 Story Points

Description

A research spike to see which data collection scripts tend to run into spiders, and can benefit most from automata removal.

Ironholds created this task.Dec 4 2015, 7:30 PM
Ironholds updated the task description. (Show Details)
Ironholds raised the priority of this task from to Needs Triage.
Ironholds claimed this task.
Ironholds added a subscriber: Ironholds.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 4 2015, 7:30 PM
Ironholds set Security to None.
Ironholds edited a custom field.
Ironholds moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.

For @Tfinc and others:

This is now done (results attached). I looked at a variety of EL schemas across our four areas of Analysis focus, and never found an automata rate > 1%. This is likely to be because automata doesn't tend to accept JavaScript.

Obviously for the Hive-based data, for example, we'll have to do this detection, but it looks like for the vast majority of our data it's not necessary.

Deskana triaged this task as Normal priority.Jan 21 2016, 1:34 AM
Deskana added a subscriber: Deskana.
Deskana closed this task as Resolved.Feb 4 2016, 6:20 AM