We have kept logstash data going back to 2016-07-01 in Elasticsearch, but none of the current indices are actually clean enough for importing into Elasticsearch 2.x yet. We believe that once 1.28.0-wmf.10 is on all wikis we will have clean indices (expected starting with 2016-07-15).
We should be able to clean the data from the older indices with a combination of filtering of the data and reloading into Elasticsearch using our new default mapping. The process would be roughly:
- Export records from an index using P3309 or something similar
- Reformat keys with embedded . via P3357 or something similar
- Discard mml records via some filtering script
- Reimport into Elasticsearch via split -l 4000 --filter 'curl -s http://elastic:9200/{indexName}/_bulk --data-binary @- > /dev/null' or a similar loading script