Page MenuHomePhabricator

Basic filter finds some duplicate ids
Closed, InvalidPublicBUG REPORT

Description

Steps to Reproduce:

Using

  • BasicFilterIntoBatch with:
    • filter1QHeadInto(toBatch),
    • wordCountView(4, 25),
    • commaCountView(0, 3),
    • nDigitCountView(0),
    • lowestWordFreqCountView(2),
    • excludeChunkLike("%;%"),
    • tailNotInBatch(excludeBatch),
  • target 500000
$ go run cmd/selector_new/main.go -chunksize 50 -chunkdecrease 1 -gb -cb -cp -size 500 dbs/AA_BA_20201016.db >| AA_BA_out.txt

Actual Results:

Filtered 477701 sents into test_batch_1 (asked for 500000)

Expected Results:

Filtered 477701 sents into test_batch_1 (asked for 500000)