As wdqs user I want triples shared by multiple entities to be treated separately in the streaming updater so that they are not deleted when an entity stops referencing them.
Some shared statements are still present in the rdf stream, these are identified at consumption but should be handled and categorized when producing them.
java.lang.IllegalArgumentException: Cannot add/delete the same triple for a different entity (should probably be considered as a shared statement) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.lambda$findInvalidStatements$6(PatchAccumulator.java:74) at java.util.HashMap.forEach(HashMap.java:1289) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.findInvalidStatements(PatchAccumulator.java:71) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:54) at org.wikidata.query.rdf.updater.consumer.PatchAccumulator.accumulate(PatchAccumulator.java:108) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at org.wikidata.query.rdf.updater.consumer.KafkaStreamConsumer.poll(KafkaStreamConsumer.java:131) at org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.lambda$run$0(StreamingUpdaterConsumer.java:46) at org.wikidata.query.rdf.common.TimerCounter.time(TimerCounter.java:51) at org.wikidata.query.rdf.updater.consumer.StreamingUpdaterConsumer.run(StreamingUpdaterConsumer.java:46) at org.wikidata.query.rdf.updater.consumer.StreamingUpdate.main(StreamingUpdate.java:49)
AC:
- the producer should identify all shared triples properly
- the consumer should continue to fail when such triples are detected but the log message should be clearer and includes the triple and the entities it belongs to
- bonus: the consumer should have a way to "fixup" these triples by "re-categorizing" them on the fly so that the rdf stream does not have to be re-generated