As a **WME Engineer** i want to synchronize ondemand state with snapshots state. At the moment they are out of sync.
**Root Cause**
A schema change has impacted ondemand service avro deserialization. Our complex system of submodules, and lack of detection of out of sync….ondemand for the base product has been losing events since then. This means ondemand api for base product and structured contents is now out of sync with the snapshots.
The issue doesn't impact structured contents, so the focus should be only on articles.
**Solution**
As discussed with team, the decision is to synchronize ondemand state only, by resetting ondemand consumer offsets and replay traffic.
**TODO**
[] scale Record "Current offset" for consumer `ondemand services to 0-articles-compacted-v2` for all topic-partitions.
[] wait for consumer to go from STABLE to EMPTY state
[] reset the consumer offsets, consumers:
* ondemand-articles-compacted-v2
* ondemand-categories-compacted-v2
* ondemand-files-compacted-v2
* ondemand-templates-compacted-v2Tip: You can make use of this [[ https://gitlab.enterprise.wikimedia.com/wikimedia-enterprise/experiments/kafka-consumer | repo ]] . Update it to give you current offsets. You can use [[ https://pkg.go.dev/github.com/confluentinc/confluent-kafka-go/v2/kafka#Consumer.Position | Position ]] API.
[] Create new instances of ondemand service with consumer auto.offset.reset "earliest", and consumer_group name `ondemand-articles-compacted-fix` (for instance)
[] scale ondemand to it's original scaling size Tip: You can use ondemand-<x>-deploy as template and create 4 new repos in service-deploys. Update `ondemad` service to set consumer `auto.offet.reset`. Take a look at `KAFKA_AUTO_OFFSET_RESET` env variable in snapshots service as an example.
[] After deploying the new instances, monitor the current offset. This is same process as step 1 but for the new consumer group name. The current offset should steadily grow from 0 till the offsets you recorded in step 1.
[] Monitor logs for serialization errorKeep monitoring occasionally. Once the offsets for the new consumer group exceeds the offsets recorded in step 1 for all topic-partitions, until consumer lag is down to 0we can destroy the new instances.
**Note**
You can execute the procedure one service at a time, starting maybe by files.Here is the visual representation of current offsets for `ondemand-articles-compacted-v2` for these topic-partitions.
{F60380679}
**Acceptance Criteria**
* Consumer lag down to 0 or close* Old events processed by new ondemand instances (current_offset of new instances > current_offset of old instances)
* No errors during re-ingestionre-processing.