Page MenuHomePhabricator

One week after SDC edits the data still shows up in WCQS queries
Closed, ResolvedPublic

Description

I have a query finding identical duplicate SDC entries for some statements. I removed the duplicates (see here on September 1 but today on September 6 the query still shows duplicates. WCQS should show us what is the data now, not a week ago.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jarekt renamed this task from One week after SDC queries the data still shows up in WCQS queries to One week after SDC edits the data still shows up in WCQS queries.Sep 7 2020, 2:37 AM

What time on September 1st was this? According to https://lists.wikimedia.org/pipermail/commons-l/2020-August/008161.html it appears the data is reloaded every Tuesday around 9am UTC; perhaps after two days your changes will then manifest in the query result.

The process we have right now is that we use SDC dumps to reload the data each week. Dumps are made each Sunday, which means, that all the changes made between Aug 30th and Sep 6th will only show up in the dump released on Sep 6th. I pushed the update time from Tuesday to Monday, so that the change is visible faster, but it still dependent on the dump itself. Since the current process takes around 4h, we decided not to do the reload more often. Reload should happen today, so after today the change should be visible.

Please note that this is a limitation of the beta service - real time updates will be implemented at some point.

This comment was removed by Zbyszko.

Today's reload happened and if this (https://tinyurl.com/y5vd95rm) query is correct, there are no duplicates. @Jarekt, can you confirm?

I can confirm that my query works as expected today all the found duplicates were real. I did not realized that we had at maximum one week lag between edit and the update of WCQS. The length of that lag was my concern, but it sounds like it will not be addressed in beta version

Gehel claimed this task.