Page MenuHomePhabricator

Fix unrecognized subjects error on redirects
Closed, ResolvedPublic

Description

I hit this playing with the sync tool:
15:11:37.617 [update 0] WARN org.wikidata.query.rdf.tool.Update - Contained error syncing. Giving up on Q19324639
org.wikidata.query.rdf.tool.rdf.Munger$BadSubjectException: Unrecognized subjects: [http://www.wikidata.org/entity/statement/q2067709-034091F7-4167-4CE3-AC08-5FC432530023, http://www.wikidata.org/entity/statement/Q2067709-F626DD85-7330-45AB-B6CE-A984F9F96B48, http://www.wikidata.org/entity/statement/Q2067709-A5D7C27D-E9B4-41F8-82A2-3D75B00D5A25, http://www.wikidata.org/entity/statement/Q2067709-95543F57-8E64-47AA-89EF-71F80467268A, http://www.wikidata.org/entity/statement/Q2067709-E8B71900-6A80-4562-82CC-A8AA3671136B, http://www.wikidata.org/entity/value/4017cd5b2efe4f54f5a1bcbf305f4dfa, http://www.wikidata.org/entity/statement/Q2067709-74436C53-7C3F-4667-AAE1-FF488C67757F, http://www.wikidata.org/entity/statement/Q2067709-3816B198-A457-47BB-BD2A-DAACFF1369C0, http://www.wikidata.org/entity/statement/Q2067709-805BE5D1-AC23-48DE-8F60-0164FFCF4FAB, http://www.wikidata.org/entity/statement/Q2067709-88B4C06D-5052-430B-A579-E4F4DF3D0C2C, http://www.wikidata.org/entity/statement/Q2067709-CCB4C139-EDC9-4C58-8380-B9CC9FA60F8E, http://www.wikidata.org/entity/statement/Q2067709-D8520DF5-C1BF-45D2-8F80-2D623C90F2B4]. Expected only sitelinks and subjects starting with http://www.wikidata.org/wiki/Special:EntityData/ and http://www.wikidata.org/entity/
at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.finishCommon(Munger.java:477) ~[classes/:na]
at org.wikidata.query.rdf.tool.rdf.Munger$MungeOperation.munge(Munger.java:201) ~[classes/:na]
at org.wikidata.query.rdf.tool.rdf.Munger.munge(Munger.java:122) ~[classes/:na]
at org.wikidata.query.rdf.tool.Update.handleChange(Update.java:255) ~[classes/:na]
at org.wikidata.query.rdf.tool.Update.access$0(Update.java:248) ~[classes/:na]
at org.wikidata.query.rdf.tool.Update$1.run(Update.java:195) ~[classes/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_75]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]

It should be fixed.

Event Timeline

Manybubbles raised the priority of this task from to Needs Triage.
Manybubbles updated the task description. (Show Details)
Manybubbles moved this task to Needs triage on the Discovery-ARCHIVED board.
Manybubbles subscribed.
Manybubbles set Security to None.

This seems to be caused by redirects - if you go to https://www.wikidata.org/wiki/Q9821329 you are redirected to https://www.wikidata.org/wiki/Q9821329. We need to figure out how to handle redirects. See also T69033.

Smalyshev renamed this task from Fix unrecognized subjects error to Fix unrecognized subjects error on redirects.Apr 16 2015, 5:25 PM

Current solution until T69033 is fixed is to ignore the redirect and treat the source entity as not existing. As we find solution to RDF representation, we add back that representation.

OK, the problem here seems to be wikidata.org does not do actual redirect - it just returns data for a different entity. So I guess it should be fixed on the RDF dump end. Good news is we probably can implement different flavor handling for this.

No, we're waiting for wikidata patch for redirects which is still WIP. For now, it's ok to throw on redirects, until we figure out what to do with them (see blocking tasks).

Since we've merged the workaround do you want to punt this back to the backlog and mark it blocked, waiting on wikidata?

I think this is done - it doesn't error out anymore. I'll do another test when all relevant changes are deployed but I think this one is done.