Page MenuHomePhabricator

Wikidata Special:EntityData not being filled with new data
Closed, ResolvedPublic

Description

https://grafana.wikimedia.org/dashboard/db/wikidata-special-entitydata

The user running the script to add the data does not have access to hive / hadoop!

addshore@stat1002:/a/analytics-wmde/src/scripts/src/wikidata$ sudo -u analytics-wmde php specialEntityData.php
2016-07-28 14:25:42 START /a/analytics-wmde/src/scripts/src/wikidata/specialEntityData.php
addshore@stat1002:/a/analytics-wmde/src/scripts/src/wikidata$ sudo -u analytics-wmde cat entitydata_errors_1.txt
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.
FAILED: SemanticException Unable to determine if hdfs://analytics-hadoop/wmf/data/wmf/webrequest is encrypted: org.apache.hadoop.security.AccessControlException: Permission denied: user=analytics-wmde, access=READ, inode="/wmf/data/wmf/webrequest":hdfs:analytics-privatedata-users:drwxr-x---
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:151)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6605)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6587)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6512)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEZForPath(FSNamesystem.java:9122)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEZForPath(NameNodeRpcServer.java:1608)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEZForPath(AuthorizationProviderProxyClientProtocol.java:926)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEZForPath(ClientNamenodeProtocolServerSideTranslatorPB.java:1343)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

Event Timeline

Addshore moved this task from Incoming to Doing on the WMDE-Analytics-Engineering board.
Addshore moved this task from Unsorted 💣 to Back Burner 🏛️ on the User-Addshore board.

Change 301610 had a related patch set uploaded (by Addshore):
Add analytics-wmde user to role::analytics_cluster::users

https://gerrit.wikimedia.org/r/301610

Change 301610 merged by Ottomata:
Add analytics-wmde user to role::analytics_cluster::users

https://gerrit.wikimedia.org/r/301610

After discussion with @Ottomata access to the webrequest data has never been given to a system user before (of which analytics-wmde is one).

Thus the best way forward would be to convert the script to an oozie job to be run in the same way as the ArticlePlaceholder oozie job.
(This would have been in the long term plan anyway)

Change 301637 had a related patch set uploaded (by Addshore):
Stop running wikidata/specialEntityData.php in cron

https://gerrit.wikimedia.org/r/301637

Change 301638 had a related patch set uploaded (by Addshore):
Stop running wikidata/specialEntityData.php in cron

https://gerrit.wikimedia.org/r/301638

Change 301637 merged by jenkins-bot:
Stop running wikidata/specialEntityData.php in cron

https://gerrit.wikimedia.org/r/301637

Change 301638 merged by jenkins-bot:
Stop running wikidata/specialEntityData.php in cron

https://gerrit.wikimedia.org/r/301638

Change 301639 had a related patch set uploaded (by Ottomata):
Remove analytics-wmde from analytics cluster, improve docs around analytics cluster users

https://gerrit.wikimedia.org/r/301639

Change 301639 merged by Ottomata:
Remove analytics-wmde from analytics cluster, improve docs around analytics cluster users

https://gerrit.wikimedia.org/r/301639

Change 301657 had a related patch set uploaded (by Addshore):
WIP Create WikidataSpecialEntityDataMetrics

https://gerrit.wikimedia.org/r/301657

Change 301661 had a related patch set uploaded (by Addshore):
WIP Create wikidata/specialentitydata_metrics coordinator

https://gerrit.wikimedia.org/r/301661

Change 302120 had a related patch set uploaded (by Addshore):
Remove specialEntityData script

https://gerrit.wikimedia.org/r/302120

Change 302121 had a related patch set uploaded (by Addshore):
Remove specialEntityData script

https://gerrit.wikimedia.org/r/302121

Change 302120 merged by jenkins-bot:
Remove specialEntityData script

https://gerrit.wikimedia.org/r/302120

Change 302121 merged by jenkins-bot:
Remove specialEntityData script

https://gerrit.wikimedia.org/r/302121

Change 301657 merged by Nuria:
Create WikidataSpecialEntityDataMetrics

https://gerrit.wikimedia.org/r/301657

Change 301661 merged by Nuria:
Create wikidata/specialentitydata_metrics coordinator

https://gerrit.wikimedia.org/r/301661

Addshore moved this task from Active 🚁 to Closing ✔️ on the User-Addshore board.
Addshore moved this task from Doing to Done on the WMDE-Analytics-Engineering board.