Page MenuHomePhabricator

Transfer Toledo project calculations to my SWAP account
Closed, ResolvedPublic

Description

To-do:

On stat1004:

  • Set up archive directory hdfs://analytics-hadoop/user/neilpquinn-wmf/toledo_pageviews/daily: hdfs dfs -mkdir -p /user/neilpquinn-wmf/toledo_pageviews/daily
  • Copy data file from hdfs://analytics-hadoop/user/chelsyx/toledo_pageviews/daily to hdfs://analytics-hadoop/user/neilpquinn-wmf/toledo_pageviews/daily: hdfs dfs -cp /user/chelsyx/toledo_pageviews/daily/data.gz /user/neilpquinn-wmf/toledo_pageviews/daily
  • Create the table neilpquinn.toledo_pageviews (script P8767), then verify
  • Copy oozie files to hdfs://analytics-hadoop/user/neilpquinn-wmf/
    • cp -r /home/chelsyx/oozie_for_neil/ /home/neilpquinn-wmf/oozie/
    • hdfs dfs -put /home/neilpquinn-wmf/oozie/ /user/neilpquinn-wmf/
  • Deploy the oozie job, verify that it works correctly: oozie job -run -Duser=neilpquinn-wmf -config ./oozie/toledo_pageviews/daily/coordinator.properties -Dstart_time=2019-07-17T00:00Z
  • Shut down the old Oozie job (ID 0135672-181112144035577-oozie-oozi-C)

On notebook1004:

  • Replace all chelsyx with neilpquinn-wmf/neilpquinn in the code and notebook of the github repo
    • Done. Please run git clone https://github.com/wikimedia-research/Audiences-External_automatic_translation.git external-automatic-translation
  • Copy file external_machine_translation_edits_revert.tsv to neilpquinn-wmf: cp /home/chelsyx/external-automatic-translation/external_machine_translation_edits_revert.tsv /home/neilpquinn-wmf/external-automatic-translation/
  • Test the notebook to see if everything works correctly
  • Change permission for the publishing directory:
    • chmod g+rwx /srv/published-datasets/external-automatic-translation
    • chmod g+rw /srv/published-datasets/external-automatic-translation/impact\ of\ external\ automatic\ translation\ services.html
  • Create an .bash_profile in home directory and add:
[[ -r ~/.bashrc ]] && . ~/.bashrc
export PATH=${PATH}:~/venv/bin
export http_proxy=http://webproxy.eqiad.wmnet:8080
export https_proxy=http://webproxy.eqiad.wmnet:8080
  • Install mwapi and mwreverts, then try to run external-automatic-translation/fetch_edit_check_revert.py and see if it works correctly
  • Run update_publish_notebook.sh and check notebook_update.log to see if there is any error
  • Deploy the cron job: crontab -e, then paste this line 0 2 * * * /home/neilpquinn-wmf/external-automatic-translation/update_publish_notebook.sh
  • Verify that the cron job is running daily

Related Objects

Event Timeline

Neil_P._Quinn_WMF moved this task from Triage to Next Up on the Product-Analytics board.
chelsyx triaged this task as High priority.Jul 17 2019, 7:23 PM
chelsyx moved this task from Next Up to Doing on the Product-Analytics board.
chelsyx updated the task description. (Show Details)Jul 17 2019, 7:59 PM
chelsyx updated the task description. (Show Details)Jul 17 2019, 9:06 PM
chelsyx updated the task description. (Show Details)Jul 17 2019, 10:52 PM
chelsyx updated the task description. (Show Details)Jul 18 2019, 9:38 PM
chelsyx updated the task description. (Show Details)Jul 18 2019, 9:40 PM
chelsyx added a subscriber: Ottomata.EditedJul 18 2019, 10:25 PM

Hi @Ottomata , we are trying to transfer the ownership of Toledo notebook to @Neil_P._Quinn_WMF (see the task description for more details). One of the step is to create an oozie job to update the neilpquinn.toledo_pageviews table daily (code for the oozie job). We tried to deploy the job, but encounter an error:

oozie job -log 0003462-190715143115257-oozie-oozi-W
-------------------------------------
......
2019-07-18 20:04:44,090  WARN HiveActionExecutor:523 - SERVER[an-coord1001.eqiad.wmnet] USER[neilpquinn-wmf] GROUP[-] TOKEN[] APP[toledo_pageviews-daily-wf-2019-7-17] JOB[0003462-190715143115257-oozie-oozi-W] ACTION[0003462-190715143115257-oozie-oozi-W@get_toledo_pageviews_daily] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [1]
2019-07-18 20:04:44,108  INFO ActionEndXCommand:520 - SERVER[an-coord1001.eqiad.wmnet] USER[neilpquinn-wmf] GROUP[-] TOKEN[] APP[toledo_pageviews-daily-wf-2019-7-17] JOB[0003462-190715143115257-oozie-oozi-W] ACTION[0003462-190715143115257-oozie-oozi-W@get_toledo_pageviews_daily] ERROR is considered as FAILED for SLA
......

I don't know how to interpret this error message, and I didn't find any useful information in the the hadoop job log neither: https://yarn.wikimedia.org/jobhistory/job/job_1561367702623_92584

Interestingly, the same oozie job that has been running under my account for ~4 months ran into the same error today: https://hue.wikimedia.org/oozie/list_oozie_workflow/0003655-190715143115257-oozie-oozi-W/

Any suggestion on trouble-shooting?

chelsyx updated the task description. (Show Details)Jul 19 2019, 3:11 AM
chelsyx updated the task description. (Show Details)Jul 19 2019, 5:02 PM

@chelsyx I reran update_publish_notebook.sh and it looks like all the problems are solved except the script not being able to move the HTML notebook to the published-datasets folder even though you changed the permissions.

Here's the most recent output of notebook_update.log:

Fri Jul 19 16:29:25 UTC 2019
You can find the source for `wmfdata` at https://github.com/neilpquinn/wmfdata
Querying edits from external machinetranslation...
Checking revert...
viwiki start!
viwiki completed in 4 s
bnwiki start!
bnwiki completed in 0 s
mlwiki start!
mlwiki completed in 0 s
urwiki start!
urwiki completed in 0 s
cebwiki start!
cebwiki completed in 0 s
idwiki start!
idwiki completed in 0 s
eswiki start!
eswiki completed in 0 s
enwiki start!
enwiki completed in 0 s
hiwiki start!
hiwiki completed in 0 s
[NbConvertApp] Converting notebook external-automatic-translation/impact of external automatic translation services.ipynb to html
[NbConvertApp] Executing notebook with kernel: python3
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/07/19 16:32:24 WARN Utils: Service 'sparkDriver' could not bind on port 12000. Attempting port 12001.
19/07/19 16:32:24 WARN Utils: Service 'sparkDriver' could not bind on port 12001. Attempting port 12002.
19/07/19 16:32:24 WARN Utils: Service 'sparkDriver' could not bind on port 12002. Attempting port 12003.
19/07/19 16:32:24 WARN Utils: Service 'sparkDriver' could not bind on port 12003. Attempting port 12004.
19/07/19 16:32:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/07/19 16:32:24 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/07/19 16:32:24 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/07/19 16:32:24 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/07/19 16:32:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13000. Attempting port 13001.
19/07/19 16:32:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13001. Attempting port 13002.
19/07/19 16:32:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13002. Attempting port 13003.
19/07/19 16:32:31 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on port 13003. Attempting port 13004.
19/07/19 16:33:09 WARN NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@220f77d5.
19/07/19 16:34:10 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
19/07/19 16:35:59 WARN SharedInMemoryCache: Evicting cached table partition metadata from memory due to size constraints (spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). This may impact query planning performance.
[NbConvertApp] Writing 753214 bytes to external-automatic-translation/impact of external automatic translation services.html
cp: cannot create regular file '/srv/published-datasets/external-automatic-translation/impact of external automatic translation services.html': Read-only file system
Neil_P._Quinn_WMF updated the task description. (Show Details)

Okay, we've taken care of all the issues with the notebook and the publication script.

Now, I just need to figure out how to get the Oozie job running under my account and verify that the cron is running properly and updating the notebook daily.

Ok! Let us know either here or on IRC if you need help.

@Ottomata, I should've said "now, I just need someone like @Ottomata to help me figure out the Oozie job" 😂

Can you take a look at Chelsy's question (T228195#5348331)?

oozie job -info 0003462-190715143115257-oozie-oozi-W@get_toledo_pageviews_daily
...
Console URL       : http://an-master1001.eqiad.wmnet:8088/proxy/application_1561367702623_92584/
# This must be run as the user that launched the job. In this case, neilpquinn-wmf
yarn logs -applicationId application_1561367702623_92584
...
2019-07-18 20:04:42,349 [main] ERROR hive.ql.exec.DDLTask  - org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=neilpquinn-wmf, access=WRITE, inode="/tmp/toledo_pageviews":chelsyx:hdfs:drwxr-xr-x

It looks like this job creates a directory in /tmp, but doesn't delete it after the fact. The previously created directory is owned by chelsyx and not writeable by neil.

Neil_P._Quinn_WMF lowered the priority of this task from High to Normal.Jul 30 2019, 1:01 PM
Neil_P._Quinn_WMF closed this task as Resolved.Aug 2 2019, 7:37 PM
Neil_P._Quinn_WMF updated the task description. (Show Details)
Neil_P._Quinn_WMF added a subscriber: mforns.

All working now! Thanks to @Ottomata for the pointer and to @mforns for proactively checking in when me after he noticed all the failing jobs! And of course, thanks to @chelsyx for a nice, detailed set of instructions and good opportunity to learn how Oozie works :)