Page MenuHomePhabricator

Create cron job in puppet sqooping prod and labs DBs
Closed, ResolvedPublic3 Estimated Story Points

Description

Deployment needs to be synchronized with updates in sqoop script and grouped wikis files (https://gerrit.wikimedia.org/r/#/c/341586/, https://gerrit.wikimedia.org/r/#/c/341586/)

This is the command we use when sqooping manually for a given month YYYY-MM:

export PYTHONPATH=$PYTHONPATH:/srv/deployment/analytics/refinery/python

# For PROD:

# This commend does NOT check if there's an already running one
sudo-u hdfs /srv/deployment/analytics/refinery/bin/download-project-namespace-map -x /wmf/data/raw/mediawiki/project_namespace_map -i prod -V YYYY-MM

# This commend normally checks if there's an already running one
sudo -u hdfs python3 /srv/deployment/analytics/refinery/bin/sqoop-mediawiki-tables      --jdbc-host analytics-store.eqiad.wmnet      --output-dir /wmf/data/raw/mediawiki/tables      --wiki-file  "/mnt/hdfs/wmf/refinery/current/static_data/mediawiki/grouped_wikis/prod_grouped_wikis.csv"      --timestamp YYYYMM01000000      --user research      --infra prod      --version YYYY-MM      --password-file /user/hdfs/mysql-analytics-research-client-pw.txt

# For LABS:

# This commend does NOT check if there's an already running one
sudo-u hdfs /srv/deployment/analytics/refinery/bin/download-project-namespace-map -x /wmf/data/raw/mediawiki/project_namespace_map -i labs -V YYYY-MM

# This commend normally checks if there's an already running one
sudo -u hdfs python3 /srv/deployment/analytics/refinery/bin/sqoop-mediawiki-tables      --jdbc-host labsdb-analytics.eqiad.wmnet      --output-dir /wmf/data/raw/mediawiki/tables      --wiki-file  "/mnt/hdfs/wmf/refinery/current/static_data/mediawiki/grouped_wikis/labs_grouped_wikis.csv"      --timestamp YYYYMM01000000      --user  TBC     --infra labs      --version YYYY-MM      --password-file TBC

Event Timeline

Ottomata changed the point value for this task from 0 to 3.Mar 10 2017, 6:21 PM
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 344165 had a related patch set uploaded (by Ottomata):
[operations/puppet] Add analytics labsdb pw in hdfs, add logrotate for refinery logs

https://gerrit.wikimedia.org/r/344165

Change 344165 merged by Ottomata:
[operations/puppet@production] Add analytics labsdb pw in hdfs, add logrotate for refinery logs

https://gerrit.wikimedia.org/r/344165