Basic access / overview:
[] [[https://wikitech.wikimedia.org/wiki/SRE/Production_access#Setting_up_your_access|ssh config setup]] (can remove config related to gerrit) and confirm access with stat1008: `ssh appledora@stat1008.eqiad.wmnet`
[] create [[https://wikitech.wikimedia.org/wiki/HTTP_proxy|HTTP proxy]] by adding to `~/.profile` on stat1008:
```
export http_proxy=http://webproxy:8080
export https_proxy=http://webproxy:8080
```
[] HDFS + kinit-ing
* able to access hive (from stat1008: `$ kinit`, `$ hive`, `$ show databases;`)
[] PySpark + Jupyter notebooks
* Verify you can access [[https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Access|Jupyter cluster]] from local computer: `ssh -N stat1008.eqiad.wmnet -L 8880:127.0.0.1:8880` and then navigate to http://localhost:8880/ in your web browser and verify you can log in (shell username + Wikitech password). NOTE: requires ldap-access T322222
* Create a bash alias for the above ssh command so you don't have to remember all that -- e.g., adding the following to your `.bash_profile` or `.bash_aliases` file so typing in `JUPYTER` into your terminal will connect you:
```
function JUPYTER() {
ssh -N "stat1008.eqiad.wmnet" -L 8880:127.0.0.1:8880;
}
```
Working with PySpark:
[] Walk through example notebook together -- [[https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#PySpark_and_wmfdata|wmfdata]], SparkSQL queries, etc.
[] Data on the cluster: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake
Backlog:
[] Superset: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset
* example of top-viewed articles on a given day/wiki: https://superset.wikimedia.org/superset/sqllab?savedQueryId=355
[] [[https://wikitech.wikimedia.org/wiki/Analytics/Systems/Clients|stat1008]] access to dumps / mariadb replicas