Page MenuHomePhabricator

Publishing html files generated on notebook hosts
Closed, ResolvedPublic3 Estimated Story Points

Description

I want to schedule a cron job to update and publish a jupyter notebook on notebook1003/1004 daily, but it's impossible to publish automatically with the current publishing solutions. As a workaround, we can setup something on the notebook hosts like /srv/published-datasets on the stat* boxes, so that html files generated from jupyter notebooks can be copied to this directory and then share with the public. This would make automatic publishing from the same notebook host possible because no password is required.

Event Timeline

Ottomata triaged this task as Medium priority.
Ottomata added a project: Analytics-Kanban.
Ottomata set the point value for this task to 3.

Change 494501 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Sync /srv/published-datasets from SWAP hosts

https://gerrit.wikimedia.org/r/494501

Change 494501 merged by Ottomata:
[operations/puppet@production] Sync /srv/published-datasets from SWAP hosts

https://gerrit.wikimedia.org/r/494501

@chelsyx you should now be able to put files in /srv/published-datasets on notebook hosts. Files there will eventually show up at https://analytics.wikimedia.org/datasets. You should be able to manually run the sync by running published-datasets-sync on a notebook (or stat) host.

Thank you so much @Ottomata ! Works like a charm!

Hi @Ottomata , published-datasets-sync works well when I (user chelsyx) execute it. But when I execute it via a cron job, I will see published-datasets-sync: command not found. Do you have any idea why this is happening?

It is in /usr/local/bin, perhaps your PATH doesn't contain it!

But! You shouldn't need to execute it in a cron, it already runs every 15 minutes in another cron.