Page MenuHomePhabricator

Add Cron Job Functionality to PAWS (Outreachy internship)
Closed, DeclinedPublic

Description

NOTE: This project has received fewer contributions and is actively seeking new contributors.

JupyterHub lacks an extension to interact with Kubernetes Job or CronJob objects. This task is for an Outreachy intern to develop that extension for PAWS, in particular, based on T124972

Requirements:

  • Allow users to run notebooks on a schedule, like Cron
  • Allow users to run arbitrary terminal commands that can be run from PAWS' terminals on a schedule
  • Allow access to delete or edit scheduled CronJobs
  • Only allow users access to their own CronJobs
  • Do so securely (don't leak k8s credentials, don't give full access k8s cluster, etc)

Skills required:

  • Python and ideally Jupyter notebooks
  • Some Javascript/HTML
  • Kubernetes concepts and usage
  • nginx configuration
  • Docker familiarity

Mentors: @Bstorm, @Chicocvenancio, @aborrero

Microtasks:

Event Timeline

I'm interested! Thanks for thinking on me :-)

Give me a couple of days to 100% confirm my participation, I need to coordinate with my team and see if this is something that we can fit into our schedule.

Also, to take into account: there are strong chances that I will be on vacations during next north hemisphere summer (Jun,Jul, Aug, Sept?) for a yet to be specified period of time, between 1 and 3 weeks.
We should make sure the relevant student aren't left "unattended" during that period.

Also, to take into account: there are strong chances that I will be on vacations during next north hemisphere summer (Jun,Jul, Aug, Sept?) for a yet to be specified period of time, between 1 and 3 weeks.
We should make sure the relevant student aren't left "unattended" during that period.

That should not be an issue. I'm planning to be off for a two week period in May (around Wikimedia-Hackathon-2020), so we need to have plans ready before that, interns should not have started yet if I'm understanding the timeline correctly.

@Bstorm, could we make this a three mentor task?

Regarding the task idea:

T124972: PAWS cron functionality is my favorite. It is heavy on frontend development but it is a task to develop an extension to manage CronJobs in k8s from a jupyterlab or the old jupyter notebook interface focused on the PAWS environment in Cloud-Services. So it also has k8s concepts, cloud services infrastructure, and several monitoring and metrics concepts familiar to operations engineers (T188684 would likely be part of this project, for example).

T139036: Add possibility to run other users notebooks by copying to own folder and improving existing PAWS infrastructure around public links and storage access is another idea I like. It is much more operations-centered as it would be dealing with NFS in Cloud-Services and volumes in k8s and the way PAWS currently has storage setup for each user pod. It would also likely include deploying nbviewer service to substitute current paws-public conversion.

T218737: Analyse offering a limited use, customized PAWS for events is the third one I can think of right now. We could wrap it into creating the tooling to manage The Littlest JupyterHub (TLJH) deployments for conferences and events. I favor this one the least.

A crazy way out there idea I have is to create a BYOC(Build Your Own Container) system based on repo2docker and BinderHub.

@aborrero, @Bstorm, any other ideas for projects? I would love to chat and brainstorm with you about this.

should this be a subtask of T241019?

@aborrero, @Bstorm Any movement on this, should we drop the plan to send the project to GSOC and focus on Outreachy?

Sorry, All Hands isn't helping me focus on this. I'll check on how it relates to T241019 today (and talk to people on that who might help me know what to do next here). I am finding my limited involvement in PAWS is not helping me scope these tasks out.

When is the deadline for GSOC submission again? If it is right after this week, we might want to focus on Outreachy. Travel is getting in the way of things quite a bit for me.

For options:
In a way, I like the second option most because it isn't quite as dependent on building workarounds to the Kubernetes user security model. I think WMCS could deliver a service account appropriately limited to the task in a rebuilt PAWS cluster in coming quarters that would open that option nicely...but will it be in time?
It is harder to do damage to a cluster with volumes than it is with spinning up pods directly via crons.

Overall, I think with all of this I need to clarify my own understanding a bit to really say anything intelligent, and I'm not sure I have the time to dig in deeply this week. I will try a bit and talk to folks here.

Sorry, All Hands isn't helping me focus on this. I'll check on how it relates to T241019 today (and talk to people on that who might help me know what to do next here). I am finding my limited involvement in PAWS is not helping me scope these tasks out.

No problem @aborrero alerted me to All Hands on IRC.

When is the deadline for GSOC submission again? If it is right after this week, we might want to focus on Outreachy. Travel is getting in the way of things quite a bit for me.

February 5 19:00 UTC

For options:
In a way, I like the second option most because it isn't quite as dependent on building workarounds to the Kubernetes user security model. I think WMCS could deliver a service account appropriately limited to the task in a rebuilt PAWS cluster in coming quarters that would open that option nicely...but will it be in time?
It is harder to do damage to a cluster with volumes than it is with spinning up pods directly via crons.

Overall, I think with all of this I need to clarify my own understanding a bit to really say anything intelligent, and I'm not sure I have the time to dig in deeply this week. I will try a bit and talk to folks here.

Unless you feel otherwise I think we should hold this until y'all are back from All Hands. That likely means we go with Outreachy but it seems fine to me.

After being properly informed of the differences in GSCO and Outreachy I am very much considering it a better solution that we focus on Outreachy. The deadline alone means we'd have very little time to plan for GSOC but it seems to may Outreachy stipend for the intern makes it a program I can support more easily.

Hi all! Super excited to see that you are considering to mentor a PAWS project via Outreachy. Sharing some helpful tips and next steps:

  • If you haven't mentored a project via Outreachy before, I would encourage you to read the Mentors' guide here https://www.mediawiki.org/wiki/Outreachy/Mentors.
  • Little bit on the timeline – prospective interns will start coming to your project looking for small tasks to contribute to as part of the application process which runs from March 5th to April 7th. You will need to be available during this time to support the interns as and when they contribute to the project and work on their application. May 19th to August 18th will be the coding period, where you need to be available slightly more. 4-5 hrs per week of commitment is recommended, but if there are three mentors then you can divide the workload :)
  • On deciding the scope of the project – an ideal project is one that can be completed by an experienced developer in 2 weeks as that might take a newcomer 3 months to finish.
  • As and when you finalize on promoting this project, let me know. I will then have to make this task private and restricted to acl*outreachy-mentors group (as per the program guidelines; you all will be able to see the task though) until the contribution period opens on March 5th. I will also share with you some next steps to upload the project proposal on the Outreachy site.

Feel free to ask more questions here or by email :) Thank you for your willingness..

@aborrero and I had a meeting today to discuss some of the concerns and details about this. A couple things came out of it:

  • We need to ensure there is a development environment ready for an intern no matter what to proceed with these. I'm going to check what limitations there are on deploying PAWS to a local environment with something like microk8s or minikube. I have doubts about this option, but it's worth checking that (unless @Chicocvenancio has the answer to that already).
  • There is a requirement already to rebuild the PAWS cluster in a repeatable fashion, which could serve as the dev environment...unless building that repeatable setup is the actual Outreachy project (and this isn't the worst idea except that Kubernetes is ridiculously complex).
  • We need to review exactly how PAWS uses NFS/storage. Current members of WMCS haven't interacted with that much yet. That might not be hard to work around setting up a sandbox for.
  • We want to know if we can set expectations around timezone overlap (@srishakatux can we do that)?
  • T218737: Analyse offering a limited use, customized PAWS for events sounds awesome, but it might be too complex for the program, unless a local (complete) dev environment is actually doable.
  • I'm going to double check some questions about how we fit performance time expectations into mentoring, etc. today.
  • I think we can remove a lot of the concerns and blockers by prioritizing a rebuild of the current cluster with this in mind (which would likely fit into the timetable ok at least scoped to a dev environment at worst), which will ensure that all those involved have a greater degree of comfort with the general setup and maintenance.
    • Embedded in this last point is that we need to work fast to decide which kind of dev environment is most achievable in a short time frame to lay out the specifics of one of the project options (since the specifics would be somewhat different if it is local or in VPS). Note this is about design of the tasks, not about building said dev environment right away.

Just satisfied some of my curiosity around NFS. It seems paws uses NFS as if it were a Toolforge tool, almost--which is making a lot of things make more sense to me now.

  • We want to know if we can set expectations around timezone overlap (@srishakatux can we do that)?

Yes, totally, and that is recommended as well. You can decide on a communication medium that works best for everyone (the one we recommend is Zulip) for asynchronous chats and then a time in a week that works for everyone to do a face-to-face meeting.

At this point, I am feeling like the best options have technical blockers. My thought is that we would really like to do this, but that we probably should dedicate time in the next quarter to making paws more contributor-friendly, instead. @Chicocvenancio and @aborrero does that seem wrong? Every time I have discussed and sat down to design an intern project, I see things we really need to do first.

After discussing in IRC, we believe we *can* provide sufficient dev environment to make one of these work. Sooo, belay my last remark.

srishakatux changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Feb 10 2020, 7:12 PM

@Bstorm @aborrero @Chicocvenancio You have until February 25th to refine the project proposal, modify this Phabricator task description, and upload the proposal + sign up as a mentor on the Outreachy site. Ideally, one of you uploads the proposal, and three of you sign up as a mentor on the site. The Phab task should have a project title, description/requirements, skills required, mentors, and microtasks. Outreachy contribution period is between March 5th, 2020 to April 7th, 2020. This is that period when the prospective applicants are supposed to contribute small tasks related to your project. This is a great opportunity for them to get familiar with your project and for you to point them to minor issues that need help. If you don't have time during this period to support/mentor prospective interns as and when they contribute to small tasks related to your project, you can point them to https://www.mediawiki.org/wiki/Good_first_bugs that they can work on independently. The coding period won't start until May 19th. Let me know if you've questions :)

FYI, as @srishakatux suggested last year, discussed in IRC, and discussed with @bd808, @yuvipanda has moved the main repo to the toolforge github organization at my request. https://github.com/toolforge/paws is where the production PAWS code lives in now.

Bstorm triaged this task as Medium priority.Feb 11 2020, 9:57 PM

I'm gonna call it and say let's work on the cronjob task for this if nobody objects. We need to get working on things. This also makes me want to add some "good first bug" tags to WMCS things that are not really related for the interim period.

So if we do that:
Title: Add Cron Job Functionality to PAWS
Description: Jupyterhub lacks an extention to interact with Kubernetes CronJob objects. We'd like to add that to the PAWS cluster (help me out here @Chicocvenancio ?)
Requirements: <stolen from Chico's comment in the other task>

  • Allow users to run notebooks on a schedule, like Cron
  • Allow users to run arbitrary terminal commands that can be run from PAWS' terminals on a schedule
  • Allow access to delete or edit scheduled CronJobs
  • Only allow users access to their own CronJobs
  • Do so securely (don't leak k8s credentials, don't give full access k8s cluster, etc)

Microtasks: @srishakatux Is this where we break down the work into chunks or is this about other tasks that people can work on during the "contribution period"?

@Bstorm Ideal scenario is where you provide small tasks related to PAWS to can help prospective candidates get familiar with the project. But as I said before – if you don't have time during the contribution period to support/mentor prospective interns as and when they contribute to small tasks related to PAWS, you can point them to https://www.mediawiki.org/wiki/Good_first_bugs that they can work on independently :)

Yeah, I'm definitely going to need a bit of help submitting the project (looking at the Outreachy website) @Chicocvenancio . It's the descriptions that I feel a bit weak on still, not really having used PAWS enough. I get the basic notebook idea, but I'm a little fuzzy on the exact use within our contexts.

Alternatively, I can just fudge it as best I can so that we don't miss the deadline.
Where I'm interested in help there is:
Long description: (Most help here)
Minimum system requirements: (I'm thinking the requirements for minikube and jupyterhub roughly, which I can probably come up with on my own unless you have something in mind)
How can applicants make a contribution to your project? (Do we have anything easy or low-hanging on PAWS? If not, I can try to find things that Bryan hasn't snatched on Toolforge python or something related to Kubernetes documentation etc.)
(Optional) Description of possible internship tasks. What smaller tasks will they start on? What is the main task or tasks for the internship? Do you have any optional stretch goals? (might not be bad to at least back-and-forth before submitting--thought this part is optional)

+1 to everything above.

Thinking on the timeline (or tasks) for the student, this could be an early/rough proposal:

  1. get familiar with PAWS. Register, use it, see what users can do with it. Learn roughly how we deploy it. Basic understanding of CloudVPS and related WMCS stuff.
  2. get familiar with Kubernetes. Install it locally using minikube or any other similar setup. Learn most important API objects and how to work with them.
  3. learn about the jupyterhub project, architecture. Does it allows plugins or other extensions?
  4. try deploying jupyterhub directly into your minikube cluster. The idea is to have a similar setup to PAWS but locally.
  5. start thinking on the cronjob functionalities (T243459#5878917). Investigate and test different options and discuss them with mentors.
  6. iterate on the previous point until more formal proposals/patches are available.
  7. work with mentors on trying to get the proposal/patches merged or implemented.

this could go into the long description of the outreachy project.

Bstorm renamed this task from Plan for GSOC or Outreachy 2020 for PAWS to Add Cron Job Functionality to PAWS (Outreachy internship).Feb 21 2020, 7:37 PM
Bstorm updated the task description. (Show Details)

So I tried submitting a thing: https://www.outreachy.org/outreachy-may-2020-internship-round/communities/wikimedia/add-cron-job-functionality-to-paws-a-wikimedia-jup/cfp/

That was WAY harder than I expected. @srishakatux please let me know if that looks reasonably close to right. I believe now @aborrero and @Chicocvenancio need to sign on as co-mentors, as long as we are all still on board. Help me figure out what looks wrong.

I think we'd want a new stream in Zulip for the project, etc, which is why details are sparse there. I guess the three skills encompass the ones I added above (couldn't add more than three). I'm not sure about the skill levels I put in, but they seemed good. I tried to be reasonable on the computer requirements.

@Bstorm The submission looks good to me! :) We do have a Zulip Stream for Outreachy 20, and we can have a separate topic in it for this project, what do you think?

@Chicocvenancio This is a friendly reminder to sign up as a co-mentor for this project on the Outreachy website. :)

Thanks everyone, your proposal has been approved and you will be able to see it here https://www.outreachy.org/communities/cfp/wikimedia/.

@Chicocvenancio In addition to what @Pavithraes said, signing up as a co-mentor on the site will help you review applicants proposals later in the program. So, it is recommended that you sign-up there. It will be a two step process and shouldn't take you more than 2 minutes:

@Pavithraes a topic in that stream would probably be great. I'm still figuring out Zulip.

Thanks for the reminders, I have signed up as a co-mentor.

srishakatux changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Mar 5 2020, 6:16 PM

Hello everyone! I'm Karma Dolkar, an Electronics and Communication engineering sophomore at the Indian Institute of Technology, Roorkee (India). I have been contributing to Wikimedia since December 2019. I am an Outreachy 2020 applicant. I looked up the project description and found it interesting! Could you point me to further resources which I can go through?

(this project didn't happen)