Page MenuHomePhabricator

Outreachy Application Task: Tutorial for Wikipedia Page Protection Data
Closed, ResolvedPublic

Assigned To
Authored By
Isaac
Sep 25 2020, 6:25 PM
Referenced Files
F32419233: image.png
Oct 30 2020, 5:45 AM
F32419230: image.png
Oct 30 2020, 5:45 AM
F32415957: image.png
Oct 27 2020, 5:38 AM
F32397219: Screenshot 2020-10-17 212552.jpg
Oct 18 2020, 10:50 AM
F32397220: Screenshot 2020-10-17 211850.jpg
Oct 18 2020, 10:50 AM
F32397114: IMG_20201018_101904.jpg
Oct 18 2020, 8:20 AM
F32391432: Screenshot 2020-10-17 183048.jpg
Oct 17 2020, 8:31 PM
F32391433: Screenshot 2020-10-17 211850.jpg
Oct 17 2020, 8:31 PM

Description

Overview

Create your own PAWS notebook tutorial (see set-up below) that completes the TODOs provided in this notebook. The full Outreachy project will involve more comprehensive coding than what is being asked for here (and some opportunities for additional explorations as desired), but this task will introduce some of the APIs and concepts that will be used for that full task and give us a sense of your Python skills, how well you work with new data, and documentation of your code. We are not expecting perfection -- give it your best shot! See this example of working with Wikidata data as an example of what a completed notebook tutorial might look like.

Set-up

  • Make sure that you can login to the PAWS service with your wiki account: https://paws.wmflabs.org/paws/hub
  • Using this notebook as a starting point, create your own notebook (see these instructions for forking the notebook to start with) and complete the functions / analyses. All PAWS notebooks have the option of generating a public link, which can be shared back so that we can evaluate what you did. Use a mixture of code cells and markdown to document what you find and your thoughts.
  • As you have questions, feel free to add comments to this task (and please don't hesitate to answer other applicant's questions if you can help)
  • If you feel you have completed your notebook, you may request feedback and we will provide high-level feedback on what is good and what is missing. To do so, send an email to your mentor with the link to your public PAWS notebook. We will try to make time to give this feedback at least once to anyone who would like it.
  • When you feel you are happy with your notebook, you should include the public link in your final Outreachy project application as a recorded contribution. You may record contributions as you go as well to track progress.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In general, please see https://man7.org/linux/man-pages/man1/head.1.html for explanation of Linux commands and parameters (like head) - thanks!

Hi,
I am unable to understand "head -46 " in below line code.

!zcat "{DUMP_DIR}{DUMP_FN}" | head -46 | cut -c1-1000

Secondly, I am having error

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

How to solve this error?

The error is coming from your code. You are probably iterating endlessly and it exhausted the notebook’s memory

head displays the beginning of a file. -46 would show the first 46 lines

Answering the question from T263646#6545673

Has anyone here been able to work with the page table dump without running out to memory.?Would appreciate some tips:)

@Liz_Kariuki thanks for the question. In general you want to take two strategies when dealing with memory challenges:

  • Process the file incrementally so you don't have to store it in memory all at once
  • Only retain the data you need and store it efficiently

Hi,
I am extremely sorry for this stupid question, but can anyone please guide me a bit about what is happening in "example of working with Wikidata data" in cell 5. Basically, I am having difficulty in understanding the format of data extraction from json dump. What checks are being confirmed by if and for statements?

Thanks

Hi,
I am extremely sorry for this stupid question, but can anyone please guide me a bit about what is happening in "example of working with Wikidata data" in cell 5. Basically, I am having difficulty in understanding the format of data extraction from json dump. What checks are being confirmed by if and for statements?

Thanks

Hi @SIBGHAsheikh,
Could you please try to narrow down what you do not understand in that cell?

This would help us to explain things better to you.

From the code in the example based on my understanding, the following checks were made:

  • Ensured that a limit of 12,000 lines were processed probably to conserve memory
  • The number of sitelinks were calculated by accessing the json objects sitelink and made a check to see if the sitelink ended with “wiki” but was not commonswiki and specieswiki
  • number of statements was calculated by getting the number of claims.
  • next was getting if a content was about a human or not. This was gotten by trying to extract certain ids and values from the json data. If found, content is about a human

Then finally data needed is appended to a list to form a list of tuples

Hope this helps!

@Isaac and everyone.

After inspecting the data in the API response, I can see two types of protection in the "restrictiontypes" key in the image below. However, there is just a single protection type present in the "protection" key.

Is it safe to assume that no "edit" protection exists in this case or does it inherit same protection as that of "move"?

protection.JPG (229×597 px, 35 KB)

Hi @Amamgbu
I think that this issue was addressed here, I hope that this helps you.

I want to confirm if the restrictiontype shown in API response (please see attached screenshot) same as user-specific restrictions field in Mediawiki dumps (please see attached screenshot) ?

Restrictiontypes just lists the potential protections a page could have. Generally, it's going to be edit and move but, for example, there are articles that haven't been created yet but have a protected title (i.e. you can't create a page with that title) and they would have a create edit restriction.

user-specific restrictions

You can ignore this field -- it is not in use.

Thanks @0xkaywong for helping to answer!

Also I want to know what is this "source" represent in API response (please see attached screenshot)

@Sidrah_M_Siddiqui good question: I'm not sure which page that is, but I assume it's a cascading protection where the article you queried was included on the Main Page and therefore received the same protections as the Main Page.

Hi @Amamgbu
I think that this issue was addressed here, I hope that this helps you.

Not sure if this is same in this case. In this API response, the article exists. Here, it has a move protection but no edit protection yet has a move and edit restriction type in place.

Does that mean the title of the Article cannot be edited except by the admin user(sysop) yet anyone can edit the body of the article since no edit protection exists in the “protection” key?

Does that mean the title of the Article cannot be edited except by the admin user(sysop) yet anyone can edit the body of the article since no edit protection exists in the “protection” key?

@Amamgbu that is correct and @Vanevela pointed to the appropriate prior discussion about this. More details: the restrictiontypes field is just what restrictions could be applied to the page, not which ones are applied -- a fuller description of what you could find in that field can be found here. For most pages, you'll see edit and move and can verify this by choosing a random page without restrictions and querying the API. I'd suggest ignoring the field as it won't tell you much.

I am extremely sorry for this stupid question

@SIBGHAsheikh nothing to apologize for. @Amamgbu gave a good overview in T263874#6556803 but if you have further questions, just try to be as specific as you can about what you don't understand. If you're having trouble understanding specific aspects of how I processed the Wikidata dumps in that notebook though, know that you don't have to fully understand that to work on the page protections notebook. They are two very different datasets -- the Wikidata notebook was just provided to give an example of the types of things you might want to do.

This comment was removed by Vanevela.

My notebook is frozen, did anyone face the same thing? How did you solve it?

Screenshot 2020-10-17 183048.jpg (768×1 px, 89 KB)

@Amamgbu that is correct and @Vanevela pointed to the appropriate prior discussion about this. More details: the restrictiontypes field is just what restrictions could be applied to the page, not which ones are applied -- a fuller description of what you could find in that field can be found here. For most pages, you'll see edit and move and can verify this by choosing a random page without restrictions and querying the API. I'd suggest ignoring the field as it won't tell you much.

Thanks @Isaac and @Vanevela for the help. I understand better now

Screenshot 2020-10-17 211850.jpg (733×1 px, 147 KB)

Screenshot 2020-10-17 183048.jpg (768×1 px, 89 KB)

Please I don't know what I did wrong.

Is anyone having difficulty running the notebook?

@Isaac and everyone,
Is there any way to get the reason of protection for a given page? If yes from where do we get that data

Perfect day I need to ask,my API response doesn't show protection type unlike the Dump data,is it something I need to worry about?

@Toluwani7 reload the page and run the server afresh it worked for me

Perfect day I need to ask,my API response doesn't show protection type unlike the Dump data,is it something I need to worry about?

Hi @Thulieblack
Could you send a screenshot of the response? If you queried properly, you should be able to see protection type. It is located in the “protection” key of the API response for each page ID

@Amamgbu take a look{F32397114}

Not sure you passed in any parameter for inprop when making your query. That is probably why you get only page info and no protection.

@Amamgbu take a look{F32397114}

Not sure you passed in any parameter for inprop when making your query. That is probably why you get only page info and no protection.

@Amamgbu inprop gave me an error I passed the prop*inf that's how I got all the queries

@Amamgbu inprop gave me an error I passed the prop*inf that's how I got all the queries

Please go through the API documentation for the query you want to make. Since you want to get the protection, you should be able to pass in protection as an argument for inprop

Okay let me investigate and see what I can come up with

Screenshot 2020-10-17 211850.jpg (733×1 px, 147 KB)

Screenshot 2020-10-17 212552.jpg (683×1 px, 165 KB)

I just realised that I uploaded the wrong pictures. I don't get what I did wrong here

@Toluwani7 reload the page and run the server afresh it worked for me

It didn't work. My laptop froze and I had to hard press its power button. I opened a new notebook, thanks

@Isaac and everyone ,
Can we access the variable 'page_counter' from the page table as it had been removed completely in MediaWiki 1.25. Is there any other method to get views of each page?

@Isaac and everyone ,
Can we access the variable 'page_counter' from the page table as it had been removed completely in MediaWiki 1.25. Is there any other method to get views of each page?

You can query for page view. You can reference the MediaWiki query API documentation to get this info. Though i think it brings up a max of 60 days.

@Amamgbu thanks I have sorted it out 🤗🤗

Awesome! You’re welcome

@Isaac

Hello! I am a bit confused. Like you mentionned earlier, it appears that most pages have edit and move permissions and most of those permissions have autoconfirmed or sysop as the level. So I am a bit confused as to what we should be trying to predict. The whole idea is to study page protections but I have a sample of about more than 80 000 pages (from the latest page protections data dump) and the protections applied to them seem very similar. Am I missing something? Thanks in advance!

@Isaac and everyone ,
Can we access the variable 'page_counter' from the page table as it had been removed completely in MediaWiki 1.25. Is there any other method to get views of each page?

You can query for page view. You can reference the MediaWiki query API documentation to get this info. Though i think it brings up a max of 60 days.

Thanks @Amamgbu But a lot of pages seems to have 'null' value for the pageviews variable.
And another problem I encountered is with the limit upto which we can query page information using title/pageid. Getting only 50 instances at a time is not enough to work on right? Does anyone know how to increase the limit upto 500? (Its said "500 for clients allowed higher limits")
@Isaac

Thanks @Amamgbu But a lot of pages seems to have 'null' value for the pageviews variable.
And another problem I encountered is with the limit upto which we can query page information using title/pageid. Getting only 50 instances at a time is not enough to work on right? Does anyone know how to increase the limit upto 500? (Its said "500 for clients allowed higher limits")
@Isaac

I think you can use pvipcontinue to extract more data but I’m not sure if it gives a definite value of 500 which you’re searching for.

@Isaac would be in the best position to help you out.

Perfect evening sorry for asking alot,I just need some clarity.
For the exploratory which data are we using dump or api??

I think you can use pvipcontinue to extract more data

@Amamgbu @SafiaKhaleel indeed -- depending on your exact query, you can use a continue parameter to get more results or just pass a new set of pageIDs to the API to get more data.

Like you mentionned earlier, it appears that most pages have edit and move permissions and most of those permissions have autoconfirmed or sysop as the level. So I am a bit confused as to what we should be trying to predict. The whole idea is to study page protections but I have a sample of about more than 80 000 pages (from the latest page protections data dump) and the protections applied to them seem very similar. Am I missing something? Thanks in advance!

@YemiKifouly good question. If you don't see a good predictive problem with just the page protection data, you can also pull in data on pages without protections via the Random API (or any other page generator API).

For the exploratory which data are we using dump or api??

@Thulieblack You'll likely have more data form the dump so that would be my suggestion but in theory they should have the same data.

@Isaac thank you so much for the clarity much appreciated 👏🏿👏🏿

Greetings everyone. My name is Tambe Tabitha Achere and I am a Data scientist. I am Cameroonian and I code in Python. I look forward to working with you.

Hi @Tambe great to see you here, I hope you remember me :)

Welcome to all the new applicants since I last posted a welcome! One request for everyone working on this task:

  • To get a good sense of how many people are intending to apply to each project (T263646 and/or T263860), I'd ask that you make an initial contribution on the Outreachy site with a link to your current progress in the next two days (so by end-of-day October 23rd).

Thanks! Keep the questions / collaboration coming!

Welcome to all the new applicants since I last posted a welcome! One request for everyone working on this task:

  • To get a good sense of how many people are intending to apply to each project (T263646 and/or T263860), I'd ask that you make an initial contribution on the Outreachy site with a link to your current progress in the next two days (so by end-of-day October 23rd).

Thanks! Keep the questions / collaboration coming!

Noted🤗

Welcome to all the new applicants since I last posted a welcome! One request for everyone working on this task:

  • To get a good sense of how many people are intending to apply to each project (T263646 and/or T263860), I'd ask that you make an initial contribution on the Outreachy site with a link to your current progress in the next two days (so by end-of-day October 23rd).

Thanks! Keep the questions / collaboration coming!

Noted

Perfect day
@Isaac which email address can I send my notebook for feedback
Can I use the one here I see on the notebook by cell 13

Hi everyone,
In the "ns" column of the data. I've not been able to see the "Name" for ID 118 in the "Extension Default Namespaces page".
Have I not checked it well or the name for the ID is not on the page??

This comment was removed by Thulieblack.

@Thulieblack, I think @Precillieo is referring to this page, which matches ns numerical IDs to the name of the namespace.

@Precillieo you're right that ID 118 is not listed under the page. I found this other page which matches ns ID 118 to the namespace "Draft".

Hi @Tambe great to see you here, I hope you remember me :)

Yes I do !

Welcome to all the new applicants since I last posted a welcome! One request for everyone working on this task:

  • To get a good sense of how many people are intending to apply to each project (T263646 and/or T263860), I'd ask that you make an initial contribution on the Outreachy site with a link to your current progress in the next two days (so by end-of-day October 23rd).

Thanks! Keep the questions / collaboration coming!

Noted.

@Thulieblack, I think @Precillieo is referring to this page, which matches ns numerical IDs to the name of the namespace.

@Precillieo you're right that ID 118 is not listed under the page. I found this other page which matches ns ID 118 to the namespace "Draft".

I see what you mean thanks

which email address can I send my notebook for feedback. Can I use the one here I see on the notebook by cell 13

@Thulieblack yes: isaac@wikimedia.org. Make sure to also record an initial contribution on Outreachy.

Hello @Isaac , I have added a contribution to the Outreachy website to be considered for this project. I apologise for not having made much progress beyond the first few To-do tasks in it! I had some trouble with my laptop keyboard and had to get it fixed. Same reason why I was not very active in this discussion thread. I hope it's alright, I'll try to finish it soon over the weekend so that I don't lag behind. Everyone's discussion comments too here have been very helpful, thank you all :))

Hello,

I felt it would be useful to say that this contains the instructions to learn to fork and that instructions leads to the PAWS's general page instead.

The instructions here state that ?format=raw should be added to the end of the url in order to download a raw .ipynb file which will be later uploaded to our respective directories.
A few more steps that will make these instructions more explicit are

  • add ?format=raw to the end of the url and reload it.
  • Do Ctrl+S and choose a destination in your computer where the .txt file will be saved.
  • login to your PAWS account and upload the txt file that you downloaded earlier.
  • Change the uploaded file into an ipython notebook by renaming its extension from .txt to .ipynb

These steps were not obvious to me at first so I'm posting it here for anyone who might not have figured it out yet.

@Tambe Alternately you could:

  • Right click -> Save as
  • file_name.ipynb (here, change the file extension to .ipynb)
  • login to your PAWS account and upload your saved notebook

@Tambe Alternately you could:

  • Right click -> Save as
  • file_name.ipynb (here, change the file extension to .ipynb)
  • login to your PAWS account and upload your saved notebook

Ohh, this makes sense. Thank you @tanny411 !

Hi @Isaac and everyone

When I try to access deletedrevisions using the Mediawiki API, I keep getting a permission denied error. Is there a fix to this as I couldn't find any in the documentation?

Thanks

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

You could reference the tutorial given to us by @Isaac. There’s a segment on that in it.

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

You could reference the tutorial given to us by @Isaac. There’s a segment on that in it.

Yes but its for checking if an individual "item" is human or not. I was thinking if we could implement that in pages too

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

It's the last code paragraph in cell 5 of this notebook https://public.paws.wmcloud.org/User:Isaac_(WMF)/Outreachy%20Dec%202020/Wikidata_Data_Example.ipynb

Hi @Isaac

I noticed that some of the protection in dump looks like this:

{'type': 'edit', 'level': 'autoconfirmed', 'expiry': '20210120093624'}

Which datetime format is this (20210120093624) ?

Regards.

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

You could reference the tutorial given to us by @Isaac. There’s a segment on that in it.

Yes but its for checking if an individual "item" is human or not. I was thinking if we could implement that in pages too

You should be able to extract the item ids from the page. You can check the mediawiki documentation.

I believe the ids used were entity ids

Hi @Isaac

I noticed that some of the protection in dump looks like this:

{'type': 'edit', 'level': 'autoconfirmed', 'expiry': '20210120093624'}

Which datetime format is this (20210120093624) ?

Regards.

Also, some page protection via API has duplicate entries

Eg for pageid 3017803

'protection': [{'expiry': 'infinity',
                'level': 'sysop',
                'type': 'edit'},
               {'expiry': 'infinity',
                'level': 'sysop',
                'type': 'move'},
               {'expiry': 'infinity',
                'level': 'sysop',
                'type': 'move'},
               {'expiry': 'infinity',
                'level': 'sysop',
                'type': 'edit'}],

@Ashmita1 some pages do have two entries inspect your data you'll see

Hi, @Ashmita1 look at the prior discussion about duplicate entries. Hope this helps you.

Has anybody else noticed the duplicate entries in the API response? They look like duplicates to me or is there a pattern that I am overseeing?

Hey @MelanieImf, interesting find -- I don't actually know what is causing that but I suspect one of two things:

  • it's a very old page and at some point, the way that protections were tracked by the Mediawiki software that runs Wikipedia was changed and caused this duplicate
  • one set of protections was applied to the page directly and another was cascaded down and the API treats them as two separate things even though they have the same result.

Hopefully it shouldn't cause any issues (in this case, the protections seemed to agree) but if you encounter instances where e.g., edit protection expires both in a month and never, you can probably ignore the one-month expiration.

Additionally, a question was raised at one point about memory usage by the Notebooks. They are capped at I believe 3GB -- depending on how you process the data, you might reach this limit. If you do, you should either: a) see if you can change how you store the data as there may be a more efficient way, or, b) just take a sample of the data and state explicitly why you did that and how you selected a representative sample. While it can be a pain, memory constraints are useful for making sure you have code that can work in a variety of environments.

Everyone's discussion comments too here have been very helpful, thank you all :))

@Chiral-carbon thanks and glad to hear!

When I try to access deletedrevisions using the Mediawiki API, I keep getting a permission denied error. Is there a fix to this as I couldn't find any in the documentation?

@Amamgbu yeah, the documentation isn't great but you need special user privileges on Wikipedia to see this data as it's sensitive.

Can anyone give me an idea to find if a page is about a human or not?

@SafiaKhaleel adding to what @Amamgbu and @Tambe pointed out, the wbgetentities API can take as input a language + title or you can get a page's QID from the pageprops API.

Which datetime format is this (20210120093624) ?

@Ashmita1 that date in more readable format is 2021-01-20 09:36:24 aka 20 January 2021 at 9h36m24s.

Thank You so much @Isaac , @Amamgbu and @Tambe You guys have been really helpful.
However, when I use the value wbgetentities for the action parameter, I'm getting the following error:

APIError: badvalue: Unrecognized value for parameter "action": wbgetentities. -- See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.

Thank You so much @Isaac , @Amamgbu and @Tambe You guys have been really helpful.
However, when I use the value wbgetentities for the action parameter, I'm getting the following error:

APIError: badvalue: Unrecognized value for parameter "action": wbgetentities. -- See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.

You’re using the wrong site name. It should be wikidata

Hello Everyone,
I'd like to raise a question concerning the predictive model we are to build in the later part of the notebook.
Are we predicting the protection types or we are to predict if a page is protected or not??

Hello Everyone,
I'd like to raise a question concerning the predictive model we are to build in the later part of the notebook.
Are we predicting the protection types or we are to predict if a page is protected or not??

Either one you are comfortable with I believe

Perfect day everyone @Isaac I'm have been trying to access my paws notebook server since morning but it keeps reloading is there anything I can do as I have tried relaunching but to no avail

Perfect day everyone @Isaac I'm have been trying to access my paws notebook server since morning but it keeps reloading is there anything I can do as I have tried relaunching but to no avail

This has happened to me a few times before. Reload the chrome tab, reload your browser, or restart your computer, in that order.

@Tambe I did all those I'm still waiting still

Hello Everyone,
I'd like to raise a question concerning the predictive model we are to build in the later part of the notebook.
Are we predicting the protection types or we are to predict if a page is protected or not??

I believe predicting protection types might require more analysis and data collection so the second option should be a more viable approach but it's up to you to decide

@Tambe I did all those I'm still waiting still

What browser are you using? (it'll help me filter my search results)

Also, did you add any new extensions recently? what were they?

@Tambe I did all those I'm still waiting still

What browser are you using? (it'll help me filter my search results)

Microsoft Edgr

Also, did you add any new extensions recently? what were they?

Thanks I managed to rest my browser and it worked

Please guys,
I need the link to the Revisions API documentation?

Please guys,
I need the link to the Revisions API documentation?

@Precillieo https://m.mediawiki.org/wiki/API:Revisions
Is this what you're looking for?

Hello @Isaac and everyone.

The API endpoint that gives the list of most viewed pages doesn't seem to be giving results anymore. I had used it to analyze data before but now its returning an empty list. My entire notebook was based on this. Please help me.

image.png (768×1 px, 150 KB)

The API endpoint that gives the list of most viewed pages doesn't seem to be giving results anymore.

@SafiaKhaleel perhaps a temporary issue. This is working for me though: https://en.wikipedia.org/w/api.php?action=query&list=mostviewed

Everyone: I wanted to thank you for making the initial contributions. It gave us a sense of how many applicants we had. We've decided to leave both projects (T263646 and T263860) open until the normal Outreachy deadline as I know a number of you are trying to balance a lot right now.

Make sure to submit your final contribution ahead of the Outreachy deadline (my understanding is that it is currently Oct. 31, 2020 4pm UTC) and include the public PAWS notebook link that you would like us to evaluate in that contribution. We will likely not be able to give feedback after Wednesday 28 October, so if you haven't requested feedback yet and would like some on your current notebook, email your mentors with that request and notebook link.

If you have any questions about the final application, don't hesitate to ask. The perennial question is what to do for the timeline part of the application -- just do your best and highlight what interests you most about the project and time where you think you may need to learn any necessary skills. And make sure to indicate what about the project (and/or Wikimedia) is most interesting to you. Thanks!

Yes Now it seems to be okay. Thank You so much @Isaac I'm really relieved now.

Hey everyone! A few days left to get in those final contributions on the Outreachy site. Make sure you complete your final application there (you can do this today and still edit it up until the deadline). Diego also posted some good general feedback about notebooks at T263860#6589759 that I wanted everyone to see:

I have a general recommendation to all of you: Keep the notebook easy to read. That means:

    Explain each piece of code that you are running. The idea is to make the notebook easy to understand. Don't make the reader have to guess what you were trying to do.
    Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
    Avoid long/repetitive code outputs that doesn't provide relevant information. For example, if you are applying a model that runs 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show that information in a way that is compact and easy to understand (for example a plot).

Also, I know the timeline part of the application can be confusing. Some general points about it:

  • This is an opportunity for you to indicate whether there are any components of the project that are more interesting to you (spend more time on them) or where you feel you would need to learn some skills in advance. We don't expect anyone to know everything they need to do these projects, so don't hesitate to explain where you'd want to do some additional learning etc.
  • Note if you have any previous commitments that would prevent you from working a given week.
  • We know you won't have a perfect plan for the project as you only know as much as we've said on the tasks about them. Do your best but we'll be more interested in the other questions in the application and Jupyter notebook submission.

Hey everyone! A few days left to get in those final contributions on the Outreachy site. Make sure you complete your final application there (you can do this today and still edit it up until the deadline). Diego also posted some good general feedback about notebooks at T263860#6589759 that I wanted everyone to see:

I have a general recommendation to all of you: Keep the notebook easy to read. That means:

    Explain each piece of code that you are running. The idea is to make the notebook easy to understand. Don't make the reader have to guess what you were trying to do.
    Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
    Avoid long/repetitive code outputs that doesn't provide relevant information. For example, if you are applying a model that runs 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show that information in a way that is compact and easy to understand (for example a plot).

Also, I know the timeline part of the application can be confusing. Some general points about it:

  • This is an opportunity for you to indicate whether there are any components of the project that are more interesting to you (spend more time on them) or where you feel you would need to learn some skills in advance. We don't expect anyone to know everything they need to do these projects, so don't hesitate to explain where you'd want to do some additional learning etc.
  • Note if you have any previous commitments that would prevent you from working a given week.
  • We know you won't have a perfect plan for the project as you only know as much as we've said on the tasks about them. Do your best but we'll be more interested in the other questions in the application and Jupyter notebook submission.

Hi @Isaac and everyone,
I didn't exactly understand where we should fill the timeline part of the application. And where do we submit the final application. Isn't it on record contributions page of the outreachy site itself?

@SafiaKhaleel Yes. after recording a contribution you should submit a final application. Thats where you will be asked to write your prospective timeline of the project.

Ok. So I should record a contribution on this page.

image.png (768×1 px, 124 KB)

And then submit a final application on this page.

image.png (768×1 px, 140 KB)

Am I right?