Page MenuHomePhabricator
Feed Advanced Search

Oct 30 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Ok. So I should record a contribution on this page.

image.png (768×1 px, 124 KB)

Oct 30 2020, 5:46 AM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Hey everyone! A few days left to get in those final contributions on the Outreachy site. Make sure you complete your final application there (you can do this today and still edit it up until the deadline). Diego also posted some good general feedback about notebooks at T263860#6589759 that I wanted everyone to see:

I have a general recommendation to all of you: Keep the notebook easy to read. That means:

    Explain each piece of code that you are running. The idea is to make the notebook easy to understand. Don't make the reader have to guess what you were trying to do.
    Describe your motivation and conclusions for every statistics you show. For example, why are you plotting variable X, or Y? and what is your takeaway/conclusions?
    Avoid long/repetitive code outputs that doesn't provide relevant information. For example, if you are applying a model that runs 1000 epochs, avoid to print 1000 lines which each epoch, because makes the notebook difficult to read. If you think that there is relevant information on those outputs, think how to show that information in a way that is compact and easy to understand (for example a plot).

Also, I know the timeline part of the application can be confusing. Some general points about it:

  • This is an opportunity for you to indicate whether there are any components of the project that are more interesting to you (spend more time on them) or where you feel you would need to learn some skills in advance. We don't expect anyone to know everything they need to do these projects, so don't hesitate to explain where you'd want to do some additional learning etc.
  • Note if you have any previous commitments that would prevent you from working a given week.
  • We know you won't have a perfect plan for the project as you only know as much as we've said on the tasks about them. Do your best but we'll be more interested in the other questions in the application and Jupyter notebook submission.
Oct 30 2020, 3:43 AM · Outreachy (Round 21)

Oct 27 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Yes Now it seems to be okay. Thank You so much @Isaac I'm really relieved now.

Oct 27 2020, 3:14 PM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Hello @Isaac and everyone.

Oct 27 2020, 5:38 AM · Outreachy (Round 21)

Oct 26 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Thank You so much @Isaac , @Amamgbu and @Tambe You guys have been really helpful.
However, when I use the value wbgetentities for the action parameter, I'm getting the following error:

Oct 26 2020, 7:14 AM · Outreachy (Round 21)

Oct 25 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

You could reference the tutorial given to us by @Isaac. There’s a segment on that in it.

Oct 25 2020, 5:11 AM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Hi @Isaac and everyone,
Can anyone give me an idea to find if a page is about a human or not?

Oct 25 2020, 5:02 AM · Outreachy (Round 21)

Oct 19 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac and everyone ,
Can we access the variable 'page_counter' from the page table as it had been removed completely in MediaWiki 1.25. Is there any other method to get views of each page?

You can query for page view. You can reference the MediaWiki query API documentation to get this info. Though i think it brings up a max of 60 days.

Oct 19 2020, 2:45 PM · Outreachy (Round 21)

Oct 18 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac and everyone ,
Can we access the variable 'page_counter' from the page table as it had been removed completely in MediaWiki 1.25. Is there any other method to get views of each page?

Oct 18 2020, 3:26 PM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac and everyone,
Is there any way to get the reason of protection for a given page? If yes from where do we get that data

Oct 18 2020, 3:49 AM · Outreachy (Round 21)

Oct 11 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac
The protection data that I got from MediaWiki dump and API results seems to be in different format.
For a particular page, we get data in the following formats..
In MediaWiki dump : (3664672,'edit','autoconfirmed',0,NULL,'infinity',717409)

In API result : {'pageid': 3664672, 'ns': 10, 'title': 'Template:Cyclopaedia 1728', 'contentmodel': 'wikitext', 'pagelanguage': 'en', 'pagelanguagehtmlcode': 'en', 'pagelanguagedir': 'ltr', 'touched': '2020-10-10T18:45:26Z', 'lastrevid': 952065611, 'length': 3572, 'protection': [{'type': 'edit', 'level': 'autoconfirmed', 'expiry': 'infinity'}, {'type': 'move', 'level': 'autoconfirmed', 'expiry': 'infinity'}], 'restrictiontypes': ['edit', 'move']}

Isn't that a problem? like how are we supposed to check the discrepancies between the two if they are in different formats? Moreover the protection data in API results doesnt show user specific restrictions or sysop permissions. Is that ok?

Hi @SafiaKhaleel, it is possible to compare those tuples with the JSON objects through indexing. You can check the tutorial notebook @Isaac shared to us for more details.

@Isaac mentioned that the user specific restrictions was an obsolete field and we should disregard it. For the sysop permissions, it is stored in the JSON data as “level”

Thanks @Amamgbu . I understood that. But the protection type in both the cases also seems to be different. In the API result, all the pages have both edit and move protection as you can see here:
'protection': [{'type': 'edit', 'level': 'autoconfirmed', 'expiry': 'infinity'}, {'type': 'move', 'level': 'autoconfirmed', 'expiry': 'infinity'}]
whereas in MediaWiki dump, there is only one type out of the two: (39620487,'edit','autoconfirmed',0,NULL,'infinity',692808)

There are actually two if you inspect the data well. I had that same issue till i inspected the ids

Oct 11 2020, 3:58 PM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac
The protection data that I got from MediaWiki dump and API results seems to be in different format.
For a particular page, we get data in the following formats..
In MediaWiki dump : (3664672,'edit','autoconfirmed',0,NULL,'infinity',717409)

In API result : {'pageid': 3664672, 'ns': 10, 'title': 'Template:Cyclopaedia 1728', 'contentmodel': 'wikitext', 'pagelanguage': 'en', 'pagelanguagehtmlcode': 'en', 'pagelanguagedir': 'ltr', 'touched': '2020-10-10T18:45:26Z', 'lastrevid': 952065611, 'length': 3572, 'protection': [{'type': 'edit', 'level': 'autoconfirmed', 'expiry': 'infinity'}, {'type': 'move', 'level': 'autoconfirmed', 'expiry': 'infinity'}], 'restrictiontypes': ['edit', 'move']}

Isn't that a problem? like how are we supposed to check the discrepancies between the two if they are in different formats? Moreover the protection data in API results doesnt show user specific restrictions or sysop permissions. Is that ok?

Hi @SafiaKhaleel, it is possible to compare those tuples with the JSON objects through indexing. You can check the tutorial notebook @Isaac shared to us for more details.

@Isaac mentioned that the user specific restrictions was an obsolete field and we should disregard it. For the sysop permissions, it is stored in the JSON data as “level”

Oct 11 2020, 3:15 PM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Isaac
The protection data that I got from MediaWiki dump and API results seems to be in different format.
For a particular page, we get data in the following formats..
In MediaWiki dump : (3664672,'edit','autoconfirmed',0,NULL,'infinity',717409)

Oct 11 2020, 7:29 AM · Outreachy (Round 21)

Oct 10 2020

SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

I'm trying to read the uncompressed data using the gzip library but everytime i run the cell, i get this error, does anyone know what's causing this?
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Update: I found a workaround by iterating and displaying selective parts of data.

Oct 10 2020, 11:41 AM · Outreachy (Round 21)
SafiaKhaleel added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Perfect evening
I'm really confused and need someone to clarify for me in cell 5
To do an example that loops through all pages and extract data how do I do that as the python docs don't really give example a or docs

Oct 10 2020, 11:38 AM · Outreachy (Round 21)

Oct 8 2020

SafiaKhaleel added a comment to T263646: Develop an approach to infer which countries are associated with a given Wikipedia article.

Hi everyone.! I'm Safia another Outreachy applicant. I'm kinda new to open source but really interested about this project. Lets all do our best.!!

Oct 8 2020, 3:11 PM · Outreachy (Round 21), Outreach-Programs-Projects