Page MenuHomePhabricator

rachita_saha (Rachita)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Mar 29 2021, 8:46 PM (168 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Rachita Saha [ Global Accounts ]

Recent Activity

May 3 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

I have had an amazing experience working on this task and I truly appreciate how the community came forward to help out in any way possible everytime I or anyone else faced an issue. I have learnt a lot under the guidance of the mentors. I hope to apply all the knowledge I have gained through this experience in all future projects that I take up. I am really looking forward to continue contributing to open source.

May 3 2021, 9:30 AM · Outreachy (Round 22)

Apr 29 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

What if it doesn't

Apr 29 2021, 3:01 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello everyone I have an article say with the name Chris Ferguson and I got this article from an English clickstream. When do a query using the mwapi libray I can see that this article also appeared in Spanish. But when I try to find this article from the Spanish clickstream("eswiki") I do not find this article even if I search It's name in Spanish. But then when I look at this article using the languageviews (https://pageviews.toolforge.org/langviews/) I can see that this article does exist. What might be the problem I'm facing

Apr 29 2021, 2:50 PM · Outreachy (Round 22)

Apr 26 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Visualize the data to show what the common pathways to and from the article are. This is Todo isn't clear to understand can someone please explain

Apr 26 2021, 2:34 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Any advice on how I can visualize the data from a Todo that I do not quiet understand

Apr 26 2021, 2:04 PM · Outreachy (Round 22)

Apr 24 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hi everyone, i wasn't quite sure whether we had to remove the initial comments (and Markdown texts) completely/partially before adding our own justifications and conclusions to the given tasks in the notebook. Could you all please tell me your thoughts on that.

Apr 24 2021, 12:11 AM · Outreachy (Round 22)

Apr 17 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello @rachita_saha it's true that we would face a ValueError in just one of the lines but is it actually fine to ignore it? On examining that particular row we would see that it has the data of a lot(really a lot) of different rows all combined to one probably due to some error by us in parsing (on the occurance of some special character sequence)

Apr 17 2021, 4:30 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@rachita_saha Okay, thank you. I'm using csv.reader to parse the data.

Apr 17 2021, 12:29 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello, I'd like to ask if there are rows in the dataset that have no pageview, or the pageview is a different index?
When I run the code, after some time of running it produces an error saying cannot convert str to int

Apr 17 2021, 10:31 AM · Outreachy (Round 22)

Apr 3 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@rachita_saha okay ...Thank you

Apr 3 2021, 5:34 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

hello everyone,when I am trying using pandas to create a dataframe ,but it is giving kernel appears to dead everytime.

Apr 3 2021, 2:57 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hi, Everyone. I have been trying to start with the code, but not able to break it. I am have decent experience with Python through courses. Help needed to get started. Thanks.

Apr 3 2021, 9:26 AM · Outreachy (Round 22)

Apr 2 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@Isaac Okay, understood. Thank you.

Apr 2 2021, 3:56 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@Isaac @MGerlach I had a doubt regarding recording contributions. Since we are not making formal pull requests in this project, at what points and in which form are we required to record our contributions? Do we need to submit the public link of our notebook after completing a few to-dos and that will count as a contribution?

Apr 2 2021, 2:02 PM · Outreachy (Round 22)

Apr 1 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Thank you @Isaac, I'll take these points into consideration.

Apr 1 2021, 7:52 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach @Isaac Is it necessary to use the mwapi library for accessing the Wikimedia api? I am getting some connection failure errors with mwapi but using request library works.

The mwapi library uses the requests-library too (see here). If you can work with the requests-library directly, that works perfectly fine.

Apr 1 2021, 12:59 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach @Isaac Is it necessary to use the mwapi library for accessing the Wikimedia api? I am getting some connection failure errors with mwapi but using request library works.

Apr 1 2021, 11:45 AM · Outreachy (Round 22)

Mar 31 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Is anybody working taking chunks of the whole dataset? if so, how are you merging them to apply aggregated functions on them? And also within the memory limit @MGerlach @Tru2198

Mar 31 2021, 5:04 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach @Isaac Several of the popular destination articles do not link to any further articles. So there is no data on common pathways from this article. Should we find an article that links to more articles or should we only provide the visualizations for the common pathways to the chosen article?

In this case, what do common pathways refers to?

Mar 31 2021, 4:59 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach @Isaac Several of the popular destination articles do not link to any further articles. So there is no data on common pathways from this article. Should we find an article which links to more articles or should we only provide the visualizations for the common pathways to the chosen article?

Mar 31 2021, 2:39 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello, for the microtask, I am trying to convert the file to CSV with pandas and subsequently data frames, but I am getting the error where several rows have conflicting columns. So the parameter, error_bad_lines=False to ignore the troubling lines can be used. Is that option viable?

@Tru2198 @rachita_saha. good catch regarding these errors when importing with pandas. in this case, if the number of lines that are affected is small, it is ok to remove them using "error_bad_lines=False".
However, more generally, it is worth to check what is happening with these lines. When pandas is importing the rows line by line, it is splitting each line at the "\t" to get the entries for the different columns. Now, the error suggests that sometimes, the number of entries from splitting is not 4 (like most of the rows) but 5 (or more). This happens because of some characters contained in the page-titles of the source and/or the target-page (such as quotes). Pandas provides different options to deal with this. I have found that using the option "quoting=3" when doing read_csv() can solve the problem (the default is quoting=0). see here for more documentation, if you are interested to dig deeper.

Mar 31 2021, 8:53 AM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@Isaac @MGerlach In the to-do where we are required to visualize the data for a chosen destination article, it says "Pull all the data in the clickstream dataset for that article (both as a source and destination)". Does this mean we need to pull the data for that article in all available languages for the month of January or only the English language?

@rachita_saha Thanks for the question, this is indeed not clear in the description. To clarify: It is totally fine for this todo to work with one language (e.g. English) for a single month (e.g. January 2021). Matching articles across languages is possible (for example using Wikidata) but this would require a few additional steps which are not super easy.

Mar 31 2021, 8:07 AM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.
Mar 31 2021, 7:50 AM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello @Tru2198, I faced the same problem while converting the file to CSV. There were about 5-7 lines in the whole dataframe which had a variation in number of columns, probably due to error in data entry. I believe that since the number of bad lines is negligible as compared to the total number of lines, the parameter error_bad_lines = False can be used. However, @Isaac clarified that we can limit the data to something more feasible (20-30k) rows for this particular task. I did not find any erroneous lines in the first 20k entries so you could probably try working with a limited number too first.

Mar 31 2021, 6:33 AM · Outreachy (Round 22)

Mar 30 2021

rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@Isaac Understood. Thank you for the clarification.

Mar 30 2021, 8:36 PM · Outreachy (Round 22)
rachita_saha added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

For the first TODO in the notebook, is it alright if we loop through only the top 20-30k entries in the given tsv file and answer the questions according to them? I tried looping through all but the memory gets exhausted. @MGerlach @Isaac

Mar 30 2021, 4:03 PM · Outreachy (Round 22)