I have had an amazing experience working on this task and I truly appreciate how the community came forward to help out in any way possible everytime I or anyone else faced an issue. I have learnt a lot under the guidance of the mentors. I hope to apply all the knowledge I have gained through this experience in all future projects that I take up. I am really looking forward to continue contributing to open source.
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
May 3 2021
Apr 29 2021
In T276315#7045924, @PatsonJay wrote:What if it doesn't
In T276315#7045872, @PatsonJay wrote:Hello everyone I have an article say with the name Chris Ferguson and I got this article from an English clickstream. When do a query using the mwapi libray I can see that this article also appeared in Spanish. But when I try to find this article from the Spanish clickstream("eswiki") I do not find this article even if I search It's name in Spanish. But then when I look at this article using the languageviews (https://pageviews.toolforge.org/langviews/) I can see that this article does exist. What might be the problem I'm facing
Apr 26 2021
In T276315#7034038, @PatsonJay wrote:Visualize the data to show what the common pathways to and from the article are. This is Todo isn't clear to understand can someone please explain
In T276315#7034030, @PatsonJay wrote:Any advice on how I can visualize the data from a Todo that I do not quiet understand
Apr 24 2021
In T276315#7030673, @Zealink wrote:Hi everyone, i wasn't quite sure whether we had to remove the initial comments (and Markdown texts) completely/partially before adding our own justifications and conclusions to the given tasks in the notebook. Could you all please tell me your thoughts on that.
Apr 17 2021
Hello @rachita_saha it's true that we would face a ValueError in just one of the lines but is it actually fine to ignore it? On examining that particular row we would see that it has the data of a lot(really a lot) of different rows all combined to one probably due to some error by us in parsing (on the occurance of some special character sequence)
In T276315#7011409, @Audrey_Nessa wrote:@rachita_saha Okay, thank you. I'm using csv.reader to parse the data.
In T276315#7011306, @Audrey_Nessa wrote:Hello, I'd like to ask if there are rows in the dataset that have no pageview, or the pageview is a different index?
When I run the code, after some time of running it produces an error saying cannot convert str to int
Apr 3 2021
In T276315#6970459, @Priyanka0325 wrote:@rachita_saha okay ...Thank you
In T276315#6970423, @Priyanka0325 wrote:hello everyone,when I am trying using pandas to create a dataframe ,but it is giving kernel appears to dead everytime.
In T276315#6970135, @Srishti0gupta wrote:Hi, Everyone. I have been trying to start with the code, but not able to break it. I am have decent experience with Python through courses. Help needed to get started. Thanks.
Apr 2 2021
@Isaac Okay, understood. Thank you.
@Isaac @MGerlach I had a doubt regarding recording contributions. Since we are not making formal pull requests in this project, at what points and in which form are we required to record our contributions? Do we need to submit the public link of our notebook after completing a few to-dos and that will count as a contribution?
Apr 1 2021
Thank you @Isaac, I'll take these points into consideration.
In T276315#6964806, @MGerlach wrote:In T276315#6964708, @rachita_saha wrote:The mwapi library uses the requests-library too (see here). If you can work with the requests-library directly, that works perfectly fine.
Mar 31 2021
In T276315#6961246, @naznin47 wrote:
In T276315#6961080, @Tru2198 wrote:In T276315#6960916, @rachita_saha wrote:@MGerlach @Isaac Several of the popular destination articles do not link to any further articles. So there is no data on common pathways from this article. Should we find an article that links to more articles or should we only provide the visualizations for the common pathways to the chosen article?
In this case, what do common pathways refers to?
@MGerlach @Isaac Several of the popular destination articles do not link to any further articles. So there is no data on common pathways from this article. Should we find an article which links to more articles or should we only provide the visualizations for the common pathways to the chosen article?
In T276315#6959707, @MGerlach wrote:In T276315#6959538, @Tru2198 wrote:Hello, for the microtask, I am trying to convert the file to CSV with pandas and subsequently data frames, but I am getting the error where several rows have conflicting columns. So the parameter, error_bad_lines=False to ignore the troubling lines can be used. Is that option viable?
@Tru2198 @rachita_saha. good catch regarding these errors when importing with pandas. in this case, if the number of lines that are affected is small, it is ok to remove them using "error_bad_lines=False".
However, more generally, it is worth to check what is happening with these lines. When pandas is importing the rows line by line, it is splitting each line at the "\t" to get the entries for the different columns. Now, the error suggests that sometimes, the number of entries from splitting is not 4 (like most of the rows) but 5 (or more). This happens because of some characters contained in the page-titles of the source and/or the target-page (such as quotes). Pandas provides different options to deal with this. I have found that using the option "quoting=3" when doing read_csv() can solve the problem (the default is quoting=0). see here for more documentation, if you are interested to dig deeper.
In T276315#6959666, @MGerlach wrote:In T276315#6959624, @rachita_saha wrote:@Isaac @MGerlach In the to-do where we are required to visualize the data for a chosen destination article, it says "Pull all the data in the clickstream dataset for that article (both as a source and destination)". Does this mean we need to pull the data for that article in all available languages for the month of January or only the English language?
@rachita_saha Thanks for the question, this is indeed not clear in the description. To clarify: It is totally fine for this todo to work with one language (e.g. English) for a single month (e.g. January 2021). Matching articles across languages is possible (for example using Wikidata) but this would require a few additional steps which are not super easy.
Hello @Tru2198, I faced the same problem while converting the file to CSV. There were about 5-7 lines in the whole dataframe which had a variation in number of columns, probably due to error in data entry. I believe that since the number of bad lines is negligible as compared to the total number of lines, the parameter error_bad_lines = False can be used. However, @Isaac clarified that we can limit the data to something more feasible (20-30k) rows for this particular task. I did not find any erroneous lines in the first 20k entries so you could probably try working with a limited number too first.
Mar 30 2021
@Isaac Understood. Thank you for the clarification.