Hi,
I'm Ana, a computer science graduate from Mozambique, eager to contribute to open source projects. I recently joined Phabricator as an Outreachy applicant, and I'm excited to dive in, learn new things, and contribute to projects.
Hi,
I'm Ana, a computer science graduate from Mozambique, eager to contribute to open source projects. I recently joined Phabricator as an Outreachy applicant, and I'm excited to dive in, learn new things, and contribute to projects.
In T358412#9635724, @GonzaGertrude wrote:Hello @Ederporto or any of my colleagues,
Help me with this issue.Please work with your mentor to provide a timeline of the work you plan to accomplish on the project and what tasks you will finish at each step. Make sure take into account any time commitments you have during the Outreachy internship round. If you are still working on your contributions and need more time, you can leave this blank and edit your application later.
In T358412#9614248, @Abishek_Das wrote:Hello everyone, I am also an Outreachy applicant for 2024.
Today, I discovered the message on Phabricator; however, I initially assumed that all discussions were taking place in the Zulip channel. Consequently, I didn't check the Phabricator comments section.
To keep it concise, I'd like to address how to make FFmpeg work on the PAWS Jupyter Notebook for the task "Create a tool for informative infographics from structured information from Wikimedia projects - Task A."
The reason FFmpeg isn't functioning on the PAWS Jupyter Notebook is that we need to download and add FFmpeg Static Builds from (https://johnvansickle.com/ffmpeg/) to the same folder where we have the code.
Here's a step-by-step guide (So, you don't have to go through the trouble of downloading from https://johnvansickle.com/ffmpeg/):
Note: Before you do the step mentioned in point (a), make sure all the steps, i.e., b, c, d, and e, are done first.
a) I've attached code that you can add to your Jupyter Notebook cell (Same notebook where you have your code to generate the bar chart race). Run this code to resolve the FFmpeg issue.
# Download a static FFmpeg build and add it to PATH. %run 'util/load-ffmpeg.ipynb' print('Done!')b) Prior to running the code mentioned in point (a), add/upload the "util" folder to your PAWS. I've included the folder below.
util.zip864 BDownload
(You, have to unzip it after downloading)c) The purpose of the "util" folder is to automatically add the FFmpeg Static Build File (which is a folder) to your PAWS when you run the provided code mentioned in point (a).
d) Ensure that the filename in bcr.bar_chart_race() has a ".mp4" extension.
e) After completing these steps, you can run your respective code, which is the code for generating the bar chart race.
Note:
a) You might encounter a warning / Error (Which again doesn't appear when I run my code locally and only sometimes appears on my PAWS Jupyter Notebook), as shown in the attached screenshot. However, this is not an issue, as the video file will be generated in the PAWS folder after running your code. You can then download the bar chart race video and watch the video(as shown the screenshot below).
b) If your code works correctly locally, it should generally (90%-99% of the time) work on the PAWS online Jupyter Notebook.
c) The warning or error screenshot I provided may or may not appear (Which happens to me only on PAWS), so be mindful of that.
d) Ensure that the filename in bcr.bar_chart_race() has a ".mp4" extension, as the ".html" filename won't appear on PAWS. But, Again, the .html works locally.
e) Why the .html doesn't appear on PAWS, I have no idea about it, and I have still not looked for a solution related to .html since the .mp4 file is generated on PAWS without any issue.
f) All the things I have mentioned on how to solve the issue related to FFmpeg were taken from various documentation like Matplotlib 3.8.3 documentation and, of course, my favorite stack overflow (So, thanks to the Devs on Stack Overflow).
In T358412#9613259, @Andreas_Sune wrote:In T358412#9613211, @Anachimuco wrote:In T358412#9612988, @Udonels wrote:In T358412#9612803, @MahimaSinghal wrote:In T358412#9611792, @Udonels wrote:In T358412#9611573, @Andreas_Sune wrote:Hello, @Ederporto
I have a question about visualisation step. On the notebook task we have to do our graphics with the previous result dataframe (top_view_dataframe). According to bar_chart_race documentation, the dataframe to use should have a date on row and different categories of articles on columns but top_view_dataframe is the opposite. My question is can we use different approach instead of top_view_dataframe, for exemple prepare our dataset with the build in method in bar_chart_race release for that goal?
Thank you
Yes, same issue. I find it hard to visualize it with the current state of our data frame. In the documentation, it's specified that every row must represent a single period, which is the exact opposite in ours.
@Ederporto would drop more insights.
I think to address the requirement of using the bar_chart_race library, we can reshape the DataFrame so that dates are on the rows and articles are on the columns. This way, it aligns with the expected format for the library.
In T358412#9611792, @Udonels wrote:In T358412#9611573, @Andreas_Sune wrote:Hello, @Ederporto
I have a question about visualisation step. On the notebook task we have to do our graphics with the previous result dataframe (top_view_dataframe). According to bar_chart_race documentation, the dataframe to use should have a date on row and different categories of articles on columns but top_view_dataframe is the opposite. My question is can we use different approach instead of top_view_dataframe, for exemple prepare our dataset with the build in method in bar_chart_race release for that goal?
Thank you
Yes, same issue. I find it hard to visualize it with the current state of our data frame. In the documentation, it's specified that every row must represent a single period, which is the exact opposite in ours.
@Ederporto would drop more insights.
Hello @Ederporto ,
I have tried using the "bar_chart_race" library but it's to no avail. I have tried the examples in the documentation using its in-built dataset, and it says ffmpeg is required to be installed. Is there a way around this on the PAWS system notebook?Please if any other intern has gone past this, do aid me. I've spent hours debugging without results.
You'll need to install ffmpeg on your machine. I've already done that, but unfortunately, I'm still unable to visualize the bar chart. I tested it on my machine using vscode, and initially encountered an error. However, after installing ffmpeg, I managed to generate the video successfully. I suspect the issue might be with the jupyter environment. If anyone has successfully addressed the problem, could you please help us?
If you have successfully installed ffmpeg on your system, It should work also with your Jupyter notebook. I did It and i’ve successfully generated the video.
I think the issue come from the dataset.
In T358412#9612988, @Udonels wrote:In T358412#9612803, @MahimaSinghal wrote:In T358412#9611792, @Udonels wrote:In T358412#9611573, @Andreas_Sune wrote:Hello, @Ederporto
I have a question about visualisation step. On the notebook task we have to do our graphics with the previous result dataframe (top_view_dataframe). According to bar_chart_race documentation, the dataframe to use should have a date on row and different categories of articles on columns but top_view_dataframe is the opposite. My question is can we use different approach instead of top_view_dataframe, for exemple prepare our dataset with the build in method in bar_chart_race release for that goal?
Thank you
Yes, same issue. I find it hard to visualize it with the current state of our data frame. In the documentation, it's specified that every row must represent a single period, which is the exact opposite in ours.
@Ederporto would drop more insights.
I think to address the requirement of using the bar_chart_race library, we can reshape the DataFrame so that dates are on the rows and articles are on the columns. This way, it aligns with the expected format for the library.
In T358412#9611792, @Udonels wrote:In T358412#9611573, @Andreas_Sune wrote:Hello, @Ederporto
I have a question about visualisation step. On the notebook task we have to do our graphics with the previous result dataframe (top_view_dataframe). According to bar_chart_race documentation, the dataframe to use should have a date on row and different categories of articles on columns but top_view_dataframe is the opposite. My question is can we use different approach instead of top_view_dataframe, for exemple prepare our dataset with the build in method in bar_chart_race release for that goal?
Thank you
Yes, same issue. I find it hard to visualize it with the current state of our data frame. In the documentation, it's specified that every row must represent a single period, which is the exact opposite in ours.
@Ederporto would drop more insights.
Hello @Ederporto ,
I have tried using the "bar_chart_race" library but it's to no avail. I have tried the examples in the documentation using its in-built dataset, and it says ffmpeg is required to be installed. Is there a way around this on the PAWS system notebook?Please if any other intern has gone past this, do aid me. I've spent hours debugging without results.
Yes, I think that the term "visualization" refers to the number of times an article has been viewed or accessed by users within a given period. The description "the cells values store the visualization of the article A on the date D" means that each cell in the DataFrame will contain the number of times the corresponding article was viewed on a specific day.
In T358412#9609811, @GonzaGertrude wrote:In T358412#9608666, @Anachimuco wrote:In T358412#9607774, @DevJames1 wrote:@Ederporto
I am having issues implementing this functionmost_viewed_ptwiki_jan_feb_per_day():as in the task description.
If I understood properly we are asked to get daily data for each article within January and February in the Portuguese Wikipedia. And then append in a DataFrame
My limitations:
- No endpoint to fetch the most viewed articles daily or monthly in a project for a date range. The closest is this endpoint
https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/all-access/all-agents/daily/{start}/{end}but only returns views for a project cumulatively and not individual articles.
- Another endpoint I tried is this:
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/{project}/all-access/year/month/dayThis returns pageviews for articles on a project for a particular day or month(not date range).
So to use this to solve the task, I will have to make almost 60 requests each time trying to get articles for each day up to two months which is not efficient enough.So any help will be appreciated, my fellow interns you can help if you find a way to go about it or if you feel I misunderstood the task description.
@Ederporto, can you please clarify this? I have a sense that I also misunderstood the task.
@Anachimuco
Perhaps you need to get the most viewed articles in Portuguese Wikipedia and then get their daily visits in the given period.
In T358412#9607774, @DevJames1 wrote:@Ederporto
I am having issues implementing this functionmost_viewed_ptwiki_jan_feb_per_day():as in the task description.
If I understood properly we are asked to get daily data for each article within January and February in the Portuguese Wikipedia. And then append in a DataFrame
My limitations:
- No endpoint to fetch the most viewed articles daily or monthly in a project for a date range. The closest is this endpoint
https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/all-access/all-agents/daily/{start}/{end}but only returns views for a project cumulatively and not individual articles.
- Another endpoint I tried is this:
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/{project}/all-access/year/month/dayThis returns pageviews for articles on a project for a particular day or month(not date range).
So to use this to solve the task, I will have to make almost 60 requests each time trying to get articles for each day up to two months which is not efficient enough.So any help will be appreciated, my fellow interns you can help if you find a way to go about it or if you feel I misunderstood the task description.
In T358412#9607480, @DevJames1 wrote:In T358412#9605389, @DevJames1 wrote:In T358412#9604805, @Anachimuco wrote:Hi @Ederporto,
I'm facing some trouble gathering data from the Wikimedia API. Every time I make a request in the Jupyter notebook, I'm getting a response status code of 403, indicating forbidden access. Is there a workaround for this? It's worth noting that I can gather data normally from the Wikimedia API website and even when running the link outside of the notebook.
Could you please take a look at the error and see if there's a solution? Thank you
I am having the same issue with this endpoint: https://wikimedia.org/api/rest_v1/metrics/pageviews/top/{project}/{access}/{year}/{month}/{day}
it returns 403. But after getting the text property of the response these were the indicatives.
- Our servers are currently under maintenance or experiencing a technical problem.
- Error: 403, Scripted requests from your IP have been blocked, please see https://meta.wikimedia.org/wiki/User-Agent_policy.
Assistance is needed to continue with the tasks
OS: Windows
python module: requests@Anachimuco I read on the page view API page that
headers = { "User-Agent": user_agent }is required, check if you added this and replace it with an actual user agent.
I was able to get the data after adding it to my code.
I assume you are using Python
I'm facing some trouble gathering data from the Wikimedia API. Every time I make a request in the Jupyter notebook, I'm getting a response status code of 403, indicating forbidden access. Is there a workaround for this? It's worth noting that I can gather data normally from the Wikimedia API website and even when running the link outside of the notebook.
In T358412#9603946, @GonzaGertrude wrote:In T358412#9603054, @Anachimuco wrote:Hi @Ederporto, I hope you're doing well!
I was going through the project tasks and had a question I wanted to clarify:
When it mentions "the most viewed articles in the Portuguese Wikipedia," is it specifically referring to Brazil, a Portuguese-speaking country, or does it encompass all Portuguese-speaking countries? Just a bit confused about the wording there.
Hello @Anachimuco
I think the Portuguese Wikipedia encompasses all Portuguese speaking countries just like the English Wikipedia is used in many English speaking countries.
Hi @Ederporto, I hope you're doing well!