Page MenuHomePhabricator

Zealink (Zia Ashraf)
Zia Ashraf

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 4 2021, 6:52 AM (159 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Zealink [ Global Accounts ]

Recent Activity

May 3 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Thanks a lot everyone. It was my first time being a part of such an active and collaborative forum and let me tell you all it was great :)
May you all have a great journey ahead.

May 3 2021, 2:25 PM · Outreachy (Round 22)

Apr 29 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@puja_jaji thanks for the reminder, also it was okay for us to record just one contribution for the task,right?

Yes @Zealink, I think it is appropriate to present all of our work in a single notebook - one contribution,

Apr 29 2021, 5:07 PM · Outreachy (Round 22)
Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach thanks a lot to the outreachy organisers and the team for this consideration. This really means a lot.

Apr 29 2021, 5:05 PM · Outreachy (Round 22)
Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@puja_jaji thanks for the reminder, also it was okay for us to record just one contribution for the task,right?

Apr 29 2021, 2:37 PM · Outreachy (Round 22)

Apr 24 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@rachita_saha @Ahn-nath @Pikaa97 thanks a lot!

Apr 24 2021, 9:07 AM · Outreachy (Round 22)

Apr 23 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hi everyone, i wasn't quite sure whether we had to remove the initial comments (and Markdown texts) completely/partially before adding our own justifications and conclusions to the given tasks in the notebook. Could you all please tell me your thoughts on that.

Apr 23 2021, 11:55 PM · Outreachy (Round 22)

Apr 19 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello everyone, i would like to ask what kind of visualization(Bar graphs, Pie Charts etc.) you all preferred over the others in the place where--

  1. TODO: Choose a destination article from the dataset that is:
  2. * relatively popular (at least 1000 pageviews and 20 unique sources in the dataset)
  3. * shows up in at least one other language with at least 1000 pageviews in January 2021
  4. ** You can check quickly via this tool: https://pageviews.toolforge.org/langviews/
  5. Pull all the data in the clickstream dataset for that article (both as a source and destination)
  6. Visualize the data to show what the common pathways to and from the article are

The answer lies in the message you're trying to convey. It depends on the hidden patterns/insights you consider important to highlight,. So, a Sankey diagram like the one @Vanevela uses is something I would choose for the 'common pathways' part. All the same, it is not the only choice you have. To compare values, as an example (e.g., compare pageviews among a subset of articles), pie charts, scatter plots, or line charts are my preference.

When I feel a bit confused I use this to check if I am making the right choice based on my objective.: https://blog.hubspot.com/marketing/types-of-graphs-for-data-visualization

Apr 19 2021, 5:34 PM · Outreachy (Round 22)
Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello everyone, i would like to ask what kind of visualization(Bar graphs, Pie Charts etc.) you all preferred over the others in the place where--

  1. TODO: Choose a destination article from the dataset that is:
  2. * relatively popular (at least 1000 pageviews and 20 unique sources in the dataset)
  3. * shows up in at least one other language with at least 1000 pageviews in January 2021
  4. ** You can check quickly via this tool: https://pageviews.toolforge.org/langviews/
  5. Pull all the data in the clickstream dataset for that article (both as a source and destination)
  6. Visualize the data to show what the common pathways to and from the article are

Hi, @Zealink I did a Sankey diagram, like this available in the Research: Wikipedia clickstream page. I made the Sankey diagram using Plotly.

Apr 19 2021, 5:31 PM · Outreachy (Round 22)
Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello everyone, i would like to ask what kind of visualization(Bar graphs, Pie Charts etc.) you all preferred over the others in the place where--

Apr 19 2021, 10:41 AM · Outreachy (Round 22)

Apr 18 2021

Zealink updated Zealink.
Apr 18 2021, 9:26 PM

Apr 17 2021

Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello @rachita_saha it's true that we would face a ValueError in just one of the lines but is it actually fine to ignore it? On examining that particular row we would see that it has the data of a lot(really a lot) of different rows all combined to one probably due to some error by us in parsing (on the occurance of some special character sequence)

Hi @Zealink
Yes on inspecting the line, it does combine the data of several rows. However, there was very little difference in the results yielded after analysis because the number of lines is still negligible as compared to the whole data set. But I agree that it is better to solve the error. :D

Apr 17 2021, 4:33 PM · Outreachy (Round 22)
Zealink updated Zealink.
Apr 17 2021, 4:30 PM
Zealink added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@rachita_saha Okay, thank you. I'm using csv.reader to parse the data.

@Audrey_Nessa Okay so make sure that you are stripping the value of any new line or quotation characters. After that you might still get a ValueError exception in one of the lines. So use a try - catch block to catch that exception. The data from that line will not be used but it won't make a difference to the analysis since there is only one bad line in the entire data set. Additionally, you can check what the problem was by printing the contents of the bad line from the file. If you still face any problem, feel free to ask them here. I'm happy to help. :)

Apr 17 2021, 3:52 PM · Outreachy (Round 22)