Page MenuHomePhabricator

Tru2198 (TRUSHA PATEL)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Mar 29 2021, 5:20 PM (160 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Tru2198 [ Global Accounts ]

Recent Activity

Apr 28 2021

Tru2198 added a comment to T279290: Synchronising Wikidata and Wikipedias using pywikibot - Task 6.

Hey @Mike_Peel @MSGJ , for the Category:The Interviews name ID not in Wikidata, what do we add as an ID for pages which where 2 or more such IDs are possible, and they've already been added to individual pages. For instance, for Glen and Les Charles, the individual wikidata pages have their correct IDs already added, and the Category:The Interviews name ID not in Wikidata page has not mentioned the individual pages.

So what is supposed to be the correct ID in this case? Is it going to be the relevant ID of Glen Charles (glen-charles) and Les Charles (les-charles)?

You tell me! :-) You may want to include both on Wikidata, or get your code to skip that article

Well, I think that the wikidata page Glen and Les Charles is supposed to have the ID of both Glen Charles and Les Charles. Anyhow there is no such ID like glen-and-les-charles (as per the naming convention of this ID type). So I guess Glen and Les Charles wikidata page will have 2 values of this ID : one for Les Charles and one for Glen Charles.

Also, the individual IDs are already present in their respective wikidata pages. So, I believe that since Glen and Les Charles wikidata page comprises of both of them(Glen and Les Charles), adding individual IDs to this page must work.

Hope this is okay!
Thanks

Apr 28 2021, 12:06 PM · Outreachy (Round 22)
Tru2198 added a comment to T279290: Synchronising Wikidata and Wikipedias using pywikibot - Task 6.

You changed your subcategory. So I assume, you got the idea, right?

@Tru2198, i guess yes. It would be better if you could answer my earlier question -
So, final task is to include P5773 and https://interviews.televisionacademy.com/interviews/glen-charles on identifiers list of https://en.wikipedia.org/wiki/Glen_and_Les_Charles wikidata page.

Apr 28 2021, 7:19 AM · Outreachy (Round 22)
Tru2198 added a comment to T279290: Synchronising Wikidata and Wikipedias using pywikibot - Task 6.
Apr 28 2021, 6:48 AM · Outreachy (Round 22)
Tru2198 added a comment to T279290: Synchronising Wikidata and Wikipedias using pywikibot - Task 6.

hey everyone,
What do we mean by ID value here?

Thanks,
Pushpanjali Kumari

Apr 28 2021, 4:24 AM · Outreachy (Round 22)

Apr 16 2021

Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

That's weird - you shouldn't need to access test.wikipedia.org... Perhaps open that as a separate ticket, and I can tag the relevant people to see if they can find the problem?

Apr 16 2021, 8:33 AM · Outreachy (Round 22)
Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

No. It does not. I restarted my entire terminal. Now I am planning to reinstall user_config.py altogether.

Thank you

Oh this would probably work. I reinstalled everything when I had a similar error, at the start. I was just considering that there should be a less crude way to do it.
In any case, keep us updated.

Thank you very much for the time and patience. Yes, I will definitely keep you in the loop

Apr 16 2021, 8:28 AM · Outreachy (Round 22)

Apr 15 2021

Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

No. It does not. I restarted my entire terminal. Now I am planning to reinstall user_config.py altogether.

Thank you

Oh this would probably work. I reinstalled everything when I had a similar error, at the start. I was just considering that there should be a less crude way to do it.
In any case, keep us updated.

Apr 15 2021, 5:28 PM · Outreachy (Round 22)
Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

usernames['wikipedia']['en'] = 'Tru2198'

Try changing that to u'Tru2198' - but in general it looks OK otherwise... Make sure you're entering your password correctly?

Apr 15 2021, 4:20 PM · Outreachy (Round 22)
Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

@Mike_Peel, @MSGJ,

I am getting the error :
pywikibot.exceptions.NoUsername: Username "Tru2198" does not have read permissions on wikipedia:en
.The supplied credentials could not be authenticated.

CRITICAL: Exiting due to uncaught exception <class 'pywikibot.exceptions.NoUsername'>

Here is the image

pywikibot error.PNG (712×1 px, 84 KB)

I have

usernames['wikipedia']['en'] = 'Tru2198'

usernames['wikidata']['wikidata']= u'Tru2198', set in user_config.py

Also, I get:
Failed OAuth authentication for wikipedia:test: The authorization headers in your request are for a user that does not exist here

when I try to add a date property in the wikidata

Kindly help!

If your browser had been inactive for a while, try logging out of wikipedia, refreshing the page and logging in again. You could also, restart your terminal.

Does it work?
Does the error change?

Apr 15 2021, 4:16 PM · Outreachy (Round 22)
Tru2198 added a comment to T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

I am getting the error :
pywikibot.exceptions.NoUsername: Username "Tru2198" does not have read permissions on wikipedia:en
.The supplied credentials could not be authenticated.

CRITICAL: Exiting due to uncaught exception <class 'pywikibot.exceptions.NoUsername'>

Apr 15 2021, 2:16 PM · Outreachy (Round 22)

Apr 11 2021

Tru2198 added a comment to T279290: Synchronising Wikidata and Wikipedias using pywikibot - Task 6.

@Mike_Peel, I am picking the topic Category: JudoInside_template_with_ID_not_in_Wikidata. I hope that won't be an issue!

Apr 11 2021, 5:37 AM · Outreachy (Round 22)

Apr 10 2021

Tru2198 added a comment to T279289: Synchronising Wikidata and Wikipedias using pywikibot - Task 5.

@Mike_Peel, For Bonus: Explore how to identify the correct item when multiple terms are returned, can we approach the problem with the idea that for every QID returned through the searching title, we refine the correct one by parsing each item for a specific property? For instance, if my title is "Harry Potter". Now that returns Harry Potter movie, book, and character. But only the book will have the property "Language of the work", or "author", or "publication date"?

Apr 10 2021, 12:41 PM · Outreachy (Round 22)

Apr 9 2021

Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

“Print out the information alongside the property name (e.g., "P31 = human"). “
Isn't this only in wikidata, as wikidata stores the information in the form of properties and QIDs?

So, how should the statements be printed on parsing Wikipedia? I am confused here.

Also, do review by task_2:
https://www.wikidata.org/w/index.php?title=User:Tru2198/Outreachy_2

Apr 9 2021, 5:03 PM · Outreachy (Round 22)

Apr 8 2021

Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

“Print out the information alongside the property name (e.g., "P31 = human"). “
Isn't this only in wikidata, as wikidata stores the information in the form of properties and QIDs?

Apr 8 2021, 5:38 PM · Outreachy (Round 22)
Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

@Mike_Peel So we have to print from both Wikipedia and wikidata? Maybe I mistakenly did only the bonus part! Thank you

Apr 8 2021, 4:01 PM · Outreachy (Round 22)
Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

Hi @MSGJ, I have completed task_3 which awaits your feedback!
Also, I think this tutorial is also ideal for task_4 that needs adding information to the wiki data.

Apr 8 2021, 10:28 AM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

In case you might need some inspiration for the last microtask, I found these to be useful:

Apr 8 2021, 3:12 AM · Outreachy (Round 22)

Apr 7 2021

Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

Hi @Tru2198, I didn't implement it for such cases yet, except Image.

I believe finding href html tag and then asking for title of the page for that link will resolve this issue.

And when the programme couldn't find a href, then must be a plain text, which should be extracted from relevant <div> tags.

If there is any better approach, please share. Because

Apr 7 2021, 8:44 AM · Outreachy (Round 22)
Tru2198 updated subscribers of T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

Hello @Mike_Peel, @MSGJ
I have completed Task_3. Though, I haven't used the parsing through the regex.

Apr 7 2021, 8:42 AM · Outreachy (Round 22)
Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.

@Shristi0gupta , wonderful program! I wanted to know how did you deal with the parameter values who didn't have label? and also, some values without Qnumbers like dates?

Apr 7 2021, 3:39 AM · Outreachy (Round 22)

Apr 6 2021

Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Now, as per the first clickstream data dump, it's in English. But in langlink API, it's not in English.

@Tru2198 make sure that you're viewing all the results. This can be done via the continuation parameter in mwapi (documentation) or just increasing the lllimit parameter via your API call to return more results at once. Hope that helps.

Apr 6 2021, 2:59 PM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

I am having few confusions on the final tasks for Compare Reader Behavior across Languages. Finding the correct article that matches both the datasets and consequently comparing their (their respective source and destination) values across languages is not ideal and time-consuming as I have to resort to a trial and error approach. I would like to know how others have moved forward with the task?
Thanks!

Hi @Tru2198 , taking into account the memory restriction in PAWS, I iterated through the files without downloading them into memory and only extracted the necessary information in less memory-consuming data structures (as @Isaac suggested before), in my case I used dictionaries and lists. That is, my initial exploration doesn't use Pandas, later when I've selected a smaller dataset I start using this library to take advantage of his methods.

Apr 6 2021, 2:26 PM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

I am having few confusions on the final tasks for Compare Reader Behavior across Languages. Finding the correct article that matches both the datasets and consequently comparing their (their respective source and destination) values across languages is not ideal and time-consuming as I have to resort to a trial and error approach. I would like to know how others have moved forward with the task?
Thanks!

Can you provide more details? @Tru2198?

  1. Does your memory run out after loading the datasets (while analyzing them) or during?
  2. How much memory are you consuming prior to this session?
  3. Are you operating on the whole datasets or a subsection of them?
Apr 6 2021, 2:24 PM · Outreachy (Round 22)
Tru2198 added a comment to T278860: Synchronising Wikidata and Wikipedias using pywikibot - Task 1.

Hello, please for task 1 when we are asked to add a statement am I adding the statement directly on the article itself?

Apr 6 2021, 1:06 PM · Outreachy (Round 22)
Tru2198 added a comment to T278997: Synchronising Wikidata and Wikipedias using pywikibot - Task 3.
Apr 6 2021, 12:33 PM · Outreachy (Round 22)
Tru2198 added a comment to T278860: Synchronising Wikidata and Wikipedias using pywikibot - Task 1.

Although yesterday, my task_1 got approved, I have added few extra properties and an article that might deviate a bit.

Apr 6 2021, 11:19 AM · Outreachy (Round 22)
Tru2198 updated subscribers of T278863: Synchronising Wikidata and Wikipedias using pywikibot - Task 2.

@Mike_Peel, @MSGJ, I have completed task 2. I have mailed you.
Though here is the link:
https://www.wikidata.org/wiki/User:Tru2198/Outreachy_2
Await your feedback,
Thank you!

Apr 6 2021, 9:04 AM · Outreachy (Round 22)

Apr 5 2021

Tru2198 added a comment to T276329: Synchronising Wikidata and Wikipedias using pywikibot.

@Mike_Peel Thank you for the heads up! No matter the interns, I am really enjoying the learning in the contribution!

Apr 5 2021, 3:30 PM · Outreachy (Round 22), Outreach-Programs-Projects
Tru2198 added a comment to T278860: Synchronising Wikidata and Wikipedias using pywikibot - Task 1.

Hello @Mike_Peel!
I have tried my best with Task_1, and look forward to your feedback. Kindly review it at your time.
Here is the link:
https://www.wikidata.org/wiki/User:Tru2198/Outreachy_1

Apr 5 2021, 11:34 AM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

I am having few confusions on the final tasks for Compare Reader Behavior across Languages. Finding the correct article that matches both the datasets and consequently comparing their (their respective source and destination) values across languages is not ideal and time-consuming as I have to resort to a trial and error approach. I would like to know how others have moved forward with the task?
Thanks!

Apr 5 2021, 4:54 AM · Outreachy (Round 22)
Tru2198 added a comment to T276329: Synchronising Wikidata and Wikipedias using pywikibot.
Apr 5 2021, 4:39 AM · Outreachy (Round 22), Outreach-Programs-Projects

Apr 2 2021

Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

I have completed my first notebook's microtask with the basic implementation of the functions and analysis that are required. However, there are many updates and enhancements that I would act upon in the upcoming days that I have mentioned in this notebook. Should I record my first contribution and await the review on the Outreachy site?

Apr 2 2021, 4:14 AM · Outreachy (Round 22)

Apr 1 2021

Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.
Apr 1 2021, 11:50 AM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

for the task: Compare Reader Behavior across Languages, will the comparison between two languages suffice, as most articles are supported by only one or two languages, other than the English language that intersects with the available clickstream data and that of the langlinks API?

Apr 1 2021, 8:04 AM · Outreachy (Round 22)

Mar 31 2021

Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello, for the microtask, I am trying to convert the file to CSV with pandas and subsequently data frames, but I am getting the error where several rows have conflicting columns. So the parameter, error_bad_lines=False to ignore the troubling lines can be used. Is that option viable?

@Tru2198 @rachita_saha. good catch regarding these errors when importing with pandas. in this case, if the number of lines that are affected is small, it is ok to remove them using "error_bad_lines=False".
However, more generally, it is worth to check what is happening with these lines. When pandas is importing the rows line by line, it is splitting each line at the "\t" to get the entries for the different columns. Now, the error suggests that sometimes, the number of entries from splitting is not 4 (like most of the rows) but 5 (or more). This happens because of some characters contained in the page-titles of the source and/or the target-page (such as quotes). Pandas provides different options to deal with this. I have found that using the option "quoting=3" when doing read_csv() can solve the problem (the default is quoting=0). see here for more documentation, if you are interested to dig deeper.

@MGerlach That's very insightful. I'll look up this option. Thank you.

Mar 31 2021, 3:42 PM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach @Isaac Several of the popular destination articles do not link to any further articles. So there is no data on common pathways from this article. Should we find an article that links to more articles or should we only provide the visualizations for the common pathways to the chosen article?

Mar 31 2021, 3:25 PM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Can anybody clarify for me that what exactly pageview means? Isn't it the sum of the number of occurrences (the 4th column)? @MGerlach @Tru2198

Mar 31 2021, 2:26 PM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

@MGerlach. Thanks for the suggestion. I'll work on it.

Mar 31 2021, 11:35 AM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

data_frame.group by('Destination').nunique()>=20 returns most of the values with false in Source, destination and link columns. Also, I am facing a deadlock in combining the queries of the TODO task: Choose a destination article from the dataset that is:
"relatively popular (at least 1000 page views (where I am using: data_frame.group by('Destination').sum()>=1000) and 20 unique sources in the dataset"

Mar 31 2021, 10:32 AM · Outreachy (Round 22)
Tru2198 added a comment to T276315: Outreachy Application Task: Tutorial for Wikipedia Clickstream data.

Hello, for the microtask, I am trying to convert the file to CSV with pandas and subsequently data frames, but I am getting the error where several rows have conflicting columns. So the parameter, error_bad_lines=False to ignore the troubling lines can be used. Is that option viable?

Mar 31 2021, 6:11 AM · Outreachy (Round 22)