Page MenuHomePhabricator

Radhika_Saini
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Mar 26 2022, 7:43 PM (107 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Radhika-outreachy [ Global Accounts ]

Recent Activity

Apr 14 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

@Appledora Can you please share some links, How can I make my own dataset? from this HTML dump

Apr 14 2022, 7:06 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 11 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

@Appledora did you find any dataset which contains content of 1000 articles? to work for analysis.

Apr 11 2022, 8:14 AM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 10 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

@Appledora. Do you find the 1000 articles in one place or any database present?
or you find out individually?

Apr 10 2022, 7:52 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)
Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

@Appledora Actually, I did get you. But I am only saying that there is a code of a single web page in article_body and Todo is to extract the text from that. so, there is no need to go inside the Tags to get more information or text which is not even asked for.
Please correct me if I am wrong. @Isaac @MGerlach
Do we need to go inside the Tags and Links for more information?

Apr 10 2022, 8:18 AM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 9 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

Thanks, @Appledore But what's the need to move inside the Tags when we have content in HTML(above in article_body) to extract?

Apr 9 2022, 8:03 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)
Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

"# TODO: write a function for extracting the article text

Apr 9 2022, 5:59 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Mar 30 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

Thanks, @Appledora, and @Isaac for clearing the things.

Mar 30 2022, 9:00 AM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Mar 29 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

in #TODO1,
Do we have to extract this URL url= "https://en.wikipedia.org/wiki/Chang_Gum-chol"? if I am not wrong
then find out the categories which this URL contains and then convert this URL to wikitext and then again find out the total categories and then check count?

Mar 29 2022, 5:41 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Mar 28 2022

Radhika_Saini added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

Hi @Isaac and @MGerlach, I am new to Open source and have knowledge of python, and do not have an idea about data analysis. I am stuck in code and need guidance on how can I learn and start to contribute to this project. Can we have a Google meet?

Mar 28 2022, 6:16 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Mar 26 2022

Radhika_Saini added a comment to T302237: Outreachy Project (Round 24): Build Python library to work with html-dumps.

Hey @Isaac and @MGerlach!

Mar 26 2022, 7:51 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)