Page MenuHomePhabricator

FatimaArshad-DS
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Mar 29 2022, 7:15 PM (108 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
FatimaArshad-DS [ Global Accounts ]

Recent Activity

Apr 17 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

Text coming from Wikitext is in pretty format. Was anyone able to pretty print HTML text?

Apr 17 2022, 3:46 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 14 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

I still don't understand the concept of templates. What are they?

Apr 14 2022, 4:40 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 12 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

Does it happen to anyone else... PAWS stops saving notebook after a while?

Apr 12 2022, 9:07 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 10 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

@Appledora There is no need to go inside tags manually. You can extract all the visible text very easily using BS4 :)

Apr 10 2022, 7:33 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 9 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

"# TODO: write a function for extracting the article text

  1. It doesn't have to look the same as the output of wt.strip_code() above (in fact, it likely won't)
  2. but it should be very similar in that you should aim for something
  3. that captures the text of the article without a lot of markup etc.
  4. NOTE: straightforward HTML -> text functions likely won't perform well here and you'll probably
  5. want to write something more custom to handle the specifics of Wikipedia articles"
Apr 9 2022, 3:56 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Apr 8 2022

FatimaArshad-DS added a comment to T302242: Outreachy Application Task (Round 24): Build Python library to work with html-dumps.

My question is related to this page: https://en.wikipedia.org/wiki/Chang_Gum-chol

Apr 8 2022, 4:41 PM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)
FatimaArshad-DS added a comment to T302237: Outreachy Project (Round 24): Build Python library to work with html-dumps.

Hi Everyone,

Apr 8 2022, 8:41 AM · Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)