User Details
- User Since
- Mar 26 2022, 7:43 PM (107 w, 6 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Radhika-outreachy [ Global Accounts ]
Apr 14 2022
@Appledora Can you please share some links, How can I make my own dataset? from this HTML dump
Apr 11 2022
@Appledora did you find any dataset which contains content of 1000 articles? to work for analysis.
Apr 10 2022
@Appledora. Do you find the 1000 articles in one place or any database present?
or you find out individually?
@Appledora Actually, I did get you. But I am only saying that there is a code of a single web page in article_body and Todo is to extract the text from that. so, there is no need to go inside the Tags to get more information or text which is not even asked for.
Please correct me if I am wrong. @Isaac @MGerlach
Do we need to go inside the Tags and Links for more information?
Apr 9 2022
Thanks, @Appledore But what's the need to move inside the Tags when we have content in HTML(above in article_body) to extract?
"# TODO: write a function for extracting the article text
Mar 30 2022
Thanks, @Appledora, and @Isaac for clearing the things.
Mar 29 2022
in #TODO1,
Do we have to extract this URL url= "https://en.wikipedia.org/wiki/Chang_Gum-chol"? if I am not wrong
then find out the categories which this URL contains and then convert this URL to wikitext and then again find out the total categories and then check count?