Apr 14 2022
@Appledora Can you please share some links, How can I make my own dataset? from this HTML dump
Apr 11 2022
@Appledora did you find any dataset which contains content of 1000 articles? to work for analysis.
Apr 10 2022
@Appledora. Do you find the 1000 articles in one place or any database present?
or you find out individually?
@Appledora Actually, I did get you. But I am only saying that there is a code of a single web page in article_body and Todo is to extract the text from that. so, there is no need to go inside the Tags to get more information or text which is not even asked for.
Please correct me if I am wrong. @Isaac @MGerlach
Do we need to go inside the Tags and Links for more information?
Apr 9 2022
Thanks, @Appledore But what's the need to move inside the Tags when we have content in HTML(above in article_body) to extract?
"# TODO: write a function for extracting the article text
Mar 30 2022
Mar 29 2022
Do we have to extract this URL url= "https://en.wikipedia.org/wiki/Chang_Gum-chol"? if I am not wrong
then find out the categories which this URL contains and then convert this URL to wikitext and then again find out the total categories and then check count?
Mar 28 2022
Hi @Isaac and @MGerlach, I am new to Open source and have knowledge of python, and do not have an idea about data analysis. I am stuck in code and need guidance on how can I learn and start to contribute to this project. Can we have a Google meet?