User Details
- User Since
- Oct 22 2019, 11:40 AM (235 w, 3 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- OlehOnyshchcak [ Global Accounts ]
Feb 7 2020
Dec 4 2019
I just want to give my perspective on an issue in order for it to be better triaged:
- pywikibot claims to support both Windows and Linux systems, so functionality must have a common denominator of "what is acceptable on all systems". If pywikibot doesn't support Windows anymore, probably we will need to remove the installation instructions for Windows to avoid confusion.
- if we don't support Windows anymore, it's still beneficial to restrict filenames to the POSIX portable filename character set, which will guarantee that filename is portable between all Posix-like systems. For example, Kaggle, which is arguably one of the most popular platforms for data storage & processing, relies on the above-mentioned restrictions. Just a few days ago it was explained to me, that Kaggle does not support filenames not following POSIX portable filename character set. And probably, there are a lot of other use cases when this becomes unportable.
Dec 3 2019
- Kaggle Notebook for processing dataset - https://www.kaggle.com/jacksoncrow/dataset-preprocessing
- Kaggle Notebook for training W2VV model - https://www.kaggle.com/jacksoncrow/w2vvtraining
- github repo - https://github.com/OlehOnyshchak/WikiImageRecommendation
Full dataset - https://www.kaggle.com/jacksoncrow/extended-wikipedia-multimodal-dataset
Full dataset with raw images - https://drive.google.com/file/d/1l0Oyv2Y6LmPGN3lP9MB6i8WWCinqkYPk/view?usp=sharing
Nov 24 2019
Filed a bug about missing functionality of pywikibot while collecting additional data T238992. Currently, I applied an inefficient workaround.
Nov 23 2019
Oct 29 2019
Oct 27 2019
Full dataset - https://drive.google.com/open?id=18i0D-N1J18UC1ebT9qbHZegKJQiKba5z
Will post a link with full dataset on Kagle.
FIled another defect, while finetuning image download T236614
Oct 25 2019
@Xqt, thank you for the resolution! While pypi version still has this bug, I will switch directly to the master branch.
Oct 24 2019
Created a truncated dataset for 500 articles(1.3 Gb) - https://www.kaggle.com/jacksoncrow/wiki-articles-multimodal
Full dataset for 5638 articles(14.2 Gb) is still uploading, will follow up with a link
I see, thanks. To make things more simple, I just reproduced the bug in a clean online environment with the latest pywikibot - https://jupyter.org/try
In other words, it allows you to reproduce it easily as well. You can see installation/demonstration of the bug in the attached file:
@Dvorapa, thank you for such a fast response!
@Aklapper, thank you for the clarification! Will report it properly
Sent email with found pywikibot bug for some articles (crashes and can't download some images)