Ask the community to list of sources that:
- Are often referenced on svwp
- Gives bad or no references with the current tools
Ask the community to list of sources that:
We can use https://github.com/mediawiki-utilities/python-mwrefs to extract all references for sv.wp, then extract the urls to get a list of most frequently used websites. A manual inspection should probably find good candidates pretty quickly.
@Alicia_Fagerving_WMSE if you extract the references for WMSE-Library-Data-2018 then maybe keep a copy of the unprocessed output for use here before you start filtering by citation template.
@Lokal_Profil I made T202541 to work with with python-mwrefs, so I'll add your note there :)
The refs are here: https://drive.google.com/open?id=1WP8rP6gwDk74knkKQ2vjOLnAILEU5ZUT (3.3 GB file)
The file contains just <ref> tags with content – I also have a full output (with article titles, timestamps etc) from python-mwrefs, if it's useful for anything, but I haven't uploaded it because it's 4.6 GB and it would take hours. I have horrible uplink here :/
@Sebastian_Berlin-WMSE In case you're still interested in the most commonly cited websites, I've done a quick analysis of the abovementioned dump (T203450) and here you can find the domain names with at least 10 hits.
Thanks. I'll take a look.
I took the liberty to zip the file, reducing the size to 176 MB (~5% of original), and uploading that. Unless you have any plans for the original, it should be safe to delete it.
Some info has also been published on wmse:Projekt:Strategisk_inkludering_av_biblioteksdata_på_Wikidata_2018/Källor_på_Wikipedia#Alla_fotnoter_–_webbplatser. When we post something about this on the Village (T201308#4582995) we should also follow up with the second part of this task (asking which are currently handled badly by Citoid).
@Sebastian_Berlin-WMSE can you coordinate that with @Alicia_Fagerving_WMSE ?
Yes, we got some ideas from the community earlier in this thread (T175348#3591041) and also published all the results of our investigation as T201308.