User Details
User Details
- User Since
- Apr 19 2021, 12:18 AM (159 w, 4 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Menemenetekelufarsim [ Global Accounts ]
Apr 22 2021
Apr 22 2021
Menemenetekelufarsim added a comment to T273221: Measure and indicate Lexeme language completeness, and prompt editors with what more might need doing.
Hello, I read that you were intrested in other corpuses than Wikipedia. I think that Swedish Wikipedia is a skewed source since so many articles are started by robots, and the frequency of odd formulations remain high even after they are manually cleaned up. The Swedish Gigaword Corpus contains one billion words from 1950-2015 analyzed with NLP and stored in XML format: https://spraakbanken.gu.se/en/resources/gigaword
A presentation: http://www.ep.liu.se/ecp/126/002/ecp16126002.pdf