User Details
- User Since
- Aug 3 2019, 6:58 AM (191 w, 14 h)
- Availability
- Available
- IRC Nick
- kevinbazira
- LDAP User
- Kevin Bazira
- MediaWiki User
- Kevin Bazira [ Global Accounts ]
Wed, Mar 29
@kostajh, we published datasets for all 12/16 models that passed the evaluation in this round.
Mon, Mar 27
The conclusion on the backtesting results is that most of the languages look fine besides:
- wuuwiki, zh_classicalwiki, and zh_yuewiki which have extremely low precision and recall compared to the recommended threshold of 0.75 and 0.2.
Fri, Mar 24
Model evaluation has been completed and below are the backtesting results:
Precision@0.5 | Recall@0.5 | |
wawiki | 0.81 | 0.40 |
warwiki | 0.95 | 0.77 |
wowiki | 0.83 | 0.54 |
wuuwiki | 0.00 | 0.00 |
xalwiki | 0.99 | 0.60 |
xhwiki | 0.83 | 0.32 |
xmfwiki | 0.76 | 0.27 |
yiwiki | 0.76 | 0.44 |
yowiki | 0.96 | 0.83 |
zawiki | 0.91 | 0.61 |
zeawiki | 0.97 | 0.78 |
zh_classicalwiki | 0.00 | 0.00 |
zh_min_nanwiki | 0.97 | 0.84 |
zh_yuewiki | 0.48 | 0.00 |
zuwiki | 0.97 | 0.80 |
Thu, Mar 23
15/16 models were trained successfully in the 14th round of wikis.
Wed, Mar 22
The Chinese Wikipedia (zhwiki) training pipeline is also throwing this error:
File "wikipedia2vec/dictionary.pyx", line 231, in wikipedia2vec.dictionary.Dictionary.build File "wikipedia2vec/dump_db.pyx", line 124, in wikipedia2vec.dump_db.DumpDB.is_disambiguation File "wikipedia2vec/dump_db.pyx", line 125, in wikipedia2vec.dump_db.DumpDB.is_disambiguation File "wikipedia2vec/dump_db.pyx", line 126, in wikipedia2vec.dump_db.DumpDB.is_disambiguation UnicodeEncodeError: 'utf-8' codec can't encode characters in position 2-3: surrogates not allowed
Tue, Mar 21
@kostajh, we published datasets for all 23/24 models that passed the evaluation in this round.
Mon, Mar 20
The conclusion on the backtesting results is that most of the languages look fine besides:
- piwiki's precision and recall are nil.
- orwiki (0.71) and pawiki (0.74) have a precision slightly lower than the recommended one (0.75).
- nqowiki has a slightly low precision (0.73) and low recall (0.11).
Fri, Mar 17
Model evaluation has been completed and below are the backtesting results:
Precision@0.5 | Recall@0.5 | |
novwiki | 0.88 | 0.61 |
nqowiki | 0.73 | 0.11 |
nrmwiki | 0.87 | 0.56 |
nsowiki | 0.96 | 0.40 |
nvwiki | 0.99 | 0.80 |
nywiki | 0.91 | 0.67 |
ocwiki | 0.89 | 0.66 |
olowiki | 0.92 | 0.51 |
omwiki | 0.84 | 0.53 |
orwiki | 0.71 | 0.22 |
oswiki | 0.79 | 0.28 |
pawiki | 0.74 | 0.29 |
pagwiki | 0.92 | 0.69 |
pamwiki | 0.94 | 0.76 |
papwiki | 0.88 | 0.60 |
pcdwiki | 0.92 | 0.75 |
pdcwiki | 0.88 | 0.73 |
pflwiki | 0.98 | 0.79 |
piwiki | 0.00 | 0.00 |
pihwiki | 0.91 | 0.77 |
pmswiki | 0.94 | 0.69 |
pnbwiki | 0.80 | 0.53 |
pntwiki | 0.93 | 0.81 |
pswiki | 0.76 | 0.47 |
Thu, Mar 16
24/24 models were trained successfully in the 13th round of wikis.
Wed, Mar 15
@kostajh, we published datasets for all 21/23 models that passed the evaluation in this round.
Tue, Mar 14
The conclusion on the backtesting results is that most of the languages look fine besides:
- mnwwiki's precision and recall are bad. Both are nil.
- mnwiki (0.72) and newiki (0.74) have a precision slightly lower than the recommended one (0.75).
- mlwiki's precision (0.69) and recall (0.14) are lower than the recommended one (0.75 and 0.2 respectively).
- mywiki has a low precision (0.63) and very low recall (0.06).
Mon, Mar 13
Sat, Mar 11
Thu, Mar 9
@Sgs, yes, the number of links in a wiki affects how the model performs.
Wed, Mar 8
Model evaluation has been completed and below are the backtesting results:
23/23 models were trained successfully in the 12th round of wikis.
Mon, Mar 6
Fri, Mar 3
@kostajh, we published datasets for all 21/22 models that passed the evaluation in this round.
Mar 2 2023
The conclusion on the backtesting results is that most of the languages look fine besides:
Mar 1 2023
Model evaluation has been completed and below are the backtesting results:
Feb 28 2023
The lrcwiki pipeline was failing during the spark job with the message poinValueError: RDD is empty.
Feb 27 2023
21/22 models were trained successfully in the 11th round of wikis.
Feb 23 2023
@kostajh, we published datasets for all 17/19 models that passed the evaluation in this round.
Feb 22 2023
kywiki has been added to wikis that will be deployed in the 11th round T308136
The Armenian sentence tokenization bug has been fixed in T327371#8631149
Feb 21 2023
The conclusion on the backtesting results is that most of the languages look fine besides:
- klwiki (0.74), and kmwiki (0.70) have a precision that is slightly lower than the recommended one (0.75).
- krcwiki has a low precision (0.65).
- koiwiki has a low recall (0.13).
Feb 20 2023
Model evaluation has been completed and below are the backtesting results:
18/19 models were trained successfully in the 10th round of wikis.
@MGerlach, thank you for the recommendations. I have tested the fix locally and the hywiki training pipeline completed successfully.
Feb 16 2023
Feb 13 2023
Feb 10 2023
Feb 9 2023
Feb 8 2023
Feb 7 2023
A unique URL has been added for each WikiGPT search result.
Thank you for working on this, @isarantopoulos!
Feb 6 2023
@isarantopoulos, I could not find Chris' username on Toolforge.
Feb 2 2023
An explainability section has been added to WikiGPT.
Feb 1 2023
A login page has been added to WikiGPT.
Jan 31 2023
WikiGPT is now up and running. You can see it here: https://wiki-gpt.toolforge.org/
Jan 26 2023
Jan 25 2023
Jan 24 2023
@kostajh, we published datasets for all 19/21 models that passed the evaluation in this round.
Thank you for filing this @kostajh. I will work on periodically updating the link recommendation models.
Jan 23 2023
The conclusion on the backtesting results is that most of the languages look fine besides:
Model evaluation has been completed and below are the backtesting results: