Hi.
We are students and a teacher of computing. As that is our first programming contribution to Wikimedia, we need help to integrate an improvement in the Basque Wikipedia editor.
Adding a reference when editting Basque Wikipedia is very easy when it is possible to extract it from an URL. The editor automatically extracts all information by just entering the URL.
But it often misseparates the author's name(s) and surname(s) when there is a second surname (in English it is not used, but in Basque sometimes two surnames are used; in addition, in a surname there may be more than one word, i.e., López de Agirre). For example: If the author of a book is "Olatz Arbelaitz Gallego" it recognices "Olatz Arbelaitz" as name and "Gallego" as surname ("Olatz Arbelaitz ; Gallego") and not "Olatz ; Arbelaitz Gallego", that is the correct distribution.
It is difficult to distinguish names and surnames. The programme now being used in Wikipedia always takes the last word as the surname and all the previous wocreatedrds as the name. Of course, the result is not always correct in Basque, it makes many mistakes. We have collected many names and surnames and trained a Python program to correctly distribute the names and surnames of an author (https://github.com/EkhiAzur/WikipediaNameProblem). We evalusted it and it goes much better.
Question: How can we integrate this program into the Wikipedia editor?
Please, will anyone help us?
Some examples:
Name | Correct distribution | Distribution given by the current version in Wikipedia | Output of our program |
Olatz Arbelaitz Gallego | Olatz ; Arbelaitz Gallego | Olatz Arbelaitz ; Gallego | Olatz ; Arbelaitz Gallego |
Ekhi Azurmendi Arrue | Ekhi ; Azurmendi Arrue | Ekhi Azurmendi ; Arrue | Ekhi ; Azurmendi Arrue |
Arantza Diaz de Ilarraza | Arantza ; Diaz de Ilarraza | Arantza Diaz de ; Ilarraza | Arantza ; Diaz de iLarraza |
Patxi Angulo Perez | Patxi ; Angulo Perez | Patxi Angulo ; Perez | Patxi ; Angulo Perez |
Arantza Diaz de Ilarraza | Arantza ; Diaz de Ilarraza | Arantza Diaz de ; Ilarraza | Arantza ; Diaz de Ilarraza |
Arantza Diaz de Ilarraza Sanchez | Arantza ; Diaz de Ilarraza Sanchez | Arantza Diaz de Ilarraza ; Sanchez | Arantza ; Diaz de iLarraza Sanchez |
Francisco Xabier Albizuri Irigoyen | Francisco Xabier ; Albizuri Irigoyen | Francisco Xabier Albizuri ; Irigoyen | Francisco Xabier ; Albizuri Irigoyen |
Francisco Xabier Albizuri | Francisco Xabier ; Albizuri | Francisco Xabier ; Albizuri | Francisco Xabier ; Albizuri |
Jose Luis Alvarez | Jose Luis ; Alvarez | Jose Luis ; Alvarez | Jose Luis ; Alvarez |
Jose Luis Alvarez Enparantza | Jose Luis ; Alvarez Enparantza | Jose Luis Alvarez ; Enparantza | Jose Luis ; Alvarez Enparantza |
Arantza Irastorza Goñi | Arantza ; Irastorza Goñi | Arantza Irastorza ; Goñi | Arantza ; Irastorza Goñi |
María Arantza Irastorza Goñi | María Arantza ; Irastorza Goñi | María Arantza Irastorza ; Goñi | María Arantza ; Irastorza Goñi |