Currently images uses the filename as alt text both on file pages and in the media viewer. It would be beneficial for users using screen readers if the Captions was used in its place, this would also enable the alt text to being localized.
Please no. Captions are not, generally, suitable for use as alt attribute text. Please consult with screen reader users and/or accessibility professionals.
 and neither are file names.
As I mentioned during the office hours, it would be sensible to make use of structured data captions in the projects. But I obviously didn't make clear enough that there are two very different and distinct pieces of text associated with images in Wikipedias.
One piece of text is the alt text, which is read by a screen reader or displayed if images are turned off in a browser. This is an essential part of accessibility, and its purpose is attempt to convey what the image looks like for those who cannot see the image. If you have an image of a bridge, then the alt text will say that it is an image of a bridge.
The other piece of text is the caption, which is read by everybody, including screen readers (so it doesn't duplicate the alt text because we don't want screen readers to be repeating the same information). A caption should not describe the image - either you can see it (and don't need to be told what you can see) or you can see/hear the alt text (with the same result). The caption brings the observer's attention to a salient point in the image, or links it into the text of the article. If you have an image of a bridge, then the caption might say that it's a hybrid cable-stayed/suspension bridge in New York City, or that the image shows how the bridge looked in 2005 - things that you don't get just by looking at the image.
Alt text is consistent across all uses of the image and it would be useful to have that stored with the image on Commons as structured data.
Captions will vary according to the article and use within the article, although there will often be common elements that could be stored with the image on Commons as structured data.
What would be ideal is two distinct pieces of structured data on Commons: one for alt text; and one for the common elements of captions. We only have one piece of data at present: the structured data captions. If these are going to be used for captions, then you can't use them as alt text, and vice-versa. But you can't have a mix of the two: that would lead to confusion and almost certain rejection of the data by potential users.
I agree, alt text and caption have both so different purpose they can't be interchanged.
I am not familiar with screen readers, nor I'm a professional, but I can imagine how it works (read it with your car satellite navigation voice ;D):
[[File:GPMonaco2018-17.jpg|Monaco GP 2018|alt = Monaco GP 2018]]
Follows an image depicting ... Monaco GP 2018 ... with a title ... Monaco GP 2018.
[[File:GPMonaco2018-13.jpg|Monaco GP 2018|alt = Daniel Ricciardo crossing the finish line in his Aston Martin formula car followed by two other cars driven by Sebastian Vettel and Lewis Hamilton]]
Follows an image depicting ... Aston Martin formula car driven by Daniel Ricciardo is crossing the finish line followed by two other cars driven by Sebastian Vettel and Lewis Hamilton ... with a title ... Monaco GP 2018.
(example translated from my home wiki article Formula One and little bit changed to be a good example of good v bad practise)