There has been a month and still no updates to the vision release notes.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 25 2024
Mar 19 2024
Feb 26 2024
In T355763#9575821, @PerfektesChaos wrote:A key in a JSON object is always a string. Therefore a number is impossible. However a Lua table needs to be a JSON object if any not one-based array shall be mapped.
Only 1-based arrays can be mapped directly as JSON array ↔ Lua sequence table.
A Lua table (mapping object) is permitted to use any data type as key, even boolean and floating point numbers. Even worse, also a table. And all data types may be mixed as keys within one table.
If you have a Lua object (table) with int keys 0, 1, 2 you need to convert these keys in JSON as "0", "1", "2". On backward conversion you have the choice in Lua whether you want string keys or number keys.
The documentation states:
Feb 23 2024
Please replicate the bug with the following code in debug console in Commons:
Jan 29 2024
Any response?
Jan 24 2024
} elseif ( $isEncoding && ctype_digit( $k ) ) { // json_decode currently doesn't return integer keys for {} $isSequence = $next++ === (int)$k; } else {
Jan 19 2024
Great! Hope it will solve the problem.
Is there anyone know how to let Google change this?
Jan 8 2024
In T354500#9440179, @Jmabel wrote:What this means is that where the generated HTML for the category link is currently <a href="/wiki/Category:BAR" title="FOO">BAR</a>, @wmr would like to be able to have <a href="/w/index.php?title=Category:BAR&filefrom=FOO" title="FOO">BAR</a>. It would be pretty easy to have a client-side gadget to do this for those who want it.
Dec 1 2023
In T332125#9369617, @Samwilson wrote:PR for re-sending the full image data: https://github.com/wikimedia/wikimedia-ocr/pull/120
@wmr the other issues you raise here are not related to the current task, could you please create new tasks for these if you think they need addressing? Thanks!
Aug 30 2023
In T332125#9129586, @Soda wrote:In T332125#9129394, @wmr wrote:The solution would be easy. Just write a bot, download a PDF from commons, and convert the file to jpg locally. Upload every jpg to Google, get the OCRed text, and use the bot put text to Wikisource.
It would only require two parameters for users to input: filename of pdf (or djvu) and the target Wikisource domain name (like zh.wikisource.org). The user should be autoconfirmed user in the target Wikisource and should confirm that they think the quality would OK (avoid handwritten manuscript that would have bad OCR quality).
Mass OCR is explicitly forbidden quite a lot of language wikisources
The solution would be easy. Just write a bot, download a PDF from commons, and convert the file to jpg locally. Upload every jpg to Google, get the OCRed text, and use the bot put text to Wikisource.
Jul 3 2023
In T332125#8966223, @Samwilson wrote:In T332125#8964021, @wmr wrote:A limitation of Google OCR has been found: it cannot recognize punctuation marks outside vertical lines. This is a common typesetting practice during the Chinese Republican era. For example, for this image, no punctuation marks were recognized. Are there any options available on Google to recognize them?
This is out of topic. But where should I report this? Is this count as bug?
It is (sort of) a bug, but there's nothing we can do about it, as it exists wholly within Google's service. The API docs are here: https://cloud.google.com/vision/docs/reference/rest/v1/Feature — there's not much in the way of configurability for text detection, beyond languageHints[].
In T332125#8966223, @Samwilson wrote:In T332125#8964021, @wmr wrote:A limitation of Google OCR has been found: it cannot recognize punctuation marks outside vertical lines. This is a common typesetting practice during the Chinese Republican era. For example, for this image, no punctuation marks were recognized. Are there any options available on Google to recognize them?
This is out of topic. But where should I report this? Is this count as bug?
It is (sort of) a bug, but there's nothing we can do about it, as it exists wholly within Google's service. The API docs are here: https://cloud.google.com/vision/docs/reference/rest/v1/Feature — there's not much in the way of configurability for text detection, beyond languageHints[].
Jun 26 2023
A limitation of Google OCR has been found: it cannot recognize punctuation marks outside vertical lines. This is a common typesetting practice during the Chinese Republican era. For example, for this image, no punctuation marks were recognized. Are there any options available on Google to recognize them?
I think the problem lies with Wikimedia Commons being slow to respond. If someone manually opens a rarely accessed book on browser and randomly selects a page to view, the server might not display it immediately; it may take some time. The server should extract pages from PDF files, cache them as image files, and then display them. For OCR, there should be dedicated tools to download the entire PDF file, convert it to images using those tools, and then send them to Google for OCR processing.
Jun 5 2023
I want to run many things at the same time. Thanks.
May 31 2023
Yes. We need a lot of tech workforce in zhws.
In T337707#8888579, @Xover wrote:Ah. I found it. zhWS used to import an old old version of the pagenumbers script from enWS, but it was removed last November by @WikiBayer apparently in response to a steward request. Probably because that script on enWS is very very old and has not worked for a very long time. enWS has migrated to an actual Gadget (MediaWiki:Gadget-PageNumbers.js, MediaWiki:Gadget-PageNumbers-core.js, and MediaWiki:Gadget-PageNumbers-core.css).
In T337707#8887945, @Aklapper wrote:@wmr: Are all other Wikisource sites also affected by this?
May 30 2023
Google actually OCR every image pdf it indexes. See the cache pages for
Google OCR cannot recognize punctuations out of line in Chinese verticle text.
Dec 6 2022
@fnegri Done.
Now tw/pdf is under 100G. I shall close this issue.
May 7 2022
Sep 20 2020
Can anyone add it please?
Sep 3 2020
Aug 31 2020
Aug 22 2020
Jun 5 2020
The function of removing space from line breaks is still badly needed in Chinese Wikisource. Line breaks are kept to help proofreading.
Dec 28 2019
In T238476#5717151, @Zoranzoki21 wrote:Deployed!
Dec 2 2019
Nov 17 2019
Nov 16 2019
Sep 24 2019
Aug 9 2019
In T229715#5402325, @Urbanecm wrote:Deployed.
Jul 24 2019
Dec 5 2016
In T60729#2845924, @Samwilson wrote:On English Wikisource, it's policy to remove mid-paragraph line breaks.
In T60729#2845938, @Samwilson wrote:If it's part of ProofreadPage, the message should probably be something more specific e.g. proofreadpage-page-separator. Sound okay?
The addition of space between lines is for all pages regardless of namespace. So fixing this problem should be involving changing somewhere other than this proofreading extension.
Thank you.
In T60729#2845375, @Aklapper wrote:@wmr: No progress yet because nobody has written a patch yet. You are very welcome to use developer access to submit a proposed code change as a Git branch directly into Gerrit which makes it easier to review it quickly and provide feedback. Thanks!
Dec 4 2016
In T60729#1099802, @Billinghurst wrote:Seems that there needs to be a configurable option for a wiki to have a space or not to have a space between transcluded pages. Presumably set in the MW: namespace.
Why no progress so far?
Nov 26 2014
Why no progress so far?
In T75967#787322, @GOIII wrote:Duplicate of T60729 ?