Epic: Support custom Han characters on Chinese Wikisource
Open, MediumPublic
Actions

Assigned To

None

Authored By

	kaldari
	Dec 23 2016, 7:54 PM

Description

Also proposed in Community-Wishlist-Survey-2016. Received 53 support votes, ranked #20 out of 265 proposals. View full proposal with discussion and votes here

Han-characters are widely used in East Asia (China, Taiwan, Singapore, Malaysia, Hong Kong, Japan, Korea, Taiwan and Vietnam). An enduring problem unsolved for digital archiving is "lacking of characters". Not only for characters in ancient books, even modern publications lacks for characters ( i.e. Some authors may created 300-400 unique new characters in certain books). It's difficult to deal when we archive them into WikiSource. Unicode gradually add new characters into the chart, but new Uni-han extension always takes time to go live. In the past WikiSource, even Wikipedia, used to deal this problem with image files to present those characters. But images cannot be indexed, unsearchable, even not exchangeable between computer systems.

Unicode IDS - Ideographic Description Sequence - defined how to composite Han character with components. We implement the function to dynamically render Han character with Ideographic Description Sequences (IDS) and extension in WikiSource like: <ids>⿺辶⿴宀⿱珤⿰隹⿰貝招</ids> It will generate a Han character image file (now rendered on the temporary server on wmflabs ) with IDS in metadata. This is a solution to resolve lacking of Han characters problem on all C/J/K/V books. The basis is that Han characters are not as the same level as European alphabets, but words. Han characters are an open set. They are composited on 2 dimension by more basic components which owns basic element, like "affix" in English (English words are composite on 1 dimension). In academies, components based Han character composite technology are developed and adapted to handle ancient Han books. The most famous are Academia Sinica's development and cbeta Sutras plan. Recent years, opensource IDS renders are developed stable, so we can use the same technology to benefit Wikisource for handling Han ancient books as the same as those academies.

Related Objects
Search...

Status	Assigned	Task
Open	None	T154044 Epic: Support custom Han characters on Chinese Wikisource
Resolved	Samwilson	T153796 Investigation: Create new Han characters with IDS extension for Wikisource
Declined	None	T153989 Get mirror of IDS Extension repository set up in Gerrit/Diffusion
Declined	None	T137786 Deploy IDS extension to zh.wikisource
Declined	None	T148693 Deploy IDS rendering engine to production
Resolved	divadsn	T154043 Add ability to configure the web service endpoint in the IDS extension

Event Timeline

kaldari created this task.Dec 23 2016, 7:54 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 23 2016, 7:54 PM

kaldari renamed this task from Support custom Han characters on Chinese Wikisource to Epic: Support custom Han characters on Chinese Wikisource.Dec 23 2016, 7:57 PM

kaldari added subtasks: T153796: Investigation: Create new Han characters with IDS extension for Wikisource, T153989: Get mirror of IDS Extension repository set up in Gerrit/Diffusion, T137786: Deploy IDS extension to zh.wikisource, T148693: Deploy IDS rendering engine to production, T154043: Add ability to configure the web service endpoint in the IDS extension.

kaldari moved this task from New & TBD Tickets to Older: Tracking Work by Others on the Community-Tech board.Dec 23 2016, 10:18 PM

kaldari triaged this task as Medium priority.Dec 23 2016, 11:05 PM

kaldari added a project: All-and-every-Wikisource.

kaldari added a subscriber: Shoichi.

kaldari created subtask T154064: Investigation: What would be the best way to support loginwiki from LoginNotify.Dec 24 2016, 12:11 AM

kaldari removed a subtask: T154064: Investigation: What would be the best way to support loginwiki from LoginNotify.Dec 24 2016, 12:13 AM

kaldari moved this task from Older: Tracking Work by Others to In Sprint 🏃‍♀️🏃‍♂️ on the Community-Tech board.

kaldari moved this task from In Sprint 🏃‍♀️🏃‍♂️ to Older: Tracking Work by Others on the Community-Tech board.

Liuxinyu970226 added a project: Chinese-Sites.Dec 24 2016, 12:23 AM

Liuxinyu970226 subscribed.

zhuyifei1999 updated the task description. (Show Details)Dec 24 2016, 5:12 AM

Liuxinyu970226 added a project: Epic.Dec 25 2016, 2:55 AM

Shizhao moved this task from Backlog to Site configuration on the Chinese-Sites board.Dec 27 2016, 2:43 AM

Shizhao added a project: Wikisource-Community-User-Group.Dec 27 2016, 7:49 AM

@Shizhao: Why did you add the Wikisource-Community-User-Group tag? If you think this is a non-coding task, please elaborate... Reverting.

Liuxinyu970226 added a project: Community-Wishlist-Survey-2016.Jan 1 2017, 3:13 AM

Liuxinyu970226 removed a project: Community-Wishlist-Survey-2016.Jan 2 2017, 12:54 AM

Shizhao added a project: Community-Wishlist-Survey-2016.Jan 3 2017, 3:24 AM

@Shizhao: Did you talk to the Community-Tech team before adding that tag?

Filip closed subtask T154043: Add ability to configure the web service endpoint in the IDS extension as Resolved.Jan 3 2017, 5:13 PM

In T154044#2912438, @Aklapper wrote:

@Shizhao: Did you talk to the Community-Tech team before adding that tag?

I'm afraid she doesn't...

Nemo_bis subscribed.Jan 31 2017, 7:48 AM

Nemo_bis added a project: I18n.Jan 31 2017, 7:57 AM

srishakatux moved this task from Backlog to Wishlist 11-30 (needs owner) on the Community-Wishlist-Survey-2016 board.Feb 6 2017, 7:59 AM

srishakatux updated the task description. (Show Details)Feb 11 2017, 1:33 AM

This task was proposed in the Community-Wishlist-Survey-2016 and in its current state needs owner. Wikimedia is participating in Google Summer of Code 2017 and Outreachy Round 14. To the subscribers -- would this task or a portion of it be a good fit for either of these programs? If so, would you be willing to help mentor this project? Remember, each outreach project requires a minimum of one primary mentor, and co-mentor.

I'm not sure how likely it is that the rendering engine will be security-reviewed any time soon, so is it an option to move ahead with deploying the IDS extension and for it to continue to use the existing Tool Labs rendering service?

This would require a caching layer to be added to the extension, so that not every request is resulting in a request to Labs. Is this something that we should work on? If the rendering engine is moved onto a production server (T148693) some time in the future, having a in-wiki caching system would still be worthwhile.

I don't think we want to have a production extension dependent on a Tool Labs service. It would probably make sense to set the service up on the scaling cluster (similar to graphoid), i.e. sca1XXX in eqiad. It would need to be security reviewed first.

Makes sense.

In that case, it sounds like things might be waiting on the security review. @Shoichi has added some translations (comments only, i think) to the Java code, but perhaps there's more to do. The plan is not currently to translate the whole codebase, but just to add English comments throughout. Is this going to be sufficient for reviewing?

Hello everyone, about renderer codereview, I posted https://phabricator.wikimedia.org/T154044

In T154044#3060261, @Samwilson wrote:

I'm not sure how likely it is that the rendering engine will be security-reviewed any time soon, so is it an option to move ahead with deploying the IDS extension and for it to continue to use the existing Tool Labs rendering service?

This would require a caching layer to be added to the extension, so that not every request is resulting in a request to Labs. Is this something that we should work on? If the rendering engine is moved onto a production server (T148693) some time in the future, having a in-wiki caching system would still be worthwhile.

After researching (also discussion with upstream author), about cache, the good solution is putting a Squid in front of IDS rendering server. Just use Squid as the cache server. Cache putting in server side,will make sense : multi wiki sites requests may highly repeat. It is possible that some missing character may be highly used in different sites. Caching in server side should be better than caching in each wiki sites by themselves.

After researching (also discussion with upstream author), about cache, the good solution is putting a Squid in front of IDS rendering server. Just use Squid as the cache server. Cache putting in server side,will make sense : multi wiki sites requests may highly repeat. It is possible that some missing character may be highly used in different sites. Caching in server side should be better than caching in each wiki sites by themselves.

This probably isn't a realistic option as there aren't any caching servers available for Tool Labs.

In T154044#3090170, @kaldari wrote:

This probably isn't a realistic option as there aren't any caching servers available for Tool Labs.

I think the idea would be to have Squid in front of a production IDS rendering server. Which would work, I think? If so, then really the big next step in getting this resolved is to finish translating the code in han3_ji7_tsoo1_kian3 and get it read for security review. (Same for the extension, but it's so simple — especially if it doesn't need to incorporate caching — and it can be done after; the Java bit is the hard bit.)

In T154044#3090176, @Samwilson wrote:

In T154044#3090170, @kaldari wrote:

This probably isn't a realistic option as there aren't any caching servers available for Tool Labs.

I think the idea would be to have Squid in front of a production IDS rendering server. Which would work, I think? If so, then really the big next step in getting this resolved is to finish translating the code in han3_ji7_tsoo1_kian3 and get it read for security review. (Same for the extension, but it's so simple — especially if it doesn't need to incorporate caching — and it can be done after; the Java bit is the hard bit.)

I think I have problem, about T148693, My team have translated almost the whole web/net source code of han3_ji7_tsoo1_kian3 . The leftover are about graphics rendering,so can someone do security review first? Where should I apply?

In T154044#3090257, @Shoichi wrote:

so can someone do security review first? Where should I apply?

See https://www.mediawiki.org/wiki/Review_queue#Preparing_for_deployment

In T154044#3090912, @Aklapper wrote:

In T154044#3090257, @Shoichi wrote:

so can someone do security review first? Where should I apply?

See https://www.mediawiki.org/wiki/Review_queue#Preparing_for_deployment

Thank you. I am going to study the procedure ,and go to next step.

Shizhao added a project: Wikimedia Taiwan.Jul 31 2017, 3:11 AM

C933103 subscribed.Nov 11 2017, 8:49 PM

Liuxinyu970226 awarded a token.Dec 17 2017, 11:06 AM

• TBolliger removed a project: Community-Tech.Feb 7 2018, 12:16 AM

Aklapper closed subtask T137786: Deploy IDS extension to zh.wikisource as Declined.Jun 17 2021, 2:59 PM

Aklapper closed subtask T148693: Deploy IDS rendering engine to production as Declined.

Kizule closed subtask T153989: Get mirror of IDS Extension repository set up in Gerrit/Diffusion as Declined.Nov 16 2022, 1:55 AM

Restricted Application added subscribers: Ericliu1912, Stang. · View Herald TranscriptNov 16 2022, 1:55 AM

Winston_Sung moved this task from Untriaged to Design research required on the I18n board.Aug 9 2023, 2:32 PM

Winston_Sung moved this task from Backlog to Tech on the Wikimedia Taiwan board.Thu, Jul 25, 5:35 AM

	F6353091: IDS_scheme.png
	Mar 9 2017, 2:51 AM

Epic: Support custom Han characters on Chinese WikisourceOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Epic: Support custom Han characters on Chinese Wikisource
Open, MediumPublic
Actions

Related Objects
Search...