Tuesday, 5 November 2013

South Asian Scripts I - Devanagari

Hi all . As am currently going through the book " Unicode Standard Version 6.1 Core Specification - Chapter 9", the upcoming blogs can be truely informative for the South Asian scripts namely Devanagari, Gujarati, Kannada & Malayalam. It may come in series format just like the present one is specifically for devanagari.

  • Background Information : 
As most of us know, the unicode standard provide programmers with a single universal character encoding & a vast amount of data about how characters functions.But as we are dealing with more complex scripts we have another standard to follow Indian Standard Code for Information Interchange (ISCII). Most of scripts of South Asia are derived from ancient Brahmi scripts and therefore share many structural characteristics. Implementation should ensure that adequate attention is given to the actual behaviour of those scripts.

  • About Devanagari :

  1. Standards :
The Devanagari block of unicode Std is based on ISCII-1988.

   2.  Encoding Principles :

The writing systems constitute cross between syllabic & alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable, consisting of consonant & vowel (CV) core & optionally with a canonical structure of (((C)C)C)V.


    3.  Rendering Devanagari :


>>Rules For Rendering :

R1
When nominal consonant preceeds a VIRAMA, it is considered to be a dead consonant . A consonant that does not precede VIRAMA is considered to be a live consonant .
                       TAn + VIRAMAn -> TAd
                       त      + ्              -> त्

R2
If ra+virama precedes a consonant , then it is replaced by superscript nonspacing mark "repha".
                       RAd + KAl -> KAl + RAsup
                       र्      +  क   ->  क    + र्          -> र्क


R3
If the "repha" is to be applied to a dead consonant & that dead consonant is combined with another consonant to form a conjunct , then the mark will be applied to the conjunct ligature form as a whole .

                       RAd + JAd+ NYAn -> J.NYAn +RAsup
                       र्      + ज्    + ञ        ->  ज्ञ         +र्        -> र्ज्ञ
R4
If the "repha" is to be applied to a dead consonant that is subsequently replaced by its half-consonant form, then the mark will get applied to the base of consonant cluster.
                       RAd + GAd + GHAl -> GAh + GHAl + RAsup
                       र्      +  ग्      +  घ       ->  ग्     +  घ         +   र्        -> र्ग्घ

R5
In conformance with ISCII std , the half-consonat form is represented as eyelash-RA . This form of RA is commonly used in writing Marathi .
                      RRAn + VIRAMAn + YAn -> RRAh
                      ऱ         + ्              +  य      ->  ऱ्य

                      RAd + ZWJ + YAn-> RAh
                      र्      +  ‍       +  य    ->  ऱ्य
                      
R6
Except for dead consonant RA, when a dead consonant precedes the live consonant RA,then dead consonant is replaced with its nominal form, and RA is replaced by subscript RAsub, which applies to nominal form.
                       TTHAd + RAl -> TTHAn + RAsub
                        ठ्          + र     ->  ठ          + ्र          -> ठ्र

R7
For certain consonants, the mark RAsub may graphically combine with the consonant to form a conjuncts.
                       PHAd + RAl -> PHAn + RAsub
                       फ्        + र     ->  फ          + ्र          -> फ्र
                     

R8
If a dead consonant (other than RAd) precedes RAd, then the substitution of RA for RAsub is performed ; however, the VIRAMA that formed RAd remains to form a dead consonant conjuct form.
                       TAd + RAd -> TAn + RAsub + VIRAMAn -> T.RAd
                        त्     + र      ->  त     + ्र         + ्              -> त्र ्


R9
The nukta sign, which modifies a consonant  form, is attached to that consonant in rendering. If the consonant represents a dead consonant, then NUKTA should precede VIRAMA .
                       KAn + NUKTAn + VIRAMAn -> QAd
                       क     + ़             + ्              ->  क़्         


R10
Other Modifying marks , in particular bindus , apply to the orthographic syllable as a whole. The bindus should follow any vowel signs. The relative placement of these marks is horizontal rather than vertical; the horizontal rendering order may vary according to typographic concerns.
                        KAn + AAvs + CANDRABINDUn
                        क     + ा      + ँ               ->  काँ      

R11
If a dead consonant immediately precedes another dead consonant or a live consonant, then the first dead consonant may join the subsequent element to form a two-part conjunct.
                        JAd + NYAl ->  J.NYAn
                        ज्    +  ञ       ->  ज्ञ          

                        TTAd + TTHAl -> TT.TTHAn
                         ट्       + ठ          ->  ट्ठ        

R12
A conjunct ligature form can itself behave as a dead consonant & enter into further, more complex ligatures. A conjunct ligature form can also produce a half-form.
                       SAd + TAd + RAn -> SAd + T.RAn -> S.T.RAn
                        स्    +  त्     + र      ->  स्     +  त्र         -> स्त्र

 R13
If a nominal consonant or conjunct ligature form precedes RAsub as a result of the application of rule R6, then the consonant or ligature form may join with RAsub to form a multi-part conjunct ligature.
                        KAn + RAsub -> K.RAn
                        क     + ्र          -> क्र        
 

R14
In some cases, other combining marks will combine with a base consonant, either attaching at a nonstandard location or changing shape. In minimal rendering there are only two cases : RA live with Uvowelsign or UUvowelsign.
                       RAl + Uvs -> RUn
                        र    +  ु    -> रु         

R15
When the dependant vowel Ivs is used to override the inherent vowel of a syllable, it is always written to the extreme left of the orthographic syllable.
                       TAd + RAl + lvs -> T.RAn + lvs -> lvs + T.RAd
                        त्     + र     + ि  ->  त्र        +  ि  -> त्रि

R16
The presence of an explicit virama blocks this reordering, and the dependant vowel is rendered after the rightmost such explicit virama.
                       TAd + ZWNJ + RAl + lvs -> TAd + lvs + RAl
                       त्     + ‌           + र     + ि  ->  त् रि

These sixteen rules for rendering of devanagari strengthen devanagari script . & also i would like to mention the currently progressing lohit devanagari on github [1] supports all of these rules & rendering principles as well .


1. https://github.com/pravins/lohit/tree/master/devanagari

No comments:

Post a Comment