Original PDF Flash format N3294-Ö-ç-è-ì-Ñ-Ö-ç-ì-•-Ä-á-Ä-Ç-É-â-Ç-Ã¥-í-ä  


N3294 Ö ç è ì Ñ Ö ç ì • Ä á Ä Ç É â Ç å í ä

ISO/IEC JTC1/SC2/WG2 N3294
L2/07-234
2007-07-30
Universal Multiple-Octet Coded Character Set
International Organization for Standardization
Organisation Internationale de Normalisation
Международная организация по стандартизации
Doc Type: Working Group Document
Title:

Preliminary proposal to encode the Book Pahlavi script in the BMP of the UCS
Source:
Michael Everson, Roozbeh Pournader, and Desmond Durkin-Meisterernst
Status:
Individual Contribution
Action:
For consideration by JTC1/SC2/WG2 and UTC
Replaces: N2556
Date:

2007-07-30
1. Introduction. “Pahlavi” is a term used in two senses. As a term for language, it means Zoroastrian or
Sasanian Middle Persian. As a term for writing systems. the word is used to describe the scripts used to
write religious and secular Sasanian Middle Persian and closely similar material, such as epigraphic
Parthian and Middle Persian. Three “Pahlavi” scripts are distinguished: Inscriptional Pahlavi, Psalter
Pahlavi, and Book Pahlavi. All of these derive from Imperial Aramic.
Book Pahlavi is an alphabetic script with an incomplete representation of vowels. The original Imperial
Aramaic script’s development is marked by a progressive loss of differentiation of many letters. Thus the
Imperial Aramaic letters Ö WAW, ç NUN, è AYIN, and ì RESH coalesced into one simple vertical stroke, in
Book Pahlavi written Ñ. (In Inscriptional Pahlavi only Ö WAW, ç NUN, and ì RESH coalesce into •.) The
Imperial Aramaic letters Ä ALEPH and á HET coalesced into Book Pahlavi Ä; the Imperial Aramaic letters
Ç GIMEL, É DALETH, and â YODH coalesced into Book Pahlavi Ç; and (less importantly) the Imperial
Aramaic letters å MEM and í QOPH coalesced into Book Pahlavi ä.
In addition to the confusion caused by this lack of differentiation, Book Pahlavi made use of many
ligatures, and often their readings are quite ambiguous so that a number of readings may be possible for a
given word-form. The deterioration and resulting ambiguity has had important consequences both for the
scribes who copied the texts and for modern editors trying to edit them. The scribes, unable to read with
certainty the manuscripts they were copying, introduced further confusion into already difficult texts.
Modern editors often cannot arrive at what the text originally contained. Each editor can only offer his
interpretation of what is written and though theoretically only one interpretation should be correct no
editor can prove his interpretation to the exclusion of others based on the written forms alone. Some parts
of Pahlavi literature are quite simply irrecoverable despite being transmitted in manuscripts.
The response to this ambiguity was the development of the Avestan script. Avestan used modified
letterforms to distinguish between signs which in Pahlavi had fallen together. Indeed, while Pahlavi
words are commonly used within Avestan texts, it was often the case that the scribes had no idea what the
values of the words were, because so many of the letters’ shapes were identical. The notion of
archigrapheme has been found to be useful in proposing several of the Pahlavi extension characters: it is
often quite clear that the scribes themselves were writing “a bowl and an ear” even though no such letter
exists in Pahlavi. These archigraphemes are proposed to be encoded in order to enable scholars to
represent the same kinds of ambiguities which the scribes themselves were writing. This is not the same
as encoding palaeographical variants.
2. Orthography. Book Pahlavi is a mixed orthographic system, making extensive use of Aramaic
spellings to represent the equivalent Middle Persian words. This heterographic principle is similar to the
use of Chinese characters together with Japanese syllabic signs to write Japanese or the use of Sumerian
1

words in Akkadian texts. For instance, ÄÜàä MLK’ writes Middle Persian sˇa¯h ‘king’, ÑààäÇ YMLLWN
writes Middle Persian go¯w ‘say!’ (the heterograms are transliterated with capital letters). As Japanese
does with its Han characters, phonetic complements are added to the heterograms to indicate a specific
ending: ÑÄÄÜàä MLK’’n for sˇa¯ha¯n ‘kings’ èÇÑÑààäÇ and YMLLWNyt for go¯we¯d ‘he says’.
Book Pahlavi orthography is also archaizing in that it uses historical spellings. For example, ÇÄà l’d
spells ra¯y ‘for’, ÜÑÇÄ ’ywk spells e¯k ‘one’, àÑÄÇ y’wl spells ja¯r ‘time’; this means that some letters seem
to shift values (d > y; y > j). In fact, the sound changes are simply not expressed by the letters which write
the underlying and, in most cases, historically earlier forms.
3. Processing. Book Pahlavi is written right-to-left. Usually words are separated with spaces; sometimes
avestan word separator is used to separate word. Book Pahlavi script has fully-developed joining
behaviour. The table below shows the joining forms.
[To be supplied. This table may prove to be extremely complex. There is a real question as to
whether there is joining behaviour in Book Pahlavi as in other RTL scripts, or if it is all a
massive set of ligatures.]
4. Book Pahlavi punctuation. Punctuation as in Avestan is used. There may be some additional triple
dots; more research is needed. Two logograms are used: an ABBREVIATION TAA ‘thus’ and the inverted
name of Ahreman, ‘the Evil Spirit’.
5. Book Pahlavi numbers. Book Pahlavi has its own numbers, which have right-to-left directionality.
Numbers are built up out of 1, 2, 3, 4, 10, 20, 40, 60, 80, and 100. The following is a list of numbers
attested in Book Pahlavi. The third column is displayed in visual order.
[To be supplied.]
6. Names and ordering. The names used for the Book Pahlavi characters are based on their Imperial
Aramaic analogues. The order of the characters in the code charts is their alphabetical order. The
historical characters GIMEL, DALETH, and YODH fell together into a single character, named here GIMEL-
DALETH-YODH. The same thing happened to ALEPH-HET, WAW-NUN-AYIN-RESH and MEM-QOPH respectively.
7. Consequences for encoding. Consequences for encoding Pahlavi arise from the first feature in
particular. The progressive loss of differentiation of many letters means that much Pahlavi writing is
ambiguous and some simply unintelligible. Though the Zoroastrian tradition, the editor’s experience and
reference to Manichaean Middle Persian can help resolve many difficulties there are many cases where
the same original text is read differently by as many editors as have worked on the text and all the
readings are and remain plausible interpretations of what is written in the manuscript. There are also
some cases where an editor despairs of reading a passage and has to reproduce the original shapes
without being able to venture a reading. Later editors may find the solution but there will always be
unresolved and irresolvable cases.
8. Unicode Character Properties. Character properties are proposed here. With regard to the combining
classes of the vowel signs, normalization should weight the combining marks in five groups:
10BA0;BOOK PAHLAVI LETTER ALEPH-HET;Lo;0;R;;;;;N;;;;;
10BA1;BOOK PAHLAVI LETTER BETH;Lo;0;R;;;;;N;;;;;
10BA2;BOOK PAHLAVI LETTER GIMEL-DALETH-YODH;Lo;0;R;;;;;N;;;;;
10BA3;BOOK PAHLAVI LETTER HE;Lo;0;R;;;;;N;;;;;
10BA4;BOOK PAHLAVI LETTER WAW-NUN-AYIN-RESH;Lo;0;R;;;;;N;;;;;
10BA5;BOOK PAHLAVI LETTER ZAYIN;Lo;0;R;;;;;N;;;;;
10BA6;BOOK PAHLAVI LETTER KAPH;Lo;0;R;;;;;N;;;;;
10BA7;BOOK PAHLAVI LETTER GHAPH;Lo;0;R;;;;;N;;;;;
10BA8;BOOK PAHLAVI LETTER LAMEDH;Lo;0;R;;;;;N;;;;;
2

10BA9;BOOK PAHLAVI LETTER LHAMEDH;Lo;0;R;;;;;N;;;;;
10BAA;BOOK PAHLAVI LETTER MEM-QOPH;Lo;0;R;;;;;N;;;;;
10BAB;BOOK PAHLAVI LETTER SAMEKH;Lo;0;R;;;;;N;;;;;
10BAC;BOOK PAHLAVI LETTER PE;Lo;0;R;;;;;N;;;;;
10BAD;BOOK PAHLAVI LETTER SADHE;Lo;0;R;;;;;N;;;;;
10BAE;BOOK PAHLAVI LETTER SHIN;Lo;0;R;;;;;N;;;;;
10BAF;BOOK PAHLAVI LETTER TAW;Lo;0;R;;;;;N;;;;;
10BB0;BOOK PAHLAVI ARCHIGRAPHEME EAR;Lo;0;R;;;;;N;;;;;
10BB1;BOOK PAHLAVI ARCHIGRAPHEME ELBOW;Lo;0;R;;;;;N;;;;;
10BB2;BOOK PAHLAVI ARCHIGRAPHEME BELLY;Lo;0;R;;;;;N;;;;;
10BBA;BOOK PAHLAVI COMBINING GIMEL;Mn;230;NSM;;;;;N;;;;;
10BBB;BOOK PAHLAVI COMBINING DALETH;Mn;230;NSM;;;;;N;;;;;
10BBC;BOOK PAHLAVI COMBINING YODH;Mn;220;NSM;;;;;N;;;;;
10BBD;BOOK PAHLAVI COMBINING SAMEKH;Mn;220;NSM;;;;;N;;;;;
10BBE;BOOK PAHLAVI COMBINING SHIN;Mn;230;NSM;;;;;N;;;;;
10BBF;BOOK PAHLAVI KASHIDA;Lm;0;AL;;;;;N;;;;;
10BC0;BOOK PAHLAVI ABBREVIATION TAA;Lo;0;R;;;;;N;;;;;
10BC1;BOOK PAHLAVI LOGOGRAM TURNED AHREMAN;Lo;0;R;;;;;N;;;;;
10BC5;BOOK PAHLAVI NUMBER ONE;No;0;R;;;;1;N;;;;;
10BC6;BOOK PAHLAVI NUMBER TWO;No;0;R;;;;2;N;;;;;
10BC7;BOOK PAHLAVI NUMBER THREE;No;0;R;;;;3;N;;;;;
10BC8;BOOK PAHLAVI NUMBER FOUR;No;0;R;;;;4;N;;;;;
10BC9;BOOK PAHLAVI NUMBER TEN;No;0;R;;;;10;N;;;;;
10BCA;BOOK PAHLAVI NUMBER TWENTY;No;0;R;;;;20;N;;;;;
10BCB;BOOK PAHLAVI NUMBER FORTY;No;0;R;;;;40;N;;;;;
10BCC;BOOK PAHLAVI NUMBER SIXTY;No;0;R;;;;60;N;;;;;
10BCD;BOOK PAHLAVI NUMBER EIGHTY;No;0;R;;;;80;N;;;;;
10BCE;BOOK PAHLAVI NUMBER ONE HUNDRED;No;0;R;;;;100;N;;;;;
10BCF;BOOK PAHLAVI NUMBER ONE THOUSAND;No;0;R;;;;1000;N;;;;;
9. Bibliography.
Akbarza¯deh, Da¯riyu¯sˇ. 2002 (1381 AP). Katibe-ha¯-ye Pahlavi-ye asˇka¯ni (Pa¯rti) = Parthian inscriptions.
Vol. II. Tehran: Pazineh Press. ISBN 964-5722-74-8
Ballhorn, Freidrich. 1864. Alphabete orientalischer und occidentalischer Sprachen. Neunte vermehrte
Auflage. Leipzig: F. A. Brockhaus.
de Harlex, C. 1880. Manuel du pehlevi des livres religieux et historiques de la perse. Grammaire,
anthologie lexique avec des notes, un fac-simile de manuscrit les alphabets et un spécimen des légendes
des sceaux et monnaies
. Paris: Maisonneuve et Cie.
Faulmann, Carl. 1990 (1880). Das Buch der Schrift. Frankfurt am Main: Eichborn. ISBN 3-8218-1720-8
Geldner, Karl F. 1880. Avesta: the sacred books of the Parsis. Stuttgart: W. Kohlhammer. Reprinted in
2003 with an introduction in Persian by Dr Jaleh Amouzgar.
Ko¯no Rokuro¯, Chino Eiichi, & Nishida Tatsuo. 2001. The Sanseido Encyclopaedia of Linguistics. Volume
7: Scripts and Writing Systems of the World [Gengogaku dai ziten (bekkan) sekai mozi ziten]. Tokyo:
Sanseido Press. ISBN 4-385-15177-6
MacKenzie, D. N. 1971. A concise Pahlavi dictionary. London: Oxford University Press.
Nyberg, Henrik Samuel. 1964 A manual of Pahlavi. Wiedbaden: Otto Harrassowitz. Reprinted 2003
Tehran: Asatir. ISBN 964-331-131-7, 964-331-132-5
Reichsdruckerei. 1951. Alphabete und Schriftzeichen des Morgen- und Abendlandes, zum allgemeinen
Gebrauch mit besonderer Berücksichtigung des Buchgewerbes. Unter Mitwerkung von Fachgelehrten
zusammengestellt in der Reichsdruckerei.
Berlin: Staatsdruckerei.
Skjærvø, P. Oktor. 1996. “Aramaic scripts for Iranian languages” in The World’s Writing Systems, ed.
Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press. ISBN 0-19-507993-0
10. Acknowledgements
This project was made possible in part by a grant from the U.S. National Endowment for the Humanities,
which funded the Universal Scripts Project (part of the Script Encoding Initiative at UC Berkeley). Any
views, findings, conclusions or recommendations expressed in this publication do not necessarily reflect
those of the National Endowment of the Humanities.
3

Figures
Figure 1. Table of Iranian alphabets, from Taylor 1883. The Arsacidan columns (III, IV) show Parthian
script, and the Sassanian columns (V, VI) show Inscriptional Pahlavi script. The Parsi column (VII)
shows Book Pahlavi. Column I shows Imperial Aramaic; column II shows Palmyrene; column VIII
shows Brahmi; column IX shows Armenian; and column X shows Georgian.
This table is given for historical interest. It stems from a period when the inscriptions were known but
could not yet be properly read—and that did not happen until the 1920s. Taylor’s table is therefore
inaccurate: under Parthian DALETH he has the letter TETH, the form of LAMEDH is wrong, and he gives no
SADHE. Under Middle Persian, TETH is missing and TAW has the wrong shape.
4

Figure 2. Table of Iranian alphabets, from from Nyberg 1964.
Book Pahlavi is given toward the right of the table. The Parthian inscriptions column refers to Parthian,
the Persian inscriptions column refers to Inscriptional Pahlavi. Psalter Pahlavi is also shown. Nyberg’s
table is slightly idiosyncratic and the shape of Parthian ALEPH is wrong.
5

Figure 3. Table of Iranian alphabets, from from Skjærvø 1996.
6

Figure 4. Table showing Imperial Aramaic, Inscriptional Parthian, Inscriptional Pahlavi, and Book
Pahlavi, from MacKenzie 1971.
7

Figure 5. Table of Iranian alphabets, showing Psalter Pahlavi on the left,
Book Pahlavi in the centre, and Inscriptional Pahlavi on the right, from Akbarzadeh 2002.
8

Figure 6. Book Pahlavi alphabet from the Reichsdruckerei 1951.
9

Figure 7a. Description of Book Pahlavi alphabet from a German source.
10

Figure 7b. Description of Book Pahlavi alphabet from a German source.
11

Figure 7c. Description of Book Pahlavi alphabet from a German source.
12

Figure 8a. Description of Book Pahlavi from de Harlez 1880.
13

Figure 8b. Description of Book Pahlavi from de Harlez 1880.
14

Figure 8c. Description of Book Pahlavi from de Harlez 1880.
15

Figure 8d. Description of Book Pahlavi from de Harlez 1880.
16

Figure 8e. Description of Book Pahlavi from de Harlez 1880.
17

Figure 8f. Description of Book Pahlavi from de Harlez 1880.
18

Figure 8g. Description of Book Pahlavi from de Harlez 1880.
19

Figure 8h. Description of Book Pahlavi from de Harlez 1880.
20

Proposal for the Universal Character Set
Michael Everson 2007-07-30
Row 10B: BOOK PAHLAVI
hex
Name
10B4
10B5
10B6
40
BOOK PAHLAVI LETTER ALEPH-HET
41
BOOK PAHLAVI LETTER BETH
42
BOOK PAHLAVI LETTER GIMEL-DALETH-YODH
43
BOOK PAHLAVI LETTER HE
0
Ä
ê

44
BOOK PAHLAVI LETTER WAW-NUN-AYIN-RESH
45
BOOK PAHLAVI LETTER ZAYIN
46
BOOK PAHLAVI LETTER KAPH
47
BOOK PAHLAVI LETTER GHAPH
48
BOOK PAHLAVI LETTER LAMEDH
1
Å
ë °
49
BOOK PAHLAVI LETTER LHAMEDH
4A
BOOK PAHLAVI LETTER MEM-QOPH
4B
BOOK PAHLAVI LETTER SAMEKH
4C
BOOK PAHLAVI LETTER PE
4D
BOOK PAHLAVI LETTER SADHE
4E
BOOK PAHLAVI LETTER SHIN
2
Ç
í
¢
4F
BOOK PAHLAVI LETTER TAW
50
BOOK PAHLAVI ARCHIGRAPHEME EAR
51
BOOK PAHLAVI ARCHIGRAPHEME ELBOW
52
BOOK PAHLAVI ARCHIGRAPHEME BELLY
53
(This position shall not be used)
3
É
£
54
(This position shall not be used)
55
(This position shall not be used)
56
(This position shall not be used)
57
(This position shall not be used)
58
(This position shall not be used)
59
(This position shall not be used)
4
Ñ
î
§
5A
BOOK PAHLAVI COMBINING GIMEL
5B
BOOK PAHLAVI COMBINING DALETH
5C
BOOK PAHLAVI COMBINING YODH
5D
BOOK PAHLAVI COMBINING SAMEKH
5E
BOOK PAHLAVI COMBINING SHIN
5
Ö
ï

5F
BOOK PAHLAVI KASHIDA
60
BOOK PAHLAVI ABBREVIATION TAA
61
BOOK PAHLAVI LOGOGRAM TURNED AHREMAN
62
(This position shall not be used)
63
(This position shall not be used)
64
(This position shall not be used)
6
Ü
ñ

65
BOOK PAHLAVI NUMBER ONE
66
BOOK PAHLAVI NUMBER TWO
67
BOOK PAHLAVI NUMBER THREE
68
BOOK PAHLAVI NUMBER FOUR
69
BOOK PAHLAVI NUMBER TEN
6A
BOOK PAHLAVI NUMBER TWENTY
7
á
ó
ß
6B
BOOK PAHLAVI NUMBER FORTY
6C
BOOK PAHLAVI NUMBER SIXTY
6D
BOOK PAHLAVI NUMBER EIGHTY
6E
BOOK PAHLAVI NUMBER ONE HUNDRED
6F
BOOK PAHLAVI NUMBER ONE THOUSAND
8
à
ò
®
9
â
ô
©
A
ä
ö
ˇ

B
ã õˇ ´
C
å úˇ ¨
D
ç
ù
ˇ ≠
E
é ûˇ
Æ
F
è
ü
Ø
21

A. Administrative
1. Title
Proposal to encode the Avestan script in the BMP of the UCS
2. Requester’s name
Michael Everson and Roozbeh Pournader
3. Requester type (Member body/Liaison/Individual contribution)
Individual contribution.
4. Submission date
2007-07-30
5. Requester’s reference (if applicable)
6. Choose one of the following:
6a. This is a complete proposal
Yes.
6b. More information will be provided later
No.
B. Technical – General
1. Choose one of the following:
1a. This proposal is for a new script (set of characters)
Yes.
1b. Proposed name of script
Avestan.
1c. The proposal is for addition of character(s) to an existing block
No.
1d. Name of the existing block
2. Number of characters in proposal
64.
3. Proposed category (A-Contemporary; B.1-Specialized (small collection); B.2-Specialized (large collection); C-Major extinct; D-Attested
extinct; E-Minor extinct; F-Archaic Hieroglyphic or Ideographic; G-Obscure or questionable usage symbols)
Category C.
4a. Is a repertoire including character names provided?
Yes.
4b. If YES, are the names in accordance with the “character naming guidelines” in Annex L of P&P document?
Yes.
4c. Are the character shapes attached in a legible form suitable for review?
Yes.
5a. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for publishing the standard?
Michael Everson.
5b. If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used:
Michael Everson, Fontographer.
6a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided?
Yes.
6b. Are published examples of use (such as samples from newspapers, magazines, or other sources) of proposed characters attached?
Yes.
7. Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching,
indexing, transliteration etc. (if yes please enclose information)?
Yes.
8. Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in
correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing
information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining
behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility
equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information
on other scripts. Also see Unicode Character Database http://www.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html and
associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the
Unicode Standard.
See above.
C. Technical – Justification
1. Has this proposal for addition of character(s) been submitted before? If YES, explain.
Yes. See N2556, N1684.
2a. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other
experts, etc.)?
Yes.
2b. If YES, with whom?
Hassan Rezai Baghbidi, Hossein Masoumi Hamedani (Iranian Academy of Sciences), Jost Gippert, Desmond Durkin-Meisterernst,
Günter Schweiger
2c. If YES, available relevant documents
http://titus.fkidg1.uni-frankfurt.de/unicode/iranian/3tagung.htm
22

3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or
publishing use) is included?
Zoroastrians, Iranianists and other scholars.
4a. The context of use for the proposed characters (type of use; common or rare)
Used liturgically and by scholars.
4b. Reference
5a. Are the proposed characters in current use by the user community?
Yes.
5b. If YES, where?
Religious and scholarly publications.
6a. After giving due considerations to the principles in the P&P document must the proposed characters be entirely in the BMP?
Yes.
6b. If YES, is a rationale provided?
Yes.
6c. If YES, reference
Accordance with the Roadmap. Avestan is used in modern Zoroastrian religion.
7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)?
No.
8a. Can any of the proposed characters be considered a presentation form of an existing character or character sequence?
No.
8b. If YES, is a rationale for its inclusion provided?
8c. If YES, reference
9a. Can any of the proposed characters be encoded using a composed character sequence of either existing characters or other proposed
characters?
No.
9b. If YES, is a rationale for its inclusion provided?
9c. If YES, reference
10a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character?
No.
10b. If YES, is a rationale for its inclusion provided?
10c. If YES, reference
11a. Does the proposal include use of combining characters and/or use of composite sequences (see clauses 4.12 and 4.14 in ISO/IEC
10646-1: 2000)?
No.
11b. If YES, is a rationale for such use provided?
11c. If YES, reference
11d. Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided?
No.
11e. If YES, reference
12a. Does the proposal contain characters with any special properties such as control function or similar semantics?
No.
12b. If YES, describe in detail (include attachment if necessary)
13a. Does the proposal contain any Ideographic compatibility character(s)?
No.
13b. If YES, is the equivalent corresponding unified ideographic character(s) identified?
23