CCL Home Page
Up Directory CCL koi7-8
# Jan Labanowski, jkl@ccl.net, Dec. 30, 1992
# File koi7_8.rus

# This is a transliteration data file for converting from KOI7 to
# KOI8 (RELCOM-GOST 19768-74). The KOI7 character codes for Russian letters
# overlap with Latin letters. To mark what is in Russian and what in English,
# The SHIFT-OUT and SHIFT-IN characters are used: The SHIFT-OUT and SHIFT-IN 
# switch between Latin and Russian character set. The SHIFT-OUT switches
# to Russian letters. If the sequence of Russian characters does not start
# with the SHIFT-OUT character, it will be treated as English text!
# The SHIFT-OUT character is CTRL-N (\14 = \0x0E = \0o16).
# The SHIFT-IN character is CTRL-O (\15 = \0x0F = \0o17). It switches back to
# Latin characters. 
# If the SHIFT-OUT character is not present, whole file is assumed to be
# written in Latin alphabet.
# On the practical side: The KOI7 characters are frequently obtained
# from KOI8 character set as a result of transmission through the network.
# In most cases, the electronic-mail strips the 8th bit of KOI8 character
# set changing it to KOI7. The problem is that in this case, there is no
# SHIFT-IN and SHIFT-OUT codes to signal which characters are Latin and
# which are Russian. In this case, buy using the editor, YOU HAVE TO ENCLOSE
# RUSSIAN TEXT inside the appropriate SHIFT-OUT and SHIFT-IN sequence.
# To obtain a true KOI7 file, these should be CTRL-N and CTRL-O, respectively.
# However, It is sometimes difficult to obtain this control characters within
# an editor. In this case, you may use your own character, but they should
# not apear elsewhere in the text. Unfortunately, most "good" characters
# are taken. I think that it is better to use a two-character sequence in
# this situation. You are free to use your own, but I would suggest that you
# stick with {{ (as SHIFT-OUT) and }} as SHIFT-IN (they correspoind to:
# sh-sh and shch-shch in KOI7). Of course, if they appear in your KOI7
# text you need to use something else. To use these characters, you need to
# make following changes in the section of this file where input SHIFT-OUT/IN
# sequences are read in:
# 1) uncomment (i.e., delete # character in the first column):
#  #   "{{"    ""    ""    ""    "}}"      ""         # Russian letters {{...}}
# 2) comment (i.e., put # in the first column)
#  "\0x0E"  ""    ""    ""    "\0x0F"   ""         # Russian letters ^N...^O


# To be used with translit.c program by Jan Labanowski

   1            file version number

   "    "      # string delimiters
   [    ]      # list delimites
   {    }      # regular expression delimiters


#starting sequence
""


#ending sequence
""

   2     # number of input SHIFT sequences, only one set of input characters
   ""      ""    ""    ""    ""        ""         # Latin characters
   "\0x0E"  ""    ""    ""    "\0x0F"   ""         # Russian letters ^N...^O
#   "{{"     ""    ""    ""    "}}"      ""         # Russian letters {{...}}

   0     # number of output SHIFT sequences, only one set of output characters
  
# conversion table
#   ASCII characters (set 1) no-change
#   Russian letters have to be entered explicitly (we could use a simple
#   range like:
#   2  [#$"@-~]  0   [\0xA3\0xB3\0xFF\0xC0-\0xFE]
#  but then we would miss the nice table for the humans).

# inp_set_numb  inp_seq        out_set_numb  out_seq
      1    [\0x21-\0x7F]          0  [\0x21-\0x7F]     #Pass  ASCII, no change

# Leave the " as a quote if it is at the beginning or end of the word
# Change it to hard sign only if it is inside the word
2 {([][\0x7d{A-Za-z\|~@'/])\0x22([][\0x7d{A-Za-z\|~@'/])}
                                 -1        {\1\0xFF\2} # hard sign
      2           "#"             0          "\0xA3"   #small yo
      2           "$"             0          "\0xB3"   #capital YO
      2           "a"             0          "\0xE1"   #capital  A
      2           "b"             0          "\0xE2"   #capital  Be
      2           "w"             0          "\0xF7"   #capital  Ve
      2           "g"             0          "\0xE7"   #capital  Ghe
      2           "d"             0          "\0xE4"   #capital  De
      2           "e"             0          "\0xE5"   #capital  Ie
      2           "v"             0          "\0xF6"   #capital  Zhe
      2           "z"             0          "\0xFA"   #capital  Ze
      2           "i"             0          "\0xE9"   #capital  I
      2           "j"             0          "\0xEA"   #capital  short I
      2           "k"             0          "\0xEB"   #capital  Ka
      2           "l"             0          "\0xEC"   #capital  El
      2           "m"             0          "\0xED"   #capital  Em
      2           "n"             0          "\0xEE"   #capital  En
      2           "o"             0          "\0xEF"   #capital  O
      2           "p"             0          "\0xF0"   #capital  Pe
      2           "r"             0          "\0xF2"   #capital  Er
      2           "s"             0          "\0xF3"   #capital  Es
      2           "t"             0          "\0xF4"   #capital  Te
      2           "u"             0          "\0xF5"   #capital  U
      2           "f"             0          "\0xE6"   #capital  Ef
      2           "h"             0          "\0xE8"   #capital  Kha
      2           "c"             0          "\0xE3"   #capital  Tse
      2           "~"             0          "\0xFE"   #capital  Che
      2           "{"             0          "\0xFB"   #capital  Sha
      2           "}"             0          "\0xFD"   #capital  Shcha
      2           "y"             0          "\0xF9"   #capital  Y (Iery)
      2           "x"             0          "\0xF8"   #capit soft sign(Ierik)
      2           "|"             0          "\0xFC"   #capit reverse rounded E
      2           "`"             0          "\0xE0"   #capital  Yu
      2           "q"             0          "\0xF1"   #capital  Ya
      2           "A"             0          "\0xC1"   #small  a
      2           "B"             0          "\0xC2"   #small  be
      2           "W"             0          "\0xD7"   #small  ve
      2           "G"             0          "\0xC7"   #small  ghe
      2           "D"             0          "\0xC4"   #small  de
      2           "E"             0          "\0xC5"   #small  ie
      2           "V"             0          "\0xD6"   #small  zhe
      2           "Z"             0          "\0xDA"   #small  z
      2           "I"             0          "\0xC9"   #small  i
      2           "J"             0          "\0xCA"   #small  short i
      2           "K"             0          "\0xCB"   #small  ka
      2           "L"             0          "\0xCC"   #small  el
      2           "M"             0          "\0xCD"   #small  em
      2           "N"             0          "\0xCE"   #small  en
      2           "O"             0          "\0xCF"   #small  o
      2           "P"             0          "\0xD0"   #small  pe
      2           "R"             0          "\0xD2"   #small  er
      2           "S"             0          "\0xD3"   #small  es
      2           "T"             0          "\0xD4"   #small  te
      2           "U"             0          "\0xD5"   #small  u
      2           "F"             0          "\0xC6"   #small  ef
      2           "H"             0          "\0xC8"   #small  kha
      2           "C"             0          "\0xC3"   #small  tse
      2           "^"             0          "\0xDE"   #small  che
      2           "["             0          "\0xDB"   #small  sha
      2           "]"             0          "\0xDD"   #small  shcha
      2           "_"             0          "\0xDF"   #small hard sign (ier)
      2           "Y"             0          "\0xD9"   #small  y (iery)
      2           "X"             0          "\0xD8"   #small soft sign (ierik)
      2           "\"             0          "\0xDC"   #small  rev rounded e
      2           "@"             0          "\0xC0"   #small  yu
      2           "Q"             0          "\0xD1"   #small  ya
Modified: Wed Apr 13 16:00:00 1994 GMT
Page accessed 4593 times since Sat Apr 17 22:00:55 1999 GMT