* HeloWrld
* A1
X MRCM      - instruction not yet implemented
* BootStra
X TAPEBOOT  - MapID2 is still wrong
X MCS       - bugs in emulation of opcode
* CHN1      - skip sizes need setting (FCB? With chain map?)
X S4        - Needs ZA(opcode=?) Maybe lozenge should be replaced by ")"?
X SR        - Error in SBR(H)
TAPETEST    - Error in movechars? (not using Baddress)
A2          - Invalid D-char "Z" on br instr.
SWITCHES    - Not working properly, switches seem good
ZS          - Instr not implemented
ZA          - Instr not implemented?
INDEX       - Not working (2 oprnd use of "H")
B           - Not fully working
S2          - Not fully working
TAPET2      - Not working
CC          - Printer skip sizes
MCE         - Instr not implemented
MA          - Not working
TIME        - loops? (add storing #0)
IOCHECK     - Invalid D-char "^" on br instr.
C           - works or does nothing?
SAMPLE      - needs MCS instr.
DEBUG       - same prog as SAMPLE?
TAPET1      - Not working
MLZS        - Not fully working
TAPET4      - Not working
A3          - Not fully working
BWZ         - Looks OK (lozenge vs ")")
S3          -- Not working
CDLOAD      - Weird needs # " % _ \ in mapping4 then invop $20
CDDUMP      - as above
SSW0003B
TBB1C11B    - has tab char in file
SPS1
SPS2
TestSPS     - As before mapping4 - added # @ %


Character codes are a total screw-up, both originally, when IBM had several
character sets mapping onto the same 64 codes, and since then, when they have
been mapped into ASCII in various ways. All known variations are show below.
"$XX=NNN=g" indicates the hex and decimal values for graphics characters.
These just make a bad situation worse, because they show up differently
depending on the font in use. In an effort to do my bit to eliminate them from
the face of the earth for all time, they are not accepted. They must be edited
out of the card file before they can be read by the emulator.

Hex Oct Char  Alt1 Alt2 Alt3       Notes
 00 00  Blank                      (Blank)
 01 01  1
 02 02  2
 03 03  3
 04 04  4
 05 05  5
 06 06  6
 07 07  7
 08 10  8
 09 11  9
 0A 12  0
 0B 13  #     =
 0C 14  @     '         $D8=216=
 0D 15  :
 0E 16  >
 0F 17  (     t         $FB=251=  (Tape Mark)
 10 20  ^     c    b    ?          (Cents or b with a stroke)
 11 21  /
 12 22  S
 13 23  T
 14 24  U
 15 25  V
 16 26  W
 17 27  X
 18 30  Y
 19 31  Z
 1A 32  '     r    |    $D8=216=  (Record Mark)
 1B 33  ,
 1C 34  %     (
 1D 35  =     ^    ~
 1E 36  \     '
 1F 37  +
 20 40  -
 21 41  J
 22 42  K
 23 43  L
 24 44  M
 25 45  N
 26 46  O
 27 47  P
 28 50  Q
 29 51  R
 2A 52  !     -
 2B 53  $
 2C 54  *
 2D 55  ]     )         $D8=216=
 2E 56  ;
 2F 57  _     d         $7F=127=  (Delta)
 30 60  &     +    ?    $D7=215=
 31 61  A
 32 62  B
 33 63  C
 34 64  D
 35 65  E
 36 66  F
 37 67  G
 38 70  H
 39 71  I
 3A 72  ?     &
 3B 73  .
 3C 74  )     o                    (Lozenge)
 3D 75  [     (
 3E 76  <
 3F 77  "     g    }    $CE=206=  (Group Mark)

Most codes have a definite, unambiguous ASCII mapping, those that don't are
shown again here:-

Hex Oct Char  Alt1 Alt2 Alt3       Notes
 0B 13  #     =
 0C 14  @     '         $D8=216=
 0F 17  (     t         $FB=251=  (Tape Mark)
 10 20  ^     c    b    ?          (Cents or b with a stroke)
 1A 32  '     r    |    $D8=216=  (Record Mark)
 1C 34  %     (
 1D 35  =     ^    ~
 1E 36  \     '
 1F 37  +
 2A 52  !     -
 2D 55  ]     )         $D8=216=
 2F 57  _     d         $7F=127=  (Delta)
 30 60  &     +    ?    $D7=215=
 3A 72  ?     &
 3C 74  )     o                    (Lozenge)
 3D 75  [     (
 3E 76  <
 3F 77  "     g    }    $CE=206=  (Group Mark)

As can be seen, a code may be represented by more than one character, and a
character may represent more than one code. All of the above fall into the
first case. Problems arise from the second case, because a character is then
ambiguous in the code that it represents.

The unambiguous characters (that appear only once above) are:-

Hex Oct Char Alt1 Alt2
 0B 13  #
 0C 14  @
 0F 17  t
 10 20  b    c
 1A 32  r    |
 1C 34  %
 1D 35  ~
 1E 36  \
 2A 52  !    -
 2D 55  ]
 2F 57  d    _
 3C 74  o
 3D 75  [
 3E 76  <
 3F 77  g    "    }

Either character will be accepted as representing the code.
("}" will be rejected, two alternatives is enough)

The ambiguous characters (that appear more than once above) are:-

 Char Alt1 Alt2 Alt3 (Hex)
 =    0B   1D
 '    0C   1A   1E
 ^    10   1D
 (    0F   1C   3D
 )    2D   3C
 ?    10   30   3A
 &    30   3A
 +    1F   30

The reason these ambiguitites exist is either abiguity in the original IBM code
mappings, or the lack of a ASCII character corresponding to the original IBM
character. Rather than force these to be edited to into yet another standard,
they are mapped onto (yet another) standard as they are loaded from disk.
This is accomplished via a mapping file, which is selected by a control file.
The mapping file specifies which character in the standard set is to be used in
place of the character read from the file. The card file itself is not altered,
just the copy of it that is loaded into the emulator. The control file is used
to specify which mapping file is to be used for a particular card file.

The standard character set is shown below. It contains all the standard and
unambiguos IBM characters, plus an alternative from each of the unambiguous
ASCII extensions. But it does not contain any of the ambiguous characters,
these codes are all represented by a lower case letter. This is to force any
card file that contains an ambiguous character to be clarified by a mapping
file. The alternates of the unambiguous characters (c | - _ ") must also be
mapped onto a standard character by the mapping file.

Hopefully this mechaism will stop any card file from being interpreted
incorrectly without any indication that this possiblity exists.

//     0 1 2 3 4 5 6 7 8 9 A B C D E F
//  0    1 2 3 4 5 6 7 8 9 0 # @ : > t
//  1  b / S T U V W X Y Z r , % ~ \ p
//  2  - J K L M N O P Q R ! $ * ] ; d
//  3  a A B C D E F G H I q . o [ < g

Const MapChrs: String=
  ' 1234567890#@:>tb/STUVWXYZr,%~\p-JKLMNOPQR!$*];daABCDEFGHIq.o[<g';


  From alt.folklore.computers Tue Oct 20 16:22:25 1992
Path: daimi!dkuug!sunic!mcsun!uknet!doc.ic.ac.uk!agate!ames!haven.umd.edu!darwin.sura.net!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!rutgers!rochester!cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!andrew.cmu.edu!jn11+
From: jn11+@andrew.cmu.edu (Joseph M. Newcomer)
Newsgroups: alt.folklore.computers
Subject: Re: What was the intended advantage of using 36bits CPU?
Message-ID: <8errpkW00WB81QCe82@andrew.cmu.edu>
Date: 17 Oct 92 02:41:52 GMT
Organization: Carnegie Mellon, Pittsburgh, PA
Lines: 79
In-Reply-To: <14934@auspex-gw.auspex.com>

Excerpts from netnews.alt.folklore.computers: 12-Oct-92 Re: What was the
intended a.. by Guy Harris@Auspex.COM 
> No, EBCDIC was 8 bits, but I don't think it emerged until the 8-bit-byte
> 360 did.  I think IBM's BCD used 6 bits; IBM *punched cards* used 12
> bits, but which IBM machines, if any, stored character strings in
> punched-card code (which doesn't include reading raw punched card code
> into memory and then converting it to BCD for processing)?

To the best of my knowledge, no IBM computer stored characters as 12-bit
characters.  The 7090/94 series read cards in column-binary, storing
three 12-bit columns in one 36-bit word, and reading a card into
24 words (columns 73-80 were not read, which is why they were used to hold
sequence numbers, and this is also why FORTRAN only used columns 1-72)
Conversion from row binary to 6-bit packed characters (6 per word)
was done in the "operating system" (IOCS? I never programmed a 709x)

This also may explain to those who never knew it why FORTRAN identifiers
were 6 characters long or the significance of column 6 in FORTRAN
programs...

The 1130/1800 also read column binary into 16-bit words, and I have a
vague recollection that you could pack 3 characters into a single 16-bit
word using a packing convention akin to PDP-10 RAD50.

The DEC10/20 used three character conventions: 7-bit ASCII, 5 7-bit
bytes left aligned in a 36-bit word with the low-order bit unused (it
flagged line numbers in LINED/SOS files...), which gave full 7-bit
ASCII; SIXBIT, which encoded the BCD subset (uppercase only, so no
left quote, {|}~), and RAD50 (40 decimal which is 50 octal) which
encoded 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ.$% and one other character
which I forget.  This was used to encode six-character symbols into
32 bits, allowing 4 bits in each word for flags (such as "global symbol",
etc.).  These RAD50 names were how the DDT symbol table was stored, and
how names were stored in .REL (==.o) files.

In a collection of papers "Faith, Hope and Parity" from the days where
Datamation was allowed to have a sense of humor, one sequence was on
"How to design a klu(d)ge" (I forget which spelling they used).  One
criterion was that it had to have a collating sequence different than
anyone else's and a character assignment that was different as well.

The 1401/40/60 collating sequence was NOT the binary character code.
The collating sequence, from appendix I of the 1440 manual, was
as shown below.  Read from top to bottom left to right.

space    000000   ;        101110   #        001011   J        100001
.        111011   delta    101111   @        001100   ...
(        111100   -        100000   :        001101   R        101001
[        111101   /        010001   >        001110   RM       011010
<        111110   ,        011011   SQ       001111   S        010010
GM       111111   % )      011100   ?        111010   ...
& +      110000   WS       011101   A        110001   Z        011001
$        101011   \        011110   ...               0        001010
*        100110   SM       011111   I        111001   ...
]        101101   b        010000   !        101010   9        001001


GM was the "group mark", graphic three horizontal lines with a vertical
bar thru the center.  "delta" was a triangle.  'b' was the "substitute
blank" (used to keep tapes with long stretches of blank characters
and which were encoded NRZ from having the long blanks being treated
as EOR gaps), SM was the segment mark (three vertical lines with a
single horizontal line), SQ was a square root sign, RM a record mark
(a not-equal sign but with a vertical rather than diagonal stroke).
WS was the word separator character, very important as it was used in
object decks to indicate which character got a word mark, and as far
as I can tell it was hand-drawn in every manual.  It looked like the
representation of a distant cartoon flying bird:
                    ___  ___
                   /   \/   \

Other vendors used different graphics; for example, Honeywell had a cent
sign and a single-character "CR" symbol for some of these characters.
This was common.  The non-contiguous alphabet was also quite common,
i.e., the sequence ?ABCDEFGHI!JKLMNOPQR=STUVWXYZ (where = means record mark).
In EBCDIC, the sequence is ABCDEFGHI...!JKLMNOPQR.../STUVWXYZ
where the ... represent the gaps in the sequence in which there are
bits but no associated graphics.
                                joe


