RMS-11 Workshop Session 
+                                  ______ ________ _______ 
                                 Spring 1981 DECUS Symposium 
                                       Miami, Florida 
         
                          Speaker: Tim Day - RMS Development Group 
         
         
        Prepared Questions Answered: 
+       ________ _________ _________ 
         
        Q:    An initial sequential GET by KRF field for an index file was done. The 
              person  then  did  a GET by RFA and RMS ignored  the  KRF  field,  the 
              retrieve got the next record on the primary key. Why? 
        R:    This is a user error in that it is well documented that  a  GET by RFA 
              has no meaning except in the context of the primary key. When you have 
              done a GET by RFA you have set your context into  the primary key. Any 
              values placed into the KRF field are meaningless and ignored. 
         
        Q:    Why not place the index of a file on a separate  device? This could be 
              done by keeping the name of the index file in the prologue of the data 
              file and opening it on a separate channel. 
        R:    This will probably not come to life because of  the  large  amount  of 
              problems  that  can  arise in maintaining,  tracking,  backing-up  and 
              recovering that form of a file. If someone wanted  to be really tricky 
              and modify the RMS code on their own, it MIGHT  be  doable. Doing this 
              with a "shared file" would cause problems!! The reason  for doing this 
              is  really for a performance increase, DEC's  overall  committment  to 
              performance is by other means such as buckets, areas and placement. 
         
        Q:    What  is  the best way to load an index file,  last  or  first  record 
              first? 
        R:    If a file is loaded in descending order you will end  up  with  a file 
              that is longer than necessary with records not packed to the best they 
              could be. The smallest, densest file will result from loading the file 
              in ascending order. 
         
        Q:    Someone wanted a utility to compress a file on-line and in-place. 
        R:    No committment to this from DEC, although a show  of  hands  showed an 
              interest in it. 
         
        Q:    Another question was raised about "hashed indexes" for a  file and its 
              possible implementation in RMS. 
        R:    This is just another way to retrieve data from a file in a much faster 
              way  than  by  using indices as RMS does. The  proper  use  of  hashed 
              indices  should  get ANY record at random in two  (2)  I/O's.  In  the 
              current use of indices, it COULD take many more I/O's to a device. 
         
        Questions From Attendees: 
+       _________ ____ __________ 
         
        Q:    What about placing the indices into a temporary file  when  an OPEN is 
              executed, possibly on a separate disk to increase performance? 
        R:    This might work for a file in a single-user per file environment. In a 
              file  openned  for  sharing, the indices would be  extremely  hard  to 
              maintain  without  corruption.  If  RMS  were  to become  part  of  an 
              operating system and have knowledge of everything going on, this might 
              be feasible. Right now though, there are too many  drawbacks  to  this 
              idea. 
 
 
 
 
 
 
 
 
 
 
         
        Q:    What  is  the  true  story about reclaiming space  left  from  deleted 
              records in an index file? 
        R:    There are a lot of different types of deletes  that  take  place.  The 
              worst  case  being  that for a shared file with  duplicate  keys.  The 
              problem  arises  because  RMS  has to leave enough of  the  record  to 
              describe the primary key, which oddly enough is the  only  way RMS can 
              tell that a record has been deleted. This is  needed  because  another 
              person  may  have your current record as his  next  record.  Generally 
              speaking,  the  remains  (or  fossils)  of  any  deleted  records  are 
              compressed in the buckets to leave a contiguous space  which  is  then 
              available to hold a record IF the record length  PLUS any RMS overhead 
              is smaller than the space available. The remains  of  deleted  records 
              will  stay  in  the  file  for the life of the file  or  until  it  is 
              reloaded, using IFL for example. 
         
        Q:    What methods are available for optimizing the structure of a file such 
              as that found on the VAX? 
        R:    DEC  is exploring the possibility of developing a utility  to  aid  in 
              defining a file with performance kept in mind. 
         
        POINTS OF INTEREST: 
+       ______ __ _________ 
         
        Concerning ODL Files and Resident Libraries. 
+       __________ ___ _____ ___ ________ __________ 
         
                  An  ODL  file is a way to do disk overlaying  and  under  the  RMS 
              implementation, if you use a non-overlayed version, you  are  going to 
              get between 7 and 44KB of RMS code/buffers added  to  your task space. 
              In an overlayed environment, RMS will get down to  about  10KB of your 
              task space in some situations. This may not apply  when using RMS from 
              a  High Leval Langauge, such as Fortran IV+. The  above  numbers  were 
              gotten  from using MACRO-11 assembly language.  The  disadvantages  of 
              disk  overlays  are many. First, the program execution  speed  can  be 
              affected  depending  on  the sequence of operations  and  the  overlay 
              structure  itself.  This  is  because  overlay  implies  I/O.  Second, 
              optimizing an ODL file is plain hard work. Third,  the  task  image on 
              disk will grow depending on the tree structure defined. This may cause 
              problems  for those systems cramped for disk room.  Fourth,  the  time 
              needed to task build an overlayed program increases  depending  on the 
              tree structure. 
                  The  cure  for  many  of  these problems is the  use  of  resident 
              libraries. In terms of RMS, this means that you  have a single, shared 
              copy  of  RMS  code placed into physical memory which  all  tasks  are 
              linked to at task build time. The first advantage is that there are no 
              overlay  I/O  operations  as  all the library  is  resident  together. 
              Second,  you don't need complex overlay files for RMS  code.  The  ODL 
              file that you would use is 3 lines long,  which  account  for  approx. 
              1.5KB of RMS code in your task image. Third, the  task  images on disk 
              become smaller and the program should execute faster  because  of  the 
              lack of overlay I/O operations. The disadvantage to using libraries is 
              that you give up 2 APR's (or 8KW of your task  space), plus the little 
              extra RMS code in your 24KW (remember you "lost" 2 APR's) task space. 
                  The breakeven point at which to change from having separate copies 
              of  RMS code in everyone's task space and installing  a  resident  RMS 
              library which will consume approx. 46KB of physical memory is four (4) 
              simultaneous memory resident tasks using RMS. This would decrease your 
              OVERALL  memory  requirements with a possible  increase  in  execution 
              speed. 
 
 
 
 
 
 
 
 
         
        Goals of RMS. 
+       _____ __ ____ 
         
                  The greatest goal of RMS is reliable tracking of  a  user's  data. 
              Considerable code was put into RMS to insure that  data  will  not  be 
              lost  or corrupted. This was done at the expense  of  execution  time. 
              Second, is a content addressable capability which  generated the index 
              file  organization. Third, is the ability to  have  multiple  indices. 
              Fourth,  good sequential access performance on the  primary  key.  The 
              overall structure of RMS was towards this goal. Fifth was fair to good 
              access on alternate keys. This can be done by using bucket size, areas 
              and placement. Sixth is Relative File Address (RFA)  access  which  is 
              guaranteed to be the same for the life of the  file.  Seventh  is good 
              space utilization within the file structure. An  area  for improvement 
              is reclaimation of space left by deleted records. 
         
        Areas. 
+       ______ 
         
                  Even if you specify no areas, you are given an  area zero (0), but 
              you as a user don't know it. It won't even show  when  you  "DSP /FU". 
              The  best  conditions  are  if your file and, hence  your  areas,  are 
              contiguous. For the most part, the use of areas is  a  trial and error 
              thing  in  search of the best performance. Proper use  of  areas  will 
              boost sequential access with little increase in  random retrieval. The 
              use of areas does not impact the user program, this  frees  you to try 
              many schemes. 
         
        DEF Utility. 
+       ___ ________ 
         
                  Is  there  any way to back up to previous entry  in  the  case  of 
              errors? 
                  There is a better (?) problem area in that DEF does not attach the 
              terminal being used for input, which on a heavily  loaded  system,  if 
              DEF is checkpointed, any user input typed while  DEF  is  checkpointed 
              will  be  sent to MCR NOT DEF. Naturally, MCR does  NOT  know  what  a 
              bucket is, you let your imagine take over from here.  There is a patch 
              in the works for this. In answer to re-entering previous parameters, a 
              future  release  of  DEF  will  confront  this issue.  CTRL-Z  is  the 
              universal input to terminate an RMS utility. 
         
        Odds and Ends. 
+       ____ ___ _____ 
         
                  If  a user has a contigious file with a  bucket  size  that  spans 
              physical disk blocks, RMS will issue a multi-block  read/write  to the 
              disk ACP. This increases the performance of your program. 
         
                  DEC is looking at a means to zero out a  file  in  preparation  to 
              re-populating  it. This would allow a user to  maintain  the  physical 
              location of a file on a volume. This would  be  more  controlled  than 
              deleteing and re-creating the file on a multi-user system.