WHAT CAN BE DONE WITH THE  S H U F F L E R  FOR RSX11M
             ======================================================

             Studying work of heavily overloaded RSX11M  V3.2,  I  found
        that  swapping and shuffling produces overhead, which under some
        circumstances results in real system lock.  The  first  step  to
        break  such "shuffling to death" was non-standard (DEC) priority
        schema, which reduced probbability of swapping more tasks at the
        same  priority.  One example for programm-development system may
        be:
        	PRI  80 - editors, directory-manipulation etc.
        	     70 - compillers, LBR, file-manipulation
        	     60 - TKB
        	     50 - programm debuging
        	     40 - routine programm runs - computations

             But previous trick does not change the nature  of  problem:
        the  overhead  produced by SHUFFLER activity.  Jim Downward pub-
        lished (THE MULTITASKER) measurements for varied values of swap-
        ping  interval  (S$$WPC), but this demonstrates only one part of
        problem.  Following diagram illustrates overhead due to swapping
        and  shuffling  which occures when not all tasks fit to availble
        memmory, depending on number of active tasks, for various memmo-
        ry  sizes  (number  of  resident  tasks).  The task measured was
        BIGTKB linking a small F4P programm using F4PRES  on  11/34  and
        RK05 disk, S$$WPC*S$$WPR=150.
         
        time per 1 task (%)
               I			N - number of resident tasks
               I			    from the total active
         400% -I-                                                 
               I				5
               I
               I
         300% -I-				4
               I			4		4
               I
               I
         200% -I-			3	3
               I		3
               I
               I	1	2	2	2
         100% -I---------------------------------------------------
               I	2	3	4	5      all resident
               I
               I
              -I--------I-------I-------I-------I-------I-------I--
               1	2	3	4	5   number of tasks
        						(total)
         
                  swapped tasks  =  total number of tasks  -  N
         

                                                                  PAGE 2



        The GEN size was changed so, that only required number  of  16kW
        tasks  could fit to memmory, and this number marks relative time
        in simplified diagram.  As you can see, SHUFFLER overhead drast-
        ically  grows  up  !!!!!!!   with  the number of resident tasks.
        This is due to fact, that every resident task  becomes  eligible
        for  swapping  out  after T=S$$WPC*S$$WPR clock ticks, and if we
        have N tasks resident, system will do N swap/shuffle/load cycles
        every T ticks.

             At this point I decided to modify not only S$$WPC in accor-
        dance  with  Jim  Downward's  reccomendation,  but also SHUFFLER
        logic.  The whole complicated work resulted  in  approx  30%  of
        changed SHUFFLER code, with following effect:
         
        (previous example, only three tasks resident)
         
        total tasks active:   1    2    3    4    5    6      12
        orig.SHF time/task:  26"  23"  22"  48"  53"  57"  never fin.
        mod. SHF time/task:  26"  23"  22"  29"  30"  29"     33"
        performance incr. :   0    0    0   65%  76%  96%     ??%
         

             The first "feature" changed was swapping schema.  The final
        goal  of  swapping is to achieve timesharing effect;  t.e.  give
        the same residency (and compute) time to all tasks on  the  same
        priority.   DEC  approach  limits memmory residency for any task
        without respect to overall system state.  In my  oppinion,  just
        opposite approach is adequate:
                 the more tasks in memmory, the slower swapping
        When only two tasks fit to memmory, every of them may get up  to
        50%  CPU  time.  When 10 tasks fit to memmory, every of them may
        get only 10% CPU time, so we may let them  in  memmory  5  times
        longer.   My  modified  swapping  algoritmus slows swapping rate
        down by number of resident tasks.  To keep  track  of  partition
        "age"  I use the P.PRI byte.  This may cause problems, because I
        don't know real purpose of this byte in RSX11M.   But  it  works
        for system with the most of available options.

             Next thing to improve is the shuffling  logic.   Many  kilo
        MOV  instructions  may be saved, when we shall fill the holes at
        the bottom of partition using tasks from  the  top,  instead  of
        shuffling   all  resident  tasks.   We  may  also  move  accross
        non-shufflable  partitions  etc.   Shuffling  tasks  for  offset
        smaller  than  1/4  of their size is not too effective, often it
        may be better to leave some holes unused.  I tried to apply some
        general  planning  to shuffling, but I found it too complicated,
        especially due to parallelism  of  EXEC,  SHF  and  TASKS  work.
        Simplicity seems to be better.

             Another significant item are periods of SHF activity.   Jim
        Downward recomended to mark the time of SHF request, and disable
        next within specific time interval.  The matter is not too  sim-
        ple.   SHF  stops  itself for checkpoint or task I/O completion,
        but when memmory allocation failrue  exists,  EXEC  unstops  SHF
        earlier  -  at  every significant event (I/O finish).  Disabling

                                                                  PAGE 3



        this feature significantly slows down  system  throughput.   The
        trick  I used is "SHF time-out" for a specified time (1/4 sec ?)
        at every SHF exit.  Simply remove  the  SHF  pointer  $SHFPT  in
        SYSCM,  and  after specified time restore it.  The whole time is
        SHF unaccessible,  and  EXEC  will  not  require  SHF  to  work.
        Naturally,  this  may  be  unacceptable  for some short response
        critical applications.

             Last usefull modification is  system  controlled  partition
        compaction  provided on operator request (MCR>RUN $SHF), even if
        NO task is out of memmory.  This  greatly  simplifies  usage  of
        SET/TOP command, and /TOP driver loads.

             					Ing.Martin Brunecky
        					Drobneho 28a
        				60000 	BRNO
        					Czechoslovakia