CAC 2015-08-25

fault rework studies

I am concerned that the implementation of RCU has too much entropy due to our poor understanding of the hardware; specifically, understanding of design assertions of the nature "if the operation is x and the state is y, then z is true (or false)"

CPU flow (ignoring EIS and RPx)

FETCH:
Fetch instruction into CU
Decode instruction
EXEC:
If not MIIF
If (PREPARE_CA || READ_OP)
Do CAF
If (READ_OP)
Do readOperand
Execute instruction
If (WRITE_OP)
If not ((PREPARE_CA || READ_OP)
Do CAF
Do writeOperand
Go to FETCH

Fault handling:

Page faults during 'Fetch instruction into CU' set the FIF bit, which signals a restart at 'FETCH:'.

Page faults during read operands CAF and readOperand restart at 'EXEC:' with MIIF set to off.

Faults during and after instruction execution restart at 'EXEC:' with MIIF set to on. These faults can be page faults from the CAF/write cycle, and non-page faults during instruction execution.

(Although the MIIF bit seems fairly trivial here, it plays a pivotal role in RPx).

The instruction restart model requires each instruction to be "invariant"; that is restarting it should not change the output.

A hypothetical problem instruction would be 'Add 1 to rA and store the result'; if the 'store the result' step faults the instruction will restarted, and rA will be incremented a second time, producing an incorrect result.

I am reasonably confident that the non-EIS instructions are invariant with respect to operands, but I am not sure about Indicator registers.

My concern lies with the fact page faults in the CAF/Write cycle cause instruction re-execution, which is inefficient and unnecessarily stresses the invariant condition.

I am currently considering a change:

FETCH:
Fetch instruction into CU
Decode instruction
EXEC:
If not MIIF
If (PREPARE_CA || READ_OP)
Do CAF
If (READ_OP)
Do readOperand
Execute instruction
rewrite:
If (WRITE_OP)
If not ((PREPARE_CA || READ_OP)
Do CAF
Do writeOperand
Go to FETCH

and a change to RCU

Define WriteFault as the various recoverable faults that can occur during writeOperand (page fault and ?)
if WriteFaults and MIIF goto rewrite.

This relies on the following assertion which I think is true, but I have not yet done documentation and code review to prove it:

The "execute instruction" cannot raise an WriteFault.

There is a case in the conditional transfers that is problematic, but I think they are wrong anyway, and am prepared to fix them.

The idea is if RCU can reliably decide if the fault occurred either during or after 'execute instruction' , it can be smarter about the restart.

This list was generated by inspections and may contain errors and omissions.

// decimal   octal
// fault     fault  mnemonic   name             priority group  handler
// number   address
//   0         0      sdf      Shutdown               27 7
//   1         2      str      Store                  10 4                                  getBARaddress, instruction execution
//   2         4      mme      Master mode entry 1    11 5      JMP_SYNC_FAULT_RETURN       instruction execution
//   3         6      f1       Fault tag 1            17 5      (JMP_REFETCH/JMP_RESTART)   doComputedAddressFormation
//   4        10      tro      Timer runout           26 7      JMP_REFETCH                 FETCH_cycle
//   5        12      cmd      Command                 9 4      JMP_REFETCH/JMP_RESTART     instruction execution
//   6        14      drl      Derail                 15 5      JMP_REFETCH/JMP_RESTART     instruction execution
//   7        16      luf      Lockup                  5 4      JMP_REFETCH                 doComputedAddressFormation, FETCH_cycle
//   8        20      con      Connect                25 7      JMP_REFETCH                 FETCH_cycle
//   9        22      par      Parity                  8 4
//  10        24      ipr      Illegal procedure      16 5                                  doITSITP, doComputedAddressFormation, instruction execution
//  11        26      onc      Operation not complete  4 2                                  nem_check, instruction execution
//  12        30      suf      Startup                 1 1
//  13        32      ofl      Overflow                7 3      JMP_REFETCH/JMP_RESTART     instruction execution
//  14        34      div      Divide check            6 3                                  instruction execution
//  15        36      exf      Execute                 2 1      JMP_REFETCH/JMP_RESTART     FETCH_cycle
//  16        40      df0      Directed fault 0       20 6      JMP_REFETCH/JMP_RESTART     getSDW, doAppendCycle
//  17        42      df1      Directed fault 1       21 6      JMP_REFETCH/JMP_RESTART     getSDW, doAppendCycle
//  18        44      df2      Directed fault 2       22 6      (JMP_REFETCH/JMP_RESTART)   getSDW, doAppendCycle
//  19        46      df3      Directed fault 3       23 6      JMP_REFETCH/JMP_RESTART     getSDW, doAppendCycle
//  20        50      acv      Access violation       24 6      JMP_REFETCH/JMP_RESTART     fetchDSPTW, modifyDSPTW, fetchNSDW, doAppendCycle, EXEC_cycle (ring alarm)
//  21        52      mme2     Master mode entry 2    12 5      JMP_SYNC_FAULT_RETURN       instruction execution
//  22        54      mme3     Master mode entry 3    13 5      (JMP_SYNC_FAULT_RETURN)     instruction execution
//  23        56      mme4     Master mode entry 4    14 5      (JMP_SYNC_FAULT_RETURN)     instruction execution
//  24        60      f2       Fault tag 2            18 5      JMP_REFETCH/JMP_RESTART     doComputedAddressFormation
//  25        62      f3       Fault tag 3            19 5      JMP_REFETCH/JMP_RESTART     doComputedAddressFormation
//  26        64               Unassigned
//  27        66               Unassigned
//  28        70               Unassigned
//  29        72               Unassigned
//  30        74               Unassigned
//  31        76      trb      Trouble                 3 2                                  FETCH_cycle, doRCU

So lets sort by usage:

Not relevant: only fires at the begin state of the CPU.
   CON TRO

Not relevant: not restartable
    EXF TRB

Only during instruction execution
   MME MME2 MME3 MME4 OFL DIV CMD DRL

Instruction execution of FETCH cycle
   LUF -- I suspect that is not restartable and so not relevant.

Unused
   SDF PAR SUF

CAF,getSDW,doAppendCycle
   F1 F2 F3 DF0 DF1 DF2 DF3

doAppendCycle or ring alarm in EXEC_cycle
   ACV -- make sure that the ring alarm is compatible with the new logic

Special cases:

   ABSA 

      ABSA can generate {ACV,ACV15} boundary violation faults and DF faults;
      the new logic would have difficulty distinguishing the DF faults from 
      writeOperands DF faults.

Need review
   IPR -- doAppendCycle, CAF, instruction execution; I suspect that is not restartable and so not relevant.
   ONC -- I suspect that is not restartable and so not relevant.
   STR -- getBARaddress, instruction execution;  suspect that is not restartable and so not relevant.
   doABSA; this routine reaches into the SDW/PTW logic; make sure that it's fault logic is correct.

ABSA can generate {ACV,ACV15} boundary violation faults and DF faults; the new logic would have difficulty distinguishing the DF faults from writeOperands DF faults.

ABSA is generating the faults because the business logic is do all of the readOperand steps except actually reading the operand.

The CAF and APPEND unit code is greatly improved since ABSA was written; it may be possible to greatly simplify it by making it READ_OPERAND, and letting it use iefpFinalAddress. T&D extensively tests it.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License