CAC 2017-10-17

chasing down pi crash

I think that Multics is struggling here. Looking at the instruction trace prior to the TRO that creates the MC that is incorrectly RCUed, I see that it was executing:

volmap_page:

post_read_in_mem:
          ana       -1,dl                         " Page number
          lprpbb    bp|pvte.volmap_stock_ptr      " bb -> record stock
          epbpab    bb|0                          " ab -> base of stock seg
          eppab     ab|stock_seg.meters           " ab -> meters
          tsx7      volmap$withdraw_from_volmap   " Replenish stock

          lda       bp|pvte.volmap_async_page
          ana       -1,dl                         " Page number
          tsx7      start_async_write             " Write it back
          tra       post_meter_exit
          tra       page_error$volmap_page_async_error " Not in memory

and has taken the 'tra page_error$volmap_page_async_error " Not in memory' path.

Looking where the code was going at the time of the TRO:

volmap_page_async_error:
savexx none
spribp arg+6
tsx0 setup_pvtname_from_pvtep
call_syserr 1,(volmap_page: Async error for ^a_^a.),devname,driveno

syserr_real: 1 => write message with alarm, log, and CRASH system - allow bce to return.

Looks like like a place to go when the boot is in big trouble. The issue is prior to the 'fault in idle' bug.

From the top:

append faults.

00130:015365 bound_file_system:append+04023
      870           unspec (dirp -> dir) = "0"b;                      /* Zero the header */
DBG(88329177)> CPU TRACE: 00130:015365 0 000100100400 (MLR 000100)
DBG(88329177)> CPU FAULT: Fault 17(021), sub 0(00), 'PTW0.F == 0'
DBG(88329179)> CPU TRACE: 015365 0 000542657220 (SCU 000542,N*)
DBG(88329179)> CPU FINAL: Write 77124030 000130050201

fault handler is entered:

00042:001062 bound_page_control:page_fault+01062
    001062                        1144  fault:
    001062                        1145            inhibit   on
    001062  0a   000010 2542 20   1146            spri      pf_prs,*
DBG(88329182)> CPU TRACE: 00042:001062 0 000010254220 (SPRI 000010,N*)

Handler determines fault type:

00042:001137 bound_page_control:page_fault+01137
    001137  0a   001143 6000 00   1210            tze       regular_page_fault  not on, must be normal one (so use TSR)
DBG(88329360)> CPU TRACE: 00042:001137 0 001143600000 (TZE 001143)

00042:001143 bound_page_control:page_fault+01143
    001143                        1215  regular_page_fault:
    001143  aa  0 00030 7211 00   1216            lxl1      ap|mc.scu.apu_stat_word see what type of fault
DBG(88329362)> CPU TRACE: 00042:001143 0 000030721100 (LXL1 PR0|30)

00042:001215 bound_page_control:page_fault+01215
    001215                        1266  found_faulted_page:
    001215  aa  2 77764 6231 00   1267            eax       .aste,ptw|-aste_size get real astep into x3
DBG(88329438)> CPU TRACE: 00042:001215 0 277764623100 (EAX3 PR2|77764)

00042:001231 bound_page_control:page_fault+01231
    001231  0a   002061 6010 00   1281            tnz       create_segment_fault
DBG(88329462)> CPU TRACE: 00042:001231 0 002061601000 (TNZ 002061)

                                  1284  "
                                  1285  "         Here the commitment has been made to actually
                                  1286  "         read in one page of virtual memory.
                                  1287  "
                                  1288
    001232  4a  4 00220 7071 20   1289            tsx       .ret,pc_trace$page_fault
                                  1290
                                  1291                                tsx       .ret,read_page
    001233  0a   001610 7070 00   1292            tsx       .ret,read_page
    001234  0a   001237 7100 00   1293                tra   readin.goon         must wait for page
    001235  0a   001237 7100 00   1294                tra   readin.goon
    001236  0a   001347 7100 00   1295                tra   wait_any_event_apte wait for volmap event (in APTE by now)

    001233  0a   001610 7070 00   1292 00042:001232 bound_page_control:page_fault+01232
    001232  4a  4 00220 7071 20   1289            tsx       .ret,pc_trace$page_fault

00042:001232 bound_page_control:page_fault+01232
    001232  4a  4 00220 7071 20   1289            tsx       .ret,pc_trace$page_fault
DBG(88329464)> CPU TRACE: 00042:001232 0 033640707000 (TSX7 033640)
00042:001233 bound_page_control:page_fault+01233
    001233  0a   001610 7070 00   1292            tsx       .ret,read_page
DBG(88329472)> CPU TRACE: 00042:001233 0 001610707000 (TSX7 001610)

...

00042:001236 bound_page_control:page_fault+01236
    001236  0a   001347 7100 00   1295                tra   wait_any_event_apte wait for volmap event (in APTE by now)
DBG(88354994)> CPU TRACE: 00042:001236 0 001347710000 (TRA 001347)

00042:001357 bound_page_control:page_fault+01357
    001357  4a  4 00234 7101 20   1409            tra       pxss$page_wait
DBG(88355114)> CPU TRACE: 00042:001357 0 400230710120 (TRA PR4|230,N*)

    002111                        2118  page_wait:
    002111  4a  4 00164 5541 20   2119            stc1      pds$pxss_args_invalid Signal overwriting of temporaries.
DBG(88355136)> CPU TRACE: 00044:002111 0 400130554120 (STC1 PR4|130,N*)

                                  2111  "         This entry is a combination of pxss$addevent and pxss$wait.
                                  2112  "         The argument (wait event) is passed in pds$arg_1.
                                  2113  "         Called by page_fault, returns to page_fault$wait_return to
                                  2114  "         restart a page fault.
                                  2115  "
                                  2116  " " " " " " " " " " " " " " " " " " " " " "" " " " " " " " " "
                                  2117
    002111                        2118  page_wait:
    002111  4a  4 00164 5541 20   2119            stc1      pds$pxss_args_invalid Signal overwriting of temporaries.
    002112  4a  4 00222 4501 20   2120            stz       pds$pc_call         set flag for page_wait, vs. waitp.
    002113  4a  4 00234 2341 20   2121            szn       tc_data$wait_enable see if during initialization
    002114  0a   001710 6000 00   2122            tze       pw1                 yes, special code

00044:002113 bound_tc_priv:pxss+02113
    002113  4a  4 00234 2341 20   2121            szn       tc_data$wait_enable see if during initialization
DBG(88355140)> CPU TRACE: 00044:002113 0 400202234120 (SZN PR4|202,N*)
DBG(88355140)> CPU FINAL: Read 00400326 000000000001
00044:002114 bound_tc_priv:pxss+02114
    002114  0a   001710 6000 00   2122            tze       pw1                 yes, special code
DBG(88355142)> CPU TRACE: 00044:002114 0 001710600000 (TZE 001710)
DBG(88355143)> CPU FINAL: Read 00230115 001463706000
00044:002115 bound_tc_priv:pxss+02115
    002115  0a   001463 7060 00   2123            tsx6      init_

Interesting; it didn't take the pw1 path. Looking at tc_init, wait_enable means the idle process has been setup; so should be okay here.

00044:002132 bound_tc_priv:pxss+02132
    002132  0a   003241 7070 00   2138            tsx7      getwork
DBG(88355234)> CPU TRACE: 00044:002132 0 003241707000 (TSX7 003241)

00044:003317 bound_tc_priv:pxss+03317
    003317  0a   003321 7100 26   3191            tra       gw_tv,ql*           dispatch on state in QL
DBG(88355316)> CPU TRACE: 00044:003317 0 003321710026 (TRA 003321,QL*)

00044:002752 bound_tc_priv:pxss+02752
    002752                        2848  find_next_eligible:
    002752  0a   001324 7060 00   2849            tsx6      UNLOCK              fne must unlock read before write
DBG(88355318)> CPU TRACE: 00044:002752 0 001324706000 (TSX6 001324)
DBG(88355318)> CPU FINAL: Read 00227324 400106034120

00044:003240 bound_tc_priv:pxss+03240
    003240  0a   004025 7100 00   3109            tra       find_idle           Failure return from fne
DBG(88355452)> CPU TRACE: 00044:003240 0 004025710000 (TRA 004025)

0044:004025 bound_tc_priv:pxss+04025
    004025                        3621  find_idle:
    004025  4a  4 00334 2221 20   3622            ldx2      prds$idle_ptr+1
DBG(88355454)> CPU TRACE: 00044:004025 0 400304222120 (LDX2 PR4|304,N*)

00044:004030 bound_tc_priv:pxss+04030
    004030  0a   003366 7100 00   3625            tra       gw_can_run
DBG(88355464)> CPU TRACE: 00044:004030 0 003366710000 (TRA 003366)
DBG(88355464)> CPU FINAL: Read 00231366 300000352112
00044:003366 bound_tc_priv:pxss+03366
    003366                        3243  gw_can_run:
    003366  aa  3 00000 3521 12   3252            eppbp     bb|0,2              processor addevent!
DBG(88355466)> CPU TRACE: 00044:003366 0 300000352112 (EPP2 PR3|0,2)
DBG(88355467)> CPU FINAL: Read 00231367 000001220003

00044:003373 bound_tc_priv:pxss+03373
    003373                        3259  gw_can_really_run:
    003373  0a   004266 7070 00   3260            tsx7      compute_virtual_clocks
DBG(88355494)> CPU TRACE: 00044:003373 0 004266707000 (TSX7 004266)

    004342                        3926  age:
    004342  aa  2 00001 2351 00   3928            lda       bp|apte.flags       check for idle process
D00044:004343 bound_tc_priv:pxss+04343
    004343  aa   002000 3150 03   3929            cana      apte.idle,du
DBG(88355558)> CPU TRACE: 00044:004343 0 002000315003 (CANA 002000,DU)
DBG(88355556)> CPU TRACE: 00044:004342 0 200001235100 (LDA PR2|1)

00044:004370 bound_tc_priv:pxss+04370
    004370  aa   040000 3150 03   3954            cana      wcte.governed,du
DBG(88355598)> CPU TRACE: 00044:004370 0 040000315003 (CANA 040000,DU)
00044:004371 bound_tc_priv:pxss+04371
    004371  0a   004407 6000 00   3955            tze       m_unlock    
DBG(88355600)> CPU TRACE: 00044:004371 0 004407600000 (TZE 004407)

00044:004407 bound_tc_priv:pxss+04407
    004407  aa  3 00142 5541 00   3973  m_unlock: stc1      bb|metering_lock    UNLOCK UNLOCK UNLOCK UNLOCK UNLOCK
DBG(88355602)> CPU TRACE: 00044:004407 0 300142554100 (STC1 PR3|142)

00044:004422 bound_tc_priv:pxss+04422
    004422  aa  3 00324 1151 00   3984            cmpa      bb|n_eligible
DBG(88355624)> CPU TRACE: 00044:004422 0 300324115100 (CMPA PR3|324)

00044:004423 bound_tc_priv:pxss+04423
    004423  0a   004440 6000 00   3985            tze       cvst                Will be NMP idle
DBG(88355626)> CPU TRACE: 00044:004423 0 004440600000 (TZE 004440)

00044:004440 bound_tc_priv:pxss+04440
    004440  aa  3 00014 7401 12   3998  cvst:     stx0      bb|apte.term_processid,2 Remember idle type.
DBG(88355628)> CPU TRACE: 00044:004440 0 300014740112 (STX0 PR3|14,2)

00044:004441 bound_tc_priv:pxss+04441
    004441  aa   000000 7100 17   3999            tra       0,7
DBG(88355630)> CPU TRACE: 00044:004441 0 000000710017 (TRA 000000,7)

00044:003374 bound_tc_priv:pxss+03374
    003374  0a   004442 7070 00   3261            tsx7      set_newt            calculate the time to run
DBG(88355632)> CPU TRACE: 00044:003374 0 004442707000 (TSX7 004442)

00044:004443 bound_tc_priv:pxss+04443
    004443  aa   002000 3150 03   4010            cana      apte.idle,du
DBG(88355636)> CPU TRACE: 00044:004443 0 002000315003 (CANA 002000,DU)

DBG(88355653)> CPU FINAL: Read 00232455 000000635000
00044:004454 bound_tc_priv:pxss+04454
    004454  0a   004467 7100 00   4020            tra       sn2                 then set it
DBG(88355654)> CPU TRACE: 00044:004454 0 004467710000 (TRA 004467)

00044:004476 bound_tc_priv:pxss+04476
    004476  aa   000000 7100 17   4043            tra       0,7                 success return
DBG(88355668)> CPU TRACE: 00044:004476 0 000000710017 (TRA 000000,7)

Back to gw_can_really_run

00044:003375 bound_tc_priv:pxss+03375
    003375  aa  6 00205 7201 00   3262            lxl0      depth               set depth meter
DBG(88355670)> CPU TRACE: 00044:003375 0 600205720100 (LXL0 PR6|205)

                                  3271
                                  3272  " " " " " " " " " " " " " " " " " " " " "
                                  3273  "                                       "
                                  3274  "   P R O C E S S   S W I T C H I N G   "
                                  3275  "                                       "
                                  3276  "        D O N E    H E R E             "
                                  3277  "                                       "
                                  3278  " " " " " " " " " " " " " " " " " " " " "

00044:003440 bound_tc_priv:pxss+03440
    003440  0a   003453 6010 00   3309            tnz       dont_unload         yes - skip unload
DBG(88355734)> CPU TRACE: 00044:003440 0 003453601000 (TNZ 003453)

00044:003453 bound_tc_priv:pxss+03453
    003453                        3322  dont_unload:
    003453  aa   000021 3200 03   3339            lcx0      apte.dbr_loaded+1,du turn off dbr_loaded bit for old user
DBG(88355736)> CPU TRACE: 00044:003453 0 000021320003 (LCX0 000021,DU)

00044:003457 bound_tc_priv:pxss+03457
    003457  0a   001354 7060 00   3344            tsx6      UNLOCK_bp           gw unlocks new user
DBG(88355744)> CPU TRACE: 00044:003457 0 001354706000 (TSX6 001354)

00044:001354 bound_tc_priv:pxss+01354
    001354                        1484  UNLOCK_bp:
    001354  aa  2 00036 2341 00   1485            szn       bp|apte.lock        DEBUG
DBG(88355746)> CPU TRACE: 00044:001354 0 200036234100 (SZN PR2|36)

0044:001357 bound_tc_priv:pxss+01357
    001357  aa   000000 7100 16   1488            tra       0,6
DBG(88355752)> CPU TRACE: 00044:001357 0 000000710016 (TRA 000000,6)

00044:003460 bound_tc_priv:pxss+03460
    003460  0a   001324 7060 00   3345            tsx6      UNLOCK
DBG(88355754)> CPU TRACE: 00044:003460 0 001324706000 (TSX6 001324)

00044:001324 bound_tc_priv:pxss+01324
    001324  4a  4 00142 0341 20   1440  UNLOCK:   ldac      tc_data$apt_lock
DBG(88355756)> CPU TRACE: 00044:001324 0 400106034120 (LDAC PR4|106,N*)

00044:001334 bound_tc_priv:pxss+01334
    001334  aa   000000 7100 16   1448            tra       0,6
DBG(88355766)> CPU TRACE: 00044:001334 0 000000710016 (TRA 000000,6)

00044:003461 bound_tc_priv:pxss+03461
    003461                        3347  stop_check:
    003461  aa  6 00150 3521 20   3348            eppbp     apt_ptr,*           OLD USER STOPPED?
DBG(88355768)> CPU TRACE: 00044:003461 0 600150352120 (EPP2 PR6|150,N*)

    003470  aa  2 00001 2351 00   3361            lda       bp|apte.flags       see if idle
DBG(88355782)> CPU TRACE: 00044:003470 0 200001235100 (LDA PR2|1)
    003471  aa   002000 3150 03   3362            cana      apte.idle,du        yes, skip timer check
00044:003472 bound_tc_priv:pxss+03472
    003472  0a   003527 6010 00   3363            tnz       end_cput_check      ..
DBG(88355786)> CPU TRACE: 00044:003472 0 003527601000 (TNZ 003527)

00044:003527 bound_tc_priv:pxss+03527
    003527                        3393  end_cput_check:
    003527  aa  2 00001 2351 00   3395            lda       bp|apte.flags       check stop or ips pending
DBG(88355788)> CPU TRACE: 00044:003527 0 200001235100 (LDA PR2|1)

00044:003532 bound_tc_priv:pxss+03532
    003532  0a   003535 6000 00   3398            tze       no_more_pending     no, skip over SMIC
DBG(88355794)> CPU TRACE: 00044:003532 0 003535600000 (TZE 003535)

    003535                        3401  no_more_pending:
    003535  4a  4 00274 2301 20   3403            lbar      pds$base_addr_reg   get base address reg for new process
DBG(88355796)> CPU TRACE: 00044:003535 0 400242230120 (LBAR PR4|242,N*)

00044:003551 bound_tc_priv:pxss+03551
    003551  0a   003571 6010 00   3421            tnz       already_got_stack_0 Doin' fine, no problem.
DBG(88355820)> CPU TRACE: 00044:003551 0 003571601000 (TNZ 003571)

00044:003571 bound_tc_priv:pxss+03571
    003571                        3439  already_got_stack_0:
    003571  aa  2 00001 2351 00   3441            lda       bp|apte.flags       Load flags once again
DBG(88355822)> CPU TRACE: 00044:003571 0 200001235100 (LDA PR2|1)

00044:003603 bound_tc_priv:pxss+03603
    003603  0a   003611 6000 00   3452            tze       no_idle_no_del      was none, ok to idle
DBG(88355842)> CPU TRACE: 00044:003603 0 003611600000 (TZE 003611)

00044:003611 bound_tc_priv:pxss+03611
    003611                        3458  no_idle_no_del:
    003611  aa  2 00013 7271 00   3460            lxl7      bp|apte.savex7      restore x7 for return
DBG(88355844)> CPU TRACE: 00044:003611 0 200013727100 (LXL7 PR2|13)

0044:003612 bound_tc_priv:pxss+03612
    003612  aa   000000 7100 17   3461            tra       0,7
DBG(88355846)> CPU TRACE: 00044:003612 0 000000710017 (TRA 000000,7)

00044:000365 bound_tc_priv:pxss+0365
    000365                         622  pre_empt.return:
    000365  4a  4 00110 2341 20    623            szn       pds$pre_empt_poll_return      zero => this process entered pre-empt through connect fault
DBG(88355848)> CPU TRACE: 00044:000365 0 400050234120 (SZN PR4|50,N*)

00044:000366 bound_tc_priv:pxss+0366
    000366  4a  4 00112 6001 20    624            tze       wired_fim$pre_empt_return     Return from connect
DBG(88355850)> CPU TRACE: 00044:000366 0 400052600120 (TZE PR4|52,N*)

00034:002452 bound_interceptors:wired_fim+0242
    000242                         450  pre_empt_return:
    000242  0a   000016 3700 20    451            epplp     my_linkage_ptr,*
DBG(88355852)> CPU TRACE: 00034:002452 0 002226370020 (EPP4 002226,N*)

    000244  4a  4 00110 7001 20    454            tsx0      fim_util$restore_mask  restore old controller mask
DBG(88355856)> CPU TRACE: 00034:002454 0 003114700000 (TSX0 003114)

00034:003114 bound_interceptors:fim_util+0124
    000124                         236  restore_mask:
    000124  4a  4 00016 7213 20    237            lxl1      prds$processor_tag  CPU tag in X1
DBG(88355858)> CPU TRACE: 00034:003114 0 400044721320 (LXL1 PR4|44,N*)

00034:003122 bound_interceptors:fim_util+0132
    000132  aa   000000 7102 10    244            tra       0,0                 return to caller
DBG(88355872)> CPU TRACE: 00034:003122 0 000000710210 (TRA 000000,0)

00034:002455 bound_interceptors:wired_fim+0245
    000245                         455            inhibit   on        <+><+><+><+><+><+><+><+><+><+><+><+>
    000245  aa  7 00022 2373 00    457            ldaq      sb|stack_header.stack_begin_ptr  restore prds to virgin state
DBG(88355874)> CPU TRACE: 00034:002455 0 700022237300 (LDAQ PR7|22)

00034:002457 bound_interceptors:wired_fim+0247
    000247  4a  4 00042 7003 20    461            tsx0      fim_util$v_time_calc  compute virtual time
DBG(88355878)> CPU TRACE: 00034:002457 0 003420700200 (TSX0 003420)

00034:003420 bound_interceptors:fim_util+0430
    000430                         552  v_time_calc:
    000430  aa   000001 3352 07    553            lca       1,dl                decrement virtual time count
DBG(88355880)> CPU TRACE: 00034:003420 0 000001335207 (LCA 000001,DL)

00034:003435 bound_interceptors:fim_util+0445
    000445  aa   000000 7102 10    571            tra       0,0
DBG(88355906)> CPU TRACE: 00034:003435 0 000000710210 (TRA 000000,0)

00034:002460 bound_interceptors:wired_fim+0250
    000250  aa  2 00050 4677 00    463            lpl       bp|mc.eis_info      restore EIS ptrs and lths
DBG(88355908)> CPU TRACE: 00034:002460 0 200050467700 (LPL PR2|50)

00034:002463 bound_interceptors:wired_fim+0253
    000253  0a   000004 6132 20    466            rcu       int_scuinfo,*       bye-bye
DBG(88355914)> CPU TRACE: 00034:002463 0 002214613220 (RCU 002214,N*)

Back in idle loop

                                   348  " The following code checks to see if a connect has been delayed.
                                   349  " This is done by seeing if the ring alarm register is nonzero.
                                   350  " If it is, the connect must be reissued now.               

00061:000310 init_processor:+0310
    000310                         346            inhibit   on        <+><+><+><+><+><+><+><+><+><+><+><+>
    000310  4a  4 00064 2143 20    352            sznc      pds$alarm_ring  is ring alarm set?
DBG(88355916)> CPU TRACE: 00061:000310 0 400064214320 (SZNC PR4|64,N*)

00061:000312 init_processor:+0312
    000312  4a  4 00056 0153 20    354            cioc      scs$cow_ptrs,1*
DBG(88355920)> CPU TRACE: 00061:000312 0 400056015320 (CIOC PR4|56,N*)

00061:000327 init_processor:+0327
    000327  0a   000307 7102 00    372            tra       idle_dis
DBG(88355946)> CPU TRACE: 00061:000327 0 000307710200 (TRA 000307)

DBG(88355949)> CPU FAULT: Fault 8(010), sub 0(00), 'Connect'
DBG(88355951)> CPU TRACE: 000310 0 000520657220 (SCU 000520,N*)

Thats wrong; the Connect Fault should be in the DIS...
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License