Tim's IT-Blog

Just a blog about IT and IT-Problems…

A analysis of irregular, random SonicWALL NSA reboots by using TSR/StackTraces

by admin on 19/06/2013

On one your SonicWALL firewall we had irregular reboots and failovers on a HA-Configuration. We got more information on downloading the “technical support report (TSR)” and found some stack traces on this report. (If you found such stack traces on this report normaly all you can do is to open a support case at SonicWALL or update the SonicWALL to the lasted (stable) version)

In your case we analysed the problem with the Level3-Support of sonicwall.

Informations inside the TSR:

#version 1.0

Status
Serial number 0017-xxxx-xxxx
Registration code: xxxxxxxx
Current Product Code: 61xx
Base Product Code:    61xx
Expanded Product Code:N/A
Board Type: 0x10
Board ID: 0x4
06/13/2013 14:58:27.160
SonicWALL has been up: 0 Days, 1 Hour, 41 Minutes, 25 Seconds
Firmware version: SonicOS Enhanced 5.8.1.9-58o
Hardware IRV: cannot detect
No debug symbols in firmware
restartRequired: False
Revision:  5.8.1.9-58o
ROM version 5.0.3.3
Previous firmware version: none
min firmware for this hardware: 4.0.0.0
max firmware for this hardware: 0.0.0.0
vers check err: 0
Crypto level: domestic
Processor= “8 x 550 MHz Mips64 Octeon Processor”
Model= NSA 4500
Resource language: eng
512 MB RAM, 512 MB Flash

06/09/2013 17:26:02.640 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/09/2013 19:21:29.848 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/10/2013 13:09:30.640Reboot due to DP Core 3, Exception06/10/2013 13:09:30.640Core Trace 3:
cause: tlb store
r0:0

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 17:48:24
06/10/2013 13:10:32.800 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/10/2013 18:34:50.912Reboot due to DP Core 2, Exception06/10/2013 18:34:50.912Core Trace 2:
cause: tlb store
r0:0

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 5:24:41
06/10/2013 18:35:52.720 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/11/2013 14:16:00.608Reboot due to DP Core 3, Exception06/11/2013 14:16:00.608Core Trace 3:
cause: tlb store
r0:0

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 19:40:30
06/11/2013 14:17:04.112 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/12/2013 11:35:16.816Reboot due to DP Core 2, Exception06/12/2013 11:35:16.816Core Trace 2:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 21:18:37
06/12/2013 11:36:20.944 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/12/2013 12:19:19.560Reboot due to task suspension06/12/2013 12:19:19.560Task Trace tTmrTask:
tTmrTask    827337e0     8e060a90  15 STOP       82b54884 8e0606d0  ad0003     0

epc:0x82b54884
Stack trace of tTmrTask:
0x82b6dcfc
0x82733810
0x82732fc8
0x82733068
0x8272bf2c
0x8272cadc
0x8272c094
0x827770e0
0x827757d0
0x824a8ae4
0x824c0bec
0x824c0708
0x82b533cc
0x82b54e2c
0x82b54884
Up: 0 d, 0:43:23
06/12/2013 12:20:21.896 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/12/2013 14:56:40.048Reboot due to DP Core 4, Exception06/12/2013 14:56:40.048Core Trace 4:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 2:36:42
06/12/2013 14:57:42.848 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/12/2013 16:27:20.864 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/12/2013 16:47:48.848 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/13/2013 10:02:21.464Reboot due to DP Core 4, Exception06/13/2013 10:02:21.464Core Trace 4:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 17:14:56
06/13/2013 10:03:24.864 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/13/2013 10:48:04.480Reboot due to DP Core 2, Exception06/13/2013 10:48:04.480Core Trace 2:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 0:45:4
06/13/2013 10:49:06.848 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/13/2013 11:39:16.304Reboot due to DP Core 4, Exception06/13/2013 11:39:16.304Core Trace 4:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 0:50:33
06/13/2013 11:40:18.864 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100
06/13/2013 13:16:23.496Reboot due to DP Core 2, Exception06/13/2013 13:16:23.496Core Trace 2:
cause: tlb store

cause: tlb store
CAUSE:4000800c
STAT:50100ce3
BADVA:4
EPC:10073340
Stack trace:
0x1007312c

Up: 0 d, 1:36:29
06/13/2013 13:17:25.912 System Startup F:5.8.1.9-58o T:base S:5.0.2.10 R: 5.0.3.3 0017xxxxxxxx p: 6105; b: 6100

On the Core-Informations (here 1 to x and not 0) you can see that the problem is not on the “core-funtions” (like taskmanager etc.) because they run on core 0.

The SonicWALL support analysed that the function mz_allow failed (by decoding the firmware and search for EPC:10073340,Stack trace: 0x1007312c)

On TraceLog_xxxx.wri we found some more informations:

06/13 15:35:56.224: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:56.224: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP core 2
06/13 15:35:58.224: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:58.224: stacktrace –
06/13 15:36:29.224: CRITICAL – Informational: dpCoresCheck:764:
06/13 15:36:29.224: CRITICAL – Exception occurred in DP Core 2; unit will be rebooted …
06/13 15:36:31.368: reboot – *** software watchdog reboot ***
showed 48 log events
log core 01: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.240: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.240: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.240: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.240: stacktrace –
showed 4 log events
log core 02: num exceptions 1, time 08/01 07:00:00.896: state events logged 0, missed 0 – critical events logged 5, missed 0
******************************************************************
Stored DP side: Core 2: Unhandled Exception.
Firmware Version: SonicOS Enhanced 5.8.1.9-58o
Directory: /depot-155518-18/Octeon/5.8.1/m2/target/oct_mips64/sw_octeon4500-cp-base
cause: tlb store
******************************************************************
r0 ($00): 0x0000000000000000       s0 ($16): 0xffffffff83502fa0
at ($01): 0x0000000000000000       s1 ($17): 0xffffffff88d551c0
v0 ($02): 0x0000000000000000       s2 ($18): 0x00000000000000c0
v1 ($03): 0xffffffff88d5d680       s3 ($19): 0x0000000000000000
a0 ($04): 0x000000000000155e       s4 ($20): 0xffffffff83502ff0
a1 ($05): 0x00000000101c6328       s5 ($21): 0x00000000101c0000
a2 ($06): 0x000000000000082e       s6 ($22): 0xffffffff83fe2740
a3 ($07): 0x000000000c34eff8       s7 ($23): 0x0000000001c82468
t0 ($08): 0x0000000000000020       t8 ($24): 0x00000000000000a8
t1 ($09): 0x0000000000000008       t9 ($25): 0x0000000000000033
t2 ($10): 0x00000000000f4242       k0 ($26): 0x0000000010127da0
t3 ($11): 0x0000000000000001       k1 ($27): 0xffffffff80001190
t4 ($12): 0xffffffff83030000       gp ($28): 0x00000000120148b0
t5 ($13): 0x0000000000000033       sp ($29): 0x000000001601f2f8
t6 ($14): 0x000000000000000c       s8 ($30): 0x0000000000000001
t7 ($15): 0x00000000000000c0       ra ($31): 0x000000001007312c
COP0_CAUSE: 0x000000004000800c
COP0_STATUS: 0x0000000050100ce3
COP0_BADVADDR: 0x0000000000000004
COP0_EPC: 0x0000000010073340
******************************************************************
Stack trace:
0x1007312c

——-critial (non-wrapping) log:
06/13 15:35:48.256: exception – DP side: Core 2: Unhandled Exception.
06/13 15:35:48.256: regset –
r0 ($00): 0x0000000000000000       s0 ($16): 0xffffffff83502fa0
at ($01): 0x0000000000000000       s1 ($17): 0xffffffff88d551c0
v0 ($02): 0x0000000000000000       s2 ($18): 0x00000000000000c0
v1 ($03): 0xffffffff88d5d680       s3 ($19): 0x0000000000000000
a0 ($04): 0x000000000000155e       s4 ($20): 0xffffffff83502ff0
a1 ($05): 0x00000000101c6328       s5 ($21): 0x00000000101c0000
a2 ($06): 0x000000000000082e       s6 ($22): 0xffffffff83fe2740
a3 ($07): 0x000000000c34eff8       s7 ($23): 0x0000000001c82468
t0 ($08): 0x0000000000000020       t8 ($24): 0x00000000000000a8
t1 ($09): 0x0000000000000008       t9 ($25): 0x0000000000000033
t2 ($10): 0x00000000000f4242       k0 ($26): 0x0000000010127da0
t3 ($11): 0x0000000000000001       k1 ($27): 0xffffffff80001190
t4 ($12): 0xffffffff83030000       gp ($28): 0x00000000120148b0
t5 ($13): 0x0000000000000033       sp ($29): 0x000000001601f2f8
t6 ($14): 0x000000000000000c       s8 ($30): 0x0000000000000001
t7 ($15): 0x00000000000000c0       ra ($31): 0x000000001007312c
06/13 15:35:48.256: exc-regset –
COP0_CAUSE: 0x000000004000800c
COP0_STATUS: 0x0000000050100ce3
COP0_BADVADDR: 0x0000000000000004
COP0_EPC: 0x0000000010073340
06/13 15:35:48.256: stacktrace –
06/13 15:35:48.256: exception – *** exception reboot ***
showed 5 log events
log core 03: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.224: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.224: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.224: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.224: stacktrace –
showed 4 log events
log core 04: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.240: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.240: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.240: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.240: stacktrace –
showed 4 log events
log core 05: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.224: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.224: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.224: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.224: stacktrace –
showed 4 log events
log core 06: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.224: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.224: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.224: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.224: stacktrace –
showed 4 log events
log core 07: num exceptions 0, state events logged 0, missed 0 – critical events logged 4, missed 0
——-critial (non-wrapping) log:
06/13 15:35:50.224: CRITICAL – Informational: lock_timeout_warn:59:
06/13 15:35:50.224: CRITICAL – spinlock 0x83502ff0 is held for more than 1100000000 ticks. Currently held by DP:2
06/13 15:35:52.224: swAssert – swAssert failed: 0, file ../../../../firmware/fw/lib/sw_spinlock.c, line 156
06/13 15:35:52.224: stacktrace –

Spinlock are use by VPNs and/or SSL-VPNs. In our case the bug was found on the VPN-Impemtaion and was fixed on the new availabe early firmware release.