Sun Solaris XSCF故障诊断
Sun Solaris XSCF故障诊断
1、showhardconf
showhardconf 命令可用于显示有关每个 FRU 的信息。可显示的信息如下所示:
■ 当前配置和状态
■ 安装的 FRU 数
■ 域信息
■ IOBOX 信息
■ PCI 卡的名称属性
XSCF> showhardconf
SPARC Enterprise M4000;
+ Serial:BDF1115196; Operator_Panel_Switch:Locked;
+ Power_Supply_System:Single; SCF-ID:XSCF#0;
+ System_Power:On; System_Phase:Cabinet Power On;
Domain#0 Domain_Status:Running;
MBU_A Status:Normal; Ver:4301h; Serial:BD1114008E ;
+ FRU-Part-Number:CF00541-4359 01 /541-4359-01 ;
+ Memory_Size:64 GB;
+ Type:2;
CPUM#0-CHIP#0 Status:Normal; Ver:0601h; Serial:PP105300QG ;
+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#0-CHIP#1 Status:Normal; Ver:0601h; Serial:PP105300QG ;
+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#1-CHIP#0 Status:Normal; Ver:0601h; Serial:PP104903Y5 ;
+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
CPUM#1-CHIP#1 Status:Normal; Ver:0601h; Serial:PP104903Y5 ;
+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;
+ Freq:2.660 GHz; Type:48;
+ Core:4; Strand:2;
MEMB#0 Status:Normal; Ver:0101h; Serial:BF1109220C ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f244b4c;
+ Type:2A; Size:2 GB;
MEM#0B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f83e611;
+ Type:2A; Size:2 GB;
MEM#1A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f53e611;
+ Type:2A; Size:2 GB;
MEM#1B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f444b4b;
+ Type:2A; Size:2 GB;
* MEM#2A Status:Degraded;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f63e609;
+ Type:2A; Size:2 GB;
MEM#2B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f83e5fa;
+ Type:2A; Size:2 GB;
MEM#3A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f444b4c;
+ Type:2A; Size:2 GB;
MEM#3B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f344b4c;
+ Type:2A; Size:2 GB;
MEMB#1 Status:Normal; Ver:0101h; Serial:BF1036E3DX ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5274b16d;
+ Type:2A; Size:2 GB;
MEM#0B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5214c262;
+ Type:2A; Size:2 GB;
MEM#1A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5234c261;
+ Type:2A; Size:2 GB;
MEM#1B Status:Normal;
+ Code:ce0000000000000001M3 93T5660QZA-CE6 4151-481382de;
+ Type:2A; Size:2 GB;
MEM#2A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5e649f87;
+ Type:2A; Size:2 GB;
MEM#2B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5264b175;
+ Type:2A; Size:2 GB;
MEM#3A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5274b170;
+ Type:2A; Size:2 GB;
MEM#3B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5234c268;
+ Type:2A; Size:2 GB;
MEMB#2 Status:Normal; Ver:0101h; Serial:BF1051HK5T ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833ce5e;
+ Type:2A; Size:2 GB;
MEM#0B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813ce45;
+ Type:2A; Size:2 GB;
MEM#1A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4843ce5f;
+ Type:2A; Size:2 GB;
MEM#1B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833ce5c;
+ Type:2A; Size:2 GB;
MEM#2A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813ce5e;
+ Type:2A; Size:2 GB;
MEM#2B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4883341c;
+ Type:2A; Size:2 GB;
MEM#3A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48833439;
+ Type:2A; Size:2 GB;
MEM#3B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48733428;
+ Type:2A; Size:2 GB;
MEMB#3 Status:Normal; Ver:0101h; Serial:BF1040EUC8 ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;
MEM#0A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a1a3;
+ Type:2A; Size:2 GB;
MEM#0B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48731182;
+ Type:2A; Size:2 GB;
MEM#1A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a19c;
+ Type:2A; Size:2 GB;
MEM#1B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48631182;
+ Type:2A; Size:2 GB;
MEM#2A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a19a;
+ Type:2A; Size:2 GB;
MEM#2B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833a19a;
+ Type:2A; Size:2 GB;
MEM#3A Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48831186;
+ Type:2A; Size:2 GB;
MEM#3B Status:Normal;
+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813a1a2;
+ Type:2A; Size:2 GB;
DDC_A#0 Status:Normal;
DDC_A#1 Status:Normal;
DDC_B#0 Status:Normal;
IOU#0 Status:Normal; Ver:0101h; Serial:BF110617KB ;
+ FRU-Part-Number:CF00541-2240 05 /541-2240-05 ;
+ Type:1;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#2 Name_Property:network; Card_Type:Other;
PCI#3 Name_Property:SUNW,qlc; Card_Type:Other;
PCI#4 Name_Property:SUNW,qlc; Card_Type:Other;
XSCFU Status:Normal,Active; Ver:0101h; Serial:BF11071FKN ;
+ FRU-Part-Number:CF00541-0481 05 /541-0481-05 ;
OPNL Status:Normal; Ver:0101h; Serial:NN11052TLU ;
+ FRU-Part-Number:CF00541-0850 06 /541-0850-06 ;
PSU#0 Status:Normal; Serial:0017527-1108023275;
+ FRU-Part-Number:CF00300-2311 0150 /300-2311-01-50;
+ Power_Status:On; AC:200 V;
PSU#1 Status:Normal; Serial:0017527-1012024046;
+ FRU-Part-Number:CF00300-2011 0250 /300-2011-02-50;
+ Power_Status:On; AC:200 V;
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FANBP_B Status:Normal; Ver:0401h; Serial:NN110736WD ;
+ FRU-Part-Number:CF00541-3098 01 /541-3098-01 ;
FAN_B#0 Status:Normal;
FAN_B#1 Status:Normal;
XSCF>
2、showlogs
showlogs 命令可用于从最早日期开始按时间戳顺序显示指定日志的内容。showlogs
命令显示下列日志:
■ 错误日志
■ 电源日志
■ 事件日志
■ 温度和湿度记录
■ 监视消息日志
■ 控制台消息日志
■ 应急消息日志
■ IPL 消息日志
XSCF> showlogs error
Date: May 05 15:03:27 CST 2014 Code: 80002000-c6ff0000-0104340700000000
Status: Alarm Occurred: May 05 15:03:26.996 CST 2014
FRU: /FAN_A#0
Msg: Unit disappeared unexpectedly
Date: May 05 15:04:23 CST 2014 Code: 80002000-c6ff0000-0104080100000000
Status: Alarm Occurred: May 05 15:04:23.572 CST 2014
FRU: /FAN_A#0
Msg: Unit detected unexpectedly
Date: May 05 15:06:53 CST 2014 Code: 80002000-c6ff0000-0104340700000000
Status: Alarm Occurred: May 05 15:06:53.420 CST 2014
FRU: /FAN_A#0
Msg: Unit disappeared unexpectedly
Date: May 05 15:07:34 CST 2014 Code: 80002000-c6ff0000-0104080100000000
Status: Alarm Occurred: May 05 15:07:34.836 CST 2014
FRU: /FAN_A#0
Msg: Unit detected unexpectedly
Date: Feb 07 13:20:46 CST 2016 Code: 80002000-c3ff0000-0104320100000000
Status: Alarm Occurred: Feb 07 13:20:44.966 CST 2016
FRU: /PSU#1
Msg: PSU failed
Date: Jan 23 02:36:06 CST 2018 Code: 60000000-8a2a0000-10cc000000000000
Status: Warning Occurred: Jan 23 02:36:05.765 CST 2018
FRU: /MBU_A/MEMB#1/MEM#1B
Msg: DIMM permanent correctable error
Date: Sep 06 13:11:15 CST 2018 Code: 60000000-8a2a0000-10cc000000000000
Status: Warning Occurred: Sep 06 13:11:15.396 CST 2018
FRU: /MBU_A/MEMB#0/MEM#2A
Msg: DIMM permanent correctable error
3、showstatus
showstatus 可用于显示服务器上已降级的 FRU 的相关信息。已降级的单元用星号 (*)
指示出来,同时会显示以下任一状态:
■ Normal
■ Faulted
■ Degraded
■ Deconfigured
■ Maintenance
XSCF> showstatus
MBU_A Status:Normal;
MEMB#0 Status:Normal;
* MEM#2A Status:Degraded;
4、fmadump
bash-3.2# fmdump
TIME UUID SUNW-MSG-ID
Sep 06 13:04:37.2512 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S
bash-3.2# fmdump -V -u 168620e1-a275-e9ed-bbff-d8f9da784bc8
TIME UUID SUNW-MSG-ID
Sep 06 2018 13:04:37.251251000 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 168620e1-a275-e9ed-bbff-d8f9da784bc8
code = SUN4U-8000-2S
diag-time = 1536210277 204244
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = SUNW,SPARC-Enterprise
chassis-id = BDF1115196
server-id = sunm4k_1
(end authority)
mod-name = cpumem-diagnosis
mod-version = 1.7
(end de)
fault-list-sz = 0x1
topo-uuid = 4ede8959-9768-eb1c-b6f5-f9f9af63c97c
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.memory.dimm
certainty = 0x5f
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = mem
unum = /MBU_A/MEMB0/MEM2A
serial = 3F63E609:HYMP125P72CP4-Y5
authority = (embedded nvlist)
nvlist version: 0
product-id = SUNW,SPARC-Enterprise
server-id = sunm4k_1
(end authority)
(end asru)
fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = mem
unum = /MBU_A/MEMB0/MEM2A
serial = 3F63E609:HYMP125P72CP4-Y5
authority = (embedded nvlist)
nvlist version: 0
product-id = SUNW,SPARC-Enterprise
server-id = sunm4k_1
(end authority)
(end fru)
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5b90b565 0xef9c938
bash-3.2#
使用 -V 选项时,用户至少会看到另外三行输出:
■ 第一行是以前在控制台消息中显示过的信息摘要,但是现在包括时间戳、UUID 和
消息 ID。
中。诊断可能涉及到多个组件,这时会显示多行,例如,此处显示了两行,每行描述
一个组件。
■ 以 "rsrc" 开头的行说明此故障导致了哪个组件失效。
bash-3.2# fmdump -e
TIME CLASS
Jan 23 2018 02:01:14 ereport.asic.mac.mi-ce
Jan 23 2018 02:01:14 ereport.asic.mac.ptrl-ce
Jan 23 2018 02:01:24 ereport.asic.mac.mi-ce
Jan 23 2018 02:01:35 ereport.asic.mac.mi-ce
Jan 23 2018 02:01:35 ereport.asic.mac.ptrl-ce
Jan 23 2018 02:01:46 ereport.asic.mac.mi-ce
Jan 23 2018 02:01:57 ereport.asic.mac.ptrl-ce
………
5、fmadm faulty/config
bash-3.2# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Sep 06 13:04:37 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S Major
Host : sunm4k_1
Platform : SUNW,SPARC-Enterprise Chassis_id : BDF1115196
Product_sn :
Fault class : fault.memory.dimm 95%
Affects : mem:///unum=/MBU_A/MEMB0/MEM2A
faulted but still in service
FRU : mem:///unum=/MBU_A/MEMB0/MEM2A 95%
faulty
Serial ID. : 3F63E609:HYMP125P72CP4-Y5
Description : The number of correctable errors associated with this memory
module has exceeded acceptable levels.
Response : Pages of memory associated with this memory module have been
removed from service, up to a limit which has now been reached.
Impact : Total system memory capacity has been reduced.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
for the latest service
procedures and policies regarding this diagnosis.
bash-3.2# fmadm config
MODULE VERSION STATUS DESCRIPTION
cpumem-diagnosis 1.7 active CPU/Memory Diagnosis
cpumem-retire 1.1 active CPU/Memory Retire Agent
disk-transport 1.0 active Disk Transport Agent
eft 1.16 active eft diagnosis engine
event-transport 2.0 active Event Transport Module
ext-event-transport 0.1 active External FM event transport
fabric-xlate 1.0 active Fabric Ereport Translater
fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis
fps-transport 1.0 active Solaris FP-Scrubber
io-retire 1.0 active I/O Retire Agent
snmp-trapgen 1.0 active SNMP Trap Generation Agent
sysevent-transport 1.0 active SysEvent Transport Agent
syslog-msgs 1.0 active Syslog Messaging Agent
zfs-diagnosis 1.0 active ZFS Diagnosis Engine
zfs-retire 1.0 active ZFS Retire Agent
6、fmstat
XSCF> fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
eft 0 0 0.0 0.0 0 0 0 0 3.3M 0
event-transport 0 0 0.0 0.0 0 0 0 0 6.4K 0
faultevent-post 2 0 0.0 8.9 0 0 0 0 0 0
fmd-self-diagnosis 24 24 0.0 352.1 0 0 1 0 24b 0
iox_agent 0 0 0.0 0.0 0 0 0 0 0 0
reagent 0 0 0.0 0.0 0 0 0 0 0 0
sysevent-transport 0 0 0.0 8700.4 0 0 0 0 0 0
syslog-msgs 0 0 0.0 0.0 0 0 0 0 97b 0
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~