[VMware] ESXi "SCSI_DeviceClusteringClearState" vMotion PSOD - 헤이든의 전산실 (Hayden's Server Room)

Encountering the dreaded “SCSI_DeviceClusteringClearState” PSOD (Purple Screen of Death) during vMotion operations in your VMware ESXi environment? This is a common yet challenging issue that administrators face when running Microsoft Cluster Service (MSCS) or Oracle RAC virtual machines.

The “SCSI_DeviceClusteringClearState” PSOD primarily occurs in ESXi 6.0 and 6.5 environments when performing vMotion operations on MSCS or Oracle RAC virtual machines. This critical system error is triggered by misconfigurations of non-RDM disks in physical bus sharing mode.

Table of Contents

1. ESXi “SCSI_DeviceClusteringClearState” Error : Symptoms and Environment Analysis

Key Symptoms

When this PSOD occurs, you’ll typically see a backtrace similar to this:

Backtrace for current CPU #xx, worldID=xyxyxy, fp=0x2005
0xyyyzyyyxyzzy:[0xxxxxxyxxxxxx]SCSI_DeviceClusteringClearState@vmkernel#nover+0x8 
0xyyyzyyyxyyyy:[0xxxzxxxxxxxxx]VSCSI_DestroyDevice@vmkernel#nover+0x2b8

Affected Environments

Component	Details
ESXi Versions	ESXi 6.0, 6.5
VM Types	MSCS VM, Oracle RAC VM, VVOLs
SCSI Bus Sharing	Physical Mode
Cluster Configuration	CAB (Cluster Across Box)
Disk Types	Shared non-RDM disk (VMDK, VVOL)

This issue has a higher occurrence rate in the following scenarios:

vMotion operations in physical bus sharing mode
Clustering node VMs containing shared non-RDM disks
CAB configurations with SCSI bus sharing set to Physical

2. ESXi “SCSI_DeviceClusteringClearState” : Root Cause Analysis

The core cause of this PSOD is misconfiguration of non-RDM disks in physical bus sharing mode during vMotion.

Breaking it down further:

SCSI-3 Persistent Reservations Conflict: MSCS uses SCSI-3 Persistent Reservations to control access to shared disks, but during vMotion, this reservation information isn’t properly transferred
Non-RDM Disk Handling Error: Regular VMDKs or VVOLs are improperly handled in physical bus sharing environments
VMkernel Device State Cleanup Failure: Exception occurs during SCSI device state cleanup after vMotion completion

3. Official Patch Resolution

ESXi Patches

VMware has provided official patches to address this issue:

ESXi Version	Patch Name	Reference
ESXi 6.5	ESXi650-201811002	VMware Official Documentation
ESXi 6.0	ESXi600-201909001	VMware Official Documentation

Patch Application Process

Connect to vSphere Client
Navigate to Host > Update Manager
Download and install the respective patch
Reboot the host

Important: Always backup your entire environment before applying patches and perform updates during scheduled maintenance windows.

4. Workaround Solutions

For environments where immediate patch application isn’t feasible, consider these alternative approaches.

Method 1: Shared Storage Configuration Change

Reconfigure to use supported shared storage configurations for MSCS:

Recommended Configurations:

Single Host Cluster: Use one or more shared eagerzeroedthick virtual disks
Physical RDM: Use RDMs in physical compatibility mode
Virtual RDM: Use RDMs in virtual compatibility mode

Method 2: SCSI Controller Separation

Boot Disk SCSI Controller:
- Bus Sharing: None
- Purpose: System disk (C:)

Cluster Shared Disk SCSI Controller:
- Bus Sharing: Physical
- Purpose: Cluster shared disks only

Method 3: vMotion Restriction

As a temporary measure, disable vMotion for MSCS VMs:

Select the VM in vSphere Client
Go to Configure > VM Options > vMotion
Select Disabled

5. Optimized MSCS Configuration

RDM Configuration Best Practices

Component	Recommended Setting
RDM Mode	Physical Compatibility Mode
Storage Protocol	FC, FCoE, Native iSCSI
Path Policy	Round Robin (preferred), Fixed, MRU
Virtual Hardware Version	11 or higher
vMotion Network	10GbE minimum (1GbE not supported)

Detailed Configuration Steps

Step 1: SCSI Controller Separation

SCSI0: Boot disk (Bus Sharing: None)
SCSI1: Cluster shared disk (Bus Sharing: Physical)

Step 2: RDM Setup

Select Physical Compatibility mode
Configure Perennially Reserved flag
Assign consistent SCSI IDs across all ESXi hosts

Step 3: Network Configuration

Configure dedicated heartbeat network
Ensure 10GbE network for vMotion
Set up DRS Anti-affinity rules

6. Monitoring and Prevention

Log Monitoring

Monitor the following logs regularly to catch early warning signs before PSOD occurrence:

# Monitor VMkernel logs
tail -f /var/run/log/vmkernel.log | grep -i "scsi\|cluster"

# Check vMotion-related logs
tail -f /var/run/log/vmkernel.log | grep -i "migrate"

Regular Check Items

Check Item	Frequency	Method
Patch Level	Monthly	vSphere Update Manager
SCSI Configuration	Quarterly	VM Settings Review
Storage Health	Weekly	Array Log Check
vMotion Performance	Real-time	vCenter Monitoring

Backup Strategy

MSCS environments have special backup considerations:

Use agent-based backup solutions (VMware backup limitations due to Physical Bus Sharing)
Implement cluster-aware backup solutions
Maintain application-level backups

Compatibility Matrix

Check the latest compatibility information at VMware Compatibility Guide.

While the “SCSI_DeviceClusteringClearState” PSOD may seem complex, it’s entirely manageable with the right understanding and systematic approach. The most reliable solution is applying the official patches, but the workarounds presented here provide viable alternatives for environments where immediate patching isn’t possible. Prevention is key. When setting up MSCS or Oracle RAC environments, follow VMware’s recommended configuration guidelines from the start and implement regular monitoring to catch issues early.

For additional technical support, contact VMware official support or visit the Broadcom Support Portal for expert assistance.

Post Views: 174

1. ESXi “SCSI_DeviceClusteringClearState” Error : Symptoms and Environment Analysis

Key Symptoms

Affected Environments

2. ESXi “SCSI_DeviceClusteringClearState” : Root Cause Analysis

3. Official Patch Resolution

ESXi Patches

Patch Application Process

4. Workaround Solutions

Method 1: Shared Storage Configuration Change

Method 2: SCSI Controller Separation

Method 3: vMotion Restriction

5. Optimized MSCS Configuration

RDM Configuration Best Practices

Detailed Configuration Steps

6. Monitoring and Prevention

Log Monitoring

Regular Check Items

Backup Strategy

Compatibility Matrix

관련

Leave a ReplyCancel reply

1. ESXi “SCSI_DeviceClusteringClearState” Error : Symptoms and Environment Analysis

Key Symptoms

Affected Environments

2. ESXi “SCSI_DeviceClusteringClearState” : Root Cause Analysis

3. Official Patch Resolution

ESXi Patches

Patch Application Process

4. Workaround Solutions

Method 1: Shared Storage Configuration Change

Method 2: SCSI Controller Separation

Method 3: vMotion Restriction

5. Optimized MSCS Configuration

RDM Configuration Best Practices

Detailed Configuration Steps

6. Monitoring and Prevention

Log Monitoring

Regular Check Items

Backup Strategy

Compatibility Matrix

이 글 공유하기:

관련

Leave a ReplyCancel reply