Encountering the dreaded “SCSI_DeviceClusteringClearState” PSOD (Purple Screen of Death) during vMotion operations in your VMware ESXi environment? This is a common yet challenging issue that administrators face when running Microsoft Cluster Service (MSCS) or Oracle RAC virtual machines.

The “SCSI_DeviceClusteringClearState” PSOD primarily occurs in ESXi 6.0 and 6.5 environments when performing vMotion operations on MSCS or Oracle RAC virtual machines. This critical system error is triggered by misconfigurations of non-RDM disks in physical bus sharing mode.

 

 

1. ESXi “SCSI_DeviceClusteringClearState” Error : Symptoms and Environment Analysis

Key Symptoms

When this PSOD occurs, you’ll typically see a backtrace similar to this:

Backtrace for current CPU #xx, worldID=xyxyxy, fp=0x2005
0xyyyzyyyxyzzy:[0xxxxxxyxxxxxx]SCSI_DeviceClusteringClearState@vmkernel#nover+0x8 
0xyyyzyyyxyyyy:[0xxxzxxxxxxxxx]VSCSI_DestroyDevice@vmkernel#nover+0x2b8 

Affected Environments

Component Details
ESXi Versions ESXi 6.0, 6.5
VM Types MSCS VM, Oracle RAC VM, VVOLs
SCSI Bus Sharing Physical Mode
Cluster Configuration CAB (Cluster Across Box)
Disk Types Shared non-RDM disk (VMDK, VVOL)

This issue has a higher occurrence rate in the following scenarios:

  • vMotion operations in physical bus sharing mode
  • Clustering node VMs containing shared non-RDM disks
  • CAB configurations with SCSI bus sharing set to Physical

 

 

2. ESXi “SCSI_DeviceClusteringClearState” : Root Cause Analysis

The core cause of this PSOD is misconfiguration of non-RDM disks in physical bus sharing mode during vMotion.

Breaking it down further:

  1. SCSI-3 Persistent Reservations Conflict: MSCS uses SCSI-3 Persistent Reservations to control access to shared disks, but during vMotion, this reservation information isn’t properly transferred
  2. Non-RDM Disk Handling Error: Regular VMDKs or VVOLs are improperly handled in physical bus sharing environments
  3. VMkernel Device State Cleanup Failure: Exception occurs during SCSI device state cleanup after vMotion completion

 

 

3. Official Patch Resolution

ESXi Patches

VMware has provided official patches to address this issue:

ESXi Version Patch Name Reference
ESXi 6.5 ESXi650-201811002 VMware Official Documentation
ESXi 6.0 ESXi600-201909001 VMware Official Documentation

Patch Application Process

  1. Connect to vSphere Client
  2. Navigate to Host > Update Manager
  3. Download and install the respective patch
  4. Reboot the host

Important: Always backup your entire environment before applying patches and perform updates during scheduled maintenance windows.

 

 

4. Workaround Solutions

For environments where immediate patch application isn’t feasible, consider these alternative approaches.

Method 1: Shared Storage Configuration Change

Reconfigure to use supported shared storage configurations for MSCS:

Recommended Configurations:

  • Single Host Cluster: Use one or more shared eagerzeroedthick virtual disks
  • Physical RDM: Use RDMs in physical compatibility mode
  • Virtual RDM: Use RDMs in virtual compatibility mode

Method 2: SCSI Controller Separation

Boot Disk SCSI Controller:
- Bus Sharing: None
- Purpose: System disk (C:)

Cluster Shared Disk SCSI Controller:
- Bus Sharing: Physical
- Purpose: Cluster shared disks only

Method 3: vMotion Restriction

As a temporary measure, disable vMotion for MSCS VMs:

  1. Select the VM in vSphere Client
  2. Go to Configure > VM Options > vMotion
  3. Select Disabled

 

 

5. Optimized MSCS Configuration

RDM Configuration Best Practices

Component Recommended Setting
RDM Mode Physical Compatibility Mode
Storage Protocol FC, FCoE, Native iSCSI
Path Policy Round Robin (preferred), Fixed, MRU
Virtual Hardware Version 11 or higher
vMotion Network 10GbE minimum (1GbE not supported)

Detailed Configuration Steps

Step 1: SCSI Controller Separation

SCSI0: Boot disk (Bus Sharing: None)
SCSI1: Cluster shared disk (Bus Sharing: Physical)

Step 2: RDM Setup

  • Select Physical Compatibility mode
  • Configure Perennially Reserved flag
  • Assign consistent SCSI IDs across all ESXi hosts

Step 3: Network Configuration

  • Configure dedicated heartbeat network
  • Ensure 10GbE network for vMotion
  • Set up DRS Anti-affinity rules

 

 

6. Monitoring and Prevention

Log Monitoring

Monitor the following logs regularly to catch early warning signs before PSOD occurrence:

# Monitor VMkernel logs
tail -f /var/run/log/vmkernel.log | grep -i "scsi\|cluster"

# Check vMotion-related logs
tail -f /var/run/log/vmkernel.log | grep -i "migrate"

Regular Check Items

Check Item Frequency Method
Patch Level Monthly vSphere Update Manager
SCSI Configuration Quarterly VM Settings Review
Storage Health Weekly Array Log Check
vMotion Performance Real-time vCenter Monitoring

Backup Strategy

MSCS environments have special backup considerations:

  • Use agent-based backup solutions (VMware backup limitations due to Physical Bus Sharing)
  • Implement cluster-aware backup solutions
  • Maintain application-level backups

Compatibility Matrix

Check the latest compatibility information at VMware Compatibility Guide.

 

 

While the “SCSI_DeviceClusteringClearState” PSOD may seem complex, it’s entirely manageable with the right understanding and systematic approach. The most reliable solution is applying the official patches, but the workarounds presented here provide viable alternatives for environments where immediate patching isn’t possible. Prevention is key. When setting up MSCS or Oracle RAC environments, follow VMware’s recommended configuration guidelines from the start and implement regular monitoring to catch issues early.

For additional technical support, contact VMware official support or visit the Broadcom Support Portal for expert assistance.

 

Leave a Reply