Microsoft Azure Stack HCI (2024)

Table of Contents
5.1. 17.0, 17.1 Only: XID error 120 causes multiple issues with NVIDIA vGPU on GPUs with a GSP Description Status Ref. # 5.2. 17.0, 17.1 Only: XID error 119 causes the hypervisor host to hang or crash when multiple vGPU VMs are shut down Description Status Ref. # 5.3. NVIDIA Control Panel is not available in multiuser environments Description Preventing this Issue Workaround Status Ref. # 5.4. CUDA profilers cannot gather hardware metrics on NVIDIA vGPU Description Version Status Ref. # 5.5. NVIDIA vGPU software graphics driver for Windows sends a remote call to ngx.download.nvidia.com Description Workaround Status Ref. # 5.6. Multiple RDP session reconnections on Windows Server 2022 can consume all frame buffer Description Version Workaround Status Ref. # 5.7. NLS client fails to acquire a license with the error The allowed time to process response has expired Description Workaround Status Ref. # 5.8. With multiple active sessions, NVIDIA Control Panel incorrectly shows that the system is unlicensed Description Status Ref. # 5.9. VP9 and AV1 decoding with web browsers are not supported on Microsoft Windows Server 2019 Description Version Status Ref. # 5.10. After an upgrade of the Linux graphics driver from a Debian package, the driver is not loaded into the VM Description Workaround Status Ref. # 5.11. The reported NVENC frame rate is double the actual frame rate Description Status Ref. # 5.12. NVENC does not work with Teradici Cloud Access Software on Windows Description Status Ref. # 5.13. A licensed client might fail to acquire a license if a proxy is set Description Workaround Status Ref. # 5.14. Session connection fails with four 4K displays and NVENC enabled on a 2Q, 3Q, or 4Q vGPU Description Workaround Status Ref. # 5.15. Disconnected sessions cannot be reconnected or might be reconnected very slowly with NVWMI installed Description Workaround Status Ref. # 5.16. Idle Teradici Cloud Access Software session disconnects from Linux VM Description Status Ref. # 5.17. Idle NVIDIA A100, NVIDIA A40, and NVIDIA A10 GPUs show 100% GPU utilization Description Workaround Status Ref. # 5.18. Guest VM frame buffer listed by nvidia-smi for vGPUs on GPUs that support SRIOV is incorrect Description Status Ref. # 5.19. Driver upgrade in a Linux guest VM with multiple vGPUs might fail Description Workaround Status Ref. # 5.20. NVIDIA Control Panel fails to start if launched too soon from a VM without licensing information Description Workaround Status Ref. # 5.21. On Linux, the frame rate might drop to 1 after several minutes Description Workaround Status Ref. # 5.22. Microsoft DDA fails with some GPUs Description Workaround Status Ref. # 5.23. DWM crashes randomly occur in Windows VMs Description Version Status Ref. # 5.24. Citrix Virtual Apps and Desktops session freezes when the desktop is unlocked Description Version Workaround Status Ref. # 5.25. NVIDIA vGPU software graphics driver fails after Linux kernel upgrade with DKMS enabled Description Workaround Status Ref. # 5.26. Blue screen crash occurs or no devices are found after VM reset Description Workaround Status Ref. # 5.27. ECC memory settings for a vGPU cannot be changed by using NVIDIA X Server Settings Description Workaround Status Ref. # 5.28. Changes to ECC memory settings for a Linux vGPU VM by nvidia-smi might be ignored Description Workaround Status Ref. # 5.29. Host core CPU utilization is higher than expected for moderate workloads Description Workaround Status Ref. # 5.30. Frame capture while the interactive logon message is displayed returns blank screen Description Workaround Status Ref. # 5.31. RDS sessions do not use the GPU with some Microsoft Windows Server releases Description Version Solution Description Status Ref. # 5.33. nvidia-smi reports that vGPU migration is supported on all hypervisors Description Status Ref. # 5.34. A segmentation fault in DBus code causes nvidia-gridd to exit on Red Hat Enterprise Linux and CentOS Description Version Status Ref. # 5.35. No Manage License option available in NVIDIA X Server Settings by default Description Workaround Status 5.36. Licenses remain checked out when VMs are forcibly powered off Description Resolution Status Ref. # 5.37. VM bug checks after the guest VM driver for Windows 10 RS2 is installed Description Fix Workaround Status Ref. # 5.38. GNOME Display Manager (GDM) fails to start on Red Hat Enterprise Linux 7.2 and CentOS 7.0 Description Workaround Status Ref. #

5.1. 17.0, 17.1 Only: XID error 120 causes multiple issues with NVIDIA vGPU on GPUs with a GSP

Description

XID error 120 causes multiple issues with VMs configured with NVIDIA vGPU on a physical GPU that includes a GPU System Processor (GSP), such as GPUs based on the NVIDIA Ada Lovelace GPU architecture. Examples of these issues include:

  • VMs hang or crash.
  • VMs fail to power on after hanging or a crashing.
  • The hypervisor host crashes.

Status

Resolved in NVIDIA vGPU software 17.2

Ref. #

4600308

5.2. 17.0, 17.1 Only: XID error 119 causes the hypervisor host to hang or crash when multiple vGPU VMs are shut down

Description

When multiple VMs configured with NVIDIA vGPU are shut down simultaneously, XID error 119 causes the hypervisor host to hang or crash. This issue affects VMs configured with NVIDIA vGPU on a physical GPU that includes a GPU System Processor (GSP), such as GPUs based on the NVIDIA Ada Lovelace GPU architecture.

Status

Resolved in NVIDIA vGPU software 17.2

Ref. #

4644559

5.3. NVIDIA Control Panel is not available in multiuser environments

Description

After the NVIDIA vGPU software graphics driver for Windows is installed, the NVIDIA Control Panel app might be missing from the system. This issue typically occurs in the following situations:

  • Multiple users connect to virtual machines by using remote desktop applications such as Microsoft RDP, VMware Horizon, and Citrix Virtual Apps and Desktops.
  • VM instances are created by using Citrix Machine Creation Services (MCS) or VMware Instant Clone technology.
  • Roaming user desktop profiles are deployed.

This issue occurs because the NVIDIA Control Panel app is now distributed through the Microsoft Store. The NVIDIA Control Panel app might fail to be installed when the NVIDIA vGPU software graphics driver for Windows is installed if the Microsoft Store app is disabled, the system is not connected to the Internet, or installation of apps from the Microsoft Store is blocked by your system settings. To determine whether the NVIDIA Control Panel app is installed on your system, use the Windows Settings app or the Get-AppxPackage Windows PowerShell command.

  • To use the Windows Settings app:

    1. From the Windows Start menu, choose Settings > Apps > Apps & feautures.
    2. In the Apps & features window, type nvidia control panel in the search box and confirm that the NVIDIA Control Panel app is found.
  • To use the Get-AppxPackageWindows PowerShell command:

    1. Run Windows PowerShell as Administrator.
    2. Determine whether the NVIDIA Control Panel app is installed for the current user.

      Copy

      Copied!

       

      PS C:\> Get-AppxPackage -Name NVIDIACorp.NVIDIAControlPanel

    3. Determine whether the NVIDIA Control Panel app is installed for all users.

      Copy

      Copied!

       

      PS C:\> Get-AppxPackage -AllUsers -Name NVIDIACorp.NVIDIAControlPanel

      This example shows that the NVIDIA Control Panel app is installed for the users Administrator, pliny, and trajan.

      Copy

      Copied!

       

      PS C:\> Get-AppxPackage -AllUsers -Name NVIDIACorp.NVIDIAControlPanelName : NVIDIACorp.NVIDIAControlPanelPublisher : CN=D6816951-877F-493B-B4EE-41AB9419C326Architecture : X64ResourceId :Version : 8.1.964.0PackageFullName : NVIDIACorp.NVIDIAControlPanel_8.1.964.0_x64__56jybvy8sckqjInstallLocation : C:\Program Files\WindowsApps\NVIDIACorp.NVIDIAControlPanel_8.1.964.0_x64__56jybvy8sckqjIsFramework : FalsePackageFamilyName : NVIDIACorp.NVIDIAControlPanel_56jybvy8sckqjPublisherId : 56jybvy8sckqjPackageUserInformation : {S-1-12-1-530092550-1307989247-1105462437-500 [Administrator]: Installed, S-1-12-1-530092550-1307989247-1105462437-1002 [pliny]: Installed, S-1-12-1-530092550-1307989247-1105462437-1003 [trajan]: Installed}IsResourcePackage : FalseIsBundle : FalseIsDevelopmentMode : FalseNonRemovable : FalseIsPartiallyStaged : FalseSignatureKind : StoreStatus : Ok

Preventing this Issue

If your system does not allow the installation apps from the Microsoft Store, download and run the standalone NVIDIA Control Panel installer that is available from NVIDIA Licensing Portal. For instructions, refer to Virtual GPU Software User Guide. If your system can allow the installation apps from the Microsoft Store, ensure that:

  • The Microsoft Store app is enabled.
  • Installation of Microsoft Store apps is not blocked by your system settings.
  • No local or group policies are set to block Microsoft Store apps.

Workaround

If the NVIDIA Control Panel app is missing, install it separately from the graphics driver by downloading and running the standalone NVIDIA Control Panel installer that is available from NVIDIA Licensing Portal. For instructions, refer to Virtual GPU Software User Guide.

If the issue persists, contact NVIDIA Enterprise Support for further assistance.

Status

Open

Ref. #

3999308

5.4. CUDA profilers cannot gather hardware metrics on NVIDIA vGPU

Description

NVIDIA CUDA Toolkit profilers cannot gather hardware metrics on NVIDIA vGPU. This issue affects only traces that gather hardware metrics. Other traces are not affected by this issue and work normally.

Version

This issue affects NVIDIA vGPU software releases starting with 15.2.

Status

Open

Ref. #

4041169

5.5. NVIDIA vGPU software graphics driver for Windows sends a remote call to ngx.download.nvidia.com

Description

After the NVIDIA vGPU software graphics for windows has been installed in the guest VM, the driver sends a remote call to ngx.download.nvidia.com to download and install additional components. Such a remote call might be a security issue.

Workaround

Before running the NVIDIA vGPU software graphics driver installer, disable the remote call to ngx.download.nvidia.com by setting the following Windows registry key:

Copy

Copied!

 

[HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\Global\NGXCore]"EnableOTA"=dword:00000000

Note:

If this Windows registry key is set to 1 or deleted, the remote call to ngx.download.nvidia.com is enabled again.

Status

Open

Ref. #

4031840

5.6. Multiple RDP session reconnections on Windows Server 2022 can consume all frame buffer

Description

Multiple RDP session reconnections in a Windows Server 2022 guest VM can consume all the frame buffer of a vGPU or physical GPU. When this issue occurs, users’ screens becomes black, their sessions are disconnected but left intact, and they cannot log on again. The following error message is written to the event log on the hypervisor host:

Copy

Copied!

 

The Desktop Window Manager process has exited. (Process exit code: 0xe0464645, Restart count: 1, Primary display device ID: )

Version

This issue affects only the Windows Server 2022 guest OS.

Workaround

Periodically restart the Windows Server 2022 guest VM to prevent all frame buffer from being consumed.

Status

Open

Ref. #

3583766

5.7. NLS client fails to acquire a license with the error The allowed time to process response has expired

Description

A licensed client of NVIDIA License System (NLS) fails to acquire a license with the error The allowed time to process response has expired. This error can affect clients of a Cloud License Service (CLS) instance or a Delegated License Service (DLS) instance.

This error occurs when the time difference between the system clocks on the client and the server that hosts the CLS or DLS instance is greater than 10 minutes. A common cause of this error is the failure of either the client or the server to adjust its system clock when daylight savings time begins or ends. The failure to acquire a license is expected to prevent clock windback from causing licensing errors.

Workaround

Ensure that system clock time of the client and any server that hosts a DLS instance match the current time in the time zone where they are located. To prevent this error from occurring when daylight savings time begins or ends, enable the option to automatically adjust the system clock for daylight savings time:

  • Windows: Set the Adjust for daylight saving time automatically option.
  • Linux: Use the hwclock command.

Status

Not a bug

Ref. #

3859889

5.8. With multiple active sessions, NVIDIA Control Panel incorrectly shows that the system is unlicensed

Description

In an environment with multiple active desktop sessions, the Manage License page of NVIDIA Control Panel shows that a licensed system is unlicensed. However, the nvidia-smi command and the management interface of the NVIDIA vGPU software license server correctly show that the system is licensed. When an active session is disconnected and reconnected, the NVIDIA Display Container service crashes.

The Manage License page incorrectly shows that the system is unlicensed because of stale data in NVIDIA Control Panel in an environment with multiple sessions. The data is stale because NVIDIA Control Panel fails to get and update the settings for remote sessions when multiple sessions or no sessions are active in the VM. The NVIDIA Display Container service crashes when a session is reconnected because the session is not active at the moment of reconnection.

Status

Open

Ref. #

3761243

5.9. VP9 and AV1 decoding with web browsers are not supported on Microsoft Windows Server 2019

Description

VP9 and AV1 decoding with web browsers are not supported on Microsoft Windows Server 2019 and later supported releases. This issue occurs because starting with Windows Server 2019, the required codecs are not included with the OS and are not available through the Microsoft Store app. As a result, hardware decoding is not available for viewing YouTube videos or using collaboration tools such as Google Meet in a web browser.

Version

This issue affects Microsoft Windows Server releases starting with Windows Server 2019.

Status

Not an NVIDIA bug

Ref. #

200756564

5.10. After an upgrade of the Linux graphics driver from a Debian package, the driver is not loaded into the VM

Description

After the NVIDIA vGPU software graphics driver for Linux is upgraded from a Debian package, the driver is not loaded into the VM.

Workaround

Use one of the following workarounds to load the driver into the VM:

Status

Not a bug

Ref. #

200748806

5.11. The reported NVENC frame rate is double the actual frame rate

Description

The frame rate in frames per second (FPS) for the NVIDIA hardware-based H.264/HEVC video encoder (NVENC) reported by the nvidia-smi encodersessions command and NVWMI is double the actual frame rate. Only the reported frame rate is incorrect. The actual encoding of frames is not affected.

This issue affects only Windows VMs that are configured with NVIDIA vGPU.

Status

Open

Ref. #

2997564

5.12. NVENC does not work with Teradici Cloud Access Software on Windows

Description

The NVIDIA hardware-based H.264/HEVC video encoder (NVENC) does not work with Teradici Cloud Access Software on Windows. This issue affects NVIDIA vGPU and GPU pass through deployments.

This issue occurs because the check that Teradici Cloud Access Software performs on the DLL signer name is case sensitive and NVIDIA recently changed the case of the company name in the signature certificate.

Status

Not an NVIDIA bug

This issue is resolved in the latest 21.07 and 21.03 Teradici Cloud Access Software releases.

Ref. #

200749065

5.13. A licensed client might fail to acquire a license if a proxy is set

Description

If a proxy is set with a system environment variable such as HTTP_PROXY or HTTPS_PROXY, a licensed client might fail to acquire a license.

Workaround

Perform this workaround on each affected licensed client.

  1. Add the address of the NVIDIA vGPU software license server to the system environment variable NO_PROXY.

    The address must be specified exactly as it is specified in the client’s license server settings either as a fully-qualified domain name or an IP address. If the NO_PROXY environment variable contains multiple entries, separate the entries with a comma (,).

    If high availability is configured for the license server, add the addresses of the primary license server and the secondary license server to the system environment variable NO_PROXY.

  2. Restart the NVIDIA driver service that runs the core NVIDIA vGPU software logic.

    • On Windows, restart the NVIDIA Display Container service.
    • On Linux, restart the nvidia-gridd service.

Status

Closed

Ref. #

200704733

5.14. Session connection fails with four 4K displays and NVENC enabled on a 2Q, 3Q, or 4Q vGPU

Description

Desktop session connections fail for a 2Q, 3Q, or 4Q vGPU that is configured with four 4K displays and for which the NVIDIA hardware-based H.264/HEVC video encoder (NVENC) is enabled. This issue affects only Teradici Cloud Access Software sessions on Linux guest VMs.

This issue is accompanied by the following error message:

Copy

Copied!

 

This Desktop has no resources available or it has timed out

This issue is caused by insufficient frame buffer.

Workaround

Ensure that sufficient frame buffer is available for all the virtual displays that are connected to a vGPU by changing the configuration in one of the following ways:

  • Reducing the number of virtual displays. The number of 4K displays supported with NVENC enabled depends on the vGPU.
    vGPU 4K Displays Supported with NVENC Enabled
    2Q 1
    3Q 2
    4Q 3
  • Disabling NVENC. The number of 4K displays supported with NVENC disabled depends on the vGPU.
    vGPU 4K Displays Supported with NVENC Disabled
    2Q 2
    3Q 2
    4Q 4
  • Using a vGPU type with more frame buffer. Four 4K displays with NVENC enabled on any Q-series vGPU with at least 6144 MB of frame buffer are supported.

Status

Not an NVIDIA bug

Ref. #

200701959

5.15. Disconnected sessions cannot be reconnected or might be reconnected very slowly with NVWMI installed

Description

Disconnected sessions cannot be reconnected or might be reconnected very slowly when the NVIDIA Enterprise Management Toolkit (NVWMI) is installed. This issue affects Citrix Virtual Apps and Desktops and VMware Horizon sessions on Windows guest VMs.

Workaround

Uninstall NVWMI.

Status

Open

Ref. #

3262923

5.16. Idle Teradici Cloud Access Software session disconnects from Linux VM

Description

After a Teradici Cloud Access Software session has been idle for a short period of time, the session disconnects from the VM. When this issue occurs, the error messages NVOS status 0x19 and vGPU Message 21 failed are written to the log files on the hypervisor host. This issue affects only Linux guest VMs.

Status

Open

Ref. #

200689126

5.17. Idle NVIDIA A100, NVIDIA A40, and NVIDIA A10 GPUs show 100% GPU utilization

Description

The nvidia-smi command shows 100% GPU utilization for NVIDIA A100, NVIDIA A40, and NVIDIA A10 GPUs even if no vGPUs have been configured or no VMs are running.

Copy

Copied!

 

[root@host ~]# nvidia-smiFri Jun 14 11:45:28 2024+-----------------------------------------------------------------------------+| NVIDIA-SMI 550.90.05 Driver Version: 550.90.05 CUDA Version: 12.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 A100-PCIE-40GB On | 00000000:5E:00.0 Off | 0 || N/A 50C P0 97W / 250W | 0MiB / 40537MiB | 100% Default || | | Disabled |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+

Workaround

After this workaround has been completed, the nvidia-smi command shows 0% GPU utilization for affected GPUs when they are idle.

Copy

Copied!

 

root@host ~]# nvidia-smiFri Jun 14 11:47:38 2024+-----------------------------------------------------------------------------+| NVIDIA-SMI 550.90.05 Driver Version: 550.90.05 CUDA Version: 12.4 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 A100-PCIE-40GB On | 00000000:5E:00.0 Off | 0 || N/A 50C P0 97W / 250W | 0MiB / 40537MiB | 0% Default || | | Disabled |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+

Status

Open

Ref. #

200605527

5.18. Guest VM frame buffer listed by nvidia-smi for vGPUs on GPUs that support SRIOV is incorrect

Description

The amount of frame buffer listed in a guest VM by the nvidia-smi command for vGPUs on GPUs that support Single Root I/O Virtualization (SR-IOV) is incorrect. Specifically, the amount of frame buffer listed is the amount of frame buffer allocated for the vGPU type minus the size of the VMMU segment (vmmu_page_size). Examples of GPUs that support SRIOV are GPUs based on the NIVIDIA Ampere architecture, such as NVIDA A100 PCIe 40GB or NVIDA A100 HGX 40GB.

For example, frame buffer for -4C and -20C vGPU types is listed as follows:

  • For -4C vGPU types, frame buffer is listed as 3963 MB instead of 4096 MB.
  • For -20C vGPU types, frame buffer is listed as 20347 MB instead of 20480 MB.

Status

Open

Ref. #

200524749

5.19. Driver upgrade in a Linux guest VM with multiple vGPUs might fail

Description

Upgrading the NVIDIA vGPU software graphics driver in a Linux guest VM with multiple vGPUs might fail. This issue occurs if the driver is upgraded by overinstalling the new release of the driver on the current release of the driver while the nvidia-gridd service is running in the VM.

Workaround

  1. Stop the nvidia-gridd service.
  2. Try again to upgrade the driver.

Status

Open

Ref. #

200633548

5.20. NVIDIA Control Panel fails to start if launched too soon from a VM without licensing information

Description

If NVIDIA licensing information is not configured on the system, any attempt to start NVIDIA Control Panel by right-clicking on the desktop within 30 seconds of the VM being started fails.

Workaround

Restart the VM and wait at least 30 seconds before trying to launch NVIDIA Control Panel.

Status

Open

Ref. #

200623179

5.21. On Linux, the frame rate might drop to 1 after several minutes

Description

On Linux, the frame rate might drop to 1 frame per second (FPS) after NVIDIA vGPU software has been running for several minutes. Only some applications are affected, for example, glxgears. Other applications, such as Unigine Heaven, are not affected. This behavior occurs because Display Power Management Signaling (DPMS) for the Xorg server is enabled by default and the display is detected to be inactive even when the application is running. When DPMS is enabled, it enables power saving behavior of the display after several minutes of inactivity by setting the frame rate to 1 FPS.

Workaround

  1. If necessary, stop the Xorg server.

     

    # /etc/init.d/xorg stop

  2. In a plain text editor, edit the /etc/X11/xorg.conf file to set the options to disable DPMS and disable the screen saver.

    1. In the Monitor section, set the DPMS option to false.

      Copy

      Copied!

       

      Option "DPMS" "false"

    2. At the end of the file, add a ServerFlags section that contains option to disable the screen saver.

      Copy

      Copied!

       

      Section "ServerFlags" Option "BlankTime" "0" EndSection

    3. Save your changes to /etc/X11/xorg.conf file and quit the editor.
  3. Start the Xorg server.

    Copy

    Copied!

     

    # etc/init.d/xorg start

Status

Open

Ref. #

200605900

5.22. Microsoft DDA fails with some GPUs

Description

Microsoft Discrete Device Assignment (DDA) fails with GPUs that have more than 16 GB of GPU memory. After the NVIDIA vGPU software graphics driver is installed in the guest VM, a second display device appears on the GPU and the driver prompts for a reboot. After the reboot, the device disappears and the Microsoft Hyper-V Video device appears.

This issue occurs because less memory-mapped input/output (MMIO) space is configured for the operating system than the device requires.

Workaround

Perform this workaround in a Windows Power Shell window on the hypervisor host.

Set the upper MMIO space to the amount that the device requires to allow all of the MMIO to be mapped. Upper MMIO space starts at approximately 64 GB in address space.

Copy

Copied!

 

Set-VM –HighMemoryMappedIoSpace mmio-space –VMName vm-name

mmio-space
The amount of MMIO space that the device requires, appended with the appropriate unit of measurement, for example, 64GB for 64 GB of MMIO space.

The required amount of MMIO space depends on the amount of BAR1 memory on the installed GPUs and the number of GPUs assigned to the VM as follows:

mmio-space = 2 ˟ gpu-bar1-memory ˟ assigned-gpus

gpu-bar1-memory
The amount of BAR1 memory on one of the installed GPUs. For example, in a server in which eight GPUs are installed and each GPU has 32 GB of BAR1 memory, gpu-bar1-memory is 32 GB.
assigned-gpus
The number of GPUs assigned to the VM.
vm-name
The name of the VM to which the GPU is assigned.

The following example sets the upper MMIO space to 64 GB for the VM named mygpuvm, to which one GPU with 32 GB of BAR1 memory is assigned.

Copy

Copied!

 

Set-VM –HighMemoryMappedIoSpace 64GB –VMName mygpuvm

For more information, see Deploy graphics devices using Discrete Device Assignment on the Microsoft technical documentation site.

Status

Not an NVIDIA bug

Ref. #

2812853

5.23. DWM crashes randomly occur in Windows VMs

Description

Desktop Windows Manager (DWM) crashes randomly occur in Windows VMs, causing a blue-screen crash and the bug check CRITICAL_PROCESS_DIED. Computer Management shows problems with the primary display device.

Version

This issue affects Windows 10 1809, 1903 and 1909 VMs.

Status

Not an NVIDIA bug

Ref. #

2730037

5.24. Citrix Virtual Apps and Desktops session freezes when the desktop is unlocked

Description

When a Citrix Virtual Apps and Desktops session that is locked is unlocked by pressing Ctrl+Alt+Del, the session freezes. This issue affects only VMs that are running Microsoft Windows 10 1809 as a guest OS.

Version

Microsoft Windows 10 1809 guest OS

Workaround

Restart the VM.

Status

Not an NVIDIA bug

Ref. #

2767012

5.25. NVIDIA vGPU software graphics driver fails after Linux kernel upgrade with DKMS enabled

Description

After the Linux kernel is upgraded (for example by running sudo apt full-upgrade) with Dynamic Kernel Module Support (DKMS) enabled, the nvidia-smi command fails to run. If DKMS is enabled, an upgrade to the Linux kernel triggers a rebuild of the NVIDIA vGPU software graphics driver. The rebuild of the driver fails because the compiler version is incorrect. Any attempt to reinstall the driver fails because the kernel fails to build.

When the failure occurs, the following messages are displayed:

Copy

Copied!

 

-> Installing DKMS kernel module: ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 550.54.14 -k 5.3.0-28-generic`: Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area... 'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.3.0-28-generic IGNORE_CC_MISMATCH='' modules...(bad exit status: 2) ERROR (dkms apport): binary package for nvidia: 550.54.14 not found Error! Bad return status for module build on kernel: 5.3.0-28-generic (x86_64) Consult /var/lib/dkms/nvidia/ 550.54.14/build/make.log for more information. -> error. ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Workaround

When installing the NVIDIA vGPU software graphics driver with DKMS enabled, use one of the following workarounds:

  • Before running the driver installer, install the dkms package, then run the driver installer with the -dkms option.
  • Run the driver installer with the --no-cc-version-check option.

Status

Not a bug.

Ref. #

2836271

5.26. Blue screen crash occurs or no devices are found after VM reset

Description

If a VM on Microsoft Windows Server with Hyper-V role is reset from the hypervisor host, a blue screen crash (BSOD) occurs on Windows VMs and the nvidia-smi command reports No devices were found on Linux VMs. This issue occurs only on Windows Server 2019 with Tesla T4 GPUs with SRIOV enabled, Quadro RTX 8000 passive GPUs, and Quadro RTX 6000 passive GPUs.

Workaround

Contact NVIDIA Enterprise Support for a workaround for this issue, referencing the knowledge base article Workaround for Blue Screen Crashes On Hyper-V DDA With SRIOV-Enabled GPUs. This article is available only to NVIDIA Enterprise Support personnel.

Status

Not an NVIDIA bug

Ref. #

200567935

5.27. ECC memory settings for a vGPU cannot be changed by using NVIDIA X Server Settings

Description

The ECC memory settings for a vGPU cannot be changed from a Linux guest VM by using NVIDIA X Server Settings. After the ECC memory state has been changed on the ECC Settings page and the VM has been rebooted, the ECC memory state remains unchanged.

Workaround

Use the nvidia-smi command in the guest VM to enable or disable ECC memory for the vGPU as explained in Virtual GPU Software User Guide.

If the ECC memory state remains unchanged even after you use the nvidia-smi command to change it, use the workaround in Changes to ECC memory settings for a Linux vGPU VM by nvidia-smi might be ignored.

Status

Open

Ref. #

200523086

5.28. Changes to ECC memory settings for a Linux vGPU VM by nvidia-smi might be ignored

Description

After the ECC memory state for a Linux vGPU VM has been changed by using the nvidia-smi command and the VM has been rebooted, the ECC memory state might remain unchanged.

This issue occurs when multiple NVIDIA configuration files in the system cause the kernel module option for setting the ECC memory state RMGuestECCState in /etc/modprobe.d/nvidia.conf to be ignored.

When the nvidia-smi command is used to enable ECC memory, the file /etc/modprobe.d/nvidia.conf is created or updated to set the kernel module option RMGuestECCState. Another configuration file in /etc/modprobe.d/ that contains the keyword NVreg_RegistryDwordsPerDevice might cause the kernel module option RMGuestECCState to be ignored.

Workaround

This workaround requires administrator privileges.

  1. Move the entry containing the keyword NVreg_RegistryDwordsPerDevice from the other configuration file to /etc/modprobe.d/nvidia.conf.
  2. Reboot the VM.

Status

Open

Ref. #

200505777

5.29. Host core CPU utilization is higher than expected for moderate workloads

Description

When GPU performance is being monitored, host core CPU utilization is higher than expected for moderate workloads. For example, host CPU utilization when only a small number of VMs are running is as high as when several times as many VMs are running.

Workaround

Disable monitoring of the following GPU performance statistics:

  • vGPU engine usage by applications across multiple vGPUs
  • Encoder session statistics
  • Frame buffer capture (FBC) session statistics
  • Statistics gathered by performance counters in guest VMs

Status

Open

Ref. #

2414897

5.30. Frame capture while the interactive logon message is displayed returns blank screen

Description

Because of a known limitation with NvFBC, a frame capture while the interactive logon message is displayed returns a blank screen.

An NvFBC session can capture screen updates that occur after the session is created. Before the logon message appears, there is no screen update after the message is shown and, therefore, a black screen is returned instead. If the NvFBC session is created after this update has occurred, NvFBC cannot get a frame to capture.

Workaround

Status

Not a bug

Ref. #

2115733

5.31. RDS sessions do not use the GPU with some Microsoft Windows Server releases

Description

When some releases of Windows Server are used as a guest OS, Remote Desktop Services (RDS) sessions do not use the GPU. With these releases, the RDS sessions by default use the Microsoft Basic Render Driver instead of the GPU. This default setting enables 2D DirectX applications such as Microsoft Office to use software rendering, which can be more efficient than using the GPU for rendering. However, as a result, 3D applications that use DirectX are prevented from using the GPU.

Version

  • Windows Server 2019
  • Windows Server 2016
  • Windows Server 2012

Solution

Change the local computer policy to use the hardware graphics adapter for all RDS sessions.

  1. Choose Local Computer Policy > Computer Configuration > Administrative Templates > Windows Components > Remote Desktop Services > Remote Desktop Session Host > Remote Session Environment.

  2. Set the Use the hardware default graphics adapter for all Remote Desktop Services sessions option.

Description

When the scheduling policy is fixed share, GPU engine utilization can be reported as higher than expected for a vGPU.

For example, GPU engine usage for six P40-4Q vGPUs on a Tesla P40 GPU might be reported as follows:

Copy

Copied!

 

[root@localhost:~] nvidia-smi vgpuMon Aug 20 10:33:18 2018+-----------------------------------------------------------------------------+| NVIDIA-SMI 390.42 Driver Version: 390.42 ||-------------------------------+--------------------------------+------------+| GPU Name | Bus-Id | GPU-Util || vGPU ID Name | VM ID VM Name | vGPU-Util ||===============================+================================+============|| 0 Tesla P40 | 00000000:81:00.0 | 99% || 85109 GRID P40-4Q | 85110 win7-xmpl-146048-1 | 32% || 87195 GRID P40-4Q | 87196 win7-xmpl-146048-2 | 39% || 88095 GRID P40-4Q | 88096 win7-xmpl-146048-3 | 26% || 89170 GRID P40-4Q | 89171 win7-xmpl-146048-4 | 0% || 90475 GRID P40-4Q | 90476 win7-xmpl-146048-5 | 0% || 93363 GRID P40-4Q | 93364 win7-xmpl-146048-6 | 0% |+-------------------------------+--------------------------------+------------+| 1 Tesla P40 | 00000000:85:00.0 | 0% |+-------------------------------+--------------------------------+------------+

The vGPU utilization of vGPU 85109 is reported as 32%. For vGPU 87195, vGPU utilization is reported as 39%. And for 88095, it is reported as 26%. However, the expected vGPU utilization of any vGPU should not exceed approximately 16.7%.

This behavior is a result of the mechanism that is used to measure GPU engine utilization.

Status

Open

Ref. #

2227591

5.33. nvidia-smi reports that vGPU migration is supported on all hypervisors

Description

The command nvidia-smi vgpu -m shows that vGPU migration is supported on all hypervisors, even hypervisors or hypervisor versions that do not support vGPU migration.

Status

Closed

Ref. #

200407230

5.34. A segmentation fault in DBus code causes nvidia-gridd to exit on Red Hat Enterprise Linux and CentOS

Description

On Red Hat Enterprise Linux 6.8 and 6.9, and CentOS 6.8 and 6.9, a segmentation fault in DBus code causes the nvidia-gridd service to exit.

The nvidia-gridd service uses DBus for communication with NVIDIA X Server Settings to display licensing information through the Manage License page. Disabling the GUI for licensing resolves this issue.

To prevent this issue, the GUI for licensing is disabled by default. You might encounter this issue if you have enabled the GUI for licensing and are using Red Hat Enterprise Linux 6.8 or 6.9, or CentOS 6.8 and 6.9.

Version

Red Hat Enterprise Linux 6.8 and 6.9

CentOS 6.8 and 6.9

Status

Open

Ref. #

  • 200358191
  • 200319854
  • 1895945

5.35. No Manage License option available in NVIDIA X Server Settings by default

Description

By default, the Manage License option is not available in NVIDIA X Server Settings. This option is missing because the GUI for licensing on Linux is disabled by default to work around the issue that is described in A segmentation fault in DBus code causes nvidia-gridd to exit on Red Hat Enterprise Linux and CentOS.

Workaround

This workaround requires sudo privileges.

Note:

Do not use this workaround with Red Hat Enterprise Linux 6.8 and 6.9 or CentOS 6.8 and 6.9. To prevent a segmentation fault in DBus code from causing the nvidia-gridd service from exiting, the GUI for licensing must be disabled with these OS versions.

If you are licensing a physical GPU for vCS, you must use the configuration file /etc/nvidia/gridd.conf.

  1. If NVIDIA X Server Settings is running, shut it down.
  2. If the /etc/nvidia/gridd.conf file does not already exist, create it by copying the supplied template file /etc/nvidia/gridd.conf.template.

  3. As root, edit the /etc/nvidia/gridd.conf file to set the EnableUI option to TRUE.

  4. Start the nvidia-gridd service.

    Copy

    Copied!

     

    # sudo service nvidia-gridd start

When NVIDIA X Server Settings is restarted, the Manage License option is now available.

Status

Open

5.36. Licenses remain checked out when VMs are forcibly powered off

Description

NVIDIA vGPU software licenses remain checked out on the license server when non-persistent VMs are forcibly powered off.

The NVIDIA service running in a VM returns checked out licenses when the VM is shut down. In environments where non-persistent licensed VMs are not cleanly shut down, licenses on the license server can become exhausted. For example, this issue can occur in automated test environments where VMs are frequently changing and are not guaranteed to be cleanly shut down. The licenses from such VMs remain checked out against their MAC address for seven days before they time out and become available to other VMs.

Resolution

If VMs are routinely being powered off without clean shutdown in your environment, you can avoid this issue by shortening the license borrow period. To shorten the license borrow period, set the LicenseInterval configuration setting in your VM image. For details, refer to Virtual GPU Client Licensing User Guide.

Status

Closed

Ref. #

1694975

5.37. VM bug checks after the guest VM driver for Windows 10 RS2 is installed

Description

When the VM is rebooted after the guest VM driver for Windows 10 RS2 is installed, the VM bug checks. When Windows boots, it selects one of the standard supported video modes. If Windows is booted directly with a display that is driven by an NVIDIA driver, for example a vGPU on Citrix Hypervisor, a blue screen crash occurs.

This issue occurs when the screen resolution is switched from VGA mode to a resolution that is higher than 1920×1200.

Fix

Download and install Microsoft Windows Update KB4020102 from the Microsoft Update Catalog.

Workaround

If you have applied the fix, ignore this workaround.

Otherwise, you can work around this issue until you are able to apply the fix by not using resolutions higher than 1920×1200.

  1. Choose a GPU profile in Citrix XenCenter that does not allow resolutions higher than 1920×1200.
  2. Before rebooting the VM, set the display resolution to 1920×1200 or lower.

Status

Not an NVIDIA bug

Ref. #

200310861

5.38. GNOME Display Manager (GDM) fails to start on Red Hat Enterprise Linux 7.2 and CentOS 7.0

Description

GDM fails to start on Red Hat Enterprise Linux 7.2 and CentOS 7.0 with the following error:

Copy

Copied!

 

Oh no! Something has gone wrong!

Workaround

Permanently enable permissive mode for Security Enhanced Linux (SELinux).

  1. As root, edit the /etc/selinux/config file to set SELINUX to permissive.

    Copy

    Copied!

     

    SELINUX=permissive

  2. Reboot the system.

    Copy

    Copied!

     

    ~]# reboot

For more information, see Permissive Mode in Red Hat Enterprise Linux 7 SELinux User’s and Administrator’s Guide.

Status

Not an NVIDIA bug

Ref. #

200167868

Microsoft Azure Stack HCI (2024)
Top Articles
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 5573

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.