Game Development Community

Enabling NVIDIA GPU on Optimus systems - Help needed!

by Nils Eikelenboom · in Torque 3D Professional · 02/02/2014 (3:13 am) · 11 replies

I'm on a trip and therefore bound to my laptop for a change. It's Win64 with Intel onboard graphics and a NVIDIA 315M High performance GPU. The whole thing is controlled with NVIDIA "Optimus". It's pretty common these days, and a lot of laptops are configured this way.

Normally when the extra GPU power is needed, Optimus will switch to high performance automaticly. Some games will need some settings in de NVIDIA drivers for this. Torque 3D, doesn't seem to activate "Optimus" at all, and therefore only the onboard (low end) GPU is used.

NVIDIA gives the following solution:

Quote:
Global Variable NvOptimusEnablement (new in Driver Release 302)
Starting with the Release 302 drivers, application developers can direct the Optimus
driver at runtime to use the High Performance Graphics to render any application-even
those applications for which there is no existing application profile. They can do this by
exporting a global variable named NvOptimusEnablement. The Optimus driver looks for
the existence and value of the export. Only the LSB of the DWORD matters at this time. A
value of 0x00000001 indicates that rendering should be performed using High
Performance Graphics. A value of 0x00000000 indicates that this method should be
ignored.

and a example of the code:

extern "C" {
    _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
}

But it's unclear to me how to implement this in the Torque project.

Has anyone an idea where to put this and what the exact code should be?

#1
02/02/2014 (6:50 am)
No need to set any drivers. You just use the Nvidia Control Center(NCC) to recognize the Torque 3D software or the game(the name of the .exe file) made with Torque 3D. It will then go into high performance mode.

If that is not good enough one can also just set the performance to high all the time in NCC, then all software will more or less be used with the high performance profile.

Hope that helps out. At least this is how I and many others get around this issue.
#2
02/02/2014 (10:22 am)
that does not work on all systems and optimus Nvidia forcing never worked properly on all systems for T3D.

I made a thread about this years ago..still not fixed.
#3
02/02/2014 (12:29 pm)
Spent more than a few months on that myself for our head artist (Don't have one of those lappys myself to directly test, unfortunately). The only thing I can say with confidence is: The opengl port works around it.

That being said, for the Nils query specifically (and thanks to Tron for pointing that one out):

main.cpp for the exe
#include <windows.h>
#include <stdio.h>

extern "C"
{
   int (*torque_winmain)( HINSTANCE hInstance, HINSTANCE h, LPSTR lpszCmdLine, int nShow) = NULL;
   _declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
};

int PASCAL WinMain( HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpszCmdLine, int nCommandShow)
{

with the dlls set to generate as _core.dll works for exposing that.

Beyond that, Nvidia swears up and down that the detection pipeline is somehow falling back to the WMI subsystem and getting mangled in the process. Haven't been able to figure out a proper trace there yet to figure out why it would.
#4
02/02/2014 (7:56 pm)
@Azaezel; Thank you very much for pointing me in the right direction. I tried to use this in main.cpp before but I didn't study the fatal error coming out of that. Now I understand one need to choose a different name for the dll, like you described.

It now results in 2 warnings, but nothing to worry about and not that difficult to fix I guess:

1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\Microsoft.CppBuild.targets(1137,5): warning MSB8012: TargetPath(C:\Torque\Torque3D-3-0\Torque3D\My Projects\DeadlyMatter\buildFiles\VisualStudio 2012\projects\../../../game/DeadlyMatter_core.dll) does not match the Linker's OutputFile property value (C:\Torque\Torque3D-3-0\Torque3D\My Projects\DeadlyMatter\game\DeadlyMatter.dll). This may cause your project to build incorrectly. To correct this, please make sure that $(OutDir), $(TargetName) and $(TargetExt) property values match the value specified in %(Link.OutputFile).
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\Microsoft.CppBuild.targets(1139,5): warning MSB8012: TargetName(DeadlyMatter_core) does not match the Linker's OutputFile property value (DeadlyMatter). This may cause your project to build incorrectly. To correct this, please make sure that $(OutDir), $(TargetName) and $(TargetExt) property values match the value specified in %(Link.OutputFile).

It compiles just fine, and runs without further errors/warnings linked to this.

@Dwarf King; Yes, I did already set the NVIDIA driver to "High Performance Processor" as default and for the .exe as well. But the performance of T3D was that terrible, I couldn't believe it uses the GPU at all.

I'm also trying to find a fix to have it in the builds by default, so less problems may occur when I finally release something.
#5
02/02/2014 (9:14 pm)
With "Optimus" forced to enable at all times; T3D still doesn't see the High performance GPU. In the T3D options panel, only 1 GPU is shown in the list. If I start a mission, there is activity on the High Performance GPU (Increase of Clock Speed, GPU- and Memory Load). So something is working; but looking at the rendering speed of T3D it isn't much.

I read in NVIDIA's docs that the "High performance GPU" is used in combination with the "Onboard GPU". So I wonder which GPU is T3D using and for what.

Quote:
Methods That Expose NVIDIA Graphics Processor Information on Optimus Systems

Often, applications will check the system's graphics configuration at initialization to determine which hardware and corresponding optimal graphics settings to use during application execution. Optimus systems power off the High Performance Graphics processor when not in use.

However, if the application or the Optimus driver are configured to use the NVIDIA High Performance Graphics hardware through one of the methods listed in Methods That Enable NVIDIA High Performance Graphics Rendering on Optimus Systems, then information about the NVIDIA High Performance Graphics hardware and its corresponding capabilities is made available to the application.

The NVIDIA driver monitors certain runtime APIs and returns the appropriate NVIDIA hardware information through calls made by the application to those APIs.

The following are the APIs that applications can use:
- DirectX3 through DirectX9 - IDirect3D9::GetAdapterIdentifier
- DirectX9 and above - IDXGIAdapter::GetDesc
- OpenGL - glGetString(GL_VENDOR) and glGetString(GL_RENDERER)

The 1st method (DX3-9) can be found in gfxPCD3D9Device.cpp:
//-----------------------------------------------------------------------------
// Enumerate D3D adapters
//-----------------------------------------------------------------------------
...
      // Get the device description string.
      D3DADAPTER_IDENTIFIER9 temp;
      d3d9->GetAdapterIdentifier( adapterIndex, NULL, &temp ); // The NULL is the flags which deal with WHQL
...

// Forgive me if the following question may not sound very intelligent
(I'm still a designer by profession, not a developer)...


Could a solution be to use the 2nd method (DX9 and beyond)
and use IDXGIAdapter::GetDesc to get all the GPUs listed?


________________________________________________________________________

side note; I noticed how poor the performance is of the NVIDIA 315M. Though it has 1G memory, the data isn't very impressive:

Memory Type: DD3
Bus width: 64bit
Memory size: 1024 MB
Bandwidth: 12.6 GB/s
GPU Clock: 606 Mhz
Memory: 790 Mhz
Shader: 1468 Mhz

Yet, even with the settings to "Lowest" (Options panel) it seems pretty hard to render a few 3d models with a few drawcalls. It's too bad I don't have a great internet connection right now to dl some games for testing.
#6
02/02/2014 (9:37 pm)
Well, these are from my personal notes from the last time I attempted a manual trace of the issues, so apologies beforehand if they're a bit messy, but:

void GFXCardProfiler::init()
{
{...
   Con::printf("   o VRAM    : %d MB", getVideoMemoryInMB());
}

class GFXCardProfiler
{
...

   U32 getVideoMemoryInMB() const { return mVideoMemory; }
}

void GFXD3D9CardProfiler::init()
{
{...
   mD3DDevice = dynamic_cast<GFXD3D9Device *>(GFX)->getDevice();

   AssertISV( mD3DDevice, "GFXD3D9CardProfiler::init() - No D3D9 Device found!");

   // Grab the caps so we can get our adapter ordinal and look up our name.

   D3DCAPS9 caps;

   D3D9Assert(mD3DDevice->GetDeviceCaps(&caps), "GFXD3D9CardProfiler::init - failed to get device caps!");
...
   WMIVideoInfo wmiVidInfo;

   if( wmiVidInfo.profileAdapters() )
   {

      const PlatformVideoInfo::PVIAdapter &adapter = wmiVidInfo.getAdapterInformation( caps.AdapterOrdinal );


      mVideoMemory = adapter.vram;

   }
...
}


const PlatformVideoInfo::PVIAdapter &PlatformVideoInfo::getAdapterInformation( const U32 adapterIndex ) const

{

   AssertFatal( adapterIndex < mAdapters.size(), "Not that many adapters" );
   return mAdapters[adapterIndex];

}


class PlatformVideoInfo

{

public:

   struct PVIAdapter
   {

	...
      U32 vram;

   };
private:

	Vector<PVIAdapter> mAdapters; ///< Vector of adapters
}

bool PlatformVideoInfo::profileAdapters()

{
   // Initialize the child class

   if( !_initialize() )
 
     return false;


   mAdapters.clear();


   // Query the number of adapters

   String tempString;


   if( !_queryProperty( PVI_NumAdapters, 0, &tempString ) )

   {

      // Not all platforms may support PVI_NumAdapters.  We will assume that there

      // is one adapter.  This was the behavior before PVI_NumAdapters was implemented.

      mAdapters.increment( 1 );

   }

   else

   {

      mAdapters.increment( dAtoi( tempString ) );

   }
...
      // Fill in adapter information

#define _QUERY_MASK_HELPER( querytype, outstringaddr ) \

      querySuccessFlags |= ( _queryProperty( querytype, adapterNum, outstringaddr ) ? 1 << querytype : 0 )


      _QUERY_MASK_HELPER( PVI_NumDevices, &tempString );

      adapter.numDevices = dAtoi( tempString );


      _QUERY_MASK_HELPER( PVI_VRAM, &tempString );

      adapter.vram = dAtoi( tempString );


      _QUERY_MASK_HELPER( PVI_Description, &adapter.description );

      _QUERY_MASK_HELPER( PVI_Name, &adapter.name );

      _QUERY_MASK_HELPER( PVI_ChipSet, &adapter.chipSet );

      _QUERY_MASK_HELPER( PVI_DriverVersion, &adapter.driverVersion );


#undef _QUERY_MASK_HELPER
}
#7
02/02/2014 (9:40 pm)
class PlatformVideoInfo

{
...
virtual bool _queryProperty( const PVIQueryType queryType, const U32 adapterId, String *outValue ) = 0;
...}


bool WMIVideoInfo::_queryProperty( const PVIQueryType queryType, const U32 adapterId, String *outValue )
{
   if( _queryPropertyDXGI( queryType, adapterId, outValue ) )
      return true;
   else if( _queryPropertyDxDiag( queryType, adapterId, outValue ) )
      return true;
   else
      return _queryPropertyWMI( queryType, adapterId, outValue );
}

bool WMIVideoInfo::_queryPropertyDXGI( const PVIQueryType queryType, const U32 adapterId, String *outValue )
{
#if 0
#endif
   return false;
}
#8
02/02/2014 (9:41 pm)
bool WMIVideoInfo::_queryPropertyDxDiag( const PVIQueryType queryType, const U32 adapterId, String *outValue )
{
   if( mDxDiagProvider != 0 )
   {
      IDxDiagContainer* rootContainer = 0;
      IDxDiagContainer* displayDevicesContainer = 0;
      IDxDiagContainer* deviceContainer = 0;

      // Special case to deal with PVI_NumAdapters
      if(queryType == PVI_NumAdapters)
      {
         DWORD count = 0;
         String value;

         if( mDxDiagProvider->GetRootContainer( &rootContainer ) == S_OK
            && rootContainer->GetChildContainer( L"DxDiag_DisplayDevices", &displayDevicesContainer ) == S_OK
            && displayDevicesContainer->GetNumberOfChildContainers( &count ) == S_OK )
         {
            value = String::ToString("%d", count);
         }

         if( rootContainer )
            SAFE_RELEASE( rootContainer );
         if( displayDevicesContainer )
            SAFE_RELEASE( displayDevicesContainer );

         *outValue = value;
         return true;
      }

      WCHAR adapterIdString[ 2 ];
      adapterIdString[ 0 ] = L'0' + adapterId;
      adapterIdString[ 1 ] = L'';

      String value;
      if( mDxDiagProvider->GetRootContainer( &rootContainer ) == S_OK
         && rootContainer->GetChildContainer( L"DxDiag_DisplayDevices", &displayDevicesContainer ) == S_OK
         && displayDevicesContainer->GetChildContainer( adapterIdString, &deviceContainer ) == S_OK )
      {
         const WCHAR* propertyName = 0;

         switch( queryType )
         {
         case PVI_Description:
            propertyName = L"szDescription";
            break;

         case PVI_Name:
            propertyName = L"szDeviceName";
            break;

         case PVI_ChipSet:
            propertyName = L"szChipType";
			//Con::errorf("card chipset info: %s", value.c_str());
            break;

         case PVI_DriverVersion:
            propertyName = L"szDriverVersion";
            break;

         // Don't get VRAM via DxDiag as that won't tell us about the actual amount of dedicated
         // video memory but rather some dedicated+shared RAM value.
         }

         if( propertyName )
         {
            VARIANT val;
            if( deviceContainer->GetProp( propertyName, &val ) == S_OK )
               switch( val.vt )
               {
               case VT_BSTR:
                  value = String( val.bstrVal );
                  break;

               default:
                  AssertWarn( false, avar( "WMIVideoInfo: property type '%i' not implemented", val.vt ) );
               }
         }
      }

      if( rootContainer )
         SAFE_RELEASE( rootContainer );
      if( displayDevicesContainer )
         SAFE_RELEASE( displayDevicesContainer );
      if( deviceContainer )
         SAFE_RELEASE( deviceContainer );

      if( value.isNotEmpty() )
      {
         // Try to get the DxDiag data into some canonical form.  Otherwise, we
         // won't be giving the card profiler much opportunity for matching up
         // its data with profile scripts.

         switch( queryType )
         {
         case PVI_ChipSet:
            if( value.compare( "ATI", 3, String::NoCase ) == 0 )
               value = "ATI Technologies Inc.";
            else if( value.compare( "NVIDIA", 6, String::NoCase ) == 0 )
               value = "NVIDIA";
            else if( value.compare( "INTEL", 5, String::NoCase ) == 0 )
               value = "INTEL";
            else if( value.compare( "MATROX", 6, String::NoCase ) == 0 )
               value = "MATROX";
            break;

         case PVI_Description:
            if( value.compare( "ATI ", 4, String::NoCase ) == 0 )
            {
               value = value.substr( 4, value.length() - 4 );
               if( value.compare( " Series", 7, String::NoCase | String::Right ) == 0 )
                  value = value.substr( 0, value.length() - 7 );
            }
            else if( value.compare( "NVIDIA ", 7, String::NoCase ) == 0 )
               value = value.substr( 7, value.length() - 7 );
            else if( value.compare( "INTEL ", 6, String::NoCase ) == 0 )
               value = value.substr( 6, value.length() - 6 );
            else if( value.compare( "MATROX ", 7, String::NoCase ) == 0 )
               value = value.substr( 7, value.length() - 7 );
            break;
         }

         *outValue = value;
         return true;
      }
   }
   return false;
}
#9
02/02/2014 (9:42 pm)
struct IDxDiagContainer : public IUnknown
{
   virtual HRESULT   STDMETHODCALLTYPE GetNumberOfChildContainers( DWORD* pdwCount ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE EnumChildContainerNames( DWORD dwIndex, LPWSTR pwszContainer, DWORD cchContainer ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE GetChildContainer( LPCWSTR pwszContainer, IDxDiagContainer** ppInstance ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE GetNumberOfProps( DWORD* pdwCount ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE EnumPropNames( DWORD dwIndex, LPWSTR pwszPropName, DWORD cchPropName ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE GetProp( LPCWSTR pwszPropName, VARIANT* pvarProp ) = 0;
};

struct IDxDiagProvider : public IUnknown
{
   virtual HRESULT   STDMETHODCALLTYPE Initialize( DXDIAG_INIT_PARAMS* pParams ) = 0;
   virtual HRESULT   STDMETHODCALLTYPE GetRootContainer( IDxDiagContainer** ppInstance ) = 0;
};

#define STDMETHODCALLTYPE       __stdcall
#10
02/02/2014 (9:43 pm)
bool WMIVideoInfo::_queryPropertyWMI( const PVIQueryType queryType, const U32 adapterId, String *outValue )
{
   if( mServices == NULL )
      return false;

   BSTR bstrWQL  = SysAllocString(L"WQL");
   BSTR bstrPath = SysAllocString(L"select * from Win32_VideoController");
   IEnumWbemClassObject* enumerator;
   
   // Use the IWbemServices pointer to make requests of WMI
   HRESULT hr = mServices->ExecQuery(bstrWQL, bstrPath, WBEM_FLAG_FORWARD_ONLY, NULL, &enumerator);

   if( FAILED( hr ) )
      return false;

   IWbemClassObject *adapter = NULL;   
   ULONG uReturned;

   // Get the appropriate adapter.
   for ( S32 i = 0; i <= adapterId; i++ )
   {
      hr = enumerator->Next(WBEM_INFINITE, 1, &adapter, &uReturned );

      if ( FAILED( hr ) || uReturned == 0 )
      {
         enumerator->Release();
         return false;         
      }
   }

   // Now get the property
   VARIANT v;
   hr = adapter->Get( smPVIQueryTypeToWMIString[queryType], 0, &v, NULL, NULL );

   bool result = SUCCEEDED( hr );

   if ( result )
   {
      switch( v.vt )
      {
      case VT_I4:
         {
            LONG longVal = v.lVal;

            if( queryType == PVI_VRAM )
			{
				longVal = longVal >> 20; // Convert to megabytes
				// While this value is reported as a signed integer, it is possible
				// for video cards to have 2GB or more.  In those cases the signed
				// bit is set and will give us a negative number.  Treating this
				// as unsigned will allows us to handle video cards with up to
				// 4GB of memory.  After that we'll need a new solution from Microsoft.
				*outValue = String::ToString( (U32)longVal );
			}
			else
			{
				*outValue = String::ToString( (S32)longVal );
			}
            break;
         }

      case VT_UI4:
         {
            *outValue = String::ToString( (U32)v.ulVal );
            break;
         }

      case VT_BSTR:
         {
            *outValue = String( v.bstrVal );
            break;
         }            
      case VT_LPSTR:
      case VT_LPWSTR:
         break;
      }


   }                  

   // Cleanup      
   adapter->Release();   
   enumerator->Release();

   return result;
}

IWbemClassObject : public IUnknown
{
...
        virtual HRESULT STDMETHODCALLTYPE Get( 
            /* [string][in] */ LPCWSTR wszName,
            /* [in] */ long lFlags,
            /* [unique][in][out] */ VARIANT *pVal,
            /* [unique][in][out] */ CIMTYPE *pType,
            /* [unique][in][out] */ long *plFlavor) = 0;
}

Hopefully that covers most points. PS: sorry that got spammy. Seemed smaller when it was color-coded and the like.
#11
02/04/2014 (4:56 pm)
Thanks @Azaezel for your notes. I'm afraid I'll need to pass this to a developer.

The trip is taking me further into undeveloped Chinese countryside and I needed to leave the laptop somewhere to pick up later.