README

cu GoDoc

Package cu is a package that interfaces with the CUDA Driver API. This package was directly inspired by Arne Vansteenkiste's cu package.

Why Write This Package?

The main reason why this package was written (as opposed to just using the already-excellent cu package) was because of errors. Specifically, the main difference between this package and Arne's package is that this package returns errors instead of panicking.

Additionally another goal for this package is to have an idiomatic interface for CUDA. For example, instead of exposing cuCtxCreate to be CtxCreate, a nicer, more idiomatic name MakeContext is used. The primary goal is to make calling the CUDA API as comfortable as calling Go functions or methods. Additional convenience functions and methods are also created in this package in the pursuit of that goal.

Lastly, this package uses the latest CUDA toolkit whereas the original package cu uses a number of deprecated APIs.

Installation

This package is go-gettable: go get -u gorgonia.org/cu

This package mostly depends on built-in packages. There are two external dependencies:

  • errors, which is licenced under a MIT-like licence. This package is used for wrapping errors and providing a debug trail.
  • assert, which is licenced under a MIT-like licence. This package is used for quick and easy testing.

However, package cu DOES depend on one major external dependency: CUDA. Specifically, it requires the CUDA driver. Thankfully nvidia has made this rather simple - everything that is required can be installed with one click: CUDA Toolkit.

To verify that this library works, install and run the cudatest program, which accompanies this package:

go install gorgonia.org/cu/cmd/cudatest
cudatest

You should see something like this if successful:

CUDA version: 10020
CUDA devices: 1

Device 0
========
Name      :	"TITAN RTX"
Clock Rate:	1770000 kHz
Memory    :	25393561600 bytes
Compute   : 	7.5

Windows

To setup CUDA in Windows:

  1. Install CUDA Toolkit
  2. Add %CUDA_PATH%/bin to your %PATH% environment variable (running nvcc from console should work)
  3. Make a symlink mklink /D C:\cuda "c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA" (alternatively, install CUDA toolkit to C:\cuda\)

To setup the compiler:

  1. Install MSYS2 (see https://www.msys2.org/)
  2. In c:\msys64\msys2_shell.cmd uncomment the line with set MSYS2_PATH_TYPE=inherit (this makes Windows PATH variable visible)
  3. Install go in MSYS2 (64 bit) with pacman -S go

FAQ

Here is a common list of problems that you may encounter.

ld: cannot find -lcuda (Linux)

Checklist:

  • Installed CUDA and applied the relevant post-installation steps?
  • Checked that the sample programs in the CUDA install all works?
  • Checked the output of ld -lcuda --verbose?
  • Checked that there is a libcuda.so in the given search paths?
  • Checked that the permissions on libcuda.so is correct?

Note, depending on how you install CUDA on Linux, sometimes the .so file is not properly linked. For example: in CUDA 10.2 on Ubuntu, the default .deb installation installs the shared object file to /usr/lib/x86_64-linux-gnu/libcuda.so.1. However ld searches only for libcuda.so. So the solution is to symlink libcuda.so.1 to libcuda.so, like so:

sudo ln -s /PATH/TO/libcuda.so.1 /PATH/TO/libcuda.so

Be careful when using ln. This author spent several hours being tripped up by permissions issues.

Progress

The work to fully represent the CUDA Driver API is a work in progress. At the moment, it is not complete. However, most of the API that are required for GPGPU purposes are complete. None of the texture, surface and graphics related APIs are handled yet. Please feel free to send a pull request.

Roadmap

  • Remaining API to be ported over
  • All texture, surface and graphics related API have an equivalent Go prototype.
  • Batching of common operations (see for example Device.Attributes(...)
  • Generic queueing/batching of API calls (by some definition of generic)

Contributing

This author loves pull requests from everyone. Here's how to contribute to this package:

  1. Fork then clone this repo: git clone git@github.com:YOUR_USERNAME/cu.git
  2. Work on your edits.
  3. Commit with a good commit message.
  4. Push to your fork then submit a pull request.

We understand that this package is an interfacing package with a third party API. As such, tests may not always be viable. However, please do try to include as much tests as possible.

Licence

The package is licenced with a MIT-like licence. Ther is one file (cgoflags.go) where code is directly copied and two files (execution.go and memory.go) where code was partially copied from Arne Vansteenkiste's package, which is unlicenced (but to be safe, just assume a GPL-like licence, as mumax/3 is licenced under GPL).

Expand ▾ Collapse ▴

Documentation

Overview

    Package cu provides an idiomatic interface to the CUDA Driver API.

    Index

    Constants

    View Source
    const (
    	Success                     cuResult = C.CUDA_SUCCESS                              // API call returned with no errors
    	InvalidValue                cuResult = C.CUDA_ERROR_INVALID_VALUE                  // This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
    	OutOfMemory                 cuResult = C.CUDA_ERROR_OUT_OF_MEMORY                  // The API call failed because it was unable to allocate enough memory to perform the requested operation.
    	NotInitialized              cuResult = C.CUDA_ERROR_NOT_INITIALIZED                // This indicates that the CUDA driver has not been initialized with cuInit() or that initialization has failed.
    	Deinitialized               cuResult = C.CUDA_ERROR_DEINITIALIZED                  // This indicates that the CUDA driver is in the process of shutting down.
    	ProfilerDisabled            cuResult = C.CUDA_ERROR_PROFILER_DISABLED              // This indicates profiler is not initialized for this run. This can happen when the application is running with external profiling tools like visual profiler.
    	ProfilerNotInitialized      cuResult = C.CUDA_ERROR_PROFILER_NOT_INITIALIZED       // Deprecated: This error return is deprecated as of CUDA 5.0. It is no longer an error to attempt to enable/disable the profiling via cuProfilerStart or cuProfilerStop without initialization.
    	ProfilerAlreadyStarted      cuResult = C.CUDA_ERROR_PROFILER_ALREADY_STARTED       // Deprecated: This error return is deprecated as of CUDA 5.0. It is no longer an error to call cuProfilerStart() when profiling is already enabled.
    	ProfilerAlreadyStopped      cuResult = C.CUDA_ERROR_PROFILER_ALREADY_STOPPED       // Deprecated: This error return is deprecated as of CUDA 5.0. It is no longer an error to call cuProfilerStop() when profiling is already disabled.
    	NoDevice                    cuResult = C.CUDA_ERROR_NO_DEVICE                      // This indicates that no CUDA-capable devices were detected by the installed CUDA driver.
    	InvalidDevice               cuResult = C.CUDA_ERROR_INVALID_DEVICE                 // This indicates that the device ordinal supplied by the user does not correspond to a valid CUDA device.
    	InvalidImage                cuResult = C.CUDA_ERROR_INVALID_IMAGE                  // This indicates that the device kernel image is invalid. This can also indicate an invalid CUDA module.
    	InvalidContext              cuResult = C.CUDA_ERROR_INVALID_CONTEXT                // This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See cuCtxGetApiVersion() for more details.
    	ContextAlreadyCurrent       cuResult = C.CUDA_ERROR_CONTEXT_ALREADY_CURRENT        // Deprecated: This error return is deprecated as of CUDA 3.2. It is no longer an error to attempt to push the active context via cuCtxPushCurrent(). This indicated that the context being supplied as a parameter to the API call was already the active context.
    	MapFailed                   cuResult = C.CUDA_ERROR_MAP_FAILED                     // This indicates that a map or register operation has failed.
    	UnmapFailed                 cuResult = C.CUDA_ERROR_UNMAP_FAILED                   // This indicates that an unmap or unregister operation has failed.
    	ArrayIsMapped               cuResult = C.CUDA_ERROR_ARRAY_IS_MAPPED                // This indicates that the specified array is currently mapped and thus cannot be destroyed.
    	AlreadyMapped               cuResult = C.CUDA_ERROR_ALREADY_MAPPED                 // This indicates that the resource is already mapped.
    	NoBinaryForGpu              cuResult = C.CUDA_ERROR_NO_BINARY_FOR_GPU              // This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.
    	AlreadyAcquired             cuResult = C.CUDA_ERROR_ALREADY_ACQUIRED               // This indicates that a resource has already been acquired.
    	NotMapped                   cuResult = C.CUDA_ERROR_NOT_MAPPED                     // This indicates that a resource is not mapped.
    	NotMappedAsArray            cuResult = C.CUDA_ERROR_NOT_MAPPED_AS_ARRAY            // This indicates that a mapped resource is not available for access as an array.
    	NotMappedAsPointer          cuResult = C.CUDA_ERROR_NOT_MAPPED_AS_POINTER          // This indicates that a mapped resource is not available for access as a pointer.
    	EccUncorrectable            cuResult = C.CUDA_ERROR_ECC_UNCORRECTABLE              // This indicates that an uncorrectable ECC error was detected during execution.
    	UnsupportedLimit            cuResult = C.CUDA_ERROR_UNSUPPORTED_LIMIT              // This indicates that the CUlimit passed to the API call is not supported by the active device.
    	ContextAlreadyInUse         cuResult = C.CUDA_ERROR_CONTEXT_ALREADY_IN_USE         // This indicates that the CUcontext passed to the API call can only be bound to a single CPU thread at a time but is already bound to a CPU thread.
    	PeerAccessUnsupported       cuResult = C.CUDA_ERROR_PEER_ACCESS_UNSUPPORTED        // This indicates that peer access is not supported across the given devices.
    	InvalidPtx                  cuResult = C.CUDA_ERROR_INVALID_PTX                    // This indicates that a PTX JIT compilation failed.
    	InvalidGraphicsContext      cuResult = C.CUDA_ERROR_INVALID_GRAPHICS_CONTEXT       // This indicates an error with OpenGL or DirectX context.
    	NvlinkUncorrectable         cuResult = C.CUDA_ERROR_NVLINK_UNCORRECTABLE           // This indicates that an uncorrectable NVLink error was detected during the execution.
    	InvalidSource               cuResult = C.CUDA_ERROR_INVALID_SOURCE                 // This indicates that the device kernel source is invalid.
    	FileNotFound                cuResult = C.CUDA_ERROR_FILE_NOT_FOUND                 // This indicates that the file specified was not found.
    	SharedObjectSymbolNotFound  cuResult = C.CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND // This indicates that a link to a shared object failed to resolve.
    	SharedObjectInitFailed      cuResult = C.CUDA_ERROR_SHARED_OBJECT_INIT_FAILED      // This indicates that initialization of a shared object failed.
    	OperatingSystem             cuResult = C.CUDA_ERROR_OPERATING_SYSTEM               // This indicates that an OS call failed.
    	InvalidHandle               cuResult = C.CUDA_ERROR_INVALID_HANDLE                 // This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like CUstream and CUevent.
    	NotFound                    cuResult = C.CUDA_ERROR_NOT_FOUND                      // This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, texture names, and surface names.
    	NotReady                    cuResult = C.CUDA_ERROR_NOT_READY                      // This indicates that asynchronous operations issued previously have not completed yet. This result is not actually an error, but must be indicated differently than CUDA_SUCCESS (which indicates completion). Calls that may return this value include cuEventQuery() and cuStreamQuery().
    	IllegalAddress              cuResult = C.CUDA_ERROR_ILLEGAL_ADDRESS                // While executing a kernel, the device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	LaunchOutOfResources        cuResult = C.CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES        // This indicates that a launch did not occur because it did not have appropriate resources. This error usually indicates that the user has attempted to pass too many arguments to the device kernel, or the kernel launch specifies too many threads for the kernel's register count. Passing arguments of the wrong size (i.e. a 64-bit pointer when a 32-bit int is expected) is equivalent to passing too many arguments and can also result in this error.
    	LaunchTimeout               cuResult = C.CUDA_ERROR_LAUNCH_TIMEOUT                 // This indicates that the device kernel took too long to execute. This can only occur if timeouts are enabled - see the device attribute CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT for more information. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	LaunchIncompatibleTexturing cuResult = C.CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING  // This error indicates a kernel launch that uses an incompatible texturing mode.
    	PeerAccessAlreadyEnabled    cuResult = C.CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED    // This error indicates that a call to cuCtxEnablePeerAccess() is trying to re-enable peer access to a context which has already had peer access to it enabled.
    	PeerAccessNotEnabled        cuResult = C.CUDA_ERROR_PEER_ACCESS_NOT_ENABLED        // This error indicates that cuCtxDisablePeerAccess() is trying to disable peer access which has not been enabled yet via cuCtxEnablePeerAccess().
    	PrimaryContextActive        cuResult = C.CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE         // This error indicates that the primary context for the specified device has already been initialized.
    	ContextIsDestroyed          cuResult = C.CUDA_ERROR_CONTEXT_IS_DESTROYED           // This error indicates that the context current to the calling thread has been destroyed using cuCtxDestroy, or is a primary context which has not yet been initialized.
    	Assert                      cuResult = C.CUDA_ERROR_ASSERT                         // A device-side assert triggered during kernel execution. The context cannot be used anymore, and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA.
    	TooManyPeers                cuResult = C.CUDA_ERROR_TOO_MANY_PEERS                 // This error indicates that the hardware resources required to enable peer access have been exhausted for one or more of the devices passed to cuCtxEnablePeerAccess().
    	HostMemoryAlreadyRegistered cuResult = C.CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED // This error indicates that the memory range passed to cuMemHostRegister() has already been registered.
    	HostMemoryNotRegistered     cuResult = C.CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED     // This error indicates that the pointer passed to cuMemHostUnregister() does not correspond to any currently registered memory region.
    	HardwareStackError          cuResult = C.CUDA_ERROR_HARDWARE_STACK_ERROR           // While executing a kernel, the device encountered a stack error. This can be due to stack corruption or exceeding the stack size limit. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	IllegalInstruction          cuResult = C.CUDA_ERROR_ILLEGAL_INSTRUCTION            // While executing a kernel, the device encountered an illegal instruction. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	MisalignedAddress           cuResult = C.CUDA_ERROR_MISALIGNED_ADDRESS             // While executing a kernel, the device encountered a load or store instruction on a memory address which is not aligned. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	InvalidAddressSpace         cuResult = C.CUDA_ERROR_INVALID_ADDRESS_SPACE          // While executing a kernel, the device encountered an instruction which can only operate on memory locations in certain address spaces (global, shared, or local), but was supplied a memory address not belonging to an allowed address space. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	InvalidPc                   cuResult = C.CUDA_ERROR_INVALID_PC                     // While executing a kernel, the device program counter wrapped its address space. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	LaunchFailed                cuResult = C.CUDA_ERROR_LAUNCH_FAILED                  // An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
    	NotPermitted                cuResult = C.CUDA_ERROR_NOT_PERMITTED                  // This error indicates that the attempted operation is not permitted.
    	NotSupported                cuResult = C.CUDA_ERROR_NOT_SUPPORTED                  // This error indicates that the attempted operation is not supported on the current system or device.
    	Unknown                     cuResult = C.CUDA_ERROR_UNKNOWN                        // This indicates that an unknown internal error has occurred.
    )
    View Source
    const DEBUG = false

    Variables

    View Source
    var NoStream = Stream{}

    Functions

    func AverageQueueLength

    func AverageQueueLength() int

      AverageQueueLength returns the average queue length recorded. This allows for optimizations.

      func BlockingCallers

      func BlockingCallers() map[string]int

      func DestroyEvent

      func DestroyEvent(event *Event) (err error)

      func Limits

      func Limits(limit Limit) (pvalue int64, err error)

      func MemFree

      func MemFree(dptr DevicePtr) (err error)

      func MemFreeHost

      func MemFreeHost(p unsafe.Pointer) (err error)

      func MemInfo

      func MemInfo() (free int64, total int64, err error)

      func Memcpy

      func Memcpy(dst DevicePtr, src DevicePtr, ByteCount int64) (err error)

      func Memcpy2D

      func Memcpy2D(pCopy Memcpy2dParam) (err error)

      func Memcpy2DAsync

      func Memcpy2DAsync(pCopy Memcpy2dParam, hStream Stream) (err error)

      func Memcpy2DUnaligned

      func Memcpy2DUnaligned(pCopy Memcpy2dParam) (err error)

      func Memcpy3D

      func Memcpy3D(pCopy Memcpy3dParam) (err error)

      func Memcpy3DAsync

      func Memcpy3DAsync(pCopy Memcpy3dParam, hStream Stream) (err error)

      func Memcpy3DPeer

      func Memcpy3DPeer(pCopy Memcpy3dPeerParam) (err error)

      func Memcpy3DPeerAsync

      func Memcpy3DPeerAsync(pCopy Memcpy3dPeerParam, hStream Stream) (err error)

      func MemcpyAsync

      func MemcpyAsync(dst DevicePtr, src DevicePtr, ByteCount int64, hStream Stream) (err error)

      func MemcpyAtoA

      func MemcpyAtoA(dstArray Array, dstOffset int64, srcArray Array, srcOffset int64, ByteCount int64) (err error)

      func MemcpyAtoD

      func MemcpyAtoD(dstDevice DevicePtr, srcArray Array, srcOffset int64, ByteCount int64) (err error)

      func MemcpyAtoH

      func MemcpyAtoH(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64) (err error)

      func MemcpyAtoHAsync

      func MemcpyAtoHAsync(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64, hStream Stream) (err error)

      func MemcpyDtoA

      func MemcpyDtoA(dstArray Array, dstOffset int64, srcDevice DevicePtr, ByteCount int64) (err error)

      func MemcpyDtoD

      func MemcpyDtoD(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64) (err error)

      func MemcpyDtoDAsync

      func MemcpyDtoDAsync(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64, hStream Stream) (err error)

      func MemcpyDtoH

      func MemcpyDtoH(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64) (err error)

      func MemcpyDtoHAsync

      func MemcpyDtoHAsync(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64, hStream Stream) (err error)

      func MemcpyHtoA

      func MemcpyHtoA(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64) (err error)

      func MemcpyHtoAAsync

      func MemcpyHtoAAsync(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64, hStream Stream) (err error)

      func MemcpyHtoD

      func MemcpyHtoD(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64) (err error)

      func MemcpyHtoDAsync

      func MemcpyHtoDAsync(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64, hStream Stream) (err error)

      func MemcpyPeer

      func MemcpyPeer(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64) (err error)

      func MemcpyPeerAsync

      func MemcpyPeerAsync(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64, hStream Stream) (err error)

      func MemsetD16

      func MemsetD16(dstDevice DevicePtr, us uint16, N int64) (err error)

      func MemsetD16Async

      func MemsetD16Async(dstDevice DevicePtr, us uint16, N int64, hStream Stream) (err error)

      func MemsetD2D16

      func MemsetD2D16(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64) (err error)

      func MemsetD2D16Async

      func MemsetD2D16Async(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64, hStream Stream) (err error)

      func MemsetD2D32

      func MemsetD2D32(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64) (err error)

      func MemsetD2D32Async

      func MemsetD2D32Async(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64, hStream Stream) (err error)

      func MemsetD2D8

      func MemsetD2D8(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64) (err error)

      func MemsetD2D8Async

      func MemsetD2D8Async(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64, hStream Stream) (err error)

      func MemsetD32

      func MemsetD32(dstDevice DevicePtr, ui uint32, N int64) (err error)

      func MemsetD32Async

      func MemsetD32Async(dstDevice DevicePtr, ui uint, N int64, hStream Stream) (err error)

      func MemsetD8

      func MemsetD8(dstDevice DevicePtr, uc byte, N int64) (err error)

      func MemsetD8Async

      func MemsetD8Async(dstDevice DevicePtr, uc byte, N int64, hStream Stream) (err error)

      func NumDevices

      func NumDevices() (count int, err error)

      func PushCurrentCtx

      func PushCurrentCtx(ctx CUContext) (err error)

      func QueueLengths

      func QueueLengths() []int

        QueueLengths return the queue lengths recorded

        func SetCurrentCacheConfig

        func SetCurrentCacheConfig(config FuncCacheConfig) (err error)

        func SetCurrentContext

        func SetCurrentContext(ctx CUContext) (err error)

        func SetLimit

        func SetLimit(limit Limit, value int64) (err error)

        func SetSharedMemConfig

        func SetSharedMemConfig(config SharedConfig) (err error)

        func StreamPriorityRange

        func StreamPriorityRange() (leastPriority int, greatestPriority int, err error)

        func Synchronize

        func Synchronize() (err error)

        func Version

        func Version() int

          Version returns the version of the CUDA driver

          Types

          type AddressMode

          type AddressMode byte

            AddressMode are texture reference addressing modes

            const (
            	WrapMode   AddressMode = C.CU_TR_ADDRESS_MODE_WRAP   // Wrapping address mode
            	ClampMode  AddressMode = C.CU_TR_ADDRESS_MODE_CLAMP  // Clamp to edge address mode
            	MirrorMode AddressMode = C.CU_TR_ADDRESS_MODE_MIRROR // Mirror address mode
            	BorderMode AddressMode = C.CU_TR_ADDRESS_MODE_BORDER // Border address mode
            )

            type Array

            type Array struct {
            	// contains filtered or unexported fields
            }

              Array is the pointer to a CUDA array. The name is a bit of a misnomer, as it would lead one to imply that it's rangeable. It's not.

              func Make3DArray

              func Make3DArray(pAllocateArray Array3Desc) (pHandle Array, err error)

              func MakeArray

              func MakeArray(pAllocateArray ArrayDesc) (pHandle Array, err error)

              func (Array) Descriptor

              func (hArray Array) Descriptor() (pArrayDescriptor ArrayDesc, err error)

              func (Array) Descriptor3

              func (hArray Array) Descriptor3() (pArrayDescriptor Array3Desc, err error)

              func (Array) Destroy

              func (hArray Array) Destroy() (err error)

              type Array3Desc

              type Array3Desc struct {
              	Width, Height, Depth uint
              	Format               Format
              	NumChannels          uint
              	Flags                uint
              }

                Array3Desc is the descriptor for CUDA 3D arrays, which is used to determine what to allocate.

                From the docs:

                 Width, Height, and Depth are the width, height, and depth of the CUDA array (in elements); the following types of CUDA arrays can be allocated:
                	- A 1D array is allocated if Height and Depth extents are both zero.
                	- A 2D array is allocated if only Depth extent is zero.
                	- A 3D array is allocated if all three extents are non-zero.
                	- A 1D layered CUDA array is allocated if only Height is zero and the CUDA_ARRAY3D_LAYERED flag is set. Each layer is a 1D array. The number of layers is determined by the depth extent.
                	- A 2D layered CUDA array is allocated if all three extents are non-zero and the CUDA_ARRAY3D_LAYERED flag is set. Each layer is a 2D array. The number of layers is determined by the depth extent.
                	- A cubemap CUDA array is allocated if all three extents are non-zero and the CUDA_ARRAY3D_CUBEMAP flag is set. Width must be equal to Height, and Depth must be six. A cubemap is a special type of 2D layered CUDA array, where the six layers represent the six faces of a cube. The order of the six layers in memory is the same as that listed in CUarray_cubemap_face.
                	- A cubemap layered CUDA array is allocated if all three extents are non-zero, and both, CUDA_ARRAY3D_CUBEMAP and CUDA_ARRAY3D_LAYERED flags are set. Width must be equal to Height, and Depth must be a multiple of six. A cubemap layered CUDA array is a special type of 2D layered CUDA array that consists of a collection of cubemaps. The first six layers represent the first cubemap, the next six layers form the second cubemap, and so on.
                

                type ArrayDesc

                type ArrayDesc struct {
                	Width, Height uint
                	Format        Format
                	NumChannels   uint
                }

                  ArrayDesc is the descriptor for CUDA arrays, which is used to determine what to allocate.

                  From the docs:

                  Width, and Height are the width, and height of the CUDA array (in elements); the CUDA array is one-dimensional if height is 0, two-dimensional otherwise;
                  

                  type BatchedContext

                  type BatchedContext struct {
                  	Context
                  	Device
                  	// contains filtered or unexported fields
                  }

                    BatchedContext is a CUDA context where the CUDA calls are batched up.

                    Typically a locked OS thread is made to execute the CUDA calls like so:

                    func main() {
                    	ctx := NewBatchedContext(...)
                    
                    	runtime.LockOSThread()
                    	defer runtime.UnlockOSThread()
                    
                    	workAvailable := ctx.WorkAvailable()
                    	go doWhatever(ctx)
                    	for {
                    		select {
                    			case <- workAvailable:
                    				ctx.DoWork()
                    				err := ctx.Errors()
                    				handleErrors(err)
                    			case ...:
                    		}
                    	}
                    }
                    
                    func doWhatever(ctx *BatchedContext) {
                    	ctx.Memcpy(...)
                    	// et cetera
                    	// et cetera
                    }
                    

                    For the moment, BatchedContext only supports a limited number of CUDA Runtime APIs. Feel free to send a pull request with more APIs.

                    func NewBatchedContext

                    func NewBatchedContext(c Context, d Device) *BatchedContext

                      NewBatchedContext creates a batched CUDA context.

                      func (*BatchedContext) AllocAndCopy

                      func (ctx *BatchedContext) AllocAndCopy(p unsafe.Pointer, bytesize int64) (retVal DevicePtr, err error)

                      func (*BatchedContext) Cleanup

                      func (ctx *BatchedContext) Cleanup()

                        Cleanup is the cleanup function. It cleans up all the ancilliary allocations that has happened for all the batched calls. This method should be called when the context is done with - otherwise there'd be a lot of leaked memory.

                        The main reason why this method exists is because there is no way to reliably free memory without causing weird issues in the CUDA calls.

                        func (*BatchedContext) Close

                        func (ctx *BatchedContext) Close() error

                          Close closes the batched context

                          func (*BatchedContext) DoWork

                          func (ctx *BatchedContext) DoWork()

                            DoWork waits for work to come in from the queue. If it's blocking, the entire queue will be processed immediately. Otherwise it will be added to the batch queue.

                            func (*BatchedContext) Errors

                            func (ctx *BatchedContext) Errors() error

                              Errors returns any errors that may have occured during a batch processing

                              func (*BatchedContext) FirstError

                              func (ctx *BatchedContext) FirstError() error

                                FirstError returns the first error if there was any

                                func (*BatchedContext) IsInitialized

                                func (ctx *BatchedContext) IsInitialized() bool

                                func (*BatchedContext) LaunchAndSync

                                func (ctx *BatchedContext) LaunchAndSync(function Function, gridDimX, gridDimY, gridDimZ int, blockDimX, blockDimY, blockDimZ int, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer)

                                func (*BatchedContext) LaunchKernel

                                func (ctx *BatchedContext) LaunchKernel(function Function, gridDimX, gridDimY, gridDimZ int, blockDimX, blockDimY, blockDimZ int, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer)

                                func (*BatchedContext) MemAlloc

                                func (ctx *BatchedContext) MemAlloc(bytesize int64) (retVal DevicePtr, err error)

                                  MemAlloc allocates memory. It is a blocking call.

                                  func (*BatchedContext) MemAllocManaged

                                  func (ctx *BatchedContext) MemAllocManaged(bytesize int64, flags MemAttachFlags) (retVal DevicePtr, err error)

                                  func (*BatchedContext) MemFree

                                  func (ctx *BatchedContext) MemFree(mem DevicePtr)

                                  func (*BatchedContext) MemFreeHost

                                  func (ctx *BatchedContext) MemFreeHost(p unsafe.Pointer)

                                  func (*BatchedContext) Memcpy

                                  func (ctx *BatchedContext) Memcpy(dst, src DevicePtr, byteCount int64)

                                  func (*BatchedContext) MemcpyDtoH

                                  func (ctx *BatchedContext) MemcpyDtoH(dst unsafe.Pointer, src DevicePtr, byteCount int64)

                                  func (*BatchedContext) MemcpyHtoD

                                  func (ctx *BatchedContext) MemcpyHtoD(dst DevicePtr, src unsafe.Pointer, byteCount int64)

                                  func (*BatchedContext) Run

                                  func (ctx *BatchedContext) Run(errChan chan error) error

                                    Run manages the running of the BatchedContext. Because it's expected to run in a goroutine, an error channel is to be passed in

                                    func (*BatchedContext) SetCurrent

                                    func (ctx *BatchedContext) SetCurrent()

                                      SetCurrent sets the current context. This is usually unnecessary because SetCurrent will be called before batch processing the calls.

                                      func (*BatchedContext) Synchronize

                                      func (ctx *BatchedContext) Synchronize()

                                      func (*BatchedContext) WorkAvailable

                                      func (ctx *BatchedContext) WorkAvailable() <-chan struct{}

                                        WorkAvailable returns the chan where work availability is broadcasted on.

                                        type CUContext

                                        type CUContext struct {
                                        	// contains filtered or unexported fields
                                        }

                                          CUContext is a CUDA context

                                          func CurrentContext

                                          func CurrentContext() (pctx CUContext, err error)

                                          func PopCurrentCtx

                                          func PopCurrentCtx() (pctx CUContext, err error)

                                          func (CUContext) APIVersion

                                          func (ctx CUContext) APIVersion() (version uint, err error)

                                          func (*CUContext) Destroy

                                          func (ctx *CUContext) Destroy() error

                                            Destroy destroys the context. It returns an error if it wasn't properly destroyed

                                            Wrapper over cuCtxDestroy: http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g27a365aebb0eb548166309f58a1e8b8e

                                            func (CUContext) DisablePeerAccess

                                            func (peerContext CUContext) DisablePeerAccess() (err error)

                                            func (CUContext) EnablePeerAccess

                                            func (peerContext CUContext) EnablePeerAccess(Flags uint) (err error)

                                            func (CUContext) Lock

                                            func (ctx CUContext) Lock() error

                                              Lock ties the calling goroutine to an OS thread, then ties the CUDA context to the thread. Do not call in a goroutine.

                                              Good:

                                              func main() {
                                              	dev, _ := GetDevice(0)
                                              	ctx, _ := dev.MakeContext()
                                              	if err := ctx.Lock(); err != nil{
                                              		// handle error
                                              	}
                                              
                                              	mem, _ := MemAlloc(1024)
                                              }
                                              

                                              Bad:

                                              func main() {
                                              	dev, _ := GetDevice(0)
                                              	ctx, _ := dev.MakeContext()
                                              	go ctx.Lock() // this will tie the goroutine that calls ctx.Lock to the OS thread, while the main thread does not get the lock
                                              	mem, _ := MemAlloc(1024)
                                              }
                                              

                                              func (CUContext) String

                                              func (ctx CUContext) String() string

                                              func (CUContext) Unlock

                                              func (ctx CUContext) Unlock() error

                                                Unlock unlocks unbinds the goroutine from the OS thread

                                                type Context

                                                type Context interface {
                                                	// Operational stuff
                                                	CUDAContext() CUContext
                                                	Error() error
                                                	Run(chan error) error
                                                	Do(fn func() error) error
                                                	Work() <-chan func() error
                                                	ErrChan() chan error
                                                	Close() error // Close closes all resources associated with the context
                                                
                                                	// actual methods
                                                	Address(hTexRef TexRef) (pdptr DevicePtr, err error)
                                                	AddressMode(hTexRef TexRef, dim int) (pam AddressMode, err error)
                                                	Array(hTexRef TexRef) (phArray Array, err error)
                                                	AttachMemAsync(hStream Stream, dptr DevicePtr, length int64, flags uint)
                                                	BorderColor(hTexRef TexRef) (pBorderColor [3]float32, err error)
                                                	CurrentCacheConfig() (pconfig FuncCacheConfig, err error)
                                                	CurrentDevice() (device Device, err error)
                                                	CurrentFlags() (flags ContextFlags, err error)
                                                	Descriptor(hArray Array) (pArrayDescriptor ArrayDesc, err error)
                                                	Descriptor3(hArray Array) (pArrayDescriptor Array3Desc, err error)
                                                	DestroyArray(hArray Array)
                                                	DestroyEvent(event *Event)
                                                	DestroyStream(hStream *Stream)
                                                	DisablePeerAccess(peerContext CUContext)
                                                	Elapsed(hStart Event, hEnd Event) (pMilliseconds float64, err error)
                                                	EnablePeerAccess(peerContext CUContext, Flags uint)
                                                	FilterMode(hTexRef TexRef) (pfm FilterMode, err error)
                                                	Format(hTexRef TexRef) (pFormat Format, pNumChannels int, err error)
                                                	FunctionAttribute(fn Function, attrib FunctionAttribute) (pi int, err error)
                                                	GetArray(hSurfRef SurfRef) (phArray Array, err error)
                                                	LaunchKernel(fn Function, gridDimX, gridDimY, gridDimZ int, blockDimX, blockDimY, blockDimZ int, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer)
                                                	Limits(limit Limit) (pvalue int64, err error)
                                                	Load(name string) (m Module, err error)
                                                	MakeEvent(flags EventFlags) (event Event, err error)
                                                	MakeStream(flags StreamFlags) (stream Stream, err error)
                                                	MakeStreamWithPriority(priority int, flags StreamFlags) (stream Stream, err error)
                                                	MaxAnisotropy(hTexRef TexRef) (pmaxAniso int, err error)
                                                	MemAlloc(bytesize int64) (dptr DevicePtr, err error)
                                                	MemAllocManaged(bytesize int64, flags MemAttachFlags) (dptr DevicePtr, err error)
                                                	MemAllocPitch(WidthInBytes int64, Height int64, ElementSizeBytes uint) (dptr DevicePtr, pPitch int64, err error)
                                                	MemFree(dptr DevicePtr)
                                                	MemFreeHost(p unsafe.Pointer)
                                                	MemInfo() (free int64, total int64, err error)
                                                	Memcpy(dst DevicePtr, src DevicePtr, ByteCount int64)
                                                	Memcpy2D(pCopy Memcpy2dParam)
                                                	Memcpy2DAsync(pCopy Memcpy2dParam, hStream Stream)
                                                	Memcpy2DUnaligned(pCopy Memcpy2dParam)
                                                	Memcpy3D(pCopy Memcpy3dParam)
                                                	Memcpy3DAsync(pCopy Memcpy3dParam, hStream Stream)
                                                	Memcpy3DPeer(pCopy Memcpy3dPeerParam)
                                                	Memcpy3DPeerAsync(pCopy Memcpy3dPeerParam, hStream Stream)
                                                	MemcpyAsync(dst DevicePtr, src DevicePtr, ByteCount int64, hStream Stream)
                                                	MemcpyAtoA(dstArray Array, dstOffset int64, srcArray Array, srcOffset int64, ByteCount int64)
                                                	MemcpyAtoD(dstDevice DevicePtr, srcArray Array, srcOffset int64, ByteCount int64)
                                                	MemcpyAtoH(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64)
                                                	MemcpyAtoHAsync(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64, hStream Stream)
                                                	MemcpyDtoA(dstArray Array, dstOffset int64, srcDevice DevicePtr, ByteCount int64)
                                                	MemcpyDtoD(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64)
                                                	MemcpyDtoDAsync(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64, hStream Stream)
                                                	MemcpyDtoH(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64)
                                                	MemcpyDtoHAsync(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64, hStream Stream)
                                                	MemcpyHtoA(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64)
                                                	MemcpyHtoAAsync(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64, hStream Stream)
                                                	MemcpyHtoD(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64)
                                                	MemcpyHtoDAsync(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64, hStream Stream)
                                                	MemcpyPeer(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64)
                                                	MemcpyPeerAsync(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64, hStream Stream)
                                                	MemsetD16(dstDevice DevicePtr, us uint16, N int64)
                                                	MemsetD16Async(dstDevice DevicePtr, us uint16, N int64, hStream Stream)
                                                	MemsetD2D16(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64)
                                                	MemsetD2D16Async(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64, hStream Stream)
                                                	MemsetD2D32(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64)
                                                	MemsetD2D32Async(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64, hStream Stream)
                                                	MemsetD2D8(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64)
                                                	MemsetD2D8Async(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64, hStream Stream)
                                                	MemsetD32(dstDevice DevicePtr, ui uint, N int64)
                                                	MemsetD32Async(dstDevice DevicePtr, ui uint, N int64, hStream Stream)
                                                	MemsetD8(dstDevice DevicePtr, uc byte, N int64)
                                                	MemsetD8Async(dstDevice DevicePtr, uc byte, N int64, hStream Stream)
                                                	ModuleFunction(m Module, name string) (function Function, err error)
                                                	ModuleGlobal(m Module, name string) (dptr DevicePtr, size int64, err error)
                                                	Priority(hStream Stream) (priority int, err error)
                                                	QueryEvent(hEvent Event)
                                                	QueryStream(hStream Stream)
                                                	Record(hEvent Event, hStream Stream)
                                                	SetAddress(hTexRef TexRef, dptr DevicePtr, bytes int64) (ByteOffset int64, err error)
                                                	SetAddress2D(hTexRef TexRef, desc ArrayDesc, dptr DevicePtr, Pitch int64)
                                                	SetAddressMode(hTexRef TexRef, dim int, am AddressMode)
                                                	SetBorderColor(hTexRef TexRef, pBorderColor [3]float32)
                                                	SetCacheConfig(fn Function, config FuncCacheConfig)
                                                	SetCurrentCacheConfig(config FuncCacheConfig)
                                                	SetFilterMode(hTexRef TexRef, fm FilterMode)
                                                	SetFormat(hTexRef TexRef, fmt Format, NumPackedComponents int)
                                                	SetFunctionSharedMemConfig(fn Function, config SharedConfig)
                                                	SetLimit(limit Limit, value int64)
                                                	SetMaxAnisotropy(hTexRef TexRef, maxAniso uint)
                                                	SetMipmapFilterMode(hTexRef TexRef, fm FilterMode)
                                                	SetMipmapLevelBias(hTexRef TexRef, bias float64)
                                                	SetMipmapLevelClamp(hTexRef TexRef, minMipmapLevelClamp float64, maxMipmapLevelClamp float64)
                                                	SetSharedMemConfig(config SharedConfig)
                                                	SetTexRefFlags(hTexRef TexRef, Flags TexRefFlags)
                                                	SharedMemConfig() (pConfig SharedConfig, err error)
                                                	StreamFlags(hStream Stream) (flags uint, err error)
                                                	StreamPriorityRange() (leastPriority int, greatestPriority int, err error)
                                                	SurfRefSetArray(hSurfRef SurfRef, hArray Array, Flags uint)
                                                	Synchronize()
                                                	SynchronizeEvent(hEvent Event)
                                                	SynchronizeStream(hStream Stream)
                                                	TexRefFlags(hTexRef TexRef) (pFlags uint, err error)
                                                	TexRefSetArray(hTexRef TexRef, hArray Array, Flags uint)
                                                	Unload(hmod Module)
                                                	Wait(hStream Stream, hEvent Event, Flags uint)
                                                	WaitOnValue32(stream Stream, addr DevicePtr, value uint32, flags uint)
                                                	WriteValue32(stream Stream, addr DevicePtr, value uint32, flags uint)
                                                }

                                                  Context interface. Typically you'd just embed *Ctx. Rarely do you need to use CUContext

                                                  type ContextFlags

                                                  type ContextFlags byte

                                                    ContextFlags are flags that are used to create a context

                                                    const (
                                                    	SchedAuto         ContextFlags = C.CU_CTX_SCHED_AUTO          // Automatic scheduling
                                                    	SchedSpin         ContextFlags = C.CU_CTX_SCHED_SPIN          // Set spin as default scheduling
                                                    	SchedYield        ContextFlags = C.CU_CTX_SCHED_YIELD         // Set yield as default scheduling
                                                    	SchedBlockingSync ContextFlags = C.CU_CTX_SCHED_BLOCKING_SYNC // Set blocking synchronization as default scheduling
                                                    	SchedMask         ContextFlags = C.CU_CTX_SCHED_MASK          // Mask for setting scheduling options for the flag
                                                    	MapHost           ContextFlags = C.CU_CTX_MAP_HOST            // Support mapped pinned allocations
                                                    	LMemResizeToMax   ContextFlags = C.CU_CTX_LMEM_RESIZE_TO_MAX  // Keep local memory allocation after launch
                                                    	FlagsMas          ContextFlags = C.CU_CTX_FLAGS_MASK          // Mask for setting other options to flags
                                                    )

                                                    func CurrentFlags

                                                    func CurrentFlags() (flags ContextFlags, err error)

                                                    type Ctx

                                                    type Ctx struct {
                                                    	CUContext
                                                    	// contains filtered or unexported fields
                                                    }

                                                      Ctx is a standalone CUDA Context that is threadlocked.

                                                      func CtxFromCUContext

                                                      func CtxFromCUContext(d Device, cuctx CUContext, flags ContextFlags) *Ctx

                                                        CtxFromCUContext is another way of buildinga *Ctx.

                                                        Typical example:

                                                        cuctx, err := dev.MakeContext(SchedAuto)
                                                        if err != nil {
                                                        	..error handling..
                                                        }
                                                        ctx := CtxFroMCUContext(d, cuctx)
                                                        

                                                        func NewContext

                                                        func NewContext(d Device, flags ContextFlags) *Ctx

                                                          NewContext creates a new context, and runs a listener locked to an OSThread. All work is piped through that goroutine

                                                          func NewManuallyManagedContext

                                                          func NewManuallyManagedContext(d Device, flags ContextFlags) *Ctx

                                                            NewManuallyManagedContext creates a new context, but the Run() method which locks a goroutine to an OS thread, has to be manually run

                                                            func (*Ctx) Address

                                                            func (ctx *Ctx) Address(hTexRef TexRef) (pdptr DevicePtr, err error)

                                                            func (*Ctx) AddressMode

                                                            func (ctx *Ctx) AddressMode(hTexRef TexRef, dim int) (pam AddressMode, err error)

                                                            func (*Ctx) Array

                                                            func (ctx *Ctx) Array(hTexRef TexRef) (phArray Array, err error)

                                                            func (*Ctx) AttachMemAsync

                                                            func (ctx *Ctx) AttachMemAsync(hStream Stream, dptr DevicePtr, length int64, flags uint)

                                                            func (*Ctx) BorderColor

                                                            func (ctx *Ctx) BorderColor(hTexRef TexRef) (pBorderColor [3]float32, err error)

                                                            func (*Ctx) CUDAContext

                                                            func (ctx *Ctx) CUDAContext() CUContext

                                                              CUDAContext returns the CUDA Context

                                                              func (*Ctx) CanAccessPeer

                                                              func (ctx *Ctx) CanAccessPeer(dev Device, peerDev Device) (canAccessPeer int, err error)

                                                              func (*Ctx) Close

                                                              func (ctx *Ctx) Close() error

                                                                Close destroys the CUDA context and associated resources that has been created. Additionally, all channels of communications will be closed.

                                                                func (*Ctx) CurrentCacheConfig

                                                                func (ctx *Ctx) CurrentCacheConfig() (pconfig FuncCacheConfig, err error)

                                                                func (*Ctx) CurrentDevice

                                                                func (ctx *Ctx) CurrentDevice() (device Device, err error)

                                                                func (*Ctx) CurrentFlags

                                                                func (ctx *Ctx) CurrentFlags() (flags ContextFlags, err error)

                                                                func (*Ctx) Descriptor

                                                                func (ctx *Ctx) Descriptor(hArray Array) (pArrayDescriptor ArrayDesc, err error)

                                                                func (*Ctx) Descriptor3

                                                                func (ctx *Ctx) Descriptor3(hArray Array) (pArrayDescriptor Array3Desc, err error)

                                                                func (*Ctx) DestroyArray

                                                                func (ctx *Ctx) DestroyArray(hArray Array)

                                                                func (*Ctx) DestroyEvent

                                                                func (ctx *Ctx) DestroyEvent(event *Event)

                                                                func (*Ctx) DestroyStream

                                                                func (ctx *Ctx) DestroyStream(hStream *Stream)

                                                                func (*Ctx) DisablePeerAccess

                                                                func (ctx *Ctx) DisablePeerAccess(peerContext CUContext)

                                                                func (*Ctx) Do

                                                                func (ctx *Ctx) Do(fn func() error) error

                                                                  Do does one function at a time.

                                                                  func (*Ctx) Elapsed

                                                                  func (ctx *Ctx) Elapsed(hStart Event, hEnd Event) (pMilliseconds float64, err error)

                                                                  func (*Ctx) EnablePeerAccess

                                                                  func (ctx *Ctx) EnablePeerAccess(peerContext CUContext, Flags uint)

                                                                  func (*Ctx) ErrChan

                                                                  func (ctx *Ctx) ErrChan() chan error

                                                                    ErrChan returns the internal error channel used

                                                                    func (*Ctx) Error

                                                                    func (ctx *Ctx) Error() error

                                                                      Error returns the errors that may have occured during the calls.

                                                                      func (*Ctx) FilterMode

                                                                      func (ctx *Ctx) FilterMode(hTexRef TexRef) (pfm FilterMode, err error)

                                                                      func (*Ctx) Format

                                                                      func (ctx *Ctx) Format(hTexRef TexRef) (pFormat Format, pNumChannels int, err error)

                                                                      func (*Ctx) FunctionAttribute

                                                                      func (ctx *Ctx) FunctionAttribute(fn Function, attrib FunctionAttribute) (pi int, err error)

                                                                      func (*Ctx) GetArray

                                                                      func (ctx *Ctx) GetArray(hSurfRef SurfRef) (phArray Array, err error)

                                                                      func (*Ctx) LaunchKernel

                                                                      func (ctx *Ctx) LaunchKernel(fn Function, gridDimX, gridDimY, gridDimZ int, blockDimX, blockDimY, blockDimZ int, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer)

                                                                      func (*Ctx) Limits

                                                                      func (ctx *Ctx) Limits(limit Limit) (pvalue int64, err error)

                                                                      func (*Ctx) Load

                                                                      func (ctx *Ctx) Load(name string) (m Module, err error)

                                                                      func (*Ctx) MakeEvent

                                                                      func (ctx *Ctx) MakeEvent(flags EventFlags) (event Event, err error)

                                                                      func (*Ctx) MakeStream

                                                                      func (ctx *Ctx) MakeStream(flags StreamFlags) (stream Stream, err error)

                                                                      func (*Ctx) MakeStreamWithPriority

                                                                      func (ctx *Ctx) MakeStreamWithPriority(priority int, flags StreamFlags) (Stream, error)

                                                                      func (*Ctx) MaxAnisotropy

                                                                      func (ctx *Ctx) MaxAnisotropy(hTexRef TexRef) (pmaxAniso int, err error)

                                                                      func (*Ctx) MemAlloc

                                                                      func (ctx *Ctx) MemAlloc(bytesize int64) (dptr DevicePtr, err error)

                                                                      func (*Ctx) MemAllocManaged

                                                                      func (ctx *Ctx) MemAllocManaged(bytesize int64, flags MemAttachFlags) (dptr DevicePtr, err error)

                                                                      func (*Ctx) MemAllocPitch

                                                                      func (ctx *Ctx) MemAllocPitch(WidthInBytes int64, Height int64, ElementSizeBytes uint) (dptr DevicePtr, pPitch int64, err error)

                                                                      func (*Ctx) MemFree

                                                                      func (ctx *Ctx) MemFree(dptr DevicePtr)

                                                                      func (*Ctx) MemFreeHost

                                                                      func (ctx *Ctx) MemFreeHost(p unsafe.Pointer)

                                                                      func (*Ctx) MemInfo

                                                                      func (ctx *Ctx) MemInfo() (free int64, total int64, err error)

                                                                      func (*Ctx) Memcpy

                                                                      func (ctx *Ctx) Memcpy(dst DevicePtr, src DevicePtr, ByteCount int64)

                                                                      func (*Ctx) Memcpy2D

                                                                      func (ctx *Ctx) Memcpy2D(pCopy Memcpy2dParam)

                                                                      func (*Ctx) Memcpy2DAsync

                                                                      func (ctx *Ctx) Memcpy2DAsync(pCopy Memcpy2dParam, hStream Stream)

                                                                      func (*Ctx) Memcpy2DUnaligned

                                                                      func (ctx *Ctx) Memcpy2DUnaligned(pCopy Memcpy2dParam)

                                                                      func (*Ctx) Memcpy3D

                                                                      func (ctx *Ctx) Memcpy3D(pCopy Memcpy3dParam)

                                                                      func (*Ctx) Memcpy3DAsync

                                                                      func (ctx *Ctx) Memcpy3DAsync(pCopy Memcpy3dParam, hStream Stream)

                                                                      func (*Ctx) Memcpy3DPeer

                                                                      func (ctx *Ctx) Memcpy3DPeer(pCopy Memcpy3dPeerParam)

                                                                      func (*Ctx) Memcpy3DPeerAsync

                                                                      func (ctx *Ctx) Memcpy3DPeerAsync(pCopy Memcpy3dPeerParam, hStream Stream)

                                                                      func (*Ctx) MemcpyAsync

                                                                      func (ctx *Ctx) MemcpyAsync(dst DevicePtr, src DevicePtr, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyAtoA

                                                                      func (ctx *Ctx) MemcpyAtoA(dstArray Array, dstOffset int64, srcArray Array, srcOffset int64, ByteCount int64)

                                                                      func (*Ctx) MemcpyAtoD

                                                                      func (ctx *Ctx) MemcpyAtoD(dstDevice DevicePtr, srcArray Array, srcOffset int64, ByteCount int64)

                                                                      func (*Ctx) MemcpyAtoH

                                                                      func (ctx *Ctx) MemcpyAtoH(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64)

                                                                      func (*Ctx) MemcpyAtoHAsync

                                                                      func (ctx *Ctx) MemcpyAtoHAsync(dstHost unsafe.Pointer, srcArray Array, srcOffset int64, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyDtoA

                                                                      func (ctx *Ctx) MemcpyDtoA(dstArray Array, dstOffset int64, srcDevice DevicePtr, ByteCount int64)

                                                                      func (*Ctx) MemcpyDtoD

                                                                      func (ctx *Ctx) MemcpyDtoD(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64)

                                                                      func (*Ctx) MemcpyDtoDAsync

                                                                      func (ctx *Ctx) MemcpyDtoDAsync(dstDevice DevicePtr, srcDevice DevicePtr, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyDtoH

                                                                      func (ctx *Ctx) MemcpyDtoH(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64)

                                                                      func (*Ctx) MemcpyDtoHAsync

                                                                      func (ctx *Ctx) MemcpyDtoHAsync(dstHost unsafe.Pointer, srcDevice DevicePtr, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyHtoA

                                                                      func (ctx *Ctx) MemcpyHtoA(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64)

                                                                      func (*Ctx) MemcpyHtoAAsync

                                                                      func (ctx *Ctx) MemcpyHtoAAsync(dstArray Array, dstOffset int64, srcHost unsafe.Pointer, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyHtoD

                                                                      func (ctx *Ctx) MemcpyHtoD(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64)

                                                                      func (*Ctx) MemcpyHtoDAsync

                                                                      func (ctx *Ctx) MemcpyHtoDAsync(dstDevice DevicePtr, srcHost unsafe.Pointer, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemcpyPeer

                                                                      func (ctx *Ctx) MemcpyPeer(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64)

                                                                      func (*Ctx) MemcpyPeerAsync

                                                                      func (ctx *Ctx) MemcpyPeerAsync(dstDevice DevicePtr, dstContext CUContext, srcDevice DevicePtr, srcContext CUContext, ByteCount int64, hStream Stream)

                                                                      func (*Ctx) MemsetD16

                                                                      func (ctx *Ctx) MemsetD16(dstDevice DevicePtr, us uint16, N int64)

                                                                      func (*Ctx) MemsetD16Async

                                                                      func (ctx *Ctx) MemsetD16Async(dstDevice DevicePtr, us uint16, N int64, hStream Stream)

                                                                      func (*Ctx) MemsetD2D16

                                                                      func (ctx *Ctx) MemsetD2D16(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64)

                                                                      func (*Ctx) MemsetD2D16Async

                                                                      func (ctx *Ctx) MemsetD2D16Async(dstDevice DevicePtr, dstPitch int64, us uint16, Width int64, Height int64, hStream Stream)

                                                                      func (*Ctx) MemsetD2D32

                                                                      func (ctx *Ctx) MemsetD2D32(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64)

                                                                      func (*Ctx) MemsetD2D32Async

                                                                      func (ctx *Ctx) MemsetD2D32Async(dstDevice DevicePtr, dstPitch int64, ui uint, Width int64, Height int64, hStream Stream)

                                                                      func (*Ctx) MemsetD2D8

                                                                      func (ctx *Ctx) MemsetD2D8(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64)

                                                                      func (*Ctx) MemsetD2D8Async

                                                                      func (ctx *Ctx) MemsetD2D8Async(dstDevice DevicePtr, dstPitch int64, uc byte, Width int64, Height int64, hStream Stream)

                                                                      func (*Ctx) MemsetD32

                                                                      func (ctx *Ctx) MemsetD32(dstDevice DevicePtr, ui uint, N int64)

                                                                      func (*Ctx) MemsetD32Async

                                                                      func (ctx *Ctx) MemsetD32Async(dstDevice DevicePtr, ui uint, N int64, hStream Stream)

                                                                      func (*Ctx) MemsetD8

                                                                      func (ctx *Ctx) MemsetD8(dstDevice DevicePtr, uc byte, N int64)

                                                                      func (*Ctx) MemsetD8Async

                                                                      func (ctx *Ctx) MemsetD8Async(dstDevice DevicePtr, uc byte, N int64, hStream Stream)

                                                                      func (*Ctx) ModuleFunction

                                                                      func (ctx *Ctx) ModuleFunction(m Module, name string) (function Function, err error)

                                                                      func (*Ctx) ModuleGlobal

                                                                      func (ctx *Ctx) ModuleGlobal(m Module, name string) (dptr DevicePtr, size int64, err error)

                                                                      func (*Ctx) ModuleSurfRef

                                                                      func (ctx *Ctx) ModuleSurfRef(mod Module, name string) (SurfRef, error)

                                                                      func (*Ctx) ModuleTexRef

                                                                      func (ctx *Ctx) ModuleTexRef(mod Module, name string) (TexRef, error)

                                                                      func (*Ctx) Priority

                                                                      func (ctx *Ctx) Priority(hStream Stream) (priority int, err error)

                                                                      func (*Ctx) QueryEvent

                                                                      func (ctx *Ctx) QueryEvent(hEvent Event)

                                                                      func (*Ctx) QueryStream

                                                                      func (ctx *Ctx) QueryStream(hStream Stream)

                                                                      func (*Ctx) Record

                                                                      func (ctx *Ctx) Record(hEvent Event, hStream Stream)

                                                                      func (*Ctx) Run

                                                                      func (ctx *Ctx) Run(errChan chan error) error

                                                                        Run locks the goroutine to the OS thread and ties the CUDA context to the OS thread. For most cases, this would suffice

                                                                        Note: errChan that is passed in should NOT be the same errChan as the one used internally for signalling. The main reasoning for passing in an error channel is to support two different kinds of run modes:

                                                                        The typical use example is as such:

                                                                        func A() {
                                                                        		ctx := NewContext(d, SchedAuto)
                                                                        		errChan := make(chan error)
                                                                        		go ctx.Run(errChan)
                                                                        		if err := <- errChan; err != nil {
                                                                        			// handleError
                                                                        		}
                                                                        		doSomethingWithCtx(ctx)
                                                                        }
                                                                        

                                                                        And yet another run mode supported is running of the context in the main thread:

                                                                        func main() {
                                                                        	ctx := NewContext(d, SchedAuto)
                                                                        	go doSomethingWithCtx(ctx)
                                                                        	if err := ctx.Run(nil); err != nil{
                                                                        		// handle error
                                                                        	}
                                                                        }
                                                                        

                                                                        func (*Ctx) SetAddress

                                                                        func (ctx *Ctx) SetAddress(hTexRef TexRef, dptr DevicePtr, bytes int64) (ByteOffset int64, err error)

                                                                        func (*Ctx) SetAddress2D

                                                                        func (ctx *Ctx) SetAddress2D(hTexRef TexRef, desc ArrayDesc, dptr DevicePtr, Pitch int64)

                                                                        func (*Ctx) SetAddressMode

                                                                        func (ctx *Ctx) SetAddressMode(hTexRef TexRef, dim int, am AddressMode)

                                                                        func (*Ctx) SetBorderColor

                                                                        func (ctx *Ctx) SetBorderColor(hTexRef TexRef, pBorderColor [3]float32)

                                                                        func (*Ctx) SetCacheConfig

                                                                        func (ctx *Ctx) SetCacheConfig(fn Function, config FuncCacheConfig)

                                                                        func (*Ctx) SetCurrentCacheConfig

                                                                        func (ctx *Ctx) SetCurrentCacheConfig(config FuncCacheConfig)

                                                                        func (*Ctx) SetFilterMode

                                                                        func (ctx *Ctx) SetFilterMode(hTexRef TexRef, fm FilterMode)

                                                                        func (*Ctx) SetFormat

                                                                        func (ctx *Ctx) SetFormat(hTexRef TexRef, fmt Format, NumPackedComponents int)

                                                                        func (*Ctx) SetFunctionSharedMemConfig

                                                                        func (ctx *Ctx) SetFunctionSharedMemConfig(fn Function, config SharedConfig)

                                                                        func (*Ctx) SetLimit

                                                                        func (ctx *Ctx) SetLimit(limit Limit, value int64)

                                                                        func (*Ctx) SetMaxAnisotropy

                                                                        func (ctx *Ctx) SetMaxAnisotropy(hTexRef TexRef, maxAniso uint)

                                                                        func (*Ctx) SetMipmapFilterMode

                                                                        func (ctx *Ctx) SetMipmapFilterMode(hTexRef TexRef, fm FilterMode)

                                                                        func (*Ctx) SetMipmapLevelBias

                                                                        func (ctx *Ctx) SetMipmapLevelBias(hTexRef TexRef, bias float64)

                                                                        func (*Ctx) SetMipmapLevelClamp

                                                                        func (ctx *Ctx) SetMipmapLevelClamp(hTexRef TexRef, minMipmapLevelClamp float64, maxMipmapLevelClamp float64)

                                                                        func (*Ctx) SetSharedMemConfig

                                                                        func (ctx *Ctx) SetSharedMemConfig(config SharedConfig)

                                                                        func (*Ctx) SetTexRefFlags

                                                                        func (ctx *Ctx) SetTexRefFlags(hTexRef TexRef, Flags TexRefFlags)

                                                                        func (*Ctx) SharedMemConfig

                                                                        func (ctx *Ctx) SharedMemConfig() (pConfig SharedConfig, err error)

                                                                        func (*Ctx) StreamFlags

                                                                        func (ctx *Ctx) StreamFlags(hStream Stream) (flags uint, err error)

                                                                        func (*Ctx) StreamPriorityRange

                                                                        func (ctx *Ctx) StreamPriorityRange() (leastPriority int, greatestPriority int, err error)

                                                                        func (*Ctx) SurfRefSetArray

                                                                        func (ctx *Ctx) SurfRefSetArray(hSurfRef SurfRef, hArray Array, Flags uint)

                                                                        func (*Ctx) Synchronize

                                                                        func (ctx *Ctx) Synchronize()

                                                                        func (*Ctx) SynchronizeEvent

                                                                        func (ctx *Ctx) SynchronizeEvent(hEvent Event)

                                                                        func (*Ctx) SynchronizeStream

                                                                        func (ctx *Ctx) SynchronizeStream(hStream Stream)

                                                                        func (*Ctx) TexRefFlags

                                                                        func (ctx *Ctx) TexRefFlags(hTexRef TexRef) (pFlags uint, err error)

                                                                        func (*Ctx) TexRefSetArray

                                                                        func (ctx *Ctx) TexRefSetArray(hTexRef TexRef, hArray Array, Flags uint)

                                                                        func (*Ctx) Unload

                                                                        func (ctx *Ctx) Unload(hmod Module)

                                                                        func (*Ctx) Wait

                                                                        func (ctx *Ctx) Wait(hStream Stream, hEvent Event, Flags uint)

                                                                        func (*Ctx) WaitOnValue32

                                                                        func (ctx *Ctx) WaitOnValue32(stream Stream, addr DevicePtr, value uint32, flags uint)

                                                                        func (*Ctx) Work

                                                                        func (ctx *Ctx) Work() <-chan func() error

                                                                          Work returns the channel where work will be passed in. In most cases you don't need this. Use Run instead.

                                                                          func (*Ctx) WriteValue32

                                                                          func (ctx *Ctx) WriteValue32(stream Stream, addr DevicePtr, value uint32, flags uint)

                                                                          type Device

                                                                          type Device int

                                                                            Device is the representation of a CUDA device

                                                                            const (
                                                                            	CPU       Device = -1
                                                                            	BadDevice Device = -2
                                                                            )

                                                                            func CurrentDevice

                                                                            func CurrentDevice() (device Device, err error)

                                                                            func GetDevice

                                                                            func GetDevice(ordinal int) (device Device, err error)

                                                                            func (Device) Attribute

                                                                            func (dev Device) Attribute(attrib DeviceAttribute) (pi int, err error)

                                                                            func (Device) Attributes

                                                                            func (dev Device) Attributes(attrs ...DeviceAttribute) ([]int, error)

                                                                              Attributes gets multiple attributes as provided

                                                                              func (Device) CanAccessPeer

                                                                              func (dev Device) CanAccessPeer(peerDev Device) (canAccessPeer int, err error)

                                                                              func (Device) ComputeCapability

                                                                              func (d Device) ComputeCapability() (major, minor int, err error)

                                                                                ComputeCapability returns the compute capability of the device. This method is a convenience method for the deprecated API call cuDeviceComputeCapability.

                                                                                func (Device) IsGPU

                                                                                func (d Device) IsGPU() bool

                                                                                  IsGPU returns true if the device is a GPU.

                                                                                  func (Device) MakeContext

                                                                                  func (d Device) MakeContext(flags ContextFlags) (CUContext, error)

                                                                                  func (Device) Name

                                                                                  func (d Device) Name() (string, error)

                                                                                    Name returns the name of the device.

                                                                                    Wrapper over cuDeviceGetName: http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gef75aa30df95446a845f2a7b9fffbb7f

                                                                                    func (Device) P2PAttribute

                                                                                    func (srcDevice Device) P2PAttribute(attrib P2PAttribute, dstDevice Device) (value int, err error)

                                                                                    func (Device) PrimaryCtxState

                                                                                    func (dev Device) PrimaryCtxState() (flags ContextFlags, active int, err error)

                                                                                    func (Device) ReleasePrimaryCtx

                                                                                    func (dev Device) ReleasePrimaryCtx() (err error)

                                                                                    func (Device) ResetPrimaryCtx

                                                                                    func (dev Device) ResetPrimaryCtx() (err error)

                                                                                    func (Device) RetainPrimaryCtx

                                                                                    func (d Device) RetainPrimaryCtx() (primaryContext CUContext, err error)

                                                                                      RetainPrimaryCtx retains the primary context on the GPU, creating it if necessary, increasing its usage count.

                                                                                      The caller must call d.ReleasePrimaryCtx() when done using the context. Unlike MakeContext() the newly created context is not pushed onto the stack.

                                                                                      Context creation will fail with error `UnknownError` if the compute mode of the device is CU_COMPUTEMODE_PROHIBITED. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. The nvidia-smi tool can be used to set the compute mode for devices. Documentation for nvidia-smi can be obtained by passing a -h option to it. Please note that the primary context always supports pinned allocations. Other flags can be specified by cuDevicePrimaryCtxSetFlags().

                                                                                      func (Device) SetPrimaryCtxFlags

                                                                                      func (dev Device) SetPrimaryCtxFlags(flags ContextFlags) (err error)

                                                                                      func (Device) String

                                                                                      func (d Device) String() string

                                                                                        String implementes fmt.Stringer (and runtime.stringer)

                                                                                        func (Device) TotalMem

                                                                                        func (dev Device) TotalMem() (bytes int64, err error)

                                                                                        type DeviceAttribute

                                                                                        type DeviceAttribute int

                                                                                          DeviceAttribute represents the device attributes that the user can query CUDA for.

                                                                                          const (
                                                                                          	MaxThreadsPerBlock                 DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK                   // Maximum number of threads per block
                                                                                          	MaxBlockDimX                       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X                         // Maximum block dimension X
                                                                                          	MaxBlockDimY                       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y                         // Maximum block dimension Y
                                                                                          	MaxBlockDimZ                       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z                         // Maximum block dimension Z
                                                                                          	MaxGridDimX                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X                          // Maximum grid dimension X
                                                                                          	MaxGridDimY                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y                          // Maximum grid dimension Y
                                                                                          	MaxGridDimZ                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z                          // Maximum grid dimension Z
                                                                                          	MaxSharedMemoryPerBlock            DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK             // Maximum shared memory available per block in bytes
                                                                                          	SharedMemoryPerBlock               DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK                 // Deprecated, use CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK
                                                                                          	TotalConstantMemory                DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY                   // Memory available on device for __constant__ variables in a CUDA C kernel in bytes
                                                                                          	WarpSize                           DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_WARP_SIZE                               // Warp size in threads
                                                                                          	MaxPitch                           DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_PITCH                               // Maximum pitch in bytes allowed by memory copies
                                                                                          	MaxRegistersPerBlock               DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK                 // Maximum number of 32-bit registers available per block
                                                                                          	RegistersPerBlock                  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK                     // Deprecated, use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK
                                                                                          	ClockRate                          DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CLOCK_RATE                              // Typical clock frequency in kilohertz
                                                                                          	TextureAlignment                   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT                       // Alignment requirement for textures
                                                                                          	GpuOverlap                         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_GPU_OVERLAP                             // Device can possibly copy memory and execute a kernel concurrently. Deprecated. Use instead CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT.
                                                                                          	MultiprocessorCount                DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT                    // Number of multiprocessors on device
                                                                                          	KernelExecTimeout                  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT                     // Specifies whether there is a run time limit on kernels
                                                                                          	Integrated                         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_INTEGRATED                              // Device is integrated with host memory
                                                                                          	CanMapHostMemory                   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY                     // Device can map host memory into CUDA address space
                                                                                          	ComputeMode                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_COMPUTE_MODE                            // Compute mode (See CUcomputemode for details)
                                                                                          	MaximumTexture1dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH                 // Maximum 1D texture width
                                                                                          	MaximumTexture2dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH                 // Maximum 2D texture width
                                                                                          	MaximumTexture2dHeight             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT                // Maximum 2D texture height
                                                                                          	MaximumTexture3dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH                 // Maximum 3D texture width
                                                                                          	MaximumTexture3dHeight             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT                // Maximum 3D texture height
                                                                                          	MaximumTexture3dDepth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH                 // Maximum 3D texture depth
                                                                                          	MaximumTexture2dLayeredWidth       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH         // Maximum 2D layered texture width
                                                                                          	MaximumTexture2dLayeredHeight      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT        // Maximum 2D layered texture height
                                                                                          	MaximumTexture2dLayeredLayers      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS        // Maximum layers in a 2D layered texture
                                                                                          	MaximumTexture2dArrayWidth         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH           // Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH
                                                                                          	MaximumTexture2dArrayHeight        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT          // Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT
                                                                                          	MaximumTexture2dArrayNumslices     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES       // Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS
                                                                                          	SurfaceAlignment                   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT                       // Alignment requirement for surfaces
                                                                                          	ConcurrentKernels                  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS                      // Device can possibly execute multiple kernels concurrently
                                                                                          	EccEnabled                         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_ECC_ENABLED                             // Device has ECC support enabled
                                                                                          	PciBusID                           DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_PCI_BUS_ID                              // PCI bus ID of the device
                                                                                          	PciDeviceID                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID                           // PCI device ID of the device
                                                                                          	TccDriver                          DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_TCC_DRIVER                              // Device is using TCC driver model
                                                                                          	MemoryClockRate                    DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE                       // Peak memory clock frequency in kilohertz
                                                                                          	GlobalMemoryBusWidth               DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH                 // Global memory bus width in bits
                                                                                          	L2CacheSize                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE                           // Size of L2 cache in bytes
                                                                                          	MaxThreadsPerMultiprocessor        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR          // Maximum resident threads per multiprocessor
                                                                                          	AsyncEngineCount                   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT                      // Number of asynchronous engines
                                                                                          	UnifiedAddressing                  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING                      // Device shares a unified address space with the host
                                                                                          	MaximumTexture1dLayeredWidth       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_WIDTH         // Maximum 1D layered texture width
                                                                                          	MaximumTexture1dLayeredLayers      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_LAYERS        // Maximum layers in a 1D layered texture
                                                                                          	CanTex2dGather                     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CAN_TEX2D_GATHER                        // Deprecated, do not use.
                                                                                          	MaximumTexture2dGatherWidth        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_WIDTH          // Maximum 2D texture width if CUDA_ARRAY3D_TEXTURE_GATHER is set
                                                                                          	MaximumTexture2dGatherHeight       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_HEIGHT         // Maximum 2D texture height if CUDA_ARRAY3D_TEXTURE_GATHER is set
                                                                                          	MaximumTexture3dWidthAlternate     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE       // Alternate maximum 3D texture width
                                                                                          	MaximumTexture3dHeightAlternate    DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE      // Alternate maximum 3D texture height
                                                                                          	MaximumTexture3dDepthAlternate     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE       // Alternate maximum 3D texture depth
                                                                                          	PciDomainID                        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID                           // PCI domain ID of the device
                                                                                          	TexturePitchAlignment              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT                 // Pitch alignment requirement for textures
                                                                                          	MaximumTexturecubemapWidth         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_WIDTH            // Maximum cubemap texture width/height
                                                                                          	MaximumTexturecubemapLayeredWidth  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH    // Maximum cubemap layered texture width/height
                                                                                          	MaximumTexturecubemapLayeredLayers DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS   // Maximum layers in a cubemap layered texture
                                                                                          	MaximumSurface1dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_WIDTH                 // Maximum 1D surface width
                                                                                          	MaximumSurface2dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_WIDTH                 // Maximum 2D surface width
                                                                                          	MaximumSurface2dHeight             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_HEIGHT                // Maximum 2D surface height
                                                                                          	MaximumSurface3dWidth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_WIDTH                 // Maximum 3D surface width
                                                                                          	MaximumSurface3dHeight             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_HEIGHT                // Maximum 3D surface height
                                                                                          	MaximumSurface3dDepth              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_DEPTH                 // Maximum 3D surface depth
                                                                                          	MaximumSurface1dLayeredWidth       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_WIDTH         // Maximum 1D layered surface width
                                                                                          	MaximumSurface1dLayeredLayers      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_LAYERS        // Maximum layers in a 1D layered surface
                                                                                          	MaximumSurface2dLayeredWidth       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_WIDTH         // Maximum 2D layered surface width
                                                                                          	MaximumSurface2dLayeredHeight      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_HEIGHT        // Maximum 2D layered surface height
                                                                                          	MaximumSurface2dLayeredLayers      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_LAYERS        // Maximum layers in a 2D layered surface
                                                                                          	MaximumSurfacecubemapWidth         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_WIDTH            // Maximum cubemap surface width
                                                                                          	MaximumSurfacecubemapLayeredWidth  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH    // Maximum cubemap layered surface width
                                                                                          	MaximumSurfacecubemapLayeredLayers DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS   // Maximum layers in a cubemap layered surface
                                                                                          	MaximumTexture1dLinearWidth        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LINEAR_WIDTH          // Maximum 1D linear texture width
                                                                                          	MaximumTexture2dLinearWidth        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_WIDTH          // Maximum 2D linear texture width
                                                                                          	MaximumTexture2dLinearHeight       DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_HEIGHT         // Maximum 2D linear texture height
                                                                                          	MaximumTexture2dLinearPitch        DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_PITCH          // Maximum 2D linear texture pitch in bytes
                                                                                          	MaximumTexture2dMipmappedWidth     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH       // Maximum mipmapped 2D texture width
                                                                                          	MaximumTexture2dMipmappedHeight    DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT      // Maximum mipmapped 2D texture height
                                                                                          	ComputeCapabilityMajor             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR                // Major compute capability version number
                                                                                          	ComputeCapabilityMinor             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR                // Minor compute capability version number
                                                                                          	MaximumTexture1dMipmappedWidth     DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH       // Maximum mipmapped 1D texture width
                                                                                          	StreamPrioritiesSupported          DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_STREAM_PRIORITIES_SUPPORTED             // Device supports stream priorities
                                                                                          	GlobalL1CacheSupported             DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED               // Device supports caching globals in L1
                                                                                          	LocalL1CacheSupported              DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED                // Device supports caching locals in L1
                                                                                          	MaxSharedMemoryPerMultiprocessor   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR    // Maximum shared memory available per multiprocessor in bytes
                                                                                          	MaxRegistersPerMultiprocessor      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR        // Maximum number of 32-bit registers available per multiprocessor
                                                                                          	ManagedMemory                      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY                          // Device can allocate managed memory on this system
                                                                                          	MultiGpuBoard                      DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD                         // Device is on a multi-GPU board
                                                                                          	MultiGpuBoardGroupID               DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD_GROUP_ID                // Unique id for a group of devices on the same multi-GPU board
                                                                                          	HostNativeAtomicSupported          DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_HOST_NATIVE_ATOMIC_SUPPORTED            // Link between the device and the host supports native atomic operations (this is a placeholder attribute, and is not supported on any current hardware)
                                                                                          	SingleToDoublePrecisionPerfRatio   DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_SINGLE_TO_DOUBLE_PRECISION_PERF_RATIO   // Ratio of single precision performance (in floating-point operations per second) to double precision performance
                                                                                          	PageableMemoryAccess               DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS                  // Device supports coherently accessing pageable memory without calling cudaHostRegister on it
                                                                                          	ConcurrentManagedAccess            DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS               // Device can coherently access managed memory concurrently with the CPU
                                                                                          	ComputePreemptionSupported         DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED            // Device supports compute preemption.
                                                                                          	CanUseHostPointerForRegisteredMem  DeviceAttribute = C.CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM // Device can access host registered memory at the same virtual address as the CPU
                                                                                          
                                                                                          )

                                                                                          type DevicePtr

                                                                                          type DevicePtr uintptr

                                                                                            DevicePtr is a pointer to the device memory. It is equivalent to CUDA's CUdeviceptr

                                                                                            func AllocAndCopy

                                                                                            func AllocAndCopy(p unsafe.Pointer, bytesize int64) (DevicePtr, error)

                                                                                              AllocAndCopy abstracts away the common pattern of allocating and then copying a Go slice to the GPU

                                                                                              func MemAlloc

                                                                                              func MemAlloc(bytesize int64) (dptr DevicePtr, err error)

                                                                                              func MemAllocManaged

                                                                                              func MemAllocManaged(bytesize int64, flags MemAttachFlags) (dptr DevicePtr, err error)

                                                                                              func MemAllocPitch

                                                                                              func MemAllocPitch(WidthInBytes int64, Height int64, ElementSizeBytes uint) (dptr DevicePtr, pPitch int64, err error)

                                                                                              func (DevicePtr) AddressRange

                                                                                              func (d DevicePtr) AddressRange() (size int64, base DevicePtr, err error)

                                                                                              func (DevicePtr) IsCUDAMemory

                                                                                              func (d DevicePtr) IsCUDAMemory() bool

                                                                                                IsCUDAMemory returns true.

                                                                                                func (DevicePtr) MemAdvise

                                                                                                func (d DevicePtr) MemAdvise(count int64, advice MemAdvice, dev Device) error

                                                                                                  MemAdvise advises the Unified Memory subsystem about the usage pattern for the memory range starting at d with a size of count bytes. The start address and end address of the memory range will be rounded down and rounded up respectively to be aligned to CPU page size before the advice is applied. The memory range must refer to managed memory allocated via `MemAllocManaged` or declared via __managed__ variables.

                                                                                                  The advice parameters can take either of the following values:

                                                                                                  - SetReadMostly:
                                                                                                  	This implies that the data is mostly going to be read from and only occasionally written to.
                                                                                                  	Any read accesses from any processor to this region will create a read-only copy of at least the accessed pages in that processor's memory. Additionally, if cuMemPrefetchAsync is called on this region, it will create a read-only copy of the data on the destination processor. If any processor writes to this region, all copies of the corresponding page will be invalidated except for the one where the write occurred. The device argument is ignored for this advice. Note that for a page to be read-duplicated, the accessing processor must either be the CPU or a GPU that has a non-zero value for the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS. Also, if a context is created on a device that does not have the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS set, then read-duplication will not occur until all such contexts are destroyed.
                                                                                                  - UnsetReadMostly:
                                                                                                  	Undoes the effect of SetReadMostly and also prevents the Unified Memory driver from attempting heuristic read-duplication on the memory range.
                                                                                                  	Any read-duplicated copies of the data will be collapsed into a single copy.
                                                                                                  	The location for the collapsed copy will be the preferred location if the page has a preferred location and one of the read-duplicated copies was resident at that location.
                                                                                                  	Otherwise, the location chosen is arbitrary.
                                                                                                  - SetPreferredLocation:
                                                                                                  	This advice sets the preferred location for the data to be the memory belonging to device.
                                                                                                  	Passing in CU_DEVICE_CPU for device sets the preferred location as host memory.
                                                                                                  	If device is a GPU, then it must have a non-zero value for the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS.
                                                                                                  	Setting the preferred location does not cause data to migrate to that location immediately.
                                                                                                  	Instead, it guides the migration policy when a fault occurs on that memory region.
                                                                                                  	If the data is already in its preferred location and the faulting processor can establish a mapping without requiring the data to be migrated, then data migration will be avoided.
                                                                                                  	On the other hand, if the data is not in its preferred location or if a direct mapping cannot be established, then it will be migrated to the processor accessing it.
                                                                                                  	It is important to note that setting the preferred location does not prevent data prefetching done using cuMemPrefetchAsync.
                                                                                                  	Having a preferred location can override the page thrash detection and resolution logic in the Unified Memory driver.
                                                                                                  	Normally, if a page is detected to be constantly thrashing between for example host and device memory, the page may eventually be pinned to host memory by the Unified Memory driver.
                                                                                                  	But if the preferred location is set as device memory, then the page will continue to thrash indefinitely.
                                                                                                  	If CU_MEM_ADVISE_SET_READ_MOSTLY is also set on this memory region or any subset of it, then the policies associated with that advice will override the policies of this advice.
                                                                                                  - UnsetPreferredLocation:
                                                                                                  	Undoes the effect of SetPreferredLocation and changes the preferred location to none.
                                                                                                  - SetAccessedBy:
                                                                                                  	This advice implies that the data will be accessed by device.
                                                                                                  	Passing in CU_DEVICE_CPU for device will set the advice for the CPU.
                                                                                                  	If device is a GPU, then the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS must be non-zero.
                                                                                                  	This advice does not cause data migration and has no impact on the location of the data per se.
                                                                                                  	Instead, it causes the data to always be mapped in the specified processor's page tables, as long as the location of the data permits a mapping to be established.
                                                                                                  	If the data gets migrated for any reason, the mappings are updated accordingly.
                                                                                                  	This advice is recommended in scenarios where data locality is not important, but avoiding faults is.
                                                                                                  	Consider for example a system containing multiple GPUs with peer-to-peer access enabled, where the data located on one GPU is occasionally accessed by peer GPUs.
                                                                                                  	In such scenarios, migrating data over to the other GPUs is not as important because the accesses are infrequent and the overhead of migration may be too high.
                                                                                                  	But preventing faults can still help improve performance, and so having a mapping set up in advance is useful.
                                                                                                  	Note that on CPU access of this data, the data may be migrated to host memory because the CPU typically cannot access device memory directly.
                                                                                                  	Any GPU that had the CU_MEM_ADVISE_SET_ACCESSED_BY flag set for this data will now have its mapping updated to point to the page in host memory.
                                                                                                  	If CU_MEM_ADVISE_SET_READ_MOSTLY is also set on this memory region or any subset of it, then the policies associated with that advice will override the policies of this advice.
                                                                                                  	Additionally, if the preferred location of this memory region or any subset of it is also device, then the policies associated with CU_MEM_ADVISE_SET_PREFERRED_LOCATION will override the policies of this advice.
                                                                                                  - UnsetAccessedBy:
                                                                                                  	Undoes the effect of SetAccessedBy.
                                                                                                  	Any mappings to the data from device may be removed at any time causing accesses to result in non-fatal page faults.
                                                                                                  

                                                                                                  func (DevicePtr) MemPrefetchAsync

                                                                                                  func (d DevicePtr) MemPrefetchAsync(count int64, dst Device, hStream Stream) error

                                                                                                    MemPrefetchAsync prefetches memory to the specified destination device. devPtr is the base device pointer of the memory to be prefetched and dstDevice is the destination device. count specifies the number of bytes to copy. hStream is the stream in which the operation is enqueued. The memory range must refer to managed memory allocated via cuMemAllocManaged or declared via __managed__ variables. Passing in CU_DEVICE_CPU for dstDevice will prefetch the data to host memory. If dstDevice is a GPU, then the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS must be non-zero. Additionally, hStream must be associated with a device that has a non-zero value for the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS.

                                                                                                    The start address and end address of the memory range will be rounded down and rounded up respectively to be aligned to CPU page size before the prefetch operation is enqueued in the stream.

                                                                                                    If no physical memory has been allocated for this region, then this memory region will be populated and mapped on the destination device. If there's insufficient memory to prefetch the desired region, the Unified Memory driver may evict pages from other cuMemAllocManaged allocations to host memory in order to make room. Device memory allocated using cuMemAlloc or cuArrayCreate will not be evicted.

                                                                                                    By default, any mappings to the previous location of the migrated pages are removed and mappings for the new location are only setup on dstDevice. The exact behavior however also depends on the settings applied to this memory range via cuMemAdvise as described below:

                                                                                                    If CU_MEM_ADVISE_SET_READ_MOSTLY was set on any subset of this memory range, then that subset will create a read-only copy of the pages on dstDevice.

                                                                                                    If CU_MEM_ADVISE_SET_PREFERRED_LOCATION was called on any subset of this memory range, then the pages will be migrated to dstDevice even if dstDevice is not the preferred location of any pages in the memory range.

                                                                                                    If CU_MEM_ADVISE_SET_ACCESSED_BY was called on any subset of this memory range, then mappings to those pages from all the appropriate processors are updated to refer to the new location if establishing such a mapping is possible. Otherwise, those mappings are cleared.

                                                                                                    Note that this API is not required for functionality and only serves to improve performance by allowing the application to migrate data to a suitable location before it is accessed. Memory accesses to this range are always coherent and are allowed even when the data is actively being migrated.

                                                                                                    Note that this function is asynchronous with respect to the host and all work on other devices.

                                                                                                    func (DevicePtr) MemSize

                                                                                                    func (mem DevicePtr) MemSize() uintptr

                                                                                                      MemSize returns the size of the memory slab in bytes. Returns 0 if errors occured

                                                                                                      func (DevicePtr) MemoryType

                                                                                                      func (mem DevicePtr) MemoryType() (typ MemoryType, err error)

                                                                                                        MemoryType returns the MemoryType of the memory

                                                                                                        func (DevicePtr) Pointer

                                                                                                        func (mem DevicePtr) Pointer() unsafe.Pointer

                                                                                                          Pointer returns the pointer in form of unsafe.pointer. You shouldn't use it though, as the pointer is typically on the device

                                                                                                          func (DevicePtr) PtrAttribute

                                                                                                          func (d DevicePtr) PtrAttribute(attr PointerAttribute) (unsafe.Pointer, error)

                                                                                                            PtrAttribute returns information about a pointer.

                                                                                                            func (DevicePtr) SetPtrAttribute

                                                                                                            func (d DevicePtr) SetPtrAttribute(value unsafe.Pointer, attr PointerAttribute) error

                                                                                                              SetPtrAttribute sets attributes on a previously allocated memory region. The supported attributes are:

                                                                                                              SynncMemOpsAttr:
                                                                                                              	A boolean attribute that can either be set (1) or unset (0).
                                                                                                              	When set, the region of memory that ptr points to is guaranteed to always synchronize memory operations that are synchronous.
                                                                                                              	If there are some previously initiated synchronous memory operations that are pending when this attribute is set, the function does not return until those memory operations are complete.
                                                                                                              	See further documentation in the section titled "API synchronization behavior" to learn more about cases when synchronous memory operations can exhibit asynchronous behavior.
                                                                                                              	`value` will be considered as a pointer to an unsigned integer to which this attribute is to be set.
                                                                                                              

                                                                                                              func (DevicePtr) String

                                                                                                              func (d DevicePtr) String() string

                                                                                                              func (DevicePtr) Uintptr

                                                                                                              func (d DevicePtr) Uintptr() uintptr

                                                                                                                Uintptr returns the pointer in form of a uintptr

                                                                                                                type ErrorLister

                                                                                                                type ErrorLister interface {
                                                                                                                	ListErrors() []error
                                                                                                                }

                                                                                                                  ErrorLister is the interface for a slice of error

                                                                                                                  type Event

                                                                                                                  type Event struct {
                                                                                                                  	// contains filtered or unexported fields
                                                                                                                  }

                                                                                                                    Event represents a CUDA event

                                                                                                                    func MakeEvent

                                                                                                                    func MakeEvent(flags EventFlags) (event Event, err error)

                                                                                                                    func (Event) Elapsed

                                                                                                                    func (hStart Event) Elapsed(hEnd Event) (pMilliseconds float64, err error)

                                                                                                                    func (Event) Query

                                                                                                                    func (hEvent Event) Query() (err error)

                                                                                                                    func (Event) Record

                                                                                                                    func (hEvent Event) Record(hStream Stream) (err error)

                                                                                                                    func (Event) Synchronize

                                                                                                                    func (hEvent Event) Synchronize() (err error)

                                                                                                                    type EventFlags

                                                                                                                    type EventFlags byte

                                                                                                                      EventFlags are flags to be used with event creation

                                                                                                                      const (
                                                                                                                      	DefaultEvent      EventFlags = C.CU_EVENT_DEFAULT        // Default event flag
                                                                                                                      	BlockingSyncEvent EventFlags = C.CU_EVENT_BLOCKING_SYNC  // Event uses blocking synchronization
                                                                                                                      	DisableTiming     EventFlags = C.CU_EVENT_DISABLE_TIMING // Event will not record timing data
                                                                                                                      	InterprocessEvent EventFlags = C.CU_EVENT_INTERPROCESS   // Event is suitable for interprocess use. DisableTiming must be set
                                                                                                                      )

                                                                                                                      type FilterMode

                                                                                                                      type FilterMode byte

                                                                                                                        FilterMode are texture reference filtering modes

                                                                                                                        const (
                                                                                                                        	PointFilterMode  FilterMode = C.CU_TR_FILTER_MODE_POINT  // Point filter mode
                                                                                                                        	LinearFilterMode FilterMode = C.CU_TR_FILTER_MODE_LINEAR // Linear filter mode
                                                                                                                        )

                                                                                                                        type Format

                                                                                                                        type Format byte

                                                                                                                          Format is the type of array (think array types)

                                                                                                                          const (
                                                                                                                          	Uint8   Format = C.CU_AD_FORMAT_UNSIGNED_INT8  // Unsigned 8-bit integers
                                                                                                                          	Uint16  Format = C.CU_AD_FORMAT_UNSIGNED_INT16 // Unsigned 16-bit integers
                                                                                                                          	Uin32   Format = C.CU_AD_FORMAT_UNSIGNED_INT32 // Unsigned 32-bit integers
                                                                                                                          	Int8    Format = C.CU_AD_FORMAT_SIGNED_INT8    // Signed 8-bit integers
                                                                                                                          	Int16   Format = C.CU_AD_FORMAT_SIGNED_INT16   // Signed 16-bit integers
                                                                                                                          	Int32   Format = C.CU_AD_FORMAT_SIGNED_INT32   // Signed 32-bit integers
                                                                                                                          	Float16 Format = C.CU_AD_FORMAT_HALF           // 16-bit floating point
                                                                                                                          	Float32 Format = C.CU_AD_FORMAT_FLOAT          // 32-bit floating point
                                                                                                                          )

                                                                                                                          type FuncCacheConfig

                                                                                                                          type FuncCacheConfig byte

                                                                                                                            FuncCacheConfig represents the CUfunc_cache enum type, which are enumerations for cache configurations

                                                                                                                            const (
                                                                                                                            	PreferNone   FuncCacheConfig = C.CU_FUNC_CACHE_PREFER_NONE   // no preference for shared memory or L1 (default)
                                                                                                                            	PreferShared FuncCacheConfig = C.CU_FUNC_CACHE_PREFER_SHARED // prefer larger shared memory and smaller L1 cache
                                                                                                                            	PreferL1     FuncCacheConfig = C.CU_FUNC_CACHE_PREFER_L1     // prefer larger L1 cache and smaller shared memory
                                                                                                                            	PreferEqual  FuncCacheConfig = C.CU_FUNC_CACHE_PREFER_EQUAL  // prefer equal sized L1 cache and shared memory
                                                                                                                            )

                                                                                                                            func CurrentCacheConfig

                                                                                                                            func CurrentCacheConfig() (pconfig FuncCacheConfig, err error)

                                                                                                                            type Function

                                                                                                                            type Function struct {
                                                                                                                            	// contains filtered or unexported fields
                                                                                                                            }

                                                                                                                              Function represents a CUDA function

                                                                                                                              func (Function) Attribute

                                                                                                                              func (fn Function) Attribute(attrib FunctionAttribute) (pi int, err error)

                                                                                                                              func (Function) Launch

                                                                                                                              func (fn Function) Launch(gridDimX, gridDimY, gridDimZ int, blockDimX, blockDimY, blockDimZ int, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer) error

                                                                                                                                Launch launches a CUDA function

                                                                                                                                func (Function) LaunchAndSync

                                                                                                                                func (fn Function) LaunchAndSync(gridDimX, gridDimY, gridDimZ, blockDimX, blockDimY, blockDimZ, sharedMemBytes int, stream Stream, kernelParams []unsafe.Pointer) error

                                                                                                                                  LaunchAndSync launches the kernel and synchronizes the context

                                                                                                                                  func (Function) MaxActiveBlocksPerMultiProcessor

                                                                                                                                  func (fn Function) MaxActiveBlocksPerMultiProcessor(blockSize int, dynamicSmemSize int64) (int, error)

                                                                                                                                    MaxActiveBlocksPerMultiProcessor returns the number of the maximum active blocks per streaming multiprocessor.

                                                                                                                                    func (Function) MaxActiveBlocksPerMultiProcessorWithFlags

                                                                                                                                    func (fn Function) MaxActiveBlocksPerMultiProcessorWithFlags(blockSize int, dynamicSmemSize int64, flags OccupancyFlags) (int, error)

                                                                                                                                      MaxActiveBlocksPerMultiProcessorWithFlags returns the number of the maximum active blocks per streaming multiprocessor. The flags control how special cases are handled.

                                                                                                                                      func (Function) SetCacheConfig

                                                                                                                                      func (fn Function) SetCacheConfig(config FuncCacheConfig) (err error)

                                                                                                                                      func (Function) SetSharedMemConfig

                                                                                                                                      func (fn Function) SetSharedMemConfig(config SharedConfig) (err error)

                                                                                                                                      type FunctionAttribute

                                                                                                                                      type FunctionAttribute int

                                                                                                                                        FunctionAttribute is a representation of the properties of a function

                                                                                                                                        const (
                                                                                                                                        	FnMaxThreadsPerBlock FunctionAttribute = C.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK // The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.
                                                                                                                                        	SharedSizeBytes      FunctionAttribute = C.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES     // The size in bytes of statically-allocated shared memory required by this function. This does not include dynamically-allocated shared memory requested by the user at runtime.
                                                                                                                                        	ConstSizeBytes       FunctionAttribute = C.CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES      // The size in bytes of user-allocated constant memory required by this function.
                                                                                                                                        	LocalSizeBytes       FunctionAttribute = C.CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES      // The size in bytes of local memory used by each thread of this function.
                                                                                                                                        	NumRegs              FunctionAttribute = C.CU_FUNC_ATTRIBUTE_NUM_REGS              // The number of registers used by each thread of this function.
                                                                                                                                        	PtxVersion           FunctionAttribute = C.CU_FUNC_ATTRIBUTE_PTX_VERSION           // The PTX virtual architecture version for which the function was compiled. This value is the major PTX version * 10 + the minor PTX version, so a PTX version 1.3 function would return the value 13. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.0.
                                                                                                                                        	BinaryVersion        FunctionAttribute = C.CU_FUNC_ATTRIBUTE_BINARY_VERSION        // The binary architecture version for which the function was compiled. This value is the major binary version * 10 + the minor binary version, so a binary version 1.3 function would return the value 13. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version.
                                                                                                                                        	CacheModeCa          FunctionAttribute = C.CU_FUNC_ATTRIBUTE_CACHE_MODE_CA         // The attribute to indicate whether the function has been compiled with user specified option "-Xptxas --dlcm=ca" set .
                                                                                                                                        )

                                                                                                                                        type JITCacheMode

                                                                                                                                        type JITCacheMode struct{ Value JITCacheModeOption }

                                                                                                                                          Specifies whether to enable caching explicitly (-dlcm)

                                                                                                                                          type JITCacheModeOption

                                                                                                                                          type JITCacheModeOption uint64

                                                                                                                                            Caching modes for dlcm

                                                                                                                                            const (
                                                                                                                                            	// Compile with no -dlcm flag specified
                                                                                                                                            	JITCacheNone JITCacheModeOption = C.CU_JIT_CACHE_OPTION_NONE
                                                                                                                                            	// Compile with L1 cache disabled
                                                                                                                                            	JITCacheCG JITCacheModeOption = C.CU_JIT_CACHE_OPTION_CG
                                                                                                                                            	// Compile with L1 cache enabled
                                                                                                                                            	JITCacheCA JITCacheModeOption = C.CU_JIT_CACHE_OPTION_CA
                                                                                                                                            )

                                                                                                                                            type JITErrorLogBuffer

                                                                                                                                            type JITErrorLogBuffer struct{ Buffer []byte }

                                                                                                                                              Buffer in which to print any log messages that reflect errors

                                                                                                                                              type JITFallbackOption

                                                                                                                                              type JITFallbackOption uint64

                                                                                                                                                Cubin matching fallback strategies

                                                                                                                                                const (
                                                                                                                                                	// Prefer to compile ptx if exact binary match not found
                                                                                                                                                	JITPreferPTX JITFallbackOption = C.CU_PREFER_PTX
                                                                                                                                                	// Prefer to fall back to compatible binary code if exact match not found
                                                                                                                                                	JITPreferBinary JITFallbackOption = C.CU_PREFER_BINARY
                                                                                                                                                )

                                                                                                                                                type JITFallbackStrategy

                                                                                                                                                type JITFallbackStrategy struct{ Value JITFallbackOption }

                                                                                                                                                  Specifies choice of fallback strategy if matching cubin is not found.

                                                                                                                                                  type JITGenerateDebugInfo

                                                                                                                                                  type JITGenerateDebugInfo struct{ Enabled bool }

                                                                                                                                                    Specifies whether to create debug information in output (-g)

                                                                                                                                                    type JITGenerateLineInfo

                                                                                                                                                    type JITGenerateLineInfo struct{ Enabled bool }

                                                                                                                                                      Generate line number information (-lineinfo)

                                                                                                                                                      type JITInfoLogBuffer

                                                                                                                                                      type JITInfoLogBuffer struct{ Buffer []byte }

                                                                                                                                                        Buffer in which to print any log messages that are informational in nature.

                                                                                                                                                        type JITInputType

                                                                                                                                                        type JITInputType uint64
                                                                                                                                                        const (
                                                                                                                                                        	// Compiled device-class-specific device code
                                                                                                                                                        	JITInputCUBIN JITInputType = C.CU_JIT_INPUT_CUBIN
                                                                                                                                                        	// PTX source code
                                                                                                                                                        	JITInputPTX JITInputType = C.CU_JIT_INPUT_PTX
                                                                                                                                                        	// Bundle of multiple cubins and/or PTX of some device code
                                                                                                                                                        	JITInputFatBinary JITInputType = C.CU_JIT_INPUT_FATBINARY
                                                                                                                                                        	// Host object with embedded device code
                                                                                                                                                        	JITInputObject JITInputType = C.CU_JIT_INPUT_OBJECT
                                                                                                                                                        	// Archive of host objects with embedded device code
                                                                                                                                                        	JITInputLibrary JITInputType = C.CU_JIT_INPUT_LIBRARY
                                                                                                                                                        )

                                                                                                                                                        type JITLogVerbose

                                                                                                                                                        type JITLogVerbose struct{ Enabled bool }

                                                                                                                                                          Generate verbose log messages (-v)

                                                                                                                                                          type JITMaxRegisters

                                                                                                                                                          type JITMaxRegisters struct{ Value uint }

                                                                                                                                                            Max number of registers that a thread may use.

                                                                                                                                                            type JITOptimizationLevel

                                                                                                                                                            type JITOptimizationLevel struct{ Value uint }

                                                                                                                                                              Level of optimizations to apply to generated code (0 - 4)

                                                                                                                                                              type JITOption

                                                                                                                                                              type JITOption interface {
                                                                                                                                                              	// contains filtered or unexported methods
                                                                                                                                                              }

                                                                                                                                                              type JITTarget

                                                                                                                                                              type JITTarget struct{ Value JITTargetOption }

                                                                                                                                                                Target is chosen based on supplied Value

                                                                                                                                                                type JITTargetFromContext

                                                                                                                                                                type JITTargetFromContext struct{}

                                                                                                                                                                  Determines the target based on the current attached context (default)

                                                                                                                                                                  type JITTargetOption

                                                                                                                                                                  type JITTargetOption uint64
                                                                                                                                                                  const (
                                                                                                                                                                  	// JITTarget10 JITTargetOption = C.CU_TARGET_COMPUTE_10
                                                                                                                                                                  	// JITTarget11 JITTargetOption = C.CU_TARGET_COMPUTE_11
                                                                                                                                                                  	// JITTarget12 JITTargetOption = C.CU_TARGET_COMPUTE_12
                                                                                                                                                                  	// JITTarget13 JITTargetOption = C.CU_TARGET_COMPUTE_13
                                                                                                                                                                  	JITTarget20 JITTargetOption = C.CU_TARGET_COMPUTE_20
                                                                                                                                                                  	JITTarget21 JITTargetOption = C.CU_TARGET_COMPUTE_21
                                                                                                                                                                  	JITTarget30 JITTargetOption = C.CU_TARGET_COMPUTE_30
                                                                                                                                                                  	JITTarget32 JITTargetOption = C.CU_TARGET_COMPUTE_32
                                                                                                                                                                  	JITTarget35 JITTargetOption = C.CU_TARGET_COMPUTE_35
                                                                                                                                                                  	JITTarget37 JITTargetOption = C.CU_TARGET_COMPUTE_37
                                                                                                                                                                  	JITTarget50 JITTargetOption = C.CU_TARGET_COMPUTE_50
                                                                                                                                                                  	JITTarget52 JITTargetOption = C.CU_TARGET_COMPUTE_52
                                                                                                                                                                  	JITTarget53 JITTargetOption = C.CU_TARGET_COMPUTE_53
                                                                                                                                                                  	JITTarget60 JITTargetOption = C.CU_TARGET_COMPUTE_60
                                                                                                                                                                  	JITTarget61 JITTargetOption = C.CU_TARGET_COMPUTE_61
                                                                                                                                                                  	JITTarget62 JITTargetOption = C.CU_TARGET_COMPUTE_62
                                                                                                                                                                  )

                                                                                                                                                                  type JITThreadsPerBlock

                                                                                                                                                                  type JITThreadsPerBlock struct{ Value uint }

                                                                                                                                                                    Specifies minimum number of threads per block to target compilation

                                                                                                                                                                    type JITWallTime

                                                                                                                                                                    type JITWallTime struct{ Result float32 }

                                                                                                                                                                      Overwrites the option value with the total wall clock time, in milliseconds, spent in the compiler and linker.

                                                                                                                                                                      type Limit

                                                                                                                                                                      type Limit byte

                                                                                                                                                                        Limit is a flag that can be used to query and set on a context

                                                                                                                                                                        const (
                                                                                                                                                                        	StackSize                    Limit = C.CU_LIMIT_STACK_SIZE                       // GPU thread stack size
                                                                                                                                                                        	PrintfFIFOSize               Limit = C.CU_LIMIT_PRINTF_FIFO_SIZE                 // GPU printf FIFO size
                                                                                                                                                                        	MallocHeapSize               Limit = C.CU_LIMIT_MALLOC_HEAP_SIZE                 // GPU malloc heap size
                                                                                                                                                                        	DevRuntimeSyncDepth          Limit = C.CU_LIMIT_DEV_RUNTIME_SYNC_DEPTH           // GPU device runtime launch synchronize depth
                                                                                                                                                                        	DevRuntimePendingLaunchCount Limit = C.CU_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT // GPU device runtime pending launch count
                                                                                                                                                                        )

                                                                                                                                                                        type LinkState

                                                                                                                                                                        type LinkState struct {
                                                                                                                                                                        	// contains filtered or unexported fields
                                                                                                                                                                        }
                                                                                                                                                                        func NewLink(options ...JITOption) (*LinkState, error)

                                                                                                                                                                          Creates a pending JIT linker invocation.

                                                                                                                                                                          func (*LinkState) AddData

                                                                                                                                                                          func (link *LinkState) AddData(input JITInputType, data string, name string, options ...JITOption) error

                                                                                                                                                                            Add an input to a pending linker invocation

                                                                                                                                                                            func (*LinkState) AddFile

                                                                                                                                                                            func (link *LinkState) AddFile(input JITInputType, path string, options ...JITOption) error

                                                                                                                                                                              Add a file input to a pending linker invocation

                                                                                                                                                                              func (*LinkState) Complete

                                                                                                                                                                              func (link *LinkState) Complete() (string, error)

                                                                                                                                                                                Complete a pending linker invocation

                                                                                                                                                                                func (*LinkState) Destroy

                                                                                                                                                                                func (link *LinkState) Destroy() error

                                                                                                                                                                                  Destroys state for a JIT linker invocation.

                                                                                                                                                                                  type MemAdvice

                                                                                                                                                                                  type MemAdvice byte

                                                                                                                                                                                    MemAdvice is a flag that advises the device on memory usage

                                                                                                                                                                                    const (
                                                                                                                                                                                    	SetReadMostly          MemAdvice = C.CU_MEM_ADVISE_SET_READ_MOSTLY          // Data will mostly be read and only occassionally be written to
                                                                                                                                                                                    	UnsetReadMostly        MemAdvice = C.CU_MEM_ADVISE_UNSET_READ_MOSTLY        // Undo the effect of CU_MEM_ADVISE_SET_READ_MOSTLY
                                                                                                                                                                                    	SetPreferredLocation   MemAdvice = C.CU_MEM_ADVISE_SET_PREFERRED_LOCATION   // Set the preferred location for the data as the specified device
                                                                                                                                                                                    	UnsetPreferredLocation MemAdvice = C.CU_MEM_ADVISE_UNSET_PREFERRED_LOCATION // Clear the preferred location for the data
                                                                                                                                                                                    	SetAccessedBy          MemAdvice = C.CU_MEM_ADVISE_SET_ACCESSED_BY          // Data will be accessed by the specified device, so prevent page faults as much as possible
                                                                                                                                                                                    	UnsetAccessedBy        MemAdvice = C.CU_MEM_ADVISE_UNSET_ACCESSED_BY        //Let the Unified Memory subsystem decide on the page faulting policy for the specified device
                                                                                                                                                                                    )

                                                                                                                                                                                    type MemAttachFlags

                                                                                                                                                                                    type MemAttachFlags byte

                                                                                                                                                                                      MemAttachFlags are flags for memory attachment (used in allocating memory)

                                                                                                                                                                                      const (
                                                                                                                                                                                      	AttachGlobal MemAttachFlags = C.CU_MEM_ATTACH_GLOBAL // Memory can be accessed by any stream on any device
                                                                                                                                                                                      	AttachHost   MemAttachFlags = C.CU_MEM_ATTACH_HOST   // Memory cannot be accessed by any stream on any device
                                                                                                                                                                                      	AttachSingle MemAttachFlags = C.CU_MEM_ATTACH_SINGLE // Memory can only be accessed by a single stream on the associated device
                                                                                                                                                                                      )

                                                                                                                                                                                      type Memcpy2dParam

                                                                                                                                                                                      type Memcpy2dParam struct {
                                                                                                                                                                                      	Height        int64
                                                                                                                                                                                      	WidthInBytes  int64
                                                                                                                                                                                      	DstArray      Array
                                                                                                                                                                                      	DstDevice     DevicePtr
                                                                                                                                                                                      	DstHost       unsafe.Pointer
                                                                                                                                                                                      	DstMemoryType MemoryType
                                                                                                                                                                                      	DstPitch      int64
                                                                                                                                                                                      	DstXInBytes   int64
                                                                                                                                                                                      	DstY          int64
                                                                                                                                                                                      	SrcArray      Array
                                                                                                                                                                                      	SrcDevice     DevicePtr
                                                                                                                                                                                      	SrcHost       unsafe.Pointer
                                                                                                                                                                                      	SrcMemoryType MemoryType
                                                                                                                                                                                      	SrcPitch      int64
                                                                                                                                                                                      	SrcXInBytes   int64
                                                                                                                                                                                      	SrcY          int64
                                                                                                                                                                                      }

                                                                                                                                                                                        Memcpy2dParam is a struct representing the params of a 2D memory copy instruction. To aid usability, the fields are ordered as per the documentation (the actual struct is laid out differently).

                                                                                                                                                                                        type Memcpy3dParam

                                                                                                                                                                                        type Memcpy3dParam struct {
                                                                                                                                                                                        	Depth         int64
                                                                                                                                                                                        	Height        int64
                                                                                                                                                                                        	WidthInBytes  int64
                                                                                                                                                                                        	DstArray      Array
                                                                                                                                                                                        	DstDevice     DevicePtr
                                                                                                                                                                                        	DstHeight     int64
                                                                                                                                                                                        	DstHost       unsafe.Pointer
                                                                                                                                                                                        	DstLOD        int64
                                                                                                                                                                                        	DstMemoryType MemoryType
                                                                                                                                                                                        	DstPitch      int64
                                                                                                                                                                                        	DstXInBytes   int64
                                                                                                                                                                                        	DstY          int64
                                                                                                                                                                                        	DstZ          int64
                                                                                                                                                                                        	SrcArray      Array
                                                                                                                                                                                        	SrcDevice     DevicePtr
                                                                                                                                                                                        	SrcHeight     int64
                                                                                                                                                                                        	SrcHost       unsafe.Pointer
                                                                                                                                                                                        	SrcLOD        int64
                                                                                                                                                                                        	SrcMemoryType MemoryType
                                                                                                                                                                                        	SrcPitch      int64
                                                                                                                                                                                        	SrcXInBytes   int64
                                                                                                                                                                                        	SrcY          int64
                                                                                                                                                                                        	SrcZ          int64
                                                                                                                                                                                        }

                                                                                                                                                                                          Memcpy3dParam is a struct representing the params of a 3D memory copy instruction. To aid usability, the fields are ordered as per the documentation (the actual struct is laid out differently).

                                                                                                                                                                                          type Memcpy3dPeerParam

                                                                                                                                                                                          type Memcpy3dPeerParam struct {
                                                                                                                                                                                          	Depth         int64
                                                                                                                                                                                          	Height        int64
                                                                                                                                                                                          	WidthInBytes  int64
                                                                                                                                                                                          	DstArray      Array
                                                                                                                                                                                          	DstContext    CUContext
                                                                                                                                                                                          	DstDevice     DevicePtr
                                                                                                                                                                                          	DstHeight     int64
                                                                                                                                                                                          	DstHost       unsafe.Pointer
                                                                                                                                                                                          	DstLOD        int64
                                                                                                                                                                                          	DstMemoryType MemoryType
                                                                                                                                                                                          	DstPitch      int64
                                                                                                                                                                                          	DstXInBytes   int64
                                                                                                                                                                                          	DstY          int64
                                                                                                                                                                                          	DstZ          int64
                                                                                                                                                                                          	SrcArray      Array
                                                                                                                                                                                          	SrcContext    CUContext
                                                                                                                                                                                          	SrcDevice     DevicePtr
                                                                                                                                                                                          	SrcHeight     int64
                                                                                                                                                                                          	SrcHost       unsafe.Pointer
                                                                                                                                                                                          	SrcLOD        int64
                                                                                                                                                                                          	SrcMemoryType MemoryType
                                                                                                                                                                                          	SrcPitch      int64
                                                                                                                                                                                          	SrcXInBytes   int64
                                                                                                                                                                                          	SrcY          int64
                                                                                                                                                                                          	SrcZ          int64
                                                                                                                                                                                          }

                                                                                                                                                                                            Memcpy3dParam is a struct representing the params of a 3D memory copy instruction across contexts. To aid usability, the fields are ordered as per the documentation (the actual struct is laid out differently).

                                                                                                                                                                                            type MemoryType

                                                                                                                                                                                            type MemoryType byte

                                                                                                                                                                                              MemoryType is a representation of the memory types of the device pointer

                                                                                                                                                                                              const (
                                                                                                                                                                                              	HostMemory    MemoryType = C.CU_MEMORYTYPE_HOST    // Host memory
                                                                                                                                                                                              	DeviceMemory  MemoryType = C.CU_MEMORYTYPE_DEVICE  // Device memory
                                                                                                                                                                                              	ArrayMemory   MemoryType = C.CU_MEMORYTYPE_ARRAY   // Array memory
                                                                                                                                                                                              	UnifiedMemory MemoryType = C.CU_MEMORYTYPE_UNIFIED // Unified device or host memory
                                                                                                                                                                                              )

                                                                                                                                                                                              type Module

                                                                                                                                                                                              type Module struct {
                                                                                                                                                                                              	// contains filtered or unexported fields
                                                                                                                                                                                              }

                                                                                                                                                                                                Module represents a CUDA Module

                                                                                                                                                                                                func Load

                                                                                                                                                                                                func Load(name string) (Module, error)

                                                                                                                                                                                                  Load loads a module into the current context. The CUDA driver API does not attempt to lazily allocate the resources needed by a module; if the memory for functions and data (constant and global) needed by the module cannot be allocated, `Load()` fails.

                                                                                                                                                                                                  The file should be a cubin file as output by nvcc, or a PTX file either as output by nvcc or handwritten, or a fatbin file as output by nvcc from toolchain 4.0 or late

                                                                                                                                                                                                  func LoadData

                                                                                                                                                                                                  func LoadData(image string) (Module, error)

                                                                                                                                                                                                    LoadData loads a module from a input string.

                                                                                                                                                                                                    func LoadDataEx

                                                                                                                                                                                                    func LoadDataEx(image string, options ...JITOption) (Module, error)

                                                                                                                                                                                                      LoadDataEx loads a module from a input string.

                                                                                                                                                                                                      func LoadFatBinary

                                                                                                                                                                                                      func LoadFatBinary(image string) (Module, error)

                                                                                                                                                                                                        LoadFatBinary loads a module from a input string.

                                                                                                                                                                                                        func (Module) Function

                                                                                                                                                                                                        func (m Module) Function(name string) (Function, error)

                                                                                                                                                                                                          Function returns a pointer to the function in the module by the name. If it's not found, the error NotFound is returned

                                                                                                                                                                                                          func (Module) Global

                                                                                                                                                                                                          func (m Module) Global(name string) (DevicePtr, int64, error)

                                                                                                                                                                                                            Global returns a global pointer as defined in a module. It returns a pointer to the memory in the device.

                                                                                                                                                                                                            func (Module) SurfRef

                                                                                                                                                                                                            func (mod Module) SurfRef(name string) (SurfRef, error)

                                                                                                                                                                                                            func (Module) TexRef

                                                                                                                                                                                                            func (mod Module) TexRef(name string) (TexRef, error)

                                                                                                                                                                                                            func (Module) Unload

                                                                                                                                                                                                            func (hmod Module) Unload() (err error)

                                                                                                                                                                                                            type OccupancyFlags

                                                                                                                                                                                                            type OccupancyFlags byte

                                                                                                                                                                                                              OccupanyFlags represents the flags to the occupancy calculator

                                                                                                                                                                                                              const (
                                                                                                                                                                                                              	DefaultOccupancy       OccupancyFlags = C.CU_OCCUPANCY_DEFAULT                  // Default behavior
                                                                                                                                                                                                              	DisableCachingOverride OccupancyFlags = C.CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE // Assume global caching is enabled and cannot be automatically turned off
                                                                                                                                                                                                              )

                                                                                                                                                                                                              type P2PAttribute

                                                                                                                                                                                                              type P2PAttribute byte

                                                                                                                                                                                                                P2PAttribute is a representation of P2P attributes

                                                                                                                                                                                                                const (
                                                                                                                                                                                                                	PerformanceRank         P2PAttribute = C.CU_DEVICE_P2P_ATTRIBUTE_PERFORMANCE_RANK        // A relative value indicating the performance of the link between two devices
                                                                                                                                                                                                                	P2PAccessSupported      P2PAttribute = C.CU_DEVICE_P2P_ATTRIBUTE_ACCESS_SUPPORTED        // P2P Access is enabled
                                                                                                                                                                                                                	P2PNativeAomicSupported P2PAttribute = C.CU_DEVICE_P2P_ATTRIBUTE_NATIVE_ATOMIC_SUPPORTED // Atomic operation over the link is supported
                                                                                                                                                                                                                )

                                                                                                                                                                                                                type PointerAttribute

                                                                                                                                                                                                                type PointerAttribute int

                                                                                                                                                                                                                  PointerAttribute is a representation of the metadata of pointers

                                                                                                                                                                                                                  const (
                                                                                                                                                                                                                  	ContextAttr       PointerAttribute = C.CU_POINTER_ATTRIBUTE_CONTEXT        // The CUcontext on which a pointer was allocated or registered
                                                                                                                                                                                                                  	MemoryTypeAttr    PointerAttribute = C.CU_POINTER_ATTRIBUTE_MEMORY_TYPE    // The CUmemorytype describing the physical location of a pointer
                                                                                                                                                                                                                  	DevicePointerAttr PointerAttribute = C.CU_POINTER_ATTRIBUTE_DEVICE_POINTER // The address at which a pointer's memory may be accessed on the device
                                                                                                                                                                                                                  	HostPointerAttr   PointerAttribute = C.CU_POINTER_ATTRIBUTE_HOST_POINTER   // The address at which a pointer's memory may be accessed on the host
                                                                                                                                                                                                                  	P2PTokenAttr      PointerAttribute = C.CU_POINTER_ATTRIBUTE_P2P_TOKENS     // A pair of tokens for use with the nv-p2p.h Linux kernel interface
                                                                                                                                                                                                                  	SymcMemopsAttr    PointerAttribute = C.CU_POINTER_ATTRIBUTE_SYNC_MEMOPS    // Synchronize every synchronous memory operation initiated on this region
                                                                                                                                                                                                                  	BufferIDAttr      PointerAttribute = C.CU_POINTER_ATTRIBUTE_BUFFER_ID      // A process-wide unique ID for an allocated memory region
                                                                                                                                                                                                                  	IsManagedAttr     PointerAttribute = C.CU_POINTER_ATTRIBUTE_IS_MANAGED     // Indicates if the pointer points to managed memory
                                                                                                                                                                                                                  )

                                                                                                                                                                                                                  type SharedConfig

                                                                                                                                                                                                                  type SharedConfig byte

                                                                                                                                                                                                                    ShareConfigs are flags for shared memory configurations

                                                                                                                                                                                                                    const (
                                                                                                                                                                                                                    	DefaultBankSize   SharedConfig = C.CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE    // set default shared memory bank size
                                                                                                                                                                                                                    	FourByteBankSize  SharedConfig = C.CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE  // set shared memory bank width to four bytes
                                                                                                                                                                                                                    	EightByteBankSize SharedConfig = C.CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE // set shared memory bank width to eight bytes
                                                                                                                                                                                                                    )

                                                                                                                                                                                                                    func SharedMemConfig

                                                                                                                                                                                                                    func SharedMemConfig() (pConfig SharedConfig, err error)

                                                                                                                                                                                                                    type Stream

                                                                                                                                                                                                                    type Stream struct {
                                                                                                                                                                                                                    	// contains filtered or unexported fields
                                                                                                                                                                                                                    }

                                                                                                                                                                                                                      Stream represents a CUDA stream.

                                                                                                                                                                                                                      func MakeStream

                                                                                                                                                                                                                      func MakeStream(flags StreamFlags) (Stream, error)

                                                                                                                                                                                                                        MakeStream creates a stream. The flags determines the behaviors of the stream.

                                                                                                                                                                                                                        func MakeStreamWithPriority

                                                                                                                                                                                                                        func MakeStreamWithPriority(priority int, flags StreamFlags) (Stream, error)

                                                                                                                                                                                                                          MakeStreamWithPriority creates a stream with the given priority. The flags determines the behaviors of the stream. This API alters the scheduler priority of work in the stream. Work in a higher priority stream may preempt work already executing in a low priority stream.

                                                                                                                                                                                                                          `priority` follows a convention where lower numbers represent higher priorities. '0' represents default priority.

                                                                                                                                                                                                                          The range of meaningful numerical priorities can be queried using `StreamPriorityRange`. If the specified priority is outside the numerical range returned by `StreamPriorityRange`, it will automatically be clamped to the lowest or the highest number in the range.

                                                                                                                                                                                                                          func (Stream) AttachMemAsync

                                                                                                                                                                                                                          func (hStream Stream) AttachMemAsync(dptr DevicePtr, length int64, flags uint) (err error)

                                                                                                                                                                                                                          func (Stream) C

                                                                                                                                                                                                                          func (s Stream) C() C.CUstream

                                                                                                                                                                                                                            C is the exported version of the c method

                                                                                                                                                                                                                            func (*Stream) Destroy

                                                                                                                                                                                                                            func (hStream *Stream) Destroy() error

                                                                                                                                                                                                                              DestroyStream destroys the stream specified by hStream.

                                                                                                                                                                                                                              In case the device is still doing work in the stream hStream when DestroyStrea() is called, the function will return immediately and the resources associated with hStream will be released automatically once the device has completed all work in hStream.

                                                                                                                                                                                                                              func (Stream) Flags

                                                                                                                                                                                                                              func (hStream Stream) Flags() (flags StreamFlags, err error)

                                                                                                                                                                                                                              func (Stream) Priority

                                                                                                                                                                                                                              func (hStream Stream) Priority() (priority int, err error)

                                                                                                                                                                                                                              func (Stream) Query

                                                                                                                                                                                                                              func (hStream Stream) Query() (err error)

                                                                                                                                                                                                                              func (Stream) Synchronize

                                                                                                                                                                                                                              func (hStream Stream) Synchronize() (err error)

                                                                                                                                                                                                                              func (Stream) Wait

                                                                                                                                                                                                                              func (hStream Stream) Wait(hEvent Event, Flags uint) (err error)

                                                                                                                                                                                                                              func (Stream) WaitOnValue32

                                                                                                                                                                                                                              func (stream Stream) WaitOnValue32(addr DevicePtr, value uint32, flags uint) (err error)

                                                                                                                                                                                                                              func (Stream) WriteValue32

                                                                                                                                                                                                                              func (stream Stream) WriteValue32(addr DevicePtr, value uint32, flags uint) (err error)

                                                                                                                                                                                                                              type StreamFlags

                                                                                                                                                                                                                              type StreamFlags byte

                                                                                                                                                                                                                                StreamFlags are flags for stream behaviours

                                                                                                                                                                                                                                const (
                                                                                                                                                                                                                                	DefaultStream StreamFlags = C.CU_STREAM_DEFAULT      // Default stream flag
                                                                                                                                                                                                                                	NonBlocking   StreamFlags = C.CU_STREAM_NON_BLOCKING // Stream does not synchronize with stream 0 (the NULL stream)
                                                                                                                                                                                                                                )

                                                                                                                                                                                                                                type SurfRef

                                                                                                                                                                                                                                type SurfRef struct {
                                                                                                                                                                                                                                	// contains filtered or unexported fields
                                                                                                                                                                                                                                }

                                                                                                                                                                                                                                func (SurfRef) GetArray

                                                                                                                                                                                                                                func (hSurfRef SurfRef) GetArray() (phArray Array, err error)

                                                                                                                                                                                                                                func (SurfRef) SetArray

                                                                                                                                                                                                                                func (hSurfRef SurfRef) SetArray(hArray Array, Flags uint) (err error)

                                                                                                                                                                                                                                type TexRef

                                                                                                                                                                                                                                type TexRef struct {
                                                                                                                                                                                                                                	// contains filtered or unexported fields
                                                                                                                                                                                                                                }

                                                                                                                                                                                                                                func (TexRef) Address

                                                                                                                                                                                                                                func (hTexRef TexRef) Address() (pdptr DevicePtr, err error)

                                                                                                                                                                                                                                func (TexRef) AddressMode

                                                                                                                                                                                                                                func (hTexRef TexRef) AddressMode(dim int) (pam AddressMode, err error)

                                                                                                                                                                                                                                func (TexRef) Array

                                                                                                                                                                                                                                func (hTexRef TexRef) Array() (phArray Array, err error)

                                                                                                                                                                                                                                func (TexRef) BorderColor

                                                                                                                                                                                                                                func (hTexRef TexRef) BorderColor() (pBorderColor [3]float32, err error)

                                                                                                                                                                                                                                func (TexRef) FilterMode

                                                                                                                                                                                                                                func (hTexRef TexRef) FilterMode() (pfm FilterMode, err error)

                                                                                                                                                                                                                                func (TexRef) Flags

                                                                                                                                                                                                                                func (hTexRef TexRef) Flags() (pFlags TexRefFlags, err error)

                                                                                                                                                                                                                                func (TexRef) Format

                                                                                                                                                                                                                                func (hTexRef TexRef) Format() (pFormat Format, pNumChannels int, err error)

                                                                                                                                                                                                                                func (TexRef) MaxAnisotropy

                                                                                                                                                                                                                                func (hTexRef TexRef) MaxAnisotropy() (pmaxAniso int, err error)

                                                                                                                                                                                                                                func (TexRef) SetAddress

                                                                                                                                                                                                                                func (hTexRef TexRef) SetAddress(dptr DevicePtr, bytes int64) (ByteOffset int64, err error)

                                                                                                                                                                                                                                func (TexRef) SetAddress2D

                                                                                                                                                                                                                                func (hTexRef TexRef) SetAddress2D(desc ArrayDesc, dptr DevicePtr, Pitch int64) (err error)

                                                                                                                                                                                                                                func (TexRef) SetAddressMode

                                                                                                                                                                                                                                func (hTexRef TexRef) SetAddressMode(dim int, am AddressMode) (err error)

                                                                                                                                                                                                                                func (TexRef) SetArray

                                                                                                                                                                                                                                func (hTexRef TexRef) SetArray(hArray Array, Flags uint) (err error)

                                                                                                                                                                                                                                func (TexRef) SetBorderColor

                                                                                                                                                                                                                                func (hTexRef TexRef) SetBorderColor(pBorderColor [3]float32) (err error)

                                                                                                                                                                                                                                func (TexRef) SetFilterMode

                                                                                                                                                                                                                                func (hTexRef TexRef) SetFilterMode(fm FilterMode) (err error)

                                                                                                                                                                                                                                func (TexRef) SetFlags

                                                                                                                                                                                                                                func (hTexRef TexRef) SetFlags(Flags TexRefFlags) (err error)

                                                                                                                                                                                                                                func (TexRef) SetFormat

                                                                                                                                                                                                                                func (hTexRef TexRef) SetFormat(fmt Format, NumPackedComponents int) (err error)

                                                                                                                                                                                                                                func (TexRef) SetMaxAnisotropy

                                                                                                                                                                                                                                func (hTexRef TexRef) SetMaxAnisotropy(maxAniso uint) (err error)

                                                                                                                                                                                                                                func (TexRef) SetMipmapFilterMode

                                                                                                                                                                                                                                func (hTexRef TexRef) SetMipmapFilterMode(fm FilterMode) (err error)

                                                                                                                                                                                                                                func (TexRef) SetMipmapLevelBias

                                                                                                                                                                                                                                func (hTexRef TexRef) SetMipmapLevelBias(bias float64) (err error)

                                                                                                                                                                                                                                func (TexRef) SetMipmapLevelClamp

                                                                                                                                                                                                                                func (hTexRef TexRef) SetMipmapLevelClamp(minMipmapLevelClamp float64, maxMipmapLevelClamp float64) (err error)

                                                                                                                                                                                                                                type TexRefFlags

                                                                                                                                                                                                                                type TexRefFlags byte
                                                                                                                                                                                                                                const (
                                                                                                                                                                                                                                	ReadAsInteger        TexRefFlags = C.CU_TRSF_READ_AS_INTEGER        // Override the texref format with a format inferred from the array.
                                                                                                                                                                                                                                	NormalizeCoordinates TexRefFlags = C.CU_TRSF_NORMALIZED_COORDINATES // Use normalized texture coordinates in the range [0,1) instead of [0,dim).
                                                                                                                                                                                                                                	SRGB                 TexRefFlags = C.CU_TRSF_READ_AS_INTEGER        // Perform sRGB->linear conversion during texture read.
                                                                                                                                                                                                                                )

                                                                                                                                                                                                                                Directories

                                                                                                                                                                                                                                Path Synopsis
                                                                                                                                                                                                                                cmd
                                                                                                                                                                                                                                cudatest
                                                                                                                                                                                                                                cudatest tests the existence of CUDA by running a simple Go program that uses CUDA.
                                                                                                                                                                                                                                cudatest tests the existence of CUDA by running a simple Go program that uses CUDA.
                                                                                                                                                                                                                                gencublas
                                                                                                                                                                                                                                generate_blas creates a blas.go file from the provided C header file with optionally added documentation from the documentation package.
                                                                                                                                                                                                                                generate_blas creates a blas.go file from the provided C header file with optionally added documentation from the documentation package.
                                                                                                                                                                                                                                dnn