|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjcuda.driver.JCudaDriver
public class JCudaDriver
Java bindings for the NVidia CUDA driver API.
Most comments are extracted from the CUDA online documentation
Field Summary | |
---|---|
static int |
CU_MEMHOSTALLOC_DEVICEMAP
If set, host memory is mapped into CUDA address space and JCudaDriver#cuMemHostGetDevicePointer may be called on the host pointer. |
static int |
CU_MEMHOSTALLOC_PORTABLE
If set, host memory is portable between CUDA contexts. |
static int |
CU_MEMHOSTALLOC_WRITECOMBINED
If set, host memory is allocated as write-combined - fast to write, faster to DMA, slow to read except via SSE4 streaming load instruction (MOVNTDQA). |
static int |
CU_PARAM_TR_DEFAULT
For texture references loaded into the module, use default texunit from texture reference |
static int |
CU_TRSA_OVERRIDE_FORMAT
Override the texref format with a format inferred from the array |
static int |
CU_TRSF_NORMALIZED_COORDINATES
Use normalized texture coordinates in the range [0,1) instead of [0,dim) |
static int |
CU_TRSF_READ_AS_INTEGER
Read the texture as integers rather than promoting the values to floats in the range [0,1] |
static int |
CUDA_ARRAY3D_2DARRAY
If set, the CUDA array contains an array of 2D slices and the Depth member of CUDA_ARRAY3D_DESCRIPTOR specifies the number of slices, not the depth of a 3D array. |
static int |
CUDA_ARRAY3D_SURFACE_LDST
This flag must be set in order to bind a surface reference to the CUDA array |
static int |
CUDA_VERSION
The CUDA version |
Method Summary | |
---|---|
static int |
align(int value,
int alignment)
Returns the given (address) value, adjusted to have the given alignment. |
static int |
cuArray3DCreate(CUarray pHandle,
CUDA_ARRAY3D_DESCRIPTOR pAllocateArray)
Creates a 3D CUDA array. |
static int |
cuArray3DGetDescriptor(CUDA_ARRAY3D_DESCRIPTOR pArrayDescriptor,
CUarray hArray)
Get a 3D CUDA array descriptor. |
static int |
cuArrayCreate(CUarray pHandle,
CUDA_ARRAY_DESCRIPTOR pAllocateArray)
Creates a 1D or 2D CUDA array. |
static int |
cuArrayDestroy(CUarray hArray)
Destroys a CUDA array. |
static int |
cuArrayGetDescriptor(CUDA_ARRAY_DESCRIPTOR pArrayDescriptor,
CUarray hArray)
Get a 1D or 2D CUDA array descriptor. |
static int |
cuCtxAttach(CUcontext pctx,
int flags)
Increment a context's usage-count. |
static int |
cuCtxCreate(CUcontext pctx,
int flags,
CUdevice dev)
Create a CUDA context. |
static int |
cuCtxDestroy(CUcontext ctx)
Destroy the current context or a floating CUDA context. |
static int |
cuCtxDetach(CUcontext ctx)
Decrement a context's usage-count. |
static int |
cuCtxGetDevice(CUdevice device)
Returns the device ID for the current context. |
static int |
cuCtxGetLimit(long[] pvalue,
int limit)
Returns resource limits. |
static int |
cuCtxPopCurrent(CUcontext pctx)
Pops the current CUDA context from the current CPU thread. |
static int |
cuCtxPushCurrent(CUcontext ctx)
Pushes a floating context on the current CPU thread. |
static int |
cuCtxSetLimit(int limit,
long value)
Set resource limits. |
static int |
cuCtxSynchronize()
Block for a context's tasks to complete. |
static int |
cuDeviceComputeCapability(int[] major,
int[] minor,
CUdevice dev)
Returns the compute capability of the device. |
static int |
cuDeviceGet(CUdevice device,
int ordinal)
Returns a handle to a compute device. |
static int |
cuDeviceGetAttribute(int[] pi,
int attrib,
CUdevice dev)
Returns information about the device. |
static int |
cuDeviceGetCount(int[] count)
Returns the number of compute-capable devices. |
static int |
cuDeviceGetName(byte[] name,
int len,
CUdevice dev)
Returns an identifer string for the device. |
static int |
cuDeviceGetProperties(CUdevprop prop,
CUdevice dev)
Returns properties for a selected device. |
static int |
cuDeviceTotalMem(int[] bytes,
CUdevice dev)
Returns the total amount of memory on the device. |
static int |
cuDriverGetVersion(int[] driverVersion)
Returns the CUDA driver version. |
static int |
cuEventCreate(CUevent phEvent,
int Flags)
Creates an event. |
static int |
cuEventDestroy(CUevent hEvent)
Destroys an event. |
static int |
cuEventElapsedTime(float[] pMilliseconds,
CUevent hStart,
CUevent hEnd)
Computes the elapsed time between two events. |
static int |
cuEventQuery(CUevent hEvent)
Queries an event's status. |
static int |
cuEventRecord(CUevent hEvent,
CUstream hStream)
Records an event. |
static int |
cuEventSynchronize(CUevent hEvent)
Waits for an event to complete. |
static int |
cuFuncGetAttribute(int[] pi,
int attrib,
CUfunction func)
Returns information about a function. |
static int |
cuFuncSetBlockShape(CUfunction hfunc,
int x,
int y,
int z)
Sets the block-dimensions for the function. |
static int |
cuFuncSetCacheConfig(CUfunction hfunc,
int config)
Sets the preferred cache configuration for a device function. |
static int |
cuFuncSetSharedSize(CUfunction hfunc,
int bytes)
Sets the dynamic shared-memory size for the function. |
static int |
cuGLCtxCreate(CUcontext pCtx,
int Flags,
CUdevice device)
Create a CUDA context for interoperability with OpenGL. |
static int |
cuGLInit()
Initializes OpenGL interoperability. |
static int |
cuGLMapBufferObject(CUdeviceptr dptr,
int[] size,
int bufferobj)
Maps an OpenGL buffer object. |
static int |
cuGLMapBufferObjectAsync(CUdeviceptr dptr,
int[] size,
int buffer,
CUstream hStream)
Maps an OpenGL buffer object. |
static int |
cuGLRegisterBufferObject(int bufferobj)
Registers an OpenGL buffer object. |
static int |
cuGLSetBufferObjectMapFlags(int buffer,
int Flags)
Set the map flags for an OpenGL buffer object. |
static int |
cuGLUnmapBufferObject(int bufferobj)
Unmaps an OpenGL buffer object. |
static int |
cuGLUnmapBufferObjectAsync(int buffer,
CUstream hStream)
Unmaps an OpenGL buffer object. |
static int |
cuGLUnregisterBufferObject(int bufferobj)
Unregister an OpenGL buffer object. |
static int |
cuGraphicsGLRegisterBuffer(CUgraphicsResource pCudaResource,
int buffer,
int Flags)
Registers an OpenGL buffer object. |
static int |
cuGraphicsGLRegisterImage(CUgraphicsResource pCudaResource,
int image,
int target,
int Flags)
Register an OpenGL texture or renderbuffer object. |
static int |
cuGraphicsMapResources(int count,
CUgraphicsResource[] resources,
CUstream hStream)
Map graphics resources for access by CUDA. |
static int |
cuGraphicsResourceGetMappedPointer(CUdeviceptr pDevPtr,
int[] pSize,
CUgraphicsResource resource)
Get an device pointer through which to access a mapped graphics resource. |
static int |
cuGraphicsResourceSetMapFlags(CUgraphicsResource resource,
int flags)
Set usage flags for mapping a graphics resource. |
static int |
cuGraphicsSubResourceGetMappedArray(CUarray pArray,
CUgraphicsResource resource,
int arrayIndex,
int mipLevel)
Get an array through which to access a subresource of a mapped graphics resource. |
static int |
cuGraphicsUnmapResources(int count,
CUgraphicsResource[] resources,
CUstream hStream)
Unmap graphics resources. |
static int |
cuGraphicsUnregisterResource(CUgraphicsResource resource)
Unregisters a graphics resource for access by CUDA. |
static int |
cuInit(int Flags)
Initialize the CUDA driver API. |
static int |
cuLaunch(CUfunction f)
Launches a CUDA function. |
static int |
cuLaunchGrid(CUfunction f,
int grid_width,
int grid_height)
Launches a CUDA function. |
static int |
cuLaunchGridAsync(CUfunction f,
int grid_width,
int grid_height,
CUstream hStream)
Launches a CUDA function. |
static int |
cuMemAlloc(CUdeviceptr dptr,
int bytesize)
Allocates device memory. |
static int |
cuMemAllocHost(Pointer pointer,
int bytesize)
Allocates page-locked host memory. |
static int |
cuMemAllocPitch(CUdeviceptr dptr,
int[] pPitch,
int WidthInBytes,
int Height,
int ElementSizeBytes)
Allocates pitched device memory. |
static int |
cuMemcpy2D(CUDA_MEMCPY2D pCopy)
Copies memory for 2D arrays. |
static int |
cuMemcpy2DAsync(CUDA_MEMCPY2D pCopy,
CUstream hStream)
Copies memory for 2D arrays. |
static int |
cuMemcpy2DUnaligned(CUDA_MEMCPY2D pCopy)
Copies memory for 2D arrays. |
static int |
cuMemcpy3D(CUDA_MEMCPY3D pCopy)
Copies memory for 3D arrays. |
static int |
cuMemcpy3DAsync(CUDA_MEMCPY3D pCopy,
CUstream hStream)
Copies memory for 3D arrays. |
static int |
cuMemcpyAtoA(CUarray dstArray,
int dstIndex,
CUarray srcArray,
int srcIndex,
int ByteCount)
Copies memory from Array to Array. |
static int |
cuMemcpyAtoD(CUdeviceptr dstDevice,
CUarray hSrc,
int SrcIndex,
int ByteCount)
Copies memory from Array to Device. |
static int |
cuMemcpyAtoH(Pointer dstHost,
CUarray srcArray,
int srcIndex,
int ByteCount)
Copies memory from Array to Host. |
static int |
cuMemcpyAtoHAsync(Pointer dstHost,
CUarray srcArray,
int srcIndex,
int ByteCount,
CUstream hStream)
Copies memory from Array to Host. |
static int |
cuMemcpyDtoA(CUarray dstArray,
int dstIndex,
CUdeviceptr srcDevice,
int ByteCount)
Copies memory from Device to Array. |
static int |
cuMemcpyDtoD(CUdeviceptr dstDevice,
CUdeviceptr srcDevice,
int ByteCount)
Copies memory from Device to Device. |
static int |
cuMemcpyDtoDAsync(CUdeviceptr dstDevice,
CUdeviceptr srcDevice,
int ByteCount,
CUstream hStream)
Copies memory from Device to Device. |
static int |
cuMemcpyDtoH(Pointer dstHost,
CUdeviceptr srcDevice,
int ByteCount)
Copies memory from Device to Host. |
static int |
cuMemcpyDtoHAsync(Pointer dstHost,
CUdeviceptr srcDevice,
int ByteCount,
CUstream hStream)
Copies memory from Device to Host. |
static int |
cuMemcpyHtoA(CUarray dstArray,
int dstIndex,
Pointer pSrc,
int ByteCount)
Copies memory from Host to Array. |
static int |
cuMemcpyHtoAAsync(CUarray dstArray,
int dstIndex,
Pointer pSrc,
int ByteCount,
CUstream hStream)
Copies memory from Host to Array. |
static int |
cuMemcpyHtoD(CUdeviceptr dstDevice,
Pointer srcHost,
int ByteCount)
Copies memory from Host to Device. |
static int |
cuMemcpyHtoDAsync(CUdeviceptr dstDevice,
Pointer srcHost,
int ByteCount,
CUstream hStream)
Copies memory from Host to Device. |
static int |
cuMemFree(CUdeviceptr dptr)
Frees device memory. |
static int |
cuMemFreeHost(Pointer p)
Frees page-locked host memory. |
static int |
cuMemGetAddressRange(CUdeviceptr pbase,
int[] psize,
CUdeviceptr dptr)
Get information on memory allocations. |
static int |
cuMemGetInfo(int[] free,
int[] total)
Gets free and total memory. |
static int |
cuMemHostAlloc(Pointer pp,
long bytes,
int Flags)
Allocates page-locked host memory. |
static int |
cuMemHostGetDevicePointer(CUdeviceptr ret,
Pointer p,
int Flags)
Passes back device pointer of mapped pinned memory. |
static int |
cuMemHostGetFlags(int[] pFlags,
Pointer p)
Passes back flags that were used for a pinned allocation. |
static int |
cuMemsetD16(CUdeviceptr dstDevice,
short us,
int N)
Initializes device memory. |
static int |
cuMemsetD2D16(CUdeviceptr dstDevice,
int dstPitch,
short us,
int Width,
int Height)
Initializes device memory. |
static int |
cuMemsetD2D32(CUdeviceptr dstDevice,
int dstPitch,
int ui,
int Width,
int Height)
Initializes device memory. |
static int |
cuMemsetD2D8(CUdeviceptr dstDevice,
int dstPitch,
char uc,
int Width,
int Height)
Initializes device memory. |
static int |
cuMemsetD32(CUdeviceptr dstDevice,
int ui,
int N)
Initializes device memory. |
static int |
cuMemsetD8(CUdeviceptr dstDevice,
char uc,
int N)
Initializes device memory. |
static int |
cuModuleGetFunction(CUfunction hfunc,
CUmodule hmod,
java.lang.String name)
Returns a function handle. |
static int |
cuModuleGetGlobal(CUdeviceptr dptr,
int[] bytes,
CUmodule hmod,
java.lang.String name)
Returns a global pointer from a module. |
static int |
cuModuleGetSurfRef(CUsurfref pSurfRef,
CUmodule hmod,
java.lang.String name)
Returns a handle to a surface reference. |
static int |
cuModuleGetTexRef(CUtexref pTexRef,
CUmodule hmod,
java.lang.String name)
Returns a handle to a texture reference. |
static int |
cuModuleLoad(CUmodule module,
java.lang.String fname)
Loads a compute module. |
static int |
cuModuleLoadData(CUmodule module,
byte[] image)
Load a module's data. |
static int |
cuModuleLoadDataEx(CUmodule phMod,
Pointer p,
int numOptions,
int[] options,
Pointer optionValues)
Load a module's data with options. |
static int |
cuModuleLoadFatBinary(CUmodule module,
byte[] fatCubin)
Load a module's data. |
static int |
cuModuleUnload(CUmodule hmod)
Unloads a module. |
static int |
cuParamSetf(CUfunction hfunc,
int offset,
float value)
Adds a floating-point parameter to the function's argument list. |
static int |
cuParamSeti(CUfunction hfunc,
int offset,
int value)
Adds an integer parameter to the function's argument list. |
static int |
cuParamSetSize(CUfunction hfunc,
int numbytes)
Sets the parameter size for the function. |
static int |
cuParamSetTexRef(CUfunction hfunc,
int texunit,
CUtexref hTexRef)
Adds a texture-reference to the function's argument list. |
static int |
cuParamSetv(CUfunction hfunc,
int offset,
Pointer ptr,
int numbytes)
Adds arbitrary data to the function's argument list. |
static int |
cuStreamCreate(CUstream phStream,
int Flags)
Create a stream. |
static int |
cuStreamDestroy(CUstream hStream)
Destroys a stream. |
static int |
cuStreamQuery(CUstream hStream)
Determine status of a compute stream. |
static int |
cuStreamSynchronize(CUstream hStream)
Wait until a stream's tasks are completed. |
static int |
cuSurfRefGetArray(CUarray phArray,
CUsurfref hSurfRef)
Passes back the CUDA array bound to a surface reference. |
static int |
cuSurfRefSetArray(CUsurfref hSurfRef,
CUarray hArray,
int Flags)
Sets the CUDA array for a surface reference. |
static int |
cuTexRefCreate(CUtexref pTexRef)
Creates a texture reference. |
static int |
cuTexRefDestroy(CUtexref hTexRef)
Destroys a texture reference. |
static int |
cuTexRefGetAddress(CUdeviceptr pdptr,
CUtexref hTexRef)
Gets the address associated with a texture reference. |
static int |
cuTexRefGetAddressMode(int[] pam,
CUtexref hTexRef,
int dim)
Gets the addressing mode used by a texture reference. |
static int |
cuTexRefGetArray(CUarray phArray,
CUtexref hTexRef)
Gets the array bound to a texture reference. |
static int |
cuTexRefGetFilterMode(int[] pfm,
CUtexref hTexRef)
Gets the filter-mode used by a texture reference. |
static int |
cuTexRefGetFlags(int[] pFlags,
CUtexref hTexRef)
Gets the flags used by a texture reference. |
static int |
cuTexRefGetFormat(int[] pFormat,
int[] pNumChannels,
CUtexref hTexRef)
Gets the format used by a texture reference. |
static int |
cuTexRefSetAddress(int[] ByteOffset,
CUtexref hTexRef,
CUdeviceptr dptr,
int bytes)
Binds an address as a texture reference. |
static int |
cuTexRefSetAddress2D(CUtexref hTexRef,
CUDA_ARRAY_DESCRIPTOR desc,
CUdeviceptr dptr,
int PitchInBytes)
Binds an address as a 2D texture reference. |
static int |
cuTexRefSetAddressMode(CUtexref hTexRef,
int dim,
int am)
Sets the addressing mode for a texture reference. |
static int |
cuTexRefSetArray(CUtexref hTexRef,
CUarray hArray,
int Flags)
Binds an address as a texture reference. |
static int |
cuTexRefSetFilterMode(CUtexref hTexRef,
int fm)
Sets the filtering mode for a texture reference. |
static int |
cuTexRefSetFlags(CUtexref hTexRef,
int Flags)
Sets the flags for a texture reference. |
static int |
cuTexRefSetFormat(CUtexref hTexRef,
int fmt,
int NumPackedComponents)
Sets the format for a texture reference. |
static void |
setExceptionsEnabled(boolean enabled)
Enables or disables exceptions. |
static void |
setLogLevel(LogLevel logLevel)
Set the specified log level for the JCuda driver library. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int CUDA_VERSION
public static final int CU_MEMHOSTALLOC_PORTABLE
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CU_MEMHOSTALLOC_DEVICEMAP
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CU_MEMHOSTALLOC_WRITECOMBINED
cuMemHostAlloc(jcuda.Pointer, long, int)
public static final int CUDA_ARRAY3D_2DARRAY
public static final int CUDA_ARRAY3D_SURFACE_LDST
public static final int CU_PARAM_TR_DEFAULT
public static final int CU_TRSA_OVERRIDE_FORMAT
public static final int CU_TRSF_READ_AS_INTEGER
public static final int CU_TRSF_NORMALIZED_COORDINATES
Method Detail |
---|
public static void setLogLevel(LogLevel logLevel)
logLevel
- The log level to use.public static void setExceptionsEnabled(boolean enabled)
enabled
- Whether exceptions are enabledpublic static int align(int value, int alignment)
value
- The address valuealignment
- The desired alignment
public static int cuInit(int Flags)
cuInit | ( | unsigned int | Flags | ) |
Initializes the driver API and must be called before any other function
from the driver API. Currently, the Flags
parameter must
be 0. If cuInit() has not been called, any function from the driver
API will return CUDA_ERROR_NOT_INITIALIZED.
public static int cuDeviceGet(CUdevice device, int ordinal)
cuDeviceGet | ( | CUdevice * | device, | |
int | ordinal | |||
) |
Returns in *device
a device handle given an ordinal in
the range [0, cuDeviceGetCount()-1].
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDeviceGetCount(int[] count)
cuDeviceGetCount | ( | int * | count | ) |
Returns in *count
the number of devices with compute
capability greater than or equal to 1.0 that are available for
execution. If there is no such device, cuDeviceGetCount() returns
0.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDeviceGetName(byte[] name, int len, CUdevice dev)
cuDeviceGetName | ( | char * | name, | |
int | len, | |||
CUdevice | dev | |||
) |
Returns an ASCII string identifying the device dev
in the
NULL-terminated string pointed to by name
. len
specifies the maximum length of the string that may be returned.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDeviceComputeCapability(int[] major, int[] minor, CUdevice dev)
cuDeviceComputeCapability | ( | int * | major, | |
int * | minor, | |||
CUdevice | dev | |||
) |
Returns in *major
and *minor
the major and
minor revision numbers that define the compute capability of the device
dev
.
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDeviceTotalMem(int[] bytes, CUdevice dev)
cuDeviceTotalMem | ( | unsigned int * | bytes, | |
CUdevice | dev | |||
) |
Returns in *bytes
the total amount of memory available on
the device dev
in bytes.
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
public static int cuDeviceGetProperties(CUdevprop prop, CUdevice dev)
cuDeviceGetProperties | ( | CUdevprop * | prop, | |
CUdevice | dev | |||
) |
Returns in *prop
the properties of device dev
.
The CUdevprop structure is defined as:
typedef struct CUdevprop_st { int maxThreadsPerBlock; int maxThreadsDim[3]; int maxGridSize[3]; int sharedMemPerBlock; int totalConstantMemory; int SIMDWidth; int memPitch; int regsPerBlock; int clockRate; int textureAlign } CUdevprop;
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetAttribute(int[], int, jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDeviceGetAttribute(int[] pi, int attrib, CUdevice dev)
cuDeviceGetAttribute | ( | int * | pi, | |
CUdevice_attribute | attrib, | |||
CUdevice | dev | |||
) |
Returns in *pi
the integer value of the attribute
attrib
on device dev
. The supported attributes
are:
cuDeviceComputeCapability(int[], int[], jcuda.driver.CUdevice)
,
cuDeviceGetCount(int[])
,
cuDeviceGetName(byte[], int, jcuda.driver.CUdevice)
,
cuDeviceGet(jcuda.driver.CUdevice, int)
,
cuDeviceGetProperties(jcuda.driver.CUdevprop, jcuda.driver.CUdevice)
,
cuDeviceTotalMem(int[], jcuda.driver.CUdevice)
public static int cuDriverGetVersion(int[] driverVersion)
cuDriverGetVersion | ( | int * | driverVersion | ) |
Returns in *driverVersion
the version number of the
installed CUDA driver. This function automatically returns
CUDA_ERROR_INVALID_VALUE if the driverVersion
argument is
NULL.
public static int cuCtxCreate(CUcontext pctx, int flags, CUdevice dev)
cuCtxCreate | ( | CUcontext * | pctx, | |
unsigned int | flags, | |||
CUdevice | dev | |||
) |
Creates a new CUDA context and associates it with the calling thread.
The flags
parameter is described below. The context is
created with a usage count of 1 and the caller of cuCtxCreate() must
call cuCtxDestroy() or cuCtxDetach() when done using the context. If a
context is already current to the thread, it is supplanted by the newly
created context and may be restored by a subsequent call to
cuCtxPopCurrent().
The two LSBs of the flags
parameter can be used to control
how the OS thread, which owns the CUDA context at the time of an API
call, interacts with the OS scheduler when waiting for results from
the GPU.
flags
parameter is zero, uses a heuristic based on the number of active CUDA
contexts in the process C and the number of logical processors
in the system P. If C > P, then CUDA will
yield to other OS threads when waiting for the GPU, otherwise CUDA will
not yield while waiting for results and actively spin on the
processor.
Note to Linux users:
Context creation will fail with CUDA_ERROR_UNKNOWN if the compute mode of the device is CU_COMPUTEMODE_PROHIBITED. Similarly, context creation will also fail with CUDA_ERROR_UNKNOWN if the compute mode for the device is set to CU_COMPUTEMODE_EXCLUSIVE and there is already an active context on the device. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. The nvidia-smi tool can be used to set the compute mode for devices. Documentation for nvidia-smi can be obtained by passing a -h option to it.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxDestroy(CUcontext ctx)
cuCtxDestroy | ( | CUcontext | ctx | ) |
Destroys the CUDA context specified by ctx
. If the context
usage count is not equal to 1, or the context is current to any CPU
thread other than the current one, this function fails. Floating
contexts (detached from a CPU thread via cuCtxPopCurrent()) may be
destroyed by this function.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxAttach(CUcontext pctx, int flags)
cuCtxAttach | ( | CUcontext * | pctx, | |
unsigned int | flags | |||
) |
Increments the usage count of the context and passes back a context
handle in *pctx
that must be passed to cuCtxDetach() when
the application is done with the context. cuCtxAttach() fails if there
is no context current to the thread.
Currently, the flags
parameter must be 0.
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxDetach(CUcontext ctx)
cuCtxDetach | ( | CUcontext | ctx | ) |
Decrements the usage count of the context ctx
, and destroys
the context if the usage count goes to 0. The context must be a handle
that was passed back by cuCtxCreate() or cuCtxAttach(), and must be
current to the calling thread.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxPushCurrent(CUcontext ctx)
cuCtxPushCurrent | ( | CUcontext | ctx | ) |
Pushes the given context ctx
onto the CPU thread's stack
of current contexts. The specified context becomes the CPU thread's
current context, so all CUDA functions that operate on the current
context are affected.
The previous current context may be made current again by calling cuCtxDestroy() or cuCtxPopCurrent().
The context must be "floating," i.e. not attached to any thread. Contexts are made to float by calling cuCtxPopCurrent().
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxPopCurrent(CUcontext pctx)
cuCtxPopCurrent | ( | CUcontext * | pctx | ) |
Pops the current CUDA context from the CPU thread. The CUDA context must have a usage count of 1. CUDA contexts have a usage count of 1 upon creation; the usage count may be incremented with cuCtxAttach() and decremented with cuCtxDetach().
If successful, cuCtxPopCurrent() passes back the old context handle in
*pctx
. That context may then be made current to a different
CPU thread by calling cuCtxPushCurrent().
Floating contexts may be destroyed by calling cuCtxDestroy().
If a context was current to the CPU thread before cuCtxCreate() or cuCtxPushCurrent() was called, this function makes that context current to the CPU thread again.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxGetDevice(CUdevice device)
cuCtxGetDevice | ( | CUdevice * | device | ) |
Returns in *device
the ordinal of the current context's
device.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxPushCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuCtxSynchronize()
cuCtxSynchronize | ( | void | ) |
Blocks until the device has completed all preceding requested tasks. cuCtxSynchronize() returns an error if one of the preceding tasks failed. If the context was created with the CU_CTX_BLOCKING_SYNC flag, the CPU thread will block until the GPU context has finished its work.
cuCtxAttach(jcuda.driver.CUcontext, int)
,
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuCtxDestroy(jcuda.driver.CUcontext)
,
cuCtxDetach(jcuda.driver.CUcontext)
,
cuCtxGetDevice(jcuda.driver.CUdevice)
,
cuCtxPopCurrent(jcuda.driver.CUcontext)
,
cuCtxSynchronize()
public static int cuModuleLoad(CUmodule module, java.lang.String fname)
cuModuleLoad | ( | CUmodule * | module, | |
const char * | fname | |||
) |
Takes a filename fname
and loads the corresponding module
module
into the current context. The CUDA driver API does
not attempt to lazily allocate the resources needed by a module; if
the memory for functions and data (constant and global) needed by the
module cannot be allocated, cuModuleLoad() fails. The file should be a
cubin file as output by nvcc or a PTX file,
either as output by nvcc or handwrtten.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadData(CUmodule module, byte[] image)
cuModuleLoadData | ( | CUmodule * | module, | |
const void * | image | |||
) |
Takes a pointer image
and loads the corresponding module
module
into the current context. The pointer may be
obtained by mapping a cubin or PTX file, passing a
cubin or PTX file as a NULL-terminated text string,
or incorporating a cubin object into the executable resources
and using operating system calls such as Windows FindResource()
to obtain the pointer.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadDataEx(CUmodule phMod, Pointer p, int numOptions, int[] options, Pointer optionValues)
cuModuleLoadDataEx | ( | CUmodule * | module, | |
const void * | image, | |||
unsigned int | numOptions, | |||
CUjit_option * | options, | |||
void ** | optionValues | |||
) |
Takes a pointer image
and loads the corresponding module
module
into the current context. The pointer may be
obtained by mapping a cubin or PTX file, passing a
cubin or PTX file as a NULL-terminated text string,
or incorporating a cubin object into the executable resources
and using operating system calls such as Windows FindResource()
to obtain the pointer. Options are passed as an array via
options
and any corresponding parameters are passed in
optionValues
. The number of total options is supplied via
numOptions
. Any outputs will be returned via
optionValues
. Supported options are (types for the option
values are specified in parentheses after the option name):
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleLoadFatBinary(CUmodule module, byte[] fatCubin)
cuModuleLoadFatBinary | ( | CUmodule * | module, | |
const void * | fatCubin | |||
) |
Takes a pointer fatCubin
and loads the corresponding
module module
into the current context. The pointer
represents a fat binary object, which is a collection of
different cubin files, all representing the same device code,
but compiled and optimized for different architectures. There is
currently no documented API for constructing and using fat binary
objects by programmers, and therefore this function is an internal
function in this version of CUDA. More information can be found in the
nvcc document.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleUnload(CUmodule hmod)
cuModuleUnload | ( | CUmodule | hmod | ) |
Unloads a module hmod
from the current context.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
public static int cuModuleGetFunction(CUfunction hfunc, CUmodule hmod, java.lang.String name)
cuModuleGetFunction | ( | CUfunction * | hfunc, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *hfunc
the handle of the function of name
name
located in module hmod
. If no function
of that name exists, cuModuleGetFunction() returns
CUDA_ERROR_NOT_FOUND.
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetGlobal(CUdeviceptr dptr, int[] bytes, CUmodule hmod, java.lang.String name)
cuModuleGetGlobal | ( | CUdeviceptr * | dptr, | |
unsigned int * | bytes, | |||
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *dptr
and *bytes
the base pointer
and size of the global of name name
located in module
hmod
. If no variable of that name exists, cuModuleGetGlobal()
returns CUDA_ERROR_NOT_FOUND. Both parameters dptr
and
bytes
are optional. If one of them is NULL, it is
ignored.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetTexRef(CUtexref pTexRef, CUmodule hmod, java.lang.String name)
cuModuleGetTexRef | ( | CUtexref * | pTexRef, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *pTexRef
the handle of the texture reference
of name name
in the module hmod
. If no
texture reference of that name exists, cuModuleGetTexRef() returns
CUDA_ERROR_NOT_FOUND. This texture reference handle should not be
destroyed, since it will be destroyed when the module is unloaded.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuModuleGetSurfRef(CUsurfref pSurfRef, CUmodule hmod, java.lang.String name)
cuModuleGetSurfRef | ( | CUsurfref * | pSurfRef, | |
CUmodule | hmod, | |||
const char * | name | |||
) |
Returns in *pSurfRef
the handle of the surface reference
of name name
in the module hmod
. If no
surface reference of that name exists, cuModuleGetSurfRef() returns
CUDA_ERROR_NOT_FOUND.
cuModuleGetFunction(jcuda.driver.CUfunction, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetGlobal(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUmodule, java.lang.String)
,
cuModuleGetTexRef(jcuda.driver.CUtexref, jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoad(jcuda.driver.CUmodule, java.lang.String)
,
cuModuleLoadData(jcuda.driver.CUmodule, byte[])
,
cuModuleLoadDataEx(jcuda.driver.CUmodule, jcuda.Pointer, int, int[], jcuda.Pointer)
,
cuModuleLoadFatBinary(jcuda.driver.CUmodule, byte[])
,
cuModuleUnload(jcuda.driver.CUmodule)
public static int cuMemGetInfo(int[] free, int[] total)
cuMemGetInfo | ( | unsigned int * | free, | |
unsigned int * | total | |||
) |
Returns in *free
and *total
respectively,
the free and total amount of memory available for allocation by the
CUDA context, in bytes.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemHostAlloc(Pointer pp, long bytes, int Flags)
cuMemHostAlloc | ( | void ** | pp, | |
size_t | bytesize, | |||
unsigned int | Flags | |||
) |
Allocates bytesize
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cuMemcpyHtoD(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of pinned memory may degrade system
performance, since it reduces the amount of memory available to the
system for paging. As a result, this function is best used sparingly
to allocate staging areas for data exchange between host and
device.
The Flags
parameter enables different options to be
specified that affect the allocation, as follows.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
The CUDA context must have been created with the CU_CTX_MAP_HOST flag in order for the CU_MEMHOSTALLOC_MAPPED flag to have any effect.
The CU_MEMHOSTALLOC_MAPPED flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cuMemHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the CU_MEMHOSTALLOC_PORTABLE flag.
The memory allocated by this function must be freed with cuMemFreeHost().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemHostGetDevicePointer(CUdeviceptr ret, Pointer p, int Flags)
cuMemHostGetDevicePointer | ( | CUdeviceptr * | pdptr, | |
void * | p, | |||
unsigned int | Flags | |||
) |
Passes back the device pointer pdptr
corresponding to the
mapped, pinned host buffer p
allocated by
cuMemHostAlloc.
cuMemHostGetDevicePointer() will fail if the CU_MEMALLOCHOST_DEVICEMAP flag was not specified at the time the memory was allocated, or if the function is called on a GPU that does not support mapped pinned memory.
Flags
provides for future releases. For now, it must be
set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemHostGetFlags(int[] pFlags, Pointer p)
cuMemHostGetFlags | ( | unsigned int * | pFlags, | |
void * | p | |||
) |
Passes back the flags pFlags
that were specified when
allocating the pinned host buffer p
allocated by
cuMemHostAlloc.
cuMemHostGetFlags() will fail if the pointer does not reside in an allocation performed by cuMemAllocHost() or cuMemHostAlloc().
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemHostAlloc(jcuda.Pointer, long, int)
public static int cuMemAlloc(CUdeviceptr dptr, int bytesize)
cuMemAlloc | ( | CUdeviceptr * | dptr, | |
unsigned int | bytesize | |||
) |
Allocates bytesize
bytes of linear memory on the device
and returns in *dptr
a pointer to the allocated memory.
The allocated memory is suitably aligned for any kind of variable. The
memory is not cleared. If bytesize
is 0, cuMemAlloc()
returns CUDA_ERROR_INVALID_VALUE.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemAllocPitch(CUdeviceptr dptr, int[] pPitch, int WidthInBytes, int Height, int ElementSizeBytes)
cuMemAllocPitch | ( | CUdeviceptr * | dptr, | |
unsigned int * | pPitch, | |||
unsigned int | WidthInBytes, | |||
unsigned int | Height, | |||
unsigned int | ElementSizeBytes | |||
) |
Allocates at least WidthInBytes
* Height
bytes of linear memory on the device and returns in *dptr
a pointer to the allocated memory. The function may pad the allocation
to ensure that corresponding pointers in any given row will continue
to meet the alignment requirements for coalescing as the address is
updated from row to row. ElementSizeBytes
specifies the
size of the largest reads and writes that will be performed on the
memory range. ElementSizeBytes
may be 4, 8 or 16 (since
coalesced memory transactions are not possible on other data sizes).
If ElementSizeBytes
is smaller than the actual read/write
size of a kernel, the kernel will run correctly, but possibly at
reduced speed. The pitch returned in *pPitch
by
cuMemAllocPitch() is the width in bytes of the allocation. The intended
usage of pitch is as a separate parameter of the allocation, used to
compute addresses within the 2D array. Given the row and column of an
array element of type T, the address is computed as:
T* pElement = (T*)((char*)BaseAddress + Row * Pitch)
+ Column;
The pitch returned by cuMemAllocPitch() is guaranteed to work with cuMemcpy2D() under all circumstances. For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cuMemAllocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).
The byte alignment of the pitch returned by cuMemAllocPitch() is guaranteed to match or exceed the alignment requirement for texture binding with cuTexRefSetAddress2D().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemFree(CUdeviceptr dptr)
cuMemFree | ( | CUdeviceptr | dptr | ) |
Frees the memory space pointed to by dptr
, which must have
been returned by a previous call to cuMemAlloc() or
cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemGetAddressRange(CUdeviceptr pbase, int[] psize, CUdeviceptr dptr)
cuMemGetAddressRange | ( | CUdeviceptr * | pbase, | |
unsigned int * | psize, | |||
CUdeviceptr | dptr | |||
) |
Returns the base address in *pbase
and size in
*psize
of the allocation by cuMemAlloc() or cuMemAllocPitch()
that contains the input pointer dptr
. Both parameters
pbase
and psize
are optional. If one of them
is NULL, it is ignored.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemAllocHost(Pointer pointer, int bytesize)
cuMemAllocHost | ( | void ** | pp, | |
unsigned int | bytesize | |||
) |
Allocates bytesize
bytes of host memory that is page-locked
and accessible to the device. The driver tracks the virtual memory
ranges allocated with this function and automatically accelerates calls
to functions such as cuMemcpy(). Since the memory can be accessed
directly by the device, it can be read or written with much higher
bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of memory with cuMemAllocHost() may
degrade system performance, since it reduces the amount of memory
available to the system for paging. As a result, this function is best
used sparingly to allocate staging areas for data exchange between host
and device.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemFreeHost(Pointer p)
cuMemFreeHost | ( | void * | p | ) |
Frees the memory space pointed to by p
, which must have
been returned by a previous call to cuMemAllocHost().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyHtoD(CUdeviceptr dstDevice, Pointer srcHost, int ByteCount)
cuMemcpyHtoD | ( | CUdeviceptr | dstDevice, | |
const void * | srcHost, | |||
unsigned int | ByteCount | |||
) |
Copies from host memory to device memory. dstDevice
and
srcHost
are the base addresses of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is synchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyDtoH(Pointer dstHost, CUdeviceptr srcDevice, int ByteCount)
cuMemcpyDtoH | ( | void * | dstHost, | |
CUdeviceptr | srcDevice, | |||
unsigned int | ByteCount | |||
) |
Copies from device to host memory. dstHost
and
srcDevice
specify the base pointers of the destination
and source, respectively. ByteCount
specifies the number
of bytes to copy. Note that this function is synchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyDtoD(CUdeviceptr dstDevice, CUdeviceptr srcDevice, int ByteCount)
cuMemcpyDtoD | ( | CUdeviceptr | dstDevice, | |
CUdeviceptr | srcDevice, | |||
unsigned int | ByteCount | |||
) |
Copies from device memory to device memory. dstDevice
and
srcDevice
are the base pointers of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is asynchronous.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyDtoA(CUarray dstArray, int dstIndex, CUdeviceptr srcDevice, int ByteCount)
cuMemcpyDtoA | ( | CUarray | dstArray, | |
unsigned int | dstOffset, | |||
CUdeviceptr | srcDevice, | |||
unsigned int | ByteCount | |||
) |
Copies from device memory to a 1D CUDA array. dstArray
and dstOffset
specify the CUDA array handle and starting
index of the destination data. srcDevice
specifies the
base pointer of the source. ByteCount
specifies the number
of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyAtoD(CUdeviceptr dstDevice, CUarray hSrc, int SrcIndex, int ByteCount)
cuMemcpyAtoD | ( | CUdeviceptr | dstDevice, | |
CUarray | srcArray, | |||
unsigned int | srcOffset, | |||
unsigned int | ByteCount | |||
) |
Copies from one 1D CUDA array to device memory. dstDevice
specifies the base pointer of the destination and must be naturally
aligned with the CUDA array elements. srcArray
and
srcOffset
specify the CUDA array handle and the offset in
bytes into the array where the copy is to begin. ByteCount
specifies the number of bytes to copy and must be evenly divisible by
the array element size.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyHtoA(CUarray dstArray, int dstIndex, Pointer pSrc, int ByteCount)
cuMemcpyHtoA | ( | CUarray | dstArray, | |
unsigned int | dstOffset, | |||
const void * | srcHost, | |||
unsigned int | ByteCount | |||
) |
Copies from host memory to a 1D CUDA array. dstArray
and
dstOffset
specify the CUDA array handle and starting
offset in bytes of the destination data. pSrc
specifies
the base address of the source. ByteCount
specifies the
number of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyAtoH(Pointer dstHost, CUarray srcArray, int srcIndex, int ByteCount)
cuMemcpyAtoH | ( | void * | dstHost, | |
CUarray | srcArray, | |||
unsigned int | srcOffset, | |||
unsigned int | ByteCount | |||
) |
Copies from one 1D CUDA array to host memory. dstHost
specifies the base pointer of the destination. srcArray
and srcOffset
specify the CUDA array handle and starting
offset in bytes of the source data. ByteCount
specifies
the number of bytes to copy.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyAtoA(CUarray dstArray, int dstIndex, CUarray srcArray, int srcIndex, int ByteCount)
cuMemcpyAtoA | ( | CUarray | dstArray, | |
unsigned int | dstOffset, | |||
CUarray | srcArray, | |||
unsigned int | srcOffset, | |||
unsigned int | ByteCount | |||
) |
Copies from one 1D CUDA array to another. dstArray
and
srcArray
specify the handles of the destination and source
CUDA arrays for the copy, respectively. dstOffset
and
srcOffset
specify the destination and source offsets in
bytes into the CUDA arrays. ByteCount
is the number of
bytes to be copied. The size of the elements in the CUDA arrays need
not be the same format, but the elements must be the same size; and
count must be evenly divisible by that size.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpy2D(CUDA_MEMCPY2D pCopy)
cuMemcpy2D | ( | const CUDA_MEMCPY2D * | pCopy | ) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpy2DUnaligned(CUDA_MEMCPY2D pCopy)
cuMemcpy2DUnaligned | ( | const CUDA_MEMCPY2D * | pCopy | ) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpy3D(CUDA_MEMCPY3D pCopy)
cuMemcpy3D | ( | const CUDA_MEMCPY3D * | pCopy | ) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY3D structure is defined as:
typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes, srcY, srcZ; unsigned int srcLOD; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; // ignored when src is array unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1 unsigned int dstXInBytes, dstY, dstZ; unsigned int dstLOD; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; // ignored when dst is array unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1 unsigned int WidthInBytes; unsigned int Height; unsigned int Depth; } CUDA_MEMCPY3D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype;
void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyHtoDAsync(CUdeviceptr dstDevice, Pointer srcHost, int ByteCount, CUstream hStream)
cuMemcpyHtoDAsync | ( | CUdeviceptr | dstDevice, | |
const void * | srcHost, | |||
unsigned int | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from host memory to device memory. dstDevice
and
srcHost
are the base addresses of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy.
cuMemcpyHtoDAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyDtoHAsync(Pointer dstHost, CUdeviceptr srcDevice, int ByteCount, CUstream hStream)
cuMemcpyDtoHAsync | ( | void * | dstHost, | |
CUdeviceptr | srcDevice, | |||
unsigned int | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from device to host memory. dstHost
and
srcDevice
specify the base pointers of the destination
and source, respectively. ByteCount
specifies the number
of bytes to copy.
cuMemcpyDtoHAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyDtoDAsync(CUdeviceptr dstDevice, CUdeviceptr srcDevice, int ByteCount, CUstream hStream)
cuMemcpyDtoDAsync | ( | CUdeviceptr | dstDevice, | |
CUdeviceptr | srcDevice, | |||
unsigned int | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from device memory to device memory. dstDevice
and
srcDevice
are the base pointers of the destination and
source, respectively. ByteCount
specifies the number of
bytes to copy. Note that this function is asynchronous and can
optionally be associated to a stream by passing a non-zero
hStream
argument
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyHtoAAsync(CUarray dstArray, int dstIndex, Pointer pSrc, int ByteCount, CUstream hStream)
cuMemcpyHtoAAsync | ( | CUarray | dstArray, | |
unsigned int | dstOffset, | |||
const void * | srcHost, | |||
unsigned int | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from host memory to a 1D CUDA array. dstArray
and
dstOffset
specify the CUDA array handle and starting
offset in bytes of the destination data. srcHost
specifies
the base address of the source. ByteCount
specifies the
number of bytes to copy.
cuMemcpyHtoAAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero hStream
argument. It
only works on page-locked memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpyAtoHAsync(Pointer dstHost, CUarray srcArray, int srcIndex, int ByteCount, CUstream hStream)
cuMemcpyAtoHAsync | ( | void * | dstHost, | |
CUarray | srcArray, | |||
unsigned int | srcOffset, | |||
unsigned int | ByteCount, | |||
CUstream | hStream | |||
) |
Copies from one 1D CUDA array to host memory. dstHost
specifies the base pointer of the destination. srcArray
and srcOffset
specify the CUDA array handle and starting
offset in bytes of the source data. ByteCount
specifies
the number of bytes to copy.
cuMemcpyAtoHAsync() is asynchronous and can optionally be associated
to a stream by passing a non-zero stream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpy2DAsync(CUDA_MEMCPY2D pCopy, CUstream hStream)
cuMemcpy2DAsync | ( | const CUDA_MEMCPY2D * | pCopy, | |
CUstream | hStream | |||
) |
Perform a 2D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY2D structure is defined as:
typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes, srcY; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; unsigned int dstXInBytes, dstY; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; unsigned int WidthInBytes; unsigned int Height; } CUDA_MEMCPY2D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype;
void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes;
hStream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemcpy3DAsync(CUDA_MEMCPY3D pCopy, CUstream hStream)
cuMemcpy3DAsync | ( | const CUDA_MEMCPY3D * | pCopy, | |
CUstream | hStream | |||
) |
Perform a 3D memory copy according to the parameters specified in
pCopy
. The CUDA_MEMCPY3D structure is defined as:
typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes, srcY, srcZ; unsigned int srcLOD; CUmemorytype srcMemoryType; const void *srcHost; CUdeviceptr srcDevice; CUarray srcArray; unsigned int srcPitch; // ignored when src is array unsigned int srcHeight; // ignored when src is array; may be 0 if Depth==1 unsigned int dstXInBytes, dstY, dstZ; unsigned int dstLOD; CUmemorytype dstMemoryType; void *dstHost; CUdeviceptr dstDevice; CUarray dstArray; unsigned int dstPitch; // ignored when dst is array unsigned int dstHeight; // ignored when dst is array; may be 0 if Depth==1 unsigned int WidthInBytes; unsigned int Height; unsigned int Depth; } CUDA_MEMCPY3D;
typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01, CU_MEMORYTYPE_DEVICE = 0x02, CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype;
void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes);
CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes;
void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes);
CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes;
hStream
argument. It only
works on page-locked host memory and returns an error if a pointer to
pageable memory is passed as input.
The srcLOD and dstLOD members of the CUDA_MEMCPY3D structure must be set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemsetD8(CUdeviceptr dstDevice, char uc, int N)
cuMemsetD8 | ( | CUdeviceptr | dstDevice, | |
unsigned char | uc, | |||
unsigned int | N | |||
) |
Sets the memory range of N
8-bit values to the specified
value uc
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemsetD16(CUdeviceptr dstDevice, short us, int N)
cuMemsetD16 | ( | CUdeviceptr | dstDevice, | |
unsigned short | us, | |||
unsigned int | N | |||
) |
Sets the memory range of N
16-bit values to the specified
value us
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemsetD32(CUdeviceptr dstDevice, int ui, int N)
cuMemsetD32 | ( | CUdeviceptr | dstDevice, | |
unsigned int | ui, | |||
unsigned int | N | |||
) |
Sets the memory range of N
32-bit values to the specified
value ui
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
public static int cuMemsetD2D8(CUdeviceptr dstDevice, int dstPitch, char uc, int Width, int Height)
cuMemsetD2D8 | ( | CUdeviceptr | dstDevice, | |
unsigned int | dstPitch, | |||
unsigned char | uc, | |||
unsigned int | Width, | |||
unsigned int | Height | |||
) |
Sets the 2D memory range of Width
8-bit values to the
specified value uc
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemsetD2D16(CUdeviceptr dstDevice, int dstPitch, short us, int Width, int Height)
cuMemsetD2D16 | ( | CUdeviceptr | dstDevice, | |
unsigned int | dstPitch, | |||
unsigned short | us, | |||
unsigned int | Width, | |||
unsigned int | Height | |||
) |
Sets the 2D memory range of Width
16-bit values to the
specified value us
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuMemsetD2D32(CUdeviceptr dstDevice, int dstPitch, int ui, int Width, int Height)
cuMemsetD2D32 | ( | CUdeviceptr | dstDevice, | |
unsigned int | dstPitch, | |||
unsigned int | ui, | |||
unsigned int | Width, | |||
unsigned int | Height | |||
) |
Sets the 2D memory range of Width
32-bit values to the
specified value ui
. Height
specifies the
number of rows to set, and dstPitch
specifies the number
of bytes between each row. This function performs fastest when the
pitch is one that has been passed back by cuMemAllocPitch().
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuFuncGetAttribute(int[] pi, int attrib, CUfunction func)
cuFuncGetAttribute | ( | int * | pi, | |
CUfunction_attribute | attrib, | |||
CUfunction | hfunc | |||
) |
Returns in *pi
the integer value of the attribute
attrib
on the kernel given by hfunc
. The
supported attributes are:
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuFuncSetBlockShape(CUfunction hfunc, int x, int y, int z)
cuFuncSetBlockShape | ( | CUfunction | hfunc, | |
int | x, | |||
int | y, | |||
int | z | |||
) |
Specifies the x
, y
, and z
dimensions of the thread blocks that are created when the kernel given
by hfunc
is launched.
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuFuncSetSharedSize(CUfunction hfunc, int bytes)
cuFuncSetSharedSize | ( | CUfunction | hfunc, | |
unsigned int | bytes | |||
) |
Sets through bytes
the amount of dynamic shared memory
that will be available to each thread block when the kernel given by
hfunc
is launched.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetCacheConfig(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuFuncSetCacheConfig(CUfunction hfunc, int config)
cuFuncSetCacheConfig | ( | CUfunction | hfunc, | |
CUfunc_cache | config | |||
) |
On devices where the L1 cache and shared memory use the same hardware
resources, this sets through config
the preferred cache
configuration for the device function hfunc
. This is only
a preference. The driver will use the requested configuration if
possible, but it is free to choose a different configuration if required
to execute hfunc
.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Switching between configuration modes may insert a device-side synchronization point for streamed kernel launches.
The supported cache modes are:
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuArrayCreate(CUarray pHandle, CUDA_ARRAY_DESCRIPTOR pAllocateArray)
cuArrayCreate | ( | CUarray * | pHandle, | |
const CUDA_ARRAY_DESCRIPTOR * | pAllocateArray | |||
) |
Creates a CUDA array according to the CUDA_ARRAY_DESCRIPTOR structure
pAllocateArray
and returns a handle to the new CUDA array
in *pHandle
. The CUDA_ARRAY_DESCRIPTOR is defined as:
typedef struct { unsigned int Width; unsigned int Height; CUarray_format Format; unsigned int NumChannels; } CUDA_ARRAY_DESCRIPTOR;
Width
, and Height
are the width, and
height of the CUDA array (in elements); the CUDA array is one-dimensional
if height is 0, two-dimensional otherwise;
typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01, CU_AD_FORMAT_UNSIGNED_INT16 = 0x02, CU_AD_FORMAT_UNSIGNED_INT32 = 0x03, CU_AD_FORMAT_SIGNED_INT8 = 0x08, CU_AD_FORMAT_SIGNED_INT16 = 0x09, CU_AD_FORMAT_SIGNED_INT32 = 0x0a, CU_AD_FORMAT_HALF = 0x10, CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format;
NumChannels
specifies the number of
packed components per CUDA array element; it may be 1, 2, or
4;
Here are examples of CUDA array descriptions:
Description for a CUDA array of 2048 floats:
CUDA_ARRAY_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 2048; desc.Height = 1;
Description for a 64 x 64 CUDA array of floats:
CUDA_ARRAY_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 64; desc.Height = 64;
Description for a width
x height
CUDA array
of 64-bit, 4x16-bit float16's:
CUDA_ARRAY_DESCRIPTOR desc; desc.FormatFlags = CU_AD_FORMAT_HALF; desc.NumChannels = 4; desc.Width = width; desc.Height = height;
Description for a width
x height
CUDA array
of 16-bit elements, each of which is two 8-bit unsigned chars:
CUDA_ARRAY_DESCRIPTOR arrayDesc; desc.FormatFlags = CU_AD_FORMAT_UNSIGNED_INT8; desc.NumChannels = 2; desc.Width = width; desc.Height = height;
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuArrayGetDescriptor(CUDA_ARRAY_DESCRIPTOR pArrayDescriptor, CUarray hArray)
cuArrayGetDescriptor | ( | CUDA_ARRAY_DESCRIPTOR * | pArrayDescriptor, | |
CUarray | hArray | |||
) |
Returns in *pArrayDescriptor
a descriptor containing
information on the format and dimensions of the CUDA array
hArray
. It is useful for subroutines that have been passed
a CUDA array, but need to know the CUDA array parameters for validation
or other purposes.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuArrayDestroy(CUarray hArray)
cuArrayDestroy | ( | CUarray | hArray | ) |
Destroys the CUDA array hArray
.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuArray3DCreate(CUarray pHandle, CUDA_ARRAY3D_DESCRIPTOR pAllocateArray)
cuArray3DCreate | ( | CUarray * | pHandle, | |
const CUDA_ARRAY3D_DESCRIPTOR * | pAllocateArray | |||
) |
Creates a CUDA array according to the CUDA_ARRAY3D_DESCRIPTOR structure
pAllocateArray
and returns a handle to the new CUDA array
in *pHandle
. The CUDA_ARRAY3D_DESCRIPTOR is defined
as:
typedef struct { unsigned int Width; unsigned int Height; unsigned int Depth; CUarray_format Format; unsigned int NumChannels; unsigned int Flags; } CUDA_ARRAY3D_DESCRIPTOR;
Width
, Height
, and Depth
are the width, height, and depth of the CUDA array (in elements); the
CUDA array is one-dimensional if height and depth are 0, two-dimensional
if depth is 0, and three-dimensional otherwise;
typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01, CU_AD_FORMAT_UNSIGNED_INT16 = 0x02, CU_AD_FORMAT_UNSIGNED_INT32 = 0x03, CU_AD_FORMAT_SIGNED_INT8 = 0x08, CU_AD_FORMAT_SIGNED_INT16 = 0x09, CU_AD_FORMAT_SIGNED_INT32 = 0x0a, CU_AD_FORMAT_HALF = 0x10, CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format;
NumChannels
specifies the number of
packed components per CUDA array element; it may be 1, 2, or
4;
Here are examples of CUDA array descriptions:
Description for a CUDA array of 2048 floats:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 2048; desc.Height = 0; desc.Depth = 0;
Description for a 64 x 64 CUDA array of floats:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.Format = CU_AD_FORMAT_FLOAT; desc.NumChannels = 1; desc.Width = 64; desc.Height = 64; desc.Depth = 0;
Description for a width
x height
x
depth
CUDA array of 64-bit, 4x16-bit float16's:
CUDA_ARRAY3D_DESCRIPTOR desc; desc.FormatFlags = CU_AD_FORMAT_HALF; desc.NumChannels = 4; desc.Width = width; desc.Height = height; desc.Depth = depth;
cuArray3DGetDescriptor(jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR, jcuda.driver.CUarray)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuArray3DGetDescriptor(CUDA_ARRAY3D_DESCRIPTOR pArrayDescriptor, CUarray hArray)
cuArray3DGetDescriptor | ( | CUDA_ARRAY3D_DESCRIPTOR * | pArrayDescriptor, | |
CUarray | hArray | |||
) |
Returns in *pArrayDescriptor
a descriptor containing
information on the format and dimensions of the CUDA array
hArray
. It is useful for subroutines that have been passed
a CUDA array, but need to know the CUDA array parameters for validation
or other purposes.
This function may be called on 1D and 2D arrays, in which case the
Height
and/or Depth
members of the descriptor
struct will be set to 0.
cuArray3DCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY3D_DESCRIPTOR)
,
cuArrayCreate(jcuda.driver.CUarray, jcuda.driver.CUDA_ARRAY_DESCRIPTOR)
,
cuArrayDestroy(jcuda.driver.CUarray)
,
cuArrayGetDescriptor(jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUarray)
,
cuMemAlloc(jcuda.driver.CUdeviceptr, int)
,
cuMemAllocHost(jcuda.Pointer, int)
,
cuMemAllocPitch(jcuda.driver.CUdeviceptr, int[], int, int, int)
,
cuMemcpy2D(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy2DAsync(jcuda.driver.CUDA_MEMCPY2D, jcuda.driver.CUstream)
,
cuMemcpy2DUnaligned(jcuda.driver.CUDA_MEMCPY2D)
,
cuMemcpy3D(jcuda.driver.CUDA_MEMCPY3D)
,
cuMemcpy3DAsync(jcuda.driver.CUDA_MEMCPY3D, jcuda.driver.CUstream)
,
cuMemcpyAtoA(jcuda.driver.CUarray, int, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoH(jcuda.Pointer, jcuda.driver.CUarray, int, int)
,
cuMemcpyAtoHAsync(jcuda.Pointer, jcuda.driver.CUarray, int, int, jcuda.driver.CUstream)
,
cuMemcpyDtoA(jcuda.driver.CUarray, int, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoD(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoDAsync(jcuda.driver.CUdeviceptr, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyDtoH(jcuda.Pointer, jcuda.driver.CUdeviceptr, int)
,
cuMemcpyDtoHAsync(jcuda.Pointer, jcuda.driver.CUdeviceptr, int, jcuda.driver.CUstream)
,
cuMemcpyHtoA(jcuda.driver.CUarray, int, jcuda.Pointer, int)
,
cuMemcpyHtoAAsync(jcuda.driver.CUarray, int, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemcpyHtoD(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemcpyHtoDAsync(jcuda.driver.CUdeviceptr, jcuda.Pointer, int, jcuda.driver.CUstream)
,
cuMemFree(jcuda.driver.CUdeviceptr)
,
cuMemFreeHost(jcuda.Pointer)
,
cuMemGetAddressRange(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUdeviceptr)
,
cuMemGetInfo(int[], int[])
,
cuMemHostAlloc(jcuda.Pointer, long, int)
,
cuMemHostGetDevicePointer(jcuda.driver.CUdeviceptr, jcuda.Pointer, int)
,
cuMemsetD2D8(jcuda.driver.CUdeviceptr, int, char, int, int)
,
cuMemsetD2D16(jcuda.driver.CUdeviceptr, int, short, int, int)
,
cuMemsetD2D32(jcuda.driver.CUdeviceptr, int, int, int, int)
,
cuMemsetD8(jcuda.driver.CUdeviceptr, char, int)
,
cuMemsetD16(jcuda.driver.CUdeviceptr, short, int)
,
cuMemsetD32(jcuda.driver.CUdeviceptr, int, int)
public static int cuTexRefCreate(CUtexref pTexRef)
cuTexRefCreate | ( | CUtexref * | pTexRef | ) |
Creates a texture reference and returns its handle in
*pTexRef
. Once created, the application must call
cuTexRefSetArray() or cuTexRefSetAddress() to associate the reference
with allocated memory. Other texture reference functions are used to
specify the format and interpretation (addressing, filtering, etc.) to
be used when the memory is read through this texture reference. To
associate the texture reference with a texture ordinal for a given
function, the application should call cuParamSetTexRef().
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefDestroy(CUtexref hTexRef)
cuTexRefDestroy | ( | CUtexref | hTexRef | ) |
Destroys the texture reference specified by hTexRef
.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetArray(CUtexref hTexRef, CUarray hArray, int Flags)
cuTexRefSetArray | ( | CUtexref | hTexRef, | |
CUarray | hArray, | |||
unsigned int | Flags | |||
) |
Binds the CUDA array hArray
to the texture reference
hTexRef
. Any previous address or CUDA array state
associated with the texture reference is superseded by this function.
Flags
must be set to CU_TRSA_OVERRIDE_FORMAT. Any CUDA
array previously bound to hTexRef
is unbound.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddress(int[] ByteOffset, CUtexref hTexRef, CUdeviceptr dptr, int bytes)
cuTexRefSetAddress | ( | unsigned int * | ByteOffset, | |
CUtexref | hTexRef, | |||
CUdeviceptr | dptr, | |||
unsigned int | bytes | |||
) |
Binds a linear address range to the texture reference hTexRef
.
Any previous address or CUDA array state associated with the texture
reference is superseded by this function. Any memory previously bound
to hTexRef
is unbound.
Since the hardware enforces an alignment requirement on texture base
addresses, cuTexRefSetAddress() passes back a byte offset in
*ByteOffset
that must be applied to texture fetches in
order to read from the desired memory. This offset must be divided by
the texel size and passed to kernels that read from the texture so they
can be applied to the tex1Dfetch() function.
If the device memory pointer was returned from cuMemAlloc(), the offset
is guaranteed to be 0 and NULL may be passed as the ByteOffset
parameter.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFormat(CUtexref hTexRef, int fmt, int NumPackedComponents)
cuTexRefSetFormat | ( | CUtexref | hTexRef, | |
CUarray_format | fmt, | |||
int | NumPackedComponents | |||
) |
Specifies the format of the data to be read by the texture reference
hTexRef
. fmt
and NumPackedComponents
are exactly analogous to the Format and NumChannels members of the
CUDA_ARRAY_DESCRIPTOR structure: They specify the format of each
component and the number of components per array element.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddress2D(CUtexref hTexRef, CUDA_ARRAY_DESCRIPTOR desc, CUdeviceptr dptr, int PitchInBytes)
cuTexRefSetAddress2D | ( | CUtexref | hTexRef, | |
const CUDA_ARRAY_DESCRIPTOR * | desc, | |||
CUdeviceptr | dptr, | |||
unsigned int | Pitch | |||
) |
Binds a linear address range to the texture reference hTexRef
.
Any previous address or CUDA array state associated with the texture
reference is superseded by this function. Any memory previously bound
to hTexRef
is unbound.
Using a tex2D() function inside a kernel requires a call to either cuTexRefSetArray() to bind the corresponding texture reference to an array, or cuTexRefSetAddress2D() to bind the texture reference to linear memory.
Function calls to cuTexRefSetFormat() cannot follow calls to cuTexRefSetAddress2D() for the same texture reference.
It is required that dptr
be aligned to the appropriate
hardware-specific texture alignment. You can query this value using
the device attribute CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT. If an
unaligned dptr
is supplied, CUDA_ERROR_INVALID_VALUE is
returned.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetAddressMode(CUtexref hTexRef, int dim, int am)
cuTexRefSetAddressMode | ( | CUtexref | hTexRef, | |
int | dim, | |||
CUaddress_mode | am | |||
) |
Specifies the addressing mode am
for the given dimension
dim
of the texture reference hTexRef
. If
dim
is zero, the addressing mode is applied to the first
parameter of the functions used to fetch from the texture; if
dim
is 1, the second, and so on. CUaddress_mode is defined
as:
typedef enum CUaddress_mode_enum { CU_TR_ADDRESS_MODE_WRAP = 0, CU_TR_ADDRESS_MODE_CLAMP = 1, CU_TR_ADDRESS_MODE_MIRROR = 2, } CUaddress_mode;
Note that this call has no effect if hTexRef
is bound to
linear memory.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFilterMode(CUtexref hTexRef, int fm)
cuTexRefSetFilterMode | ( | CUtexref | hTexRef, | |
CUfilter_mode | fm | |||
) |
Specifies the filtering mode fm
to be used when reading
memory through the texture reference hTexRef
.
CUfilter_mode_enum is defined as:
typedef enum CUfilter_mode_enum { CU_TR_FILTER_MODE_POINT = 0, CU_TR_FILTER_MODE_LINEAR = 1 } CUfilter_mode;
Note that this call has no effect if hTexRef
is bound to
linear memory.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefSetFlags(CUtexref hTexRef, int Flags)
cuTexRefSetFlags | ( | CUtexref | hTexRef, | |
unsigned int | Flags | |||
) |
Specifies optional flags via Flags
to specify the behavior
of data returned through the texture reference hTexRef
.
The valid flags are:
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetAddress(CUdeviceptr pdptr, CUtexref hTexRef)
cuTexRefGetAddress | ( | CUdeviceptr * | pdptr, | |
CUtexref | hTexRef | |||
) |
Returns in *pdptr
the base address bound to the texture
reference hTexRef
, or returns CUDA_ERROR_INVALID_VALUE if
the texture reference is not bound to any device memory range.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetArray(CUarray phArray, CUtexref hTexRef)
cuTexRefGetArray | ( | CUarray * | phArray, | |
CUtexref | hTexRef | |||
) |
Returns in *phArray
the CUDA array bound to the texture
reference hTexRef
, or returns CUDA_ERROR_INVALID_VALUE if
the texture reference is not bound to any CUDA array.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetAddressMode(int[] pam, CUtexref hTexRef, int dim)
cuTexRefGetAddressMode | ( | CUaddress_mode * | pam, | |
CUtexref | hTexRef, | |||
int | dim | |||
) |
Returns in *pam
the addressing mode corresponding to the
dimension dim
of the texture reference hTexRef
.
Currently, the only valid value for dim
are 0 and 1.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFilterMode(int[] pfm, CUtexref hTexRef)
cuTexRefGetFilterMode | ( | CUfilter_mode * | pfm, | |
CUtexref | hTexRef | |||
) |
Returns in *pfm
the filtering mode of the texture reference
hTexRef
.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFormat(int[] pFormat, int[] pNumChannels, CUtexref hTexRef)
cuTexRefGetFormat | ( | CUarray_format * | pFormat, | |
int * | pNumChannels, | |||
CUtexref | hTexRef | |||
) |
Returns in *pFormat
and *pNumChannels
the
format and number of components of the CUDA array bound to the texture
reference hTexRef
. If pFormat
or
pNumChannels
is NULL, it will be ignored.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFlags(int[], jcuda.driver.CUtexref)
public static int cuTexRefGetFlags(int[] pFlags, CUtexref hTexRef)
cuTexRefGetFlags | ( | unsigned int * | pFlags, | |
CUtexref | hTexRef | |||
) |
Returns in *pFlags
the flags of the texture reference
hTexRef
.
cuTexRefCreate(jcuda.driver.CUtexref)
,
cuTexRefDestroy(jcuda.driver.CUtexref)
,
cuTexRefSetAddress(int[], jcuda.driver.CUtexref, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddress2D(jcuda.driver.CUtexref, jcuda.driver.CUDA_ARRAY_DESCRIPTOR, jcuda.driver.CUdeviceptr, int)
,
cuTexRefSetAddressMode(jcuda.driver.CUtexref, int, int)
,
cuTexRefSetArray(jcuda.driver.CUtexref, jcuda.driver.CUarray, int)
,
cuTexRefSetFilterMode(jcuda.driver.CUtexref, int)
,
cuTexRefSetFlags(jcuda.driver.CUtexref, int)
,
cuTexRefSetFormat(jcuda.driver.CUtexref, int, int)
,
cuTexRefGetAddress(jcuda.driver.CUdeviceptr, jcuda.driver.CUtexref)
,
cuTexRefGetAddressMode(int[], jcuda.driver.CUtexref, int)
,
cuTexRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUtexref)
,
cuTexRefGetFilterMode(int[], jcuda.driver.CUtexref)
,
cuTexRefGetFormat(int[], int[], jcuda.driver.CUtexref)
public static int cuSurfRefSetArray(CUsurfref hSurfRef, CUarray hArray, int Flags)
cuSurfRefSetArray | ( | CUsurfref | hSurfRef, | |
CUarray | hArray, | |||
unsigned int | Flags | |||
) |
Sets the CUDA array hArray
to be read and written by the
surface reference hSurfRef
. Any previous CUDA array state
associated with the surface reference is superseded by this function.
Flags
must be set to 0. The CUDA_ARRAY3D_SURFACE_LDST flag
must have been set for the CUDA array. Any CUDA array previously bound
to hSurfRef
is unbound.
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuSurfRefGetArray(jcuda.driver.CUarray, jcuda.driver.CUsurfref)
public static int cuSurfRefGetArray(CUarray phArray, CUsurfref hSurfRef)
cuSurfRefGetArray | ( | CUarray * | phArray, | |
CUsurfref | hSurfRef | |||
) |
Returns in *phArray
the CUDA array bound to the surface
reference hSurfRef
, or returns CUDA_ERROR_INVALID_VALUE
if the surface reference is not bound to any CUDA array.
cuModuleGetSurfRef(jcuda.driver.CUsurfref, jcuda.driver.CUmodule, java.lang.String)
,
cuSurfRefSetArray(jcuda.driver.CUsurfref, jcuda.driver.CUarray, int)
public static int cuParamSetSize(CUfunction hfunc, int numbytes)
cuParamSetSize | ( | CUfunction | hfunc, | |
unsigned int | numbytes | |||
) |
Sets through numbytes
the total size in bytes needed by
the function parameters of the kernel corresponding to
hfunc
.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuParamSeti(CUfunction hfunc, int offset, int value)
cuParamSeti | ( | CUfunction | hfunc, | |
int | offset, | |||
unsigned int | value | |||
) |
Sets an integer parameter that will be specified the next time the
kernel corresponding to hfunc
will be invoked.
offset
is a byte offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuParamSetf(CUfunction hfunc, int offset, float value)
cuParamSetf | ( | CUfunction | hfunc, | |
int | offset, | |||
float | value | |||
) |
Sets a floating-point parameter that will be specified the next time
the kernel corresponding to hfunc
will be invoked.
offset
is a byte offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuParamSetv(CUfunction hfunc, int offset, Pointer ptr, int numbytes)
cuParamSetv | ( | CUfunction | hfunc, | |
int | offset, | |||
void * | ptr, | |||
unsigned int | numbytes | |||
) |
Copies an arbitrary amount of data (specified in numbytes
)
from ptr
into the parameter space of the kernel
corresponding to hfunc
. offset
is a byte
offset.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuParamSetTexRef(CUfunction hfunc, int texunit, CUtexref hTexRef)
cuParamSetTexRef | ( | CUfunction | hfunc, | |
int | texunit, | |||
CUtexref | hTexRef | |||
) |
Makes the CUDA array or linear memory bound to the texture reference
hTexRef
available to a device program as a texture. In
this version of CUDA, the texture-reference must be obtained via
cuModuleGetTexRef() and the texunit
parameter must be set
to CU_PARAM_TR_DEFAULT.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuLaunch(CUfunction f)
cuLaunch | ( | CUfunction | f | ) |
Invokes the kernel f
on a 1 x 1 x 1 grid of blocks. The
block contains the number of threads specified by a previous call to
cuFuncSetBlockShape().
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuLaunchGrid(CUfunction f, int grid_width, int grid_height)
cuLaunchGrid | ( | CUfunction | f, | |
int | grid_width, | |||
int | grid_height | |||
) |
Invokes the kernel f
on a grid_width
x
grid_height
grid of blocks. Each block contains the number
of threads specified by a previous call to cuFuncSetBlockShape().
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGridAsync(jcuda.driver.CUfunction, int, int, jcuda.driver.CUstream)
public static int cuLaunchGridAsync(CUfunction f, int grid_width, int grid_height, CUstream hStream)
cuLaunchGridAsync | ( | CUfunction | f, | |
int | grid_width, | |||
int | grid_height, | |||
CUstream | hStream | |||
) |
Invokes the kernel f
on a grid_width
x
grid_height
grid of blocks. Each block contains the number
of threads specified by a previous call to cuFuncSetBlockShape().
cuLaunchGridAsync() can optionally be associated to a stream by passing
a non-zero hStream
argument.
cuFuncSetBlockShape(jcuda.driver.CUfunction, int, int, int)
,
cuFuncSetSharedSize(jcuda.driver.CUfunction, int)
,
cuFuncGetAttribute(int[], int, jcuda.driver.CUfunction)
,
cuParamSetSize(jcuda.driver.CUfunction, int)
,
cuParamSetf(jcuda.driver.CUfunction, int, float)
,
cuParamSeti(jcuda.driver.CUfunction, int, int)
,
cuParamSetv(jcuda.driver.CUfunction, int, jcuda.Pointer, int)
,
cuParamSetTexRef(jcuda.driver.CUfunction, int, jcuda.driver.CUtexref)
,
cuLaunch(jcuda.driver.CUfunction)
,
cuLaunchGrid(jcuda.driver.CUfunction, int, int)
public static int cuEventCreate(CUevent phEvent, int Flags)
cuEventCreate | ( | CUevent * | phEvent, | |
unsigned int | Flags | |||
) |
Creates an event *phEvent with the flags specified via Flags
.
Valid flags include:
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventRecord(CUevent hEvent, CUstream hStream)
cuEventRecord | ( | CUevent | hEvent, | |
CUstream | hStream | |||
) |
Records an event. If stream
is non-zero, the event is
recorded after all preceding operations in the stream have been
completed; otherwise, it is recorded after all preceding operations in
the CUDA context have been completed. Since operation is asynchronous,
cuEventQuery() and/or cuEventSynchronize() must be used to determine
when the event has actually been recorded.
If cuEventRecord() has previously been called and the event has not been recorded yet, this function returns CUDA_ERROR_INVALID_VALUE.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventQuery(CUevent hEvent)
cuEventQuery | ( | CUevent | hEvent | ) |
Returns CUDA_SUCCESS if the event has actually been recorded, or CUDA_ERROR_NOT_READY if not. If cuEventRecord() has not been called on this event, the function returns CUDA_ERROR_INVALID_VALUE.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventSynchronize(CUevent hEvent)
cuEventSynchronize | ( | CUevent | hEvent | ) |
Waits until the event has actually been recorded. If cuEventRecord() has been called on this event, the function returns CUDA_ERROR_INVALID_VALUE. Waiting for an event that was created with the CU_EVENT_BLOCKING_SYNC flag will cause the calling CPU thread to block until the event has actually been recorded.
If cuEventRecord() has previously been called and the event has not been recorded yet, this function returns CUDA_ERROR_INVALID_VALUE.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventDestroy(CUevent hEvent)
cuEventDestroy | ( | CUevent | hEvent | ) |
Destroys the event specified by event
.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventElapsedTime(float[], jcuda.driver.CUevent, jcuda.driver.CUevent)
public static int cuEventElapsedTime(float[] pMilliseconds, CUevent hStart, CUevent hEnd)
cuEventElapsedTime | ( | float * | pMilliseconds, | |
CUevent | hStart, | |||
CUevent | hEnd | |||
) |
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds). If either event has not been recorded yet, this function returns CUDA_ERROR_NOT_READY. If either event has been recorded with a non-zero stream, the result is undefined.
cuEventCreate(jcuda.driver.CUevent, int)
,
cuEventRecord(jcuda.driver.CUevent, jcuda.driver.CUstream)
,
cuEventQuery(jcuda.driver.CUevent)
,
cuEventSynchronize(jcuda.driver.CUevent)
,
cuEventDestroy(jcuda.driver.CUevent)
public static int cuStreamCreate(CUstream phStream, int Flags)
cuStreamCreate | ( | CUstream * | phStream, | |
unsigned int | Flags | |||
) |
Creates a stream and returns a handle in phStream
.
Flags
is required to be 0.
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamQuery(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuStreamQuery(CUstream hStream)
cuStreamQuery | ( | CUstream | hStream | ) |
Returns CUDA_SUCCESS if all operations in the stream specified by
hStream
have completed, or CUDA_ERROR_NOT_READY if
not.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuStreamSynchronize(CUstream hStream)
cuStreamSynchronize | ( | CUstream | hStream | ) |
Waits until the device has completed all operations in the stream
specified by hStream
. If the context was created with the
CU_CTX_BLOCKING_SYNC flag, the CPU thread will block until the stream
is finished with all of its tasks.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamDestroy(jcuda.driver.CUstream)
,
cuStreamQuery(jcuda.driver.CUstream)
public static int cuStreamDestroy(CUstream hStream)
cuStreamDestroy | ( | CUstream | hStream | ) |
Destroys the stream specified by hStream
.
cuStreamCreate(jcuda.driver.CUstream, int)
,
cuStreamQuery(jcuda.driver.CUstream)
,
cuStreamSynchronize(jcuda.driver.CUstream)
public static int cuGLInit()
cuGLInit | ( | void | ) |
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGLMapBufferObject(jcuda.driver.CUdeviceptr, int[], int)
,
cuGLRegisterBufferObject(int)
,
cuGLUnmapBufferObject(int)
,
cuGLUnregisterBufferObject(int)
,
cuGLMapBufferObjectAsync(jcuda.driver.CUdeviceptr, int[], int, jcuda.driver.CUstream)
,
cuGLUnmapBufferObjectAsync(int, jcuda.driver.CUstream)
,
cuGLSetBufferObjectMapFlags(int, int)
public static int cuGLCtxCreate(CUcontext pCtx, int Flags, CUdevice device)
cuGLCtxCreate | ( | CUcontext * | pCtx, | |
unsigned int | Flags, | |||
CUdevice | device | |||
) |
Creates a new CUDA context, initializes OpenGL interoperability, and
associates the CUDA context with the calling thread. It must be called
before performing any other OpenGL interoperability operations. It may
fail if the needed OpenGL driver facilities are not available. For
usage of the Flags
parameter, see cuCtxCreate().
cuCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGLInit()
,
cuGLMapBufferObject(jcuda.driver.CUdeviceptr, int[], int)
,
cuGLRegisterBufferObject(int)
,
cuGLUnmapBufferObject(int)
,
cuGLUnregisterBufferObject(int)
,
cuGLMapBufferObjectAsync(jcuda.driver.CUdeviceptr, int[], int, jcuda.driver.CUstream)
,
cuGLUnmapBufferObjectAsync(int, jcuda.driver.CUstream)
,
cuGLSetBufferObjectMapFlags(int, int)
public static int cuGraphicsGLRegisterBuffer(CUgraphicsResource pCudaResource, int buffer, int Flags)
cuGraphicsGLRegisterBuffer | ( | CUgraphicsResource * | pCudaResource, | |
GLuint | buffer, | |||
unsigned int | Flags | |||
) |
Registers the buffer object specified by buffer
for access
by CUDA. A handle to the registered object is returned as
pCudaResource
. The map flags Flags
specify
the intended usage, as follows:
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
,
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsGLRegisterImage(CUgraphicsResource pCudaResource, int image, int target, int Flags)
cuGraphicsGLRegisterImage | ( | CUgraphicsResource * | pCudaResource, | |
GLuint | image, | |||
GLenum | target, | |||
unsigned int | Flags | |||
) |
Registers the texture or renderbuffer object specified by
image
for access by CUDA. target
must match
the type of the object. A handle to the registered object is returned
as pCudaResource
. The map flags Flags
specify
the intended usage, as follows:
The following image classes are currently disallowed:
cuGLCtxCreate(jcuda.driver.CUcontext, int, jcuda.driver.CUdevice)
,
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
,
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsSubResourceGetMappedArray(jcuda.driver.CUarray, jcuda.driver.CUgraphicsResource, int, int)
public static int cuGLRegisterBufferObject(int bufferobj)
cuGLRegisterBufferObject | ( | GLuint | buffer | ) |
buffer
for access
by CUDA. This function must be called before CUDA can map the buffer
object. There must be a valid OpenGL context bound to the current
thread when this function is called, and the buffer name is resolved
by that context.
cuGraphicsGLRegisterBuffer(jcuda.driver.CUgraphicsResource, int, int)
public static int cuGLMapBufferObject(CUdeviceptr dptr, int[] size, int bufferobj)
cuGLMapBufferObject | ( | CUdeviceptr * | dptr, | |
unsigned int * | size, | |||
GLuint | buffer | |||
) |
buffer
into the
address space of the current CUDA context and returns in
*dptr
and *size
the base pointer and size of
the resulting mapping.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
All streams in the current CUDA context are synchronized with the current GL context.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnmapBufferObject(int bufferobj)
cuGLUnmapBufferObject | ( | GLuint | buffer | ) |
buffer
for access
by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
All streams in the current CUDA context are synchronized with the current GL context.
cuGraphicsUnmapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnregisterBufferObject(int bufferobj)
cuGLUnregisterBufferObject | ( | GLuint | buffer | ) |
buffer
. This
releases any resources associated with the registered buffer. After
this call, the buffer may no longer be mapped for access by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
cuGraphicsUnregisterResource(jcuda.driver.CUgraphicsResource)
public static int cuGLSetBufferObjectMapFlags(int buffer, int Flags)
cuGLSetBufferObjectMapFlags | ( | GLuint | buffer, | |
unsigned int | Flags | |||
) |
buffer
.
Changes to Flags
will take effect the next time
buffer
is mapped. The Flags
argument may be
any of the following:
If buffer
has not been registered for use with CUDA, then
CUDA_ERROR_INVALID_HANDLE is returned. If buffer
is
presently mapped for access by CUDA, then CUDA_ERROR_ALREADY_MAPPED is
returned.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
cuGraphicsResourceSetMapFlags(jcuda.driver.CUgraphicsResource, int)
public static int cuGLMapBufferObjectAsync(CUdeviceptr dptr, int[] size, int buffer, CUstream hStream)
cuGLMapBufferObjectAsync | ( | CUdeviceptr * | dptr, | |
unsigned int * | size, | |||
GLuint | buffer, | |||
CUstream | hStream | |||
) |
buffer
into the
address space of the current CUDA context and returns in
*dptr
and *size
the base pointer and size of
the resulting mapping.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
Stream hStream
in the current CUDA context is synchronized
with the current GL context.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGLUnmapBufferObjectAsync(int buffer, CUstream hStream)
cuGLUnmapBufferObjectAsync | ( | GLuint | buffer, | |
CUstream | hStream | |||
) |
buffer
for access
by CUDA.
There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context, or a member of the same shareGroup, as the context that was bound when the buffer was registered.
Stream hStream
in the current CUDA context is synchronized
with the current GL context.
cuGraphicsUnmapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGraphicsUnregisterResource(CUgraphicsResource resource)
cuGraphicsUnregisterResource | ( | CUgraphicsResource | resource | ) |
Unregisters the graphics resource resource
so it is not
accessible by CUDA unless registered again.
If resource
is invalid then CUDA_ERROR_INVALID_HANDLE is
returned.
cuGraphicsGLRegisterBuffer(jcuda.driver.CUgraphicsResource, int, int)
,
cuGraphicsGLRegisterImage(jcuda.driver.CUgraphicsResource, int, int, int)
public static int cuGraphicsSubResourceGetMappedArray(CUarray pArray, CUgraphicsResource resource, int arrayIndex, int mipLevel)
cuGraphicsSubResourceGetMappedArray | ( | CUarray * | pArray, | |
CUgraphicsResource | resource, | |||
unsigned int | arrayIndex, | |||
unsigned int | mipLevel | |||
) |
Returns in *pArray
an array through which the subresource
of the mapped graphics resource resource
which corresponds
to array index arrayIndex
and mipmap level
mipLevel
may be accessed. The value set in
*pArray
may change every time that resource
is mapped.
If resource
is not a texture then it cannot be accessed
via an array and CUDA_ERROR_NOT_MAPPED_AS_ARRAY is returned. If
arrayIndex
is not a valid array index for
resource
then CUDA_ERROR_INVALID_VALUE is returned. If
mipLevel
is not a valid mipmap level for resource
then CUDA_ERROR_INVALID_VALUE is returned. If resource
is
not mapped then CUDA_ERROR_NOT_MAPPED is returned.
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsResourceGetMappedPointer(CUdeviceptr pDevPtr, int[] pSize, CUgraphicsResource resource)
cuGraphicsResourceGetMappedPointer | ( | CUdeviceptr * | pDevPtr, | |
unsigned int * | pSize, | |||
CUgraphicsResource | resource | |||
) |
Returns in *pDevPtr
a pointer through which the mapped
graphics resource resource
may be accessed. Returns in
pSize
the size of the memory in bytes which may be accessed
from that pointer. The value set in pPointer
may change
every time that resource
is mapped.
If resource
is not a buffer then it cannot be accessed
via a pointer and CUDA_ERROR_NOT_MAPPED_AS_POINTER is returned. If
resource
is not mapped then CUDA_ERROR_NOT_MAPPED is
returned. *
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
,
cuGraphicsSubResourceGetMappedArray(jcuda.driver.CUarray, jcuda.driver.CUgraphicsResource, int, int)
public static int cuGraphicsResourceSetMapFlags(CUgraphicsResource resource, int flags)
cuGraphicsResourceSetMapFlags | ( | CUgraphicsResource | resource, | |
unsigned int | flags | |||
) |
Set flags
for mapping the graphics resource
resource
.
Changes to flags
will take effect the next time
resource
is mapped. The flags
argument may
be any of the following:
If resource
is presently mapped for access by CUDA then
CUDA_ERROR_ALREADY_MAPPED is returned. If flags
is not
one of the above values then CUDA_ERROR_INVALID_VALUE is returned.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuGraphicsMapResources(int count, CUgraphicsResource[] resources, CUstream hStream)
cuGraphicsMapResources | ( | unsigned int | count, | |
CUgraphicsResource * | resources, | |||
CUstream | hStream | |||
) |
Maps the count
graphics resources in resources
for access by CUDA.
The resources in resources
may be accessed by CUDA until
they are unmapped. The graphics API from which resources
were registered should not access any resources while they are mapped
by CUDA. If an application does so, the results are undefined.
This function provides the synchronization guarantee that any graphics
calls issued before cuGraphicsMapResources() will complete before any
subsequent CUDA work issued in stream
begins.
If resources
includes any duplicate entries then
CUDA_ERROR_INVALID_HANDLE is returned. If any of resources
are presently mapped for access by CUDA then CUDA_ERROR_ALREADY_MAPPED
is returned.
cuGraphicsResourceGetMappedPointer(jcuda.driver.CUdeviceptr, int[], jcuda.driver.CUgraphicsResource)
public static int cuGraphicsUnmapResources(int count, CUgraphicsResource[] resources, CUstream hStream)
cuGraphicsUnmapResources | ( | unsigned int | count, | |
CUgraphicsResource * | resources, | |||
CUstream | hStream | |||
) |
Unmaps the count
graphics resources in
resources
.
Once unmapped, the resources in resources
may not be
accessed by CUDA until they are mapped again.
This function provides the synchronization guarantee that any CUDA work
issued in stream
before cuGraphicsUnmapResources() will
complete before any subsequently issued graphics work begins.
If resources
includes any duplicate entries then
CUDA_ERROR_INVALID_HANDLE is returned. If any of resources
are not presently mapped for access by CUDA then CUDA_ERROR_NOT_MAPPED
is returned.
cuGraphicsMapResources(int, jcuda.driver.CUgraphicsResource[], jcuda.driver.CUstream)
public static int cuCtxSetLimit(int limit, long value)
cuCtxSetLimit | ( | CUlimit | limit, | |
size_t | value | |||
) |
Setting limit
to value
is a request by the
application to update the current limit maintained by the context. The
driver is free to modify the requested value to meet h/w requirements
(this could be clamping to minimum or maximum values, rounding up to
nearest element size, etc). The application can use cuCtxGetLimit() to
find out exactly what the limit has been set to.
Setting each CUlimit has its own specific restrictions, so each is discussed here.
cuCtxGetLimit(long[], int)
public static int cuCtxGetLimit(long[] pvalue, int limit)
cuCtxGetLimit | ( | size_t * | pvalue, | |
CUlimit | limit | |||
) |
Returns in *pvalue
the current size of limit
.
The supported CUlimit values are:
cuCtxSetLimit(int, long)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |