GpuAccelerationConfig.GpuInferencePriority

  • GpuAccelerationConfig.GpuInferencePriority defines relative priorities for GPU delegate client needs in inference.

  • Ordered priorities determine decisions made by the inference engine.

  • The GPU_PRIORITY_AUTO option can only be used when higher priorities are fully specified.

  • Specific enum values include GPU_PRIORITY_AUTO, GPU_PRIORITY_MAX_PRECISION, GPU_PRIORITY_MIN_LATENCY, and GPU_PRIORITY_MIN_MEMORY_USAGE.

public static final enum GpuAccelerationConfig.GpuInferencePriority extends Enum<GpuAccelerationConfig.GpuInferencePriority>

Relative priorities given by the GPU delegate to different client needs. Ordered priorities provide better control over desired semantics, where priority(n) is more important than priority(n+1), therefore, each time inference engine needs to make a decision, it uses ordered priorities to do so.

For example: GPU_PRIORITY_MAX_PRECISION at priority(1) would not allow to decrease precision, but moving it to priority(2) or priority(3) would result in F16 calculation.

GPU_PRIORITY_AUTO can only be used when higher priorities are fully specified.

Inherited Method Summary

Enum Values

public static final GpuAccelerationConfig.GpuInferencePriority GPU_PRIORITY_AUTO

Auto GPU priority.

public static final GpuAccelerationConfig.GpuInferencePriority GPU_PRIORITY_MAX_PRECISION

Maximum precision GPU priority.

public static final GpuAccelerationConfig.GpuInferencePriority GPU_PRIORITY_MIN_LATENCY

Minimum latency GPU priority.

public static final GpuAccelerationConfig.GpuInferencePriority GPU_PRIORITY_MIN_MEMORY_USAGE

Minimum memory usage GPU priority.