rknnlite

package module
v0.0.0-...-81b8ecc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2025 License: Apache-2.0 Imports: 10 Imported by: 0

README

go-rknnlite

go-rknnlite-logo.jpg

go-rknnlite provides Go language bindings for the RKNN Toolkit2 C API interface. It aims to provide lite bindings in the spirit of the closed source Python lite bindings used for running AI Inference models on the Rockchip NPU via the RKNN software stack.

These bindings are tested on the RK3588 (specifically the Radxa Rock Pi 5B) and work with other RK3588 based SBC's.

Other users have reported running the bindings and examples on other models in the RK35xx series supported by the RKNN Toolkit2.

Usage

To use in your Go project, get the library.

go get github.com/swdee/go-rknnlite

Or to try the examples clone the git code and data repositories.

git clone https://github.com/swdee/go-rknnlite.git
cd go-rknnlite/example
git clone https://github.com/swdee/go-rknnlite-data.git data

Then refer to the Readme files for each example to run on command line.

Dependencies

The rknn-toolkit2 must be installed on your system with C header files available in the system path, eg: /usr/include/rknn_api.h.

Refer to the official documentation on how to install this on your system as it will vary based on OS and SBC vendor.

Rock Pi 5B

My usage was on the Radxa Rock Pi 5B running the official Debian 11 OS image which has the rknpu2 driver already installed.

To my knowledge Armbian and Joshua's Ubuntu OS images also have the driver installed for the support SBC's.

You can test if your OS has the driver installed with.

dmesg | grep -i rknpu

The output should list the driver and indicate the NPU is initialized.

[    5.130935] [drm] Initialized rknpu 0.8.2 20220829 for fdab0000.npu on minor 1
GoCV

The examples make use of GoCV for image processing. Make sure you have a working installation of GoCV first, see the instructions in the link for installation on your system.

Examples

See the example directory.

Pooled Runtimes

Running multiple Runtimes in a Pool allows you to take advantage of all three NPU cores. For our usage of an EfficentNet-Lite0 model, a single runtime has an inference speed of 7.9ms per image, however running a Pool of 9 runtimes brings the average inference speed down to 1.65ms per image.

See the Pool example.

Other Rockchip Models

For other Rockchip models such as the RK3566 which features a single NPU core, initialise the Runtime with the rknnlite.NPUSkipSetCore flag as follows.

rt, err := rknnlite.NewRuntime(*modelFile, rknnlite.NPUSkipSetCore)

CPU Affinity

The performance of the NPU is effected by which CPU cores your program runs on, so to achieve maximum performance we need to set the CPU Affinity.

The RK3588 for example has 4 fast Cortex-A76 cores at 2.4Ghz and 4 efficient Cortex-A55 cores at 1.8Ghz. By default your Go program will run across all cores which effects performance, instead set the CPU Affinity to run on the fast Cortex-A76 cores only.

// set CPU affinity
err = rknnlite.SetCPUAffinity(rknnlite.RK3588FastCores)
	
if err != nil {
	log.Printf("Failed to set CPU Affinity: %v\n", err)
}

Constants have been set for RK3588 and RK3582 processors, for other CPU's you can define the core mask.

Core Mask

To create the core mask value we will use the RK3588 as an example which has CPU cores 0-3 as the slow A55 cores and cores 4-7 being the fast A76 cores.

You can use the provided convenience function to calculate the mask for cores 4-7.

mask := rknnlite.CPUCoreMask([]int{4,5,6,7})

PreProcessing

Convenience functions exist for handling preprocessing of images to run inference on.

The preprocess.Resizer provides functions for handling resizing and scaling of input images to the target size needed for inference input tensors. It will maintain aspect ratio by scaling and applying any needed letterbox padding to the source image.

// load source image file
img := gocv.IMRead(filename, gocv.IMReadColor)

if img.Empty() {
		log.Fatal("Error reading image from: ", *imgFile)
}

// convert colorspace from GoCV's BGR to RGB as most models have been trained
// using RGB data 
rgbImg := gocv.NewMat()
gocv.CvtColor(img, &rgbImg, gocv.ColorBGRToRGB)

// create new resizer setting the source image size and input tensor sizes
resizer := preprocess.NewResizer(img.Cols(), img.Rows(),
  int(inputAttrs[0].Dims[1]), int(inputAttrs[0].Dims[2]))

// resize image
resizedImg := gocv.NewMat()
resizer.LetterBoxResize(rgbImg, &resizedImg, render.Black)

For Object Detection and Instance Segmentation the Resizer is required so image mask sizes can be correctly calculated and scaled back for applying as an overlay on the source image.

Renderer

The render package provides convenience functions for drawing the bounding box around objects or segmentation mask/outline.

Post Processing

If a Model (ie: specific YOLO version) is not yet supported, a post processor could be written to handle the outputs from the RKNN engine in the same manner the YOLOv5 code has been created.

Notice

This code is being used in production for Image Classification. Over time it will be expanded on to support more features such as Object Detection using YOLO. The addition of new features may cause changes or breakages in the API between commits due to the early nature of how this library evolves.

Ensure you use Go Modules so your code is not effected, but be aware any updates may require minor changes to your code to support the latest version.

Versioning of the library will be added at a later date once the feature set stablises.

See the CHANGES file for a list of breaking changes.

Reference Material

Documentation

Overview

go-rknnlite provides Go language bindings for the RKNN Toolkit2 C API interface. It aims to provide lite bindings in the spirit of the closed source Python lite bindings used for running AI Inference models on the Rockchip NPU via the RKNN software stack.

These bindings have only been tested on the RK3588 (specifically the Radxa Rock Pi 5B) and work with other RK3588 based SBC's.

Other users have reported running the bindings and examples on other models in the RK35xx series supported by the RKNN Toolkit2.

See example code and usage in the examples subdirectory.

Index

Constants

View Source
const (
	// RK3588FastCores is the cpu affinity mask of the fast cortex A76 cores 4-7
	RK3588FastCores = uintptr(0b11110000)
	// RK3588SlowCores is the cpu affinity mask of the efficient cortex A55 cores 0-3
	RK3588SlowCores = uintptr(0b00001111)
	// RK3588Allcores is the cpu affinity mask for all cortext A76 and A55 cores 0-7
	RK3588AllCores = uintptr(0b11111111)

	// RK3582FastCores is the cpu affinity mask of the fast cortex A76 cores 4-5
	RK3582FastCores = uintptr(0b00110000)
	// RK3582SlowCores is the cpu affinity mask of the efficient cortex A55 cores 0-3
	RK3582SlowCores = uintptr(0b00001111)
	// RK3582Allcores is the cpu affinity mask for all cortext A76 and A55 cores 0-5
	RK3582AllCores = uintptr(0b00111111)
)
View Source
const MAX_TOP_NUM = 20

Variables

View Source
var (
	// A list of Rockchip models and the NPU core masks used for each.
	// These are provided for passing to NewPool() to define which NPU
	// cores to pin the Model and Runtime too.
	RK3588 = []CoreMask{NPUCore0, NPUCore1, NPUCore2}
	RK3582 = []CoreMask{NPUCore0, NPUCore1, NPUCore2}
	RK3576 = []CoreMask{NPUCore0, NPUCore1}
	RK3568 = []CoreMask{NPUCore0}
	RK3566 = []CoreMask{NPUCore0}
	RK3562 = []CoreMask{NPUCore0}
)

Functions

func CPUCoreMask

func CPUCoreMask(cores []int) uintptr

CPUCoreMask calculates the core mask by passing in the CPU core numbers as a slice, eg: []int{4,5,6,7}

func GetCPUAffinity

func GetCPUAffinity() (uintptr, error)

GetCPUAffinity gets the current CPU Affinity mask the program is running on

func GetTop

func GetTop(pfProb []float32, pfMaxProb []float32, pMaxClass []int32,
	outputCount int32, topNum int32) int

GetTop takes outputs and produces a top list of matches by probability

func LoadLabels

func LoadLabels(file string) ([]string, error)

LoadLabels reads the labels used to train the Model from the given text file. It should contain one label per line.

func SetCPUAffinity

func SetCPUAffinity(mask uintptr) error

SetCPUAffinity sets the CPU Affinity mask of the program to run on the specified cores

Types

type AttrMaxDimensions

type AttrMaxDimensions int

AttrMaxDimensions are the maximum dimensions for an attribute in a tensor

maximum field lengths of attributes in a tensor

type CoreMask

type CoreMask int

CoreMask wraps C.rknn_core_mask

const (
	NPUCoreAuto    CoreMask = C.RKNN_NPU_CORE_AUTO
	NPUCore0       CoreMask = C.RKNN_NPU_CORE_0
	NPUCore1       CoreMask = C.RKNN_NPU_CORE_1
	NPUCore2       CoreMask = C.RKNN_NPU_CORE_2
	NPUCore01      CoreMask = C.RKNN_NPU_CORE_0_1
	NPUCore012     CoreMask = C.RKNN_NPU_CORE_0_1_2
	NPUSkipSetCore CoreMask = 9999
)

rknn_core_mask values used to target which cores on the NPU the model is run on. The rk3588 has three cores, auto will pick an idle core to run the model on, whilst the others specify the specific core or combined number of cores to run. For multi-core modes the following ops have better acceleration: Conv, DepthwiseConvolution, Add, Concat, Relu, Clip, Relu6, ThresholdedRelu, Prelu, and LeakyRelu. Other type of ops will fallback to Core0 to continue running

type ErrorCodes

type ErrorCodes int

ErrorCodes

error code values returned by the C API

func (ErrorCodes) String

func (e ErrorCodes) String() string

String returns a readable description of the error code

type IONumber

type IONumber struct {
	NumberInput  uint32
	NumberOutput uint32
}

IONumber represents the C.rknn_input_output_num struct

type Input

type Input struct {
	// Index is the input index
	Index uint32
	// Buf is the gocv Mat input
	Buf unsafe.Pointer
	// Size is the number of bytes of Buf
	Size uint32
	// Passthrough defines the mode, if True the buf data is passed directly to
	// the input node of the rknn model without any conversion.  If False the
	// buf data is converted into an input consistent with the model according
	// to the following type and fmt
	PassThrough bool
	// Type is the data type of Buf. This is a required parameter if Passthrough
	// is False
	Type TensorType
	// Fmt is the data format of Buf.  This is a required parameter if Passthrough
	// is False
	Fmt TensorFormat
}

Input represents the C.rknn_input struct and defines the Input used for inference

type InputAttribute

type InputAttribute struct {
	Width   uint32
	Height  uint32
	Channel uint32
}

InputAttribute of trained model input tensor

type Output

type Output struct {
	WantFloat  uint8  // want transfer output data to float
	IsPrealloc uint8  // whether buf is pre-allocated
	Index      uint32 // the output index
	// the output buf cast to float32, when WantFloat = 1
	// this is a slice header that points to C memory
	BufFloat []float32
	// the output buf cast to int8, when WantFloat = 0
	// this is a slice header that points to C memory
	BufInt []int8
	Size   uint32 // the size of output buf
}

Output wraps C.rknn_output

type OutputAttribute

type OutputAttribute struct {
	DimForDFL  uint32
	Scales     []float32
	ZPs        []int32
	DimHeights []uint32
	DimWidths  []uint32
	IONumber   uint32
}

OutputAttribute of trained model output tensor

type Outputs

type Outputs struct {
	Output []Output

	// mutex to lock access to freed variable
	sync.Mutex
	// contains filtered or unexported fields
}

Outputs is a struct containing Go and C output data

func (*Outputs) Free

func (o *Outputs) Free() error

Free C memory buffer holding RKNN inference outputs

func (*Outputs) InputAttributes

func (o *Outputs) InputAttributes() InputAttribute

InputAttributes queries the Model and returns Input image dimensions

func (*Outputs) OutputAttributes

func (o *Outputs) OutputAttributes() OutputAttribute

OutputAttributes returns the Model output attribute scales and zero points

type Pool

type Pool struct {
	// contains filtered or unexported fields
}

Pool is a simple runtime pool to open multiple of the same Model across all NPU cores

func NewPool

func NewPool(size int, modelFile string, cores []CoreMask) (*Pool, error)

NewPool creates a new runtime pool that pins the runtimes to the specified NPU cores. You can use the variables RK3588, RK3582, RK3576, RK3568, RK3566, RK3562 for the CoreMask array, or create your own, eg: []CoreMask{NPUCore0, NPUCore1}

func (*Pool) Close

func (p *Pool) Close()

Close the pool and all runtimes in it

func (*Pool) Get

func (p *Pool) Get() *Runtime

Gets a runtime from the pool

func (*Pool) Return

func (p *Pool) Return(runtime *Runtime)

Return a runtime to the pool

func (*Pool) SetWantFloat

func (p *Pool) SetWantFloat(val bool)

SetWantFloat defines if the Model load requires Output tensors to be converted to float32 for post processing, or left as quantitized int8

type Probability

type Probability struct {
	LabelIndex  int32
	Probability float32
}

func GetTop5

func GetTop5(outputs []Output) []Probability

GetTop5 outputs the Top5 matches in the model, with left column as label index and right column the match probability. The results are returned in the Probability slice in descending order from top match.

type Runtime

type Runtime struct {
	// contains filtered or unexported fields
}

Runtime defines the RKNN run time instance

func NewRuntime

func NewRuntime(modelFile string, core CoreMask) (*Runtime, error)

NewRuntime returns a RKNN run time instance. Provide the full path and filename of the RKNN compiled model file to run.

func (*Runtime) Close

func (r *Runtime) Close() error

Close wraps C.rknn_destroy which unloads the RKNN model from the runtime and destroys the context releasing all C resources

func (*Runtime) GetOutputs

func (r *Runtime) GetOutputs(nOutputs uint32, wantFloat bool) (*Outputs, error)

GetOutputs returns the Output results

func (*Runtime) Inference

func (r *Runtime) Inference(mats []gocv.Mat) (*Outputs, error)

Inference runs the model inference on the given inputs

func (*Runtime) InputAttrs

func (r *Runtime) InputAttrs() []TensorAttr

InputAttrs returns the loaded model's input tensor attributes

func (*Runtime) OutputAttrs

func (r *Runtime) OutputAttrs() []TensorAttr

OutputAttrs returns the loaded model's output tensor attributes

func (*Runtime) Query

func (r *Runtime) Query(w io.Writer) error

Query the runtime and loaded model to get input and output tensor information as well as SDK version in text/human readable format

func (*Runtime) QueryInputTensors

func (r *Runtime) QueryInputTensors() ([]TensorAttr, error)

QueryInputTensors gets the model Input Tensor attributes

func (*Runtime) QueryModelIONumber

func (r *Runtime) QueryModelIONumber() (ioNum IONumber, err error)

QueryModelIONumber queries the number of Input and Output tensors of the model

func (*Runtime) QueryOutputTensors

func (r *Runtime) QueryOutputTensors() ([]TensorAttr, error)

QueryOutputTensors gets the model Output Tensor attributes

func (*Runtime) RunModel

func (r *Runtime) RunModel() error

RunModel wraps C.rknn_run

func (*Runtime) SDKVersion

func (r *Runtime) SDKVersion() (SDKVersion, error)

SDKVersion returns the RKNN API and Driver versions

func (*Runtime) SetInputTypeFloat32

func (r *Runtime) SetInputTypeFloat32(val bool)

SetInputTypeFloat32 defines if the Model requires the Inference() function to pass the gocv.Mat's as float32 data to RKNN backend. Setting this overrides the default behaviour to pass gocv.Mat data as Uint8

func (*Runtime) SetInputs

func (r *Runtime) SetInputs(inputs []Input) error

setInputs wraps C.rknn_inputs_set

func (*Runtime) SetWantFloat

func (r *Runtime) SetWantFloat(val bool)

SetWantFloat defines if the Model load requires Output tensors to be converted to float32 for post processing, or left as quantitized int8

type SDKVersion

type SDKVersion struct {
	DriverVersion string
	APIVersion    string
}

SDKVersion represents the C.rknn_sdk_version struct

type TensorAttr

type TensorAttr struct {
	Index          uint32
	NDims          uint32
	Dims           [AttrMaxDimension]uint32
	Name           string
	NElems         uint32
	Size           uint32
	Fmt            TensorFormat
	Type           TensorType
	QntType        TensorQntType
	FL             int8
	ZP             int32
	Scale          float32
	WStride        uint32
	SizeWithStride uint32
	PassThrough    bool
	HStride        uint32
}

TensorAttr represents the C.rknn_tensor_attr structure

func (TensorAttr) String

func (a TensorAttr) String() string

String returns the TensorAttr's attributes formatted as a string

type TensorFormat

type TensorFormat int

TensorFormat wraps C.rknn_tensor_format

const (
	TensorNCHW      TensorFormat = C.RKNN_TENSOR_NCHW
	TensorNHWC      TensorFormat = C.RKNN_TENSOR_NHWC
	TensorNC1HWC2   TensorFormat = C.RKNN_TENSOR_NC1HWC2
	TensorUndefined TensorFormat = C.RKNN_TENSOR_UNDEFINED
)

func (TensorFormat) String

func (t TensorFormat) String() string

String returns a readable description of the TensorFormat

type TensorQntType

type TensorQntType int

TensorQntType wraps C.rknn_tensor_qnt_type

func (TensorQntType) String

func (t TensorQntType) String() string

String returns a readable description of the TensorQntType

type TensorType

type TensorType int

TensorType wraps C.rknn_tensor_type

func (TensorType) String

func (t TensorType) String() string

String returns a readable description of the TensorType

Directories

Path Synopsis
example
alpr command
Example code showing how to perform Automatic License Plate Recognition (ALPR) using a License Plate Detection YOLOv8n and LPRNet model
Example code showing how to perform Automatic License Plate Recognition (ALPR) using a License Plate Detection YOLOv8n and LPRNet model
lprnet command
Example code showing how to perform inferencing using a LPRnet model
Example code showing how to perform inferencing using a LPRnet model
mobilenet command
Example code showing how to perform inferencing using a MobileNetv1 model.
Example code showing how to perform inferencing using a MobileNetv1 model.
pool command
Running multiple Runtimes in a Pool allows you to take advantage of all three NPU cores to significantly reduce average inferencing time.
Running multiple Runtimes in a Pool allows you to take advantage of all three NPU cores to significantly reduce average inferencing time.
ppocr command
Example code showing how to perform OCR on an image using PaddleOCR recognition
Example code showing how to perform OCR on an image using PaddleOCR recognition
retinaface command
Example code showing how to perform inferencing using a Retina Face model.
Example code showing how to perform inferencing using a Retina Face model.
stream command
yolov10 command
Example code showing how to perform object detection using a YOLOv10 model.
Example code showing how to perform object detection using a YOLOv10 model.
yolov11 command
Example code showing how to perform object detection using a YOLOv10 model.
Example code showing how to perform object detection using a YOLOv10 model.
yolov5 command
Example code showing how to perform object detection using a YOLOv5 model.
Example code showing how to perform object detection using a YOLOv5 model.
yolov5-seg command
yolov8 command
Example code showing how to perform object detection using a YOLOv8 model.
Example code showing how to perform object detection using a YOLOv8 model.
yolov8-obb command
Example code showing how to perform oriented bounding box object detection using a YOLOv8 model.
Example code showing how to perform oriented bounding box object detection using a YOLOv8 model.
yolov8-pose command
Example code showing how to perform pose using a YOLOv8 model.
Example code showing how to perform pose using a YOLOv8 model.
yolov8-seg command
yolox command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL