Analysis of HAL Method Call Flow in OpenCV

There are some so-called HAL(Hardware Acceleration Layer) implementations in OpenCV that seem to be hardware related by name, but they are not. It can be understood that a faster version of OCV implementation is better.What this article does is to find out its implementation or cut into the process to get through the whole function call logic.This paper will analyze resize and GaussianBlur functions.

resize

Firstly, locate the imgproc.hpp file of the imgproc module and find the CV_EXPORTS_W void resize (InputArray src, OutputArray dst, Size dsize, double FX = 0, double FY = 0, int interpolation = INTER_LINEAR); method.Because we use the header file externally, that is, the function of the header file is the entry function we use.OCV implementations have many branches that are difficult to determine at a glance, so it is convenient for us to find them from the entrance.Then jump to the implementation of the function, if IDE does not support it, you can search for the same function declaration function in the corresponding resize.cpp, which is the implementation of the corresponding function, as follows:

void cv::resize( InputArray _src, OutputArray _dst, Size dsize,
                 double inv_scale_x, double inv_scale_y, int interpolation )
{
    CV_INSTRUMENT_REGION();

    Size ssize = _src.size();

    CV_Assert( !ssize.empty() );
    if( dsize.empty() )
    {
        CV_Assert(inv_scale_x > 0); CV_Assert(inv_scale_y > 0);
        dsize = Size(saturate_cast<int>(ssize.width*inv_scale_x),
                     saturate_cast<int>(ssize.height*inv_scale_y));
        CV_Assert( !dsize.empty() );
    }
    else
    {
        inv_scale_x = (double)dsize.width/ssize.width;
        inv_scale_y = (double)dsize.height/ssize.height;
        CV_Assert(inv_scale_x > 0); CV_Assert(inv_scale_y > 0);
    }

    if (interpolation == INTER_LINEAR_EXACT && (_src.depth() == CV_32F || _src.depth() == CV_64F))
        interpolation = INTER_LINEAR; // If depth isn't supported fallback to generic resize

    CV_OCL_RUN(_src.dims() <= 2 && _dst.isUMat() && _src.cols() > 10 && _src.rows() > 10,
               ocl_resize(_src, _dst, dsize, inv_scale_x, inv_scale_y, interpolation))

    Mat src = _src.getMat();
    _dst.create(dsize, src.type());
    Mat dst = _dst.getMat();

    if (dsize == ssize)
    {
        // Source and destination are of same size. Use simple copy.
        src.copyTo(dst);
        return;
    }

    hal::resize(src.type(), src.data, src.step, src.cols, src.rows, dst.data, dst.step, dst.cols, dst.rows, inv_scale_x, inv_scale_y, interpolation);
}

We see three things done with this function implementation:

  1. Parameter Check
  2. Detect if OpenCL is supported or enabled
  3. Implemented using the resize function of hal space

Jump to the hal implementation, also in resize.cpp, with some code:

namespace hal {

void resize(int src_type,
            const uchar * src_data, size_t src_step, int src_width, int src_height,
            uchar * dst_data, size_t dst_step, int dst_width, int dst_height,
            double inv_scale_x, double inv_scale_y, int interpolation)
{
    CV_INSTRUMENT_REGION();

    CV_Assert((dst_width > 0 && dst_height > 0) || (inv_scale_x > 0 && inv_scale_y > 0));
    if (inv_scale_x < DBL_EPSILON || inv_scale_y < DBL_EPSILON)
    {
        inv_scale_x = static_cast<double>(dst_width) / src_width;
        inv_scale_y = static_cast<double>(dst_height) / src_height;
    }

    CALL_HAL(resize, cv_hal_resize, src_type, src_data, src_step, src_width, src_height, dst_data, dst_step, dst_width, dst_height, inv_scale_x, inv_scale_y, interpolation);
    //The rest of the code is a regular implementation

Then we see here a macro like CALL_HAL, jumping to its implementation at hal_replacement.hpp,

#define CALL_HAL(name, fun, ...) \
    int res = __CV_EXPAND(fun(__VA_ARGS__)); \
    if (res == CV_HAL_ERROR_OK) \
        return; \
    else if (res != CV_HAL_ERROR_NOT_IMPLEMENTED) \
        CV_Error_(cv::Error::StsInternal, \
            ("HAL implementation " CVAUX_STR(name) " ==> " CVAUX_STR(fun) " returned %d (0x%08x)", res, res));

We can see that it actually calls the fun function, and if it returns CV_HAL_ERROR_OK, it returns, apparently hal::resize will also return; otherwise, it calls CV_Error_, which does not cause the function to end or terminate the whole function directly like a program exception, and we will talk more about it later.The result, however, is that hal::resize will continue to execute, and here's the general implementation, which won't return in this macro.
Then we found in hal_replacement.hpp that cv_hal_resize was defined as

#define cv_hal_resize hal_ni_resize

Then continue to find the implementation of hal_ni_resize as

inline int hal_ni_resize(int src_type, const uchar *src_data, size_t src_step, int src_width, int src_height, uchar *dst_data, size_t dst_step, int dst_width, int dst_height, double inv_scale_x, double inv_scale_y, int interpolation) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }

Here we find that the function returns CV_HAL_ERROR_NOT_IMPLEMENTED directly, and hal::resize proceeds as analyzed above.So how does Hal come in?
We found that the CALL_HAL macro in hal_replacement.hpp has a sentence #include "custom_hal.hpp". Curiously, include doesn't always start?Then we look at this custom_hal.cpp and find that it only has one sentence #include "carotene/tegra_hal.hpp", and we'll keep tracking.Because the previously analyzed function is hal_ni_resize, findhal_ni_resize directly, with no results.Then we findcv_hal_resize and find that:

#undef cv_hal_resize
#define cv_hal_resize TEGRA_RESIZE

We know that in hal_replacement.hpp, #define cv_hal_resize hal_ni_resize is #defined cv_hal, resize hal_ni_resize, and from the location of the file, this def will be removed by undef, then redefined as TEGRA_RESIZE, find it, found its definition:

#define TEGRA_RESIZE(src_type, src_data, src_step, src_width, src_height, dst_data, dst_step, dst_width, dst_height, inv_scale_x, inv_scale_y, interpolation) \
( \
    interpolation == CV_HAL_INTER_LINEAR ? \
        CV_MAT_DEPTH(src_type) == CV_8U && CAROTENE_NS::isResizeLinearOpenCVSupported(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), ((src_type >> CV_CN_SHIFT) + 1)) && \
        inv_scale_x > 0 && inv_scale_y > 0 && \
        (dst_width - 0.5)/inv_scale_x - 0.5 < src_width && (dst_height - 0.5)/inv_scale_y - 0.5 < src_height && \
        (dst_width + 0.5)/inv_scale_x + 0.5 >= src_width && (dst_height + 0.5)/inv_scale_y + 0.5 >= src_height && \
        std::abs(dst_width / inv_scale_x - src_width) < 0.1 && std::abs(dst_height / inv_scale_y - src_height) < 0.1 ? \
            CAROTENE_NS::resizeLinearOpenCV(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), \
                                            src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)), \
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED : \
    interpolation == CV_HAL_INTER_AREA ? \
        CV_MAT_DEPTH(src_type) == CV_8U && CAROTENE_NS::isResizeAreaSupported(1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)) && \
        std::abs(dst_width / inv_scale_x - src_width) < 0.1 && std::abs(dst_height / inv_scale_y - src_height) < 0.1 ? \
            CAROTENE_NS::resizeAreaOpenCV(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), \
                                          src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, ((src_type >> CV_CN_SHIFT) + 1)), \
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED : \
    /*nearest neighbour interpolation disabled due to rounding accuracy issues*/ \
    /*interpolation == CV_HAL_INTER_NEAREST ? \
        (src_type == CV_8UC1 || src_type == CV_8SC1) && CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 1) ? \
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), \
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 1), \
            CV_HAL_ERROR_OK : \
        (src_type == CV_8UC3 || src_type == CV_8SC3) && CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 3) ? \
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), \
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 3), \
            CV_HAL_ERROR_OK : \
        (src_type == CV_8UC4 || src_type == CV_8SC4 || src_type == CV_16UC2 || src_type == CV_16SC2 || src_type == CV_32SC1) && \
        CAROTENE_NS::isResizeNearestNeighborSupported(CAROTENE_NS::Size2D(src_width, src_height), 4) ? \
            CAROTENE_NS::resizeNearestNeighbor(CAROTENE_NS::Size2D(src_width, src_height), CAROTENE_NS::Size2D(dst_width, dst_height), \
                                               src_data, src_step, dst_data, dst_step, 1.0/inv_scale_x, 1.0/inv_scale_y, 4), \
            CV_HAL_ERROR_OK : CV_HAL_ERROR_NOT_IMPLEMENTED :*/ \
    CV_HAL_ERROR_NOT_IMPLEMENTED \
)

This macro definition roughly does these things:

  1. If bilinear interpolation is of CV_8U data type and size and channel meet certain requirements, then resizeLinearOpenCV is used to truly implement resize and returns CV_HAL_ERROR_OK, which is not supported for bilinear interpolation that does not satisfy these conditions, and returns CV_HAL_ERROR_NOT_IMPLEMENTED, which will lead to the general implementation of hal::resize
  2. In the case of AREA interpolation, the situation is similar to bilinear interpolation
  3. Other interpolation methods are not currently supported.But from the code in the comment, it should be planned to support it, but it's not done yet.
    This macro definition uses the less-noticed comma operator to return the CV_HAL_ERROR_OK value, which returns its rightmost value.
    Then we can jump to resizeLinearOpenCV and resizeAreaOpenCV to track real fast implementations.

You can see that the key to getting started is the undef and define operation.

GaussianBlur

Similarly, we found an implementation of the cv_hal_gaussianBlur method in smooth.cpp, found that its Hal macro is cv_hal_gaussianBlur, and then findcv_hal_gaussianBlur in tegra_hal.hpp, found no results.This means that there is no corresponding Hal fast version of Gauss Blur.Then we find that there are Gauss blur-related codes in the carotene library, which seems to be an implementation?By writing demo and log ging in the source code, we find that these implementation functions are indeed not called, all returning CV_HAL_ERROR_NOT_IMPLEMENTED from the CALL_HAL macro.It should be that these realizations are not good enough, so don't cut in and wait.

Tags: C++ OpenCV REST less

Posted on Wed, 18 Mar 2020 11:00:02 -0700 by readourlines