
When accessing global memory by DWORD4, memory bandwidth can be fully utilized on Intel platform. This patch will make more image format(e.g. 8UC4) be processed in DWORD4 by work-item. After applying this patch, 3 subcase of ./opencv_perf_core --gtest_filter=OCL_RepeatFixture_Repeat.Repeat/* can be speedup on HD4000 graphics card with Beignet: OCL_RepeatFixture_Repeat.Repeat/2, 64% improvement. OCL_RepeatFixture_Repeat.Repeat/6, 50% improvement. OCL_RepeatFixture_Repeat.Repeat/8, 56% improvement. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
OpenCV: Open Source Computer Vision Library
Resources
- Homepage: http://opencv.org
- Docs: http://docs.opencv.org
- Q&A forum: http://answers.opencv.org
- Issue tracking: http://code.opencv.org
Contributing
Please read before starting work on a pull request: http://code.opencv.org/projects/opencv/wiki/How_to_contribute
Summary of guidelines:
- One pull request per issue;
- Choose the right base branch;
- Include tests and documentation;
- Clean up "oops" commits before submitting;
- Follow the coding style guide.
Description
Languages
C++
83.4%
Cuda
5.9%
C
3.8%
Java
2.4%
CMake
2.2%
Other
2.1%