Some of functions were enabled on Qualcomm S800 by changing grid size;
OpenCL kernel grid size unification for different platfroms;
Test pass rate improvements by inclreasing threshold;
Some tests were disabled for Android;
run.py was adopted for devices with brackets in in name.
Most codes are ported from AMD's Bolt library.
Four methods are implemented:
SORT_BITONIC, // only support power-of-2 buffer size
SORT_SELECTION, // cannot sort duplicate keys
SORT_MERGE,
SORT_RADIX // only support signed int/float keys