做点笔记
最近的目标是提高 Android 应用程序的计算能力,尝试了RenderScript, OpenCL, OpenBlas 等各种方式,所以写一写记录一下。
现在人工智能,机器学习可是越来越火了。移动端的应用应该也很有前景,但是我看到的那些计算,比如对图片数据卷积计算要求都很高,然而手机的计算性能是比较捉急的。
从最开始我们就放弃使用 JVM,直接调用本 Native 程序。这个并不难,Android Studio 2.2 版本增加了对 CMake 的支持,可以将需要的C库连接到应用的本地库,写一些需要的 JNI 接口就行。
在算法已经确定的情况下,我们打算从2个方面来进行计算性能的提升:
- 使用 blas 来加速线性代数的计算。
- 使用异构计算来提高计算性能。
OpenBlas
OpenBlas(Open Basic Linear Algebra Subprograms)
OpenCL
OpenCl 是一个计算平台用它来调用手机上的 GPU,但前提是手机要支持,它的共享库一般可以以直接中手机里取出来,头文件我是从 Github 上找的。连接到本地库后就可以开始编写 OpenCL 的代码了。
启动OpenCL:
/**
* 初始化OpenCL
* @param *env JVM 指针
* @param openCLProgramText 内核程序
* @param openCLObjects openCL信息封装类
*/
void initOpenCL(JNIEnv * env, jstring openCLProgramText){
cl_device_type device_type = CL_DEVICE_TYPE_GPU;
cl_int err = CL_SUCCESS;
/**----------------------------------------------------------------------
* 1.获取可用平台信息
*/
cl_uint num_of_platforms = 0;
err = clGetPlatformIDs(0, 0, &num_of_platforms);
SAMPLE_CHECK_ERRORS(err);
if (num_of_platforms == 0){
LOGE("There is no found a suitable OpenCL platform");
return;
}
LOGD("Number of available platforms: %u", num_of_platforms);
vector platforms(num_of_platforms);
err = clGetPlatformIDs(num_of_platforms, &platforms[0], 0);
SAMPLE_CHECK_ERRORS(err);
LOGD("OpenCL platform names:");
cl_uint selected_platform_index = 0;
openCLObjects.platform = platforms[selected_platform_index];
size_t platform_name_length = 0;
err = clGetPlatformInfo(
platforms[selected_platform_index],
CL_PLATFORM_NAME,
0,
0,
&platform_name_length
);
SAMPLE_CHECK_ERRORS(err);
vector<char> platform_name_buffer(platform_name_length);
err = clGetPlatformInfo(
platforms[selected_platform_index],
CL_PLATFORM_NAME,
platform_name_length,
&platform_name_buffer[0],
0
);
SAMPLE_CHECK_ERRORS(err);
string platform_name = &platform_name_buffer[0];
LOGD("platform name:%s", platform_name.c_str());
/**----------------------------------------------------------------------
* 2. 创建上下文
*/
cl_context_properties context_props[] = {
CL_CONTEXT_PLATFORM,
cl_context_properties(openCLObjects.platform),
0
};
openCLObjects.context = clCreateContextFromType(
context_props,
device_type,
0,
0,
&err);
SAMPLE_CHECK_ERRORS(err);
/**----------------------------------------------------------------------
* 3. 查找设备信息
*/
err = clGetContextInfo(
openCLObjects.context,
CL_CONTEXT_DEVICES,
sizeof(openCLObjects.device),
&openCLObjects.device,
0);
SAMPLE_CHECK_ERRORS(err);
/**----------------------------------------------------------------------
* 4. 创建 openCl 程序
*/
const char* openCLProgramTextNative = env->GetStringUTFChars(openCLProgramText, 0);
LOGD("OpenCL program text:
%s", openCLProgramTextNative);
openCLObjects.program =
clCreateProgramWithSource
(
openCLObjects.context,
1,
&openCLProgramTextNative,
0,
&err
);
SAMPLE_CHECK_ERRORS(err);
/**----------------------------------------------------------------------
* 5. 构建 CL 程序
*/
err = clBuildProgram(openCLObjects.program, 0, 0, 0, 0, 0);
if(err == CL_BUILD_PROGRAM_FAILURE)
{
size_t log_length = 0;
err = clGetProgramBuildInfo(
openCLObjects.program,
openCLObjects.device,
CL_PROGRAM_BUILD_LOG,
0,
0,
&log_length
);
SAMPLE_CHECK_ERRORS(err);
vector<char> log(log_length);
err = clGetProgramBuildInfo(
openCLObjects.program,
openCLObjects.device,
CL_PROGRAM_BUILD_LOG,
log_length,
&log[0],
0
);
SAMPLE_CHECK_ERRORS(err);
LOGE("Error happened during the build of OpenCL program.
Build log:%s", &log[0]);
return;
}
/**-----------------------------------------------------------------------
* 6. 提取内核
*/
openCLObjects.kernel = clCreateKernel(openCLObjects.program, "stepKernel", &err);
SAMPLE_CHECK_ERRORS(err);
openCLObjects.queue =
clCreateCommandQueue
(
openCLObjects.context,
openCLObjects.device,
0,
&err
);
SAMPLE_CHECK_ERRORS(err);
LOGD("initOpenCL finished successfully");
env->ReleaseStringUTFChars(openCLProgramText, openCLProgramTextNative);
}