Skip to content

源码编译TensorFlow为C++ API动态库

引言

根据文档 https://tensorflow.google.cn/install 进行TensorFlow的安装时,要特别注意环境依赖的版本号,如果某个依赖项版本号不一致就会出现问题,比如使用的Python版本、Bazel版本、ProtocolBuffers版本等。

如果是通过Pip安装Python库的形式会容易一些,但如果想从源码编译,尤其是想编译出C/C++ API动态库形式就会比较麻烦,即使编译成功还要避免和其它库出现ABI冲突,参考: https://github.com/rangsimanketkaew/tensorflow-cpp-api 可以节省很多时间。

Pip安装(TersorFlow 2)

目前支持Python3.6~3.9:

$ sudo apt update
$ sudo apt install python3-dev python3-pip python3-venv

$ python3 --version
$ pip3 --version

# Requires the latest pip
$ pip install --upgrade pip

# Current stable release for CPU and GPU
$ pip install tensorflow

# Or try the preview build (unstable)
$ pip install tf-nightly

验证安装成功:

$ ipython

In [1]: import tensorflow as tf
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

In [2]:  tf.version.VERSION
Out[2]: '2.11.0'

In [3]: print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(229.19952, shape=(), dtype=float32)

Docker安装

$ docker pull tensorflow/tensorflow:latest  # Download latest stable image
$ docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter  # Start Jupyter server 

源码安装

进行源码安装要先注意几个问题:

  • 根据 https://tensorflow.google.cn/install/pip?hl=zh-cn) 官方编译好的软件包,大概知道TensorFlow版本和Python版本之间是否支持,比如Python3.7可以使用r2.6分支
  • 本地安装的Bazel版本,这个只要源码分支确定了,编译的时候会自动检查版本是否一致
  • 依赖项(比如protobuf)是否使用源码里面自带版本,还是使用本地系统另外安装的版本,它们之间版本差异可能导致后面编译的动态库使用问题

使用r2.6分支进行源码编译:

$ git clone git@github.com:tensorflow/tensorflow.git
$ cd tensorflow/

$ git checkout r2.6

查看该分支依赖的protobuf版本:

$ grep -i protobuf tensorflow/workspace2.bzl
protobuf-3.9.2

因为本地已经安装的版本不一致,需要降级安装3.9.2:

$ protoc --version
libprotoc 3.14.0

$ git clone https://github.com/protocolbuffers/protobuf.git
$ git checkotu v3.9.2
$ ./autogen.sh
$ ./configure
$ make 
$ make install

$ protoc --version
libprotoc 3.9.2

现在可以开始编译:

$ bazel --version
bazel 3.7.2

$ ./configure  # 这次都选N
  • 编译C动态库:
$ bazel build --config=opt //tensorflow:libtensorflow.so    # C
$ bazel build --config=opt //tensorflow:libtensorflow_framework.so # framework base
$ ls -1 bazel-bin/tensorflow/libtensorflow*.so*
  • 编译C++动态库:
$ bazel build --config=opt //tensorflow:libtensorflow_cc.so # C++
$ bazel build --config=cuda //tensorflow:libtensorflow_cc.so # with CUDA
$ bazel build --config=opt //tensorflow:libtensorflow_framework.so # framework base
$ ls -1 bazel-bin/tensorflow/libtensorflow*.so*
  • 自动收集头文件:
$ bazel build --config=opt //tensorflow:install_headers     # headers
  • 编译出一个压缩包:
$ bazel build --config=opt //tensorflow/tools/lib_package:libtensorflow
$ ls -1 bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
  • 编译Pip包:
    $ bazel build //tensorflow/tools/pip_package:build_pip_package # py
    
    $ ls -1 bazel-bin/tensorflow/tools/pip_package/build_pip_package
    
    $ mkdir tensorflow_pkg
    $ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package ./tensorflow_pkg
    
    $ ls -1 ./tensorflow_pkg/tensorflow-2.9.3-cp37-cp37m-linux_x86_64.whl
    $ pip3 install --user ./tensorflow_pkg/tensorflow-2.9.3-cp37-cp37m-linux_x86_64.whl
    

对python, bazel等版本不一致问题都可能导致出问题,参考: https://github.com/rangsimanketkaew/tensorflow-cpp-api

设置API路径和变量

先设置头文件和链接库文件:

$ mkdir /usr/local/tensorflow/lib/

$ rsync -av bazel-bin/tensorflow/include  /usr/local/tensorflow/
$ rsync -av bazel-bin/tensorflow/libtensorflow*.so*  /usr/local/tensorflow/lib/
$ export LD_LIBRARY_PATH=/usr/local/tensorflow/lib 

# for Mac
$ rsync -av bazel-bin/tensorflow/libtensorflow*.dylib*  /usr/local/tensorflow/lib/
$ export DYLD_LIBRARY_PATH=/usr/local/tensorflow/lib

特别说明一下,如果是在Mac电脑上进行的,却只将.so共享库文件放到指定位置,没有将.dylib文件一起放过去,会遇到奇奇怪怪的问题:

$ otool -L test
test:
    @rpath/libtensorflow_cc.so.2 (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libtensorflow_framework.so.2 (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.23.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)

$ ./test
dyld[54114]: Library not loaded: @rpath/libtensorflow_framework.2.dylib
  Referenced from: /usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5
  Reason: tried: '/usr/local/tensorflow/lib//libtensorflow_framework.2.dylib' (no such file), 
  '/usr/local/tensorflow/lib/../_solib_darwin_x86_64/_U_S_Stensorflow_Clibtensorflow_Uc
  c.so.2.6.5___Utensorflow/libtensorflow_framework.2.dylib' (no such file), '/usr/local/ten
  sorflow/lib/libtensorflow_framework.2.dylib' (no such file), '/usr/local/lib/libtensorflo
  w_framework.2.dylib' (no such file), '/usr/lib/libtensorflow_framework.2.dylib' (no such file)

明明使用的是.so文件,却说找不到.dylib,究其根本,原因是:

$ otool -L /usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5
/usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5:
    @rpath/libtensorflow_cc.so.2 (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.23.0)
    @rpath/libtensorflow_framework.2.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1858.112.0)
    /System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60158.100.133)
    /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
    /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1858.112.0)
    /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

可以看到虽然这是一个.so文件,但其实它底下依赖了.dylib文件的;如果只是按照其他的教程亦步亦趋只将.so复制过去,就会发现能够顺利编译成功却又无法运行,徒增烦恼。

调用C API

  • test_c.c:
#include <stdio.h>
#include <tensorflow/c/c_api.h>

int main() {
  printf("Hello from TensorFlow C library version %s\n", TF_Version());
  return 0;
}

编译:

$ gcc test_c.c -I/usr/local/tensorflow/include/ -L/usr/local/tensorflow/lib/ -ltensorflow -ltensorflow_framework -o test_c
$ ./test_c
Hello from TensorFlow C library version 2.6.5

调用C++ API

  • test_c++.cpp:
#include <tensorflow/core/platform/env.h>
#include <tensorflow/core/public/session.h>

#include <iostream>

using namespace std;
using namespace tensorflow;

int main()
{
    Session* session;
    Status status = NewSession(SessionOptions(), &session);
    if (!status.ok()) {
        cout << status.ToString() << "\n";
        return 1;
    }
    cout << "Session successfully created.\n";
}

编译运行:

$ g++ -std=c++17 test_c++.cpp -I/usr/local/tensorflow/include/ -L/usr/local/tensorflow/lib/ -ltensorflow_cc -ltensorflow_framework -o test_c++
$ ./test_c++
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Session successfully created.

直接和TF源码一起编译

还有一种方法就是不提前编译动态库,而是直接在TF源码里面一起编译,就可以直接编译为可执行程序,也可以编译为动态库;这个好处是依赖动态库少,方便部署,却包含了大量不需要的东西导致文件很大,而且编译也很慢。

比如,使用这样的目录结构:

tensorflow-src/tensorflow/test-tf/
tensorflow-src/tensorflow/test-tf/BUILD
tensorflow-src/tensorflow/test-tf/test-tf.cpp
  • test-tf/BUILD:
cc_binary(
    name = "test-tf",
    srcs = ["test-tf.cpp"],
    deps = [
        "//tensorflow/core:tensorflow",
    ],
)
  • test-tf/test-tf.cpp:
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/public/session.h"

#include <iostream>

using namespace std;
using namespace tensorflow;

int main()
{
    Session* session;
    Status status = NewSession(SessionOptions(), &session);
    if (!status.ok()) {
        cout << status.ToString() << "\n";
        return 1;
    }
    cout << "Session successfully created.\n";
}
  • 编译:
$ cd ~/tensorflow-src/
$ ./configure
$ cd tensorflow/test-tf/
$ bazel build --config=opt :test-tf
$ cd ~/tensorflow-src/bazel-bin/tensorflow/test-tf/
$ ./test-tf

总结

TensorFlow支持最好的是Python,而C++是没有官方直接提供下载版本的,但为了线下用Python得到训练模型结果,然后在线上用C++开发的系统来使用,又的确是一个需求;过程虽然曲折,但也是必由之路!

资料