Advanced Usage

Multiple Models

It supports serve multiple models and multiple versions of these models. You can run the server with this configuration.

{
  "model_config_list": [
    {
      "name": "tensorflow_template_application_model",
      "base_path": "./models/tensorflow_template_application_model/",
      "platform": "tensorflow"
    }, {
      "name": "deep_image_model",
      "base_path": "./models/deep_image_model/",
      "platform": "tensorflow"
    }, {
       "name": "mxnet_mlp_model",
       "base_path": "./models/mxnet_mlp/mx_mlp",
       "platform": "mxnet"
    }
  ]
}
simple_tensorflow_serving --model_config_file="./examples/model_config_file.json"

_images/dashboard_multiple_models.png

GPU Acceleration

If you want to use GPU, try with the docker image with GPU tag and put cuda files in /usr/cuda_files/.

export CUDA_SO="-v /usr/cuda_files/:/usr/cuda_files/"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
export LIBRARY_ENV="-e LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/cuda_files"

docker run -it -p 8500:8500 $CUDA_SO $DEVICES $LIBRARY_ENV tobegit3hub/simple_tensorflow_serving:latest-gpu

You can set session config and gpu options in command-line parameter or the model config file.

simple_tensorflow_serving --model_base_path="./models/tensorflow_template_application_model" --session_config='{"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}'
{
  "model_config_list": [
    {
      "name": "default",
      "base_path": "./models/tensorflow_template_application_model/",
      "platform": "tensorflow",
      "session_config": {
        "log_device_placement": true,
        "allow_soft_placement": true,
        "allow_growth": true,
        "per_process_gpu_memory_fraction": 0.5
      }
    }
  ]
}

Here is the benchmark of CPU and GPU inference and y-coordinate is the latency(the lower the better).

_images/cpu_gpu_benchmark.png

Generated Client

You can generate the test json data for the online models.

curl http://localhost:8500/v1/models/default/gen_json

Or generate clients in different languages(Bash, Python, Golang, JavaScript etc.) for your model without writing any code.

curl http://localhost:8500/v1/models/default/gen_client?language=python > client.py
curl http://localhost:8500/v1/models/default/gen_client?language=bash > client.sh
curl http://localhost:8500/v1/models/default/gen_client?language=golang > client.go
curl http://localhost:8500/v1/models/default/gen_client?language=javascript > client.js

The generated code should look like these which can be test immediately.

#!/usr/bin/env python

import requests

def main():
  endpoint = "http://127.0.0.1:8500"

  input_data = {"keys": [[1.0], [1.0]], "features": [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]}
  result = requests.post(endpoint, json=input_data)
  print(result.json())

if __name__ == "__main__":
  main()

Image Model

For image models, we can request with the raw image files instead of constructing array data.

Now start serving the image model like deep_image_model.

simple_tensorflow_serving --model_base_path="./models/deep_image_model/"

Then request with the raw image file which has the same shape of your model.

curl -X POST -F 'image=@./images/mew.jpg' -F "model_version=1" 127.0.0.1:8500

Custom Op

If your models rely on new TensorFlow custom op, you can run the server while loading the so files.

simple_tensorflow_serving --model_base_path="./model/" --custom_op_paths="./foo_op/"

Please check out the complete example in ./examples/custom_op/.

Authentication

For enterprises, we can enable basic auth for all the APIs and any anonymous request is denied.

Now start the server with the configured username and password.

./server.py --model_base_path="./models/tensorflow_template_application_model/" --enable_auth=True --auth_username="admin" --auth_password="admin"

If you are using the Web dashboard, just type your certification. If you are using clients, give the username and password within the request.

curl -u admin:admin -H "Content-Type: application/json" -X POST -d '{"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}' http://127.0.0.1:8500
endpoint = "http://127.0.0.1:8500"
input_data = {
  "data": {
      "keys": [[11.0], [2.0]],
      "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]
  }
}
auth = requests.auth.HTTPBasicAuth("admin", "admin")
result = requests.post(endpoint, json=input_data, auth=auth)

TSL/SSL

It supports TSL/SSL and you can generate the self-signed secret files for testing.

openssl req -x509 -newkey rsa:4096 -nodes -out /tmp/secret.pem -keyout /tmp/secret.key -days 365

Then run the server with certification files.

simple_tensorflow_serving --enable_ssl=True --secret_pem=/tmp/secret.pem --secret_key=/tmp/secret.key --model_base_path="./models/tensorflow_template_application_model"