TensorFlow Serving

Intro

TensorFlow Serving is an high-performance system designed for serving TensorFlow models.

It can load saved models (ProtoBuff format) and expose an endpoint for inference

Sometimes, we can’t use TensorFlow serving alone as we need to make some processing before/after the inference.

Read more about TensorFlow Serving: https://www.tensorflow.org/tfx/guide/serving

Integration

mlserving allows easy integration with TensorFlow Serving model server.

The idea is to have a python layer that makes some processing before invoking the tf-serving endpoint.

TFServingPrediction implements def predict can be used as a mixin that handles the tf-serving request

requests package is required for TFServingPrediction to work properly

$ pip install requests
from mlserving.predictors import RESTPredictor
from mlserving.predictors.tensorflow import TFServingPrediction

class MyPredictor(TFServingPrediction, RESTPredictor):
    def __init__(self):
        # configure the TFServingPrediction with default values.
        super().__init__()
        # Default values: host='127.0.0.1' port=8501 model_name='model'

    def pre_process(self, features: dict, req):
        return {
            "instances": [
                # TODO: fill your tensor inputs here
            ]
        }

    def post_process(self, prediction, req):
        prediction = prediction['prediction']
        return {
            'probabilities': prediction,
        }

Since def predict is already implemented, we just need to implement the processing layer that comes before/after the inference