Predictors

PredictorBase presents a simple interface:

Request Flow
  • before_request - Things to do before starting the ml-business logics (like parsing, validations, etc…)
  • pre_process - Used for feature processing (string embedding, normalizations, etc…)
  • predict - Predicting a result with your loaded model
  • post_process - Formatting a response for your client, the return value of this method will be sent back in the response

This interface is very intuitive, and makes it easy to integrate other serving-apps like TensorFlow Serving.

These methods are called one after the other, the output of a method will become the input of the next one in line.

Since the most common format of transmitting data over HTTP/1.1 is JSON, mlserving accepts & returns JSONs only

RESTPredictor

This class implements PredictorBase and the go-to class to inherit from.

When you inherit from RESTPredictor, you usually want to load relevant resources on def __init__


from mlserving.predictors import RESTPredictor

import joblib

class MyPredictor(RESTPredictor):
    def __init__(self):
        # Loading resources
        self.model = joblib.load('./models/my_trained_model.pkl')

RESTPredictor also adds validations to the input request in before_request

The validation is done with validr, a fast (and easy to use python library).

In order to define your request schema, you’ll need to add a decorator above your predictor class:

from mlserving.api import request_schema
from mlserving.predictors import RESTPredictor

SCHEMA = {
    # floats only list
    'features': [
        'list',
        'float'
    ]
}

@request_schema(SCHEMA)
class MyPredictor(RESTPredictor):
    # TODO: Implement "def predict" & override methods (if needed)
    pass

validr syntax can be found here: https://github.com/guyskk/validr/wiki/Schema-Syntax

PipelinePredictor

Whenever your prediction is based on the result of several models, you should consider using PipelinePredictor for chaining models one after the other.

A good example would be a text classification model.

Request with input text -> text processing -> embedding -> classification

from mlserving import ServingApp
from mlserving.api import request_schema
from mlserving.predictors import RESTPredictor, PipelinePredictor

SCHEMA = {
    'text': 'str'
}

@request_schema(SCHEMA)
class EmbeddingPredictor(RESTPredictor):
    def __init__(self):
        # Load relevant resources
        pass

    def pre_process(self, features: dict, req):
        text = features['text']
        # Clean the text, make other processing if needed
        return text

    def predict(self, processed_text, req):
        # Use the processed_text and get its embedding
        pass

class TextClassifierPredictor(RESTPredictor):
    def __init__(self):
        # Load relevant resources
        pass

     def predict(self, features: dict, req):
        # Make the prediction based on the text-embedding
        pass
        

app = ServingApp()
p = PipelinePredictor([
        EmbeddingPredictor(),
        TextClassifierPredictor()
    ])

app.add_inference_handler(p, '/classify_text')