Version: Next

Sagemaker

Deploying the Takeoff server on SageMaker

Amazon SageMaker is a fully-managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning models at scale. SageMaker can be useful for anyone looking to build and deploy machine learning models quickly and efficiently, without having to worry about the underlying infrastructure and management of the machine learning pipeline.

Takeoff makes it easy to deploy large language models. SageMaker and Takeoff are perfect partners - with Takeoff it is trivial to launch a language model on a single device, and with SageMaker it is easy to scale that deployment to many instances dynamically.

info

When running on SageMaker Takeoff doesn't support streaming. Only the /generate endpoint will work. The SageMaker instance also cannot communicate with the Titan frontends. If you want either of these behaviours, you need to use a service like EC2 instead.

Prerequisites

To follow along with this guide we are going to assume you have a few things set up:

SageMaker should be set up so you can create Endpoints. Check out the Getting Started guide for help setting this up. For this guide we have been using sagemaker with the IAM AmazonSageMakerFullAccess policy. If something in the guide doesn't work, it may be that your policy does not grant you sufficient access to writing logs, for example.
Elastic Container Registry (ECR) should be set up so that you can push and pull containers. This guide will tell you how to set up ECR on the command line.

Step 1: Downloading the Takeoff Server

To get started we are going to download the Takeoff server. Assuming you have already arranged access to the Takeoff docker container we can download the image:

docker pull tytn/takeoff-pro:0.20.0

Step 2: Upload Takeoff to ECR

Now we need to tag the image so that it can be uploaded to ECR. The command below will do that. Make sure you fill in the command with your own account id and region!

docker tag tytn/takeoff-pro:0.20.0 <aws-account-id>.dkr.ecr.<region>.amazonaws.com/takeoff-pro:0.20.0

Now we can push the container to ECR:

docker push <aws-account-id>.dkr.ecr.<region>.amazonaws.com/takeoff-pro:0.20.0

Step 3: Deploy the Endpoint

Open up a python file deploy.py.

We are going to use the SageMaker Python SDK, so make sure that is installed using

pip install sagemaker

We need to import some classes from the SageMaker SDK:

from sagemaker.model import Model
from sagemaker.predictor import Predictor

Next define the image that we just uploaded, and your user role.

image_uri = "<aws-account-id>.dkr.ecr.<region>.amazonaws.com/takeoff-pro:0.20.0"

role = "arn:aws:iam::<user-id>:role/service-role/AmazonSageMaker-ExecutionRole-<ARN>"

Now we need to make a Model object that will describe the deployment. This is where we set the environment variables for the Takeoff Docker image. This is where you do things like specify what model you want to inference. We are choosing a small model here and set the log level to info to avoid cluttering the logs with lots of debug messages.

We also need to specify that the model needs access to ECR, using image_config = {'RepositoryAccessMode':'Platform'}

sagemaker_model = Model(
    image_uri = image_uri,
    role = role,
    predictor_cls = Predictor,
    env = {
        'TAKEOFF_MODEL_NAME': 'facebook/opt-125m',
        'TAKEOFF_DEVICE': 'cuda',
    },
    image_config = {
        'RepositoryAccessMode':'Platform'
    }
)

Finally we launch the prediction endpoint. Here we specify the endpoint name, and the instance type.

predictor = sagemaker_model.deploy(
    initial_instance_count = 1,
    instance_type = 'ml.p3.2xlarge',
    endpoint_name = 'takeoff-sagemaker-endpoint'
)

You can see what instances are available here.

Here is the whole python file in full:

from sagemaker.model import Model
from sagemaker.predictor import Predictor

# make sure to fill in the correct account id and region!
image_uri = "<aws-account-id>.dkr.ecr.<region>.amazonaws.com/takeoff-pro:0.20.0"

role = "arn:aws:iam::<user-id>:role/service-role/AmazonSageMaker-ExecutionRole-<ARN>"

sagemaker_model = Model(
    image_uri = image_uri,
    role = role,
    predictor_cls = Predictor,
    env = {
        'TAKEOFF_MODEL_NAME': 'facebook/opt-125m', # model choice
        'TAKEOFF_DEVICE': 'cuda', # hardware choice
    },
    image_config = {
        'RepositoryAccessMode':'Platform'
    }
)

predictor = sagemaker_model.deploy(
    initial_instance_count = 1,
    instance_type = 'ml.p3.2xlarge', # a small gpu endpoint
    endpoint_name = 'takeoff-sagemaker-endpoint'
)

Step 4: Monitoring the Deployment

If you navigate to the SageMaker console and look for the Inference/Endpoints tab on the left you can view the endpoint and more details about the configuration.

From here you can also view the logs and check that the Takeoff server has started as expected.

It can take a while for the endpoint to be available as you have to download a large container and model.

Step 5: Making Requests

Once the container is ready to go we can start to make requests and get back model responses. We are going to make another python file, inference.py

We need the following imports from the sagemaker sdk:

from sagemaker import Session
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

We need to build a Predictor object:

session = Session()

predictor = Predictor(
    endpoint_name='takeoff-sagemaker-endpoint',
    sagemaker_session=session,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

Now are ready to make inferences:

response = predictor.predict({'text': 'Hello SageMaker!'})

print(response)

# >> Hello Takeoff

Thats it, now you are ready to use Takeoff alongside a whole host of tools offered in SageMaker.

Advanced: Using a Local Model

If you have a transformer model saved locally you can use that but it involves creating a new docker container that contains the model and uploading it to ECR.

We are going to demo this with a small model, facebook/opt-125m.

We are going to assume that we have this model saved in a directory $HOME/models/myopt.

To start, make a new dockerfile in the $HOME directory. We will call it Dockerfile.opt because it is wrapping the opt model.

The dockerfile will look like this:

FROM tytn/takeoff-pro:0.20.0

COPY models/myopt /code/models/hf/myopt

This will copy the model into the directory /code/models/hf/myopt. It is important that this is the exact directory as this is where Takeoff will search for the model.

Then we are going to build this image:

docker build -f Dockerfile.opt -t tytn/takeoff-pro:0.20.0 .

Now we proceed from section 2, by tagging the new image and uploading it to ECR.

Sagemaker

Deploying the Takeoff server on SageMaker

Prerequisites​

Step 1: Downloading the Takeoff Server​

Step 2: Upload Takeoff to ECR​

Step 3: Deploy the Endpoint​

Step 4: Monitoring the Deployment​

Step 5: Making Requests​

Advanced: Using a Local Model​