Introduction#

GitHub shows basic of SageMaker endpoint

  • Hosting Option: Single Model, Multiple Models
  • Product Variants
  • Auto Scaling Models
  • Update Models: A/B, Blue/Green, Canary
  • Model Monitoring

Multi-Model endpoint#

sagemaker multi model endpoint

It is possible to host multiple models within container behind a endpoint to save cost. Follow docs to setup a MM endpoint. Then we can specify TargetModel when invoking a endpoint.

Retrieve image url

import sagemaker
region = sagemaker_session.boto_region_name
image = sagemaker.image_uris.retrieve("knn",region=region)
container = {
'Image': image,
'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>',
'Mode': 'MultiModel'
}

Create a model

import boto3
sagemaker_client = boto3.client('sagemaker')
response = sagemaker_client.create_model(
ModelName = '<MODEL_NAME>',
ExecutionRoleArn = role,
Containers = [container])

Create an endpoint configuration

response = sagemaker_client.create_endpoint_config(
EndpointConfigName = '<ENDPOINT_CONFIG_NAME>',
ProductionVariants=[
{
'InstanceType': 'ml.m4.xlarge',
'InitialInstanceCount': 2,
'InitialVariantWeight': 1,
'ModelName': '<MODEL_NAME>',
'VariantName': 'AllTraffic'
}
]
)

Create an endpoint

response = sagemaker_client.create_endpoint(
EndpointName = '<ENDPOINT_NAME>',
EndpointConfigName = '<ENDPOINT_CONFIG_NAME>')

Invoke an endpont and specifying the TargetModel

response = runtime_sagemaker_client.invoke_endpoint(
EndpointName = "<ENDPOINT_NAME>",
ContentType = "text/csv",
TargetModel = "<MODEL_FILENAME>.tar.gz",
Body = body)

A/B Testing Endpoint#

sagemaker ab testing endpoint

Following this example to create a A/B testing endpoint with two models. There are two options to distribute traffics or requests to two models

  • Specifying weights
  • Specifying invoking variants

First, create two models

from sagemaker.amazon.amazon_estimator import get_image_uri
model_name = f"DEMO-xgb-churn-pred-{datetime.now():%Y-%m-%d-%H-%M-%S}"
model_name2 = f"DEMO-xgb-churn-pred2-{datetime.now():%Y-%m-%d-%H-%M-%S}"
image_uri = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-1')
image_uri2 = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-2')
sm_session.create_model(
name=model_name,
role=role,
container_defs={
'Image': image_uri,
'ModelDataUrl': model_url
}
)
sm_session.create_model(
name=model_name2,
role=role,
container_defs={
'Image': image_uri2,
'ModelDataUrl': model_url2
}
)

Create two model variants

from sagemaker.session import production_variant
variant1 = production_variant(
model_name=model_name,
instance_type="ml.m5.xlarge",
initial_instance_count=1,
variant_name='Variant1',
initial_weight=1,
)
variant2 = production_variant(
model_name=model_name2,
instance_type="ml.m5.xlarge",
initial_instance_count=1,
variant_name='Variant2',
initial_weight=1,
)

Create an endpoint with two variants

endpoint_name = f"DEMO-xgb-churn-pred-{datetime.now():%Y-%m-%d-%H-%M-%S}"
print(f"EndpointName={endpoint_name}")
sm_session.endpoint_from_production_variants(
name=endpoint_name,
production_variants=[variant1, variant2]
)

Invoke the deploy model

# get a subset of test data for a quick test
!tail -120 test_data/test-dataset-input-cols.csv > test_data/test_sample_tail_input_cols.csv
print(f"Sending test traffic to the endpoint {endpoint_name}. \nPlease wait...")
with open('test_data/test_sample_tail_input_cols.csv', 'r') as f:
for row in f:
print(".", end="", flush=True)
payload = row.rstrip('\n')
sm_runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Body=payload
)
time.sleep(0.5)
print("Done!")

Blue/Green Endpoint#

Canary Endpoint#

Reference#