Introduction#
GitHub shows basic of SageMaker endpoint
- Hosting Option: Single Model, Multiple Models
- Product Variants
- Auto Scaling Models
- Update Models: A/B, Blue/Green, Canary
- Model Monitoring
Multi-Model endpoint#
It is possible to host multiple models within container behind a endpoint to save cost. Follow docs to setup a MM endpoint. Then we can specify TargetModel when invoking a endpoint.
Retrieve image url
import sagemakerregion = sagemaker_session.boto_region_nameimage = sagemaker.image_uris.retrieve("knn",region=region)container = {'Image': image,'ModelDataUrl': 's3://<BUCKET_NAME>/<PATH_TO_ARTIFACTS>','Mode': 'MultiModel'}
Create a model
import boto3sagemaker_client = boto3.client('sagemaker')response = sagemaker_client.create_model(ModelName = '<MODEL_NAME>',ExecutionRoleArn = role,Containers = [container])
Create an endpoint configuration
response = sagemaker_client.create_endpoint_config(EndpointConfigName = '<ENDPOINT_CONFIG_NAME>',ProductionVariants=[{'InstanceType': 'ml.m4.xlarge','InitialInstanceCount': 2,'InitialVariantWeight': 1,'ModelName': '<MODEL_NAME>','VariantName': 'AllTraffic'}])
Create an endpoint
response = sagemaker_client.create_endpoint(EndpointName = '<ENDPOINT_NAME>',EndpointConfigName = '<ENDPOINT_CONFIG_NAME>')
Invoke an endpont and specifying the TargetModel
response = runtime_sagemaker_client.invoke_endpoint(EndpointName = "<ENDPOINT_NAME>",ContentType = "text/csv",TargetModel = "<MODEL_FILENAME>.tar.gz",Body = body)
A/B Testing Endpoint#
Following this example to create a A/B testing endpoint with two models. There are two options to distribute traffics or requests to two models
- Specifying weights
- Specifying invoking variants
First, create two models
from sagemaker.amazon.amazon_estimator import get_image_urimodel_name = f"DEMO-xgb-churn-pred-{datetime.now():%Y-%m-%d-%H-%M-%S}"model_name2 = f"DEMO-xgb-churn-pred2-{datetime.now():%Y-%m-%d-%H-%M-%S}"image_uri = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-1')image_uri2 = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-2')sm_session.create_model(name=model_name,role=role,container_defs={'Image': image_uri,'ModelDataUrl': model_url})sm_session.create_model(name=model_name2,role=role,container_defs={'Image': image_uri2,'ModelDataUrl': model_url2})
Create two model variants
from sagemaker.session import production_variantvariant1 = production_variant(model_name=model_name,instance_type="ml.m5.xlarge",initial_instance_count=1,variant_name='Variant1',initial_weight=1,)variant2 = production_variant(model_name=model_name2,instance_type="ml.m5.xlarge",initial_instance_count=1,variant_name='Variant2',initial_weight=1,)
Create an endpoint with two variants
endpoint_name = f"DEMO-xgb-churn-pred-{datetime.now():%Y-%m-%d-%H-%M-%S}"print(f"EndpointName={endpoint_name}")sm_session.endpoint_from_production_variants(name=endpoint_name,production_variants=[variant1, variant2])
Invoke the deploy model
# get a subset of test data for a quick test!tail -120 test_data/test-dataset-input-cols.csv > test_data/test_sample_tail_input_cols.csvprint(f"Sending test traffic to the endpoint {endpoint_name}. \nPlease wait...")with open('test_data/test_sample_tail_input_cols.csv', 'r') as f:for row in f:print(".", end="", flush=True)payload = row.rstrip('\n')sm_runtime.invoke_endpoint(EndpointName=endpoint_name,ContentType="text/csv",Body=payload)time.sleep(0.5)print("Done!")