Quick notes of AWS MLops, Week 4
Talk about availability, scalability, and resilience#
Monitoring and Logging
- View point: Data science for Software system
- AWS CloudWatch
- Dashboards
- Search
- Alerts
- Automated insights
- Use CloudWatch to pull in info from servers hosting source codes and monitoring agents (e.g. CPU metrics, Memory metrics, and DISK I/O metrics)
Multiple Regions
- Resources are distributed across isolated geographic regions which have multiple availability zones
- Create as many as redundant infrastructures
- Increase resilience
- Resources are distributed across isolated geographic regions which have multiple availability zones
Reproducible Workflows
- Infrastructure as code (IAC) workflow: the idea behind infrastructure as code is that there isn’t a human pushing buttons that make something happen. The code is triggered to built by events.
Implement Appropriate ML Services#
- Comparisons between higher infrastructure control and faster application development and deployment
Provisioning EC2
- Can launch EC2 from console, SDK, or CLI.
- Sub-components:
- User data: could put special instructions here
- Storage: EBS versus Instance
- Security Group: firewall rules for the EC2 launch
- SSH Key pair
- Have Amazon Machine Image?
- Instances type: CPU vs. GPU
- Cost: On demand vs. Spot
- Virtual Private Cloud (VPC)
- IAM Role
Provisioning EBS, Elastic Beanstalk, various possibilities of building on top of AWS platform
- Key idea: Elastic Beanstalk can scale up and down resources automatically according to the health metrics from the load balancer. The provisioning model is elastic.
- Block storage can be provisioned to have high bandwidth
- The user decides which parts should be pre-provisioned and which parts should be elastic.
- Example: You need to have extremely high band width storage for doing machine learning training where you had a cluster of machines all talking to the same amount point.
AWS ML Services
- Many high level ML services provided
- Provides both GUI, Console access through
boto3
- Examples and tutorials
boto3
documentations have all API call details
Deploying and Secure ML Solutions#
Principle of Least Privilege AWS Lambda
- Configure the Lambda micro service to have the minimal privilege necessary for accessing upstream (e.g. AWS S3) and downstream (e.g. DynamoDB) access.
- Can be achieved through IAM role based policies
Integrated Security
- AWS security firewall, blocking incoming ports via role-based privileges
- Within the firewall, data transfer is encrypted between source and AWS S3 object storage
- Everything is inside a virtual private cloud (AWS VPC)
- Use automated deployment or infrastructure as code, no need to worry about making manual mistakes
- Audit: AWS CloudTrail monitors API calls and all actions that is occurring in the network
Use AWS SageMaker Studio to prepare data, build model, train & tune model, and deploy. This platform provides launchers that have many models and templates for jump-starting any projects
AWS SageMaker Canvas: Using Canvas is a great way to understand at a high level around machine learning problems solved, and you can also do this by building your own machine learning system with SageMaker or building your own system outside using AWS Cloud9.
Data Drift and Model Monitoring:
- Use data to train the first model
- New data comes in, triggers a data drift alert saying a new model is needed
- New data is combined with the old one and a new model is trained, registered, and deployed