Tuesday, January 1, 2019

AWS CloudFormation: Infrastructure as Code

When you need to create the same/similar replica of the existing cloud environment to another region or account, just create a template (in form of JSON/YAML) from the existing cloud environment and implement it on another region or account. CloudFormation converts all your cloud infrastructure to JSON/YAML code. Below are some basic points to remember about CloudFormation:

1. Infrastructure as Code

2. Create replica of your existing cloud environment (infrastructure resources) across multiple accounts and regions.

3. Components:

  • Template (JSON or YAML) (Code of your cloud environment or infrastructure resources) 
  • Stack (Logical collection/grouping of infrastructure resources based on the template code)
  • Changeset (Preview summary of proposed changes to your infrastructure)

4. Use cases: 

  • To copy the current cloud environment to another account or region 
  • To copy Production environment for developers to debug any issue 

5. Cost: Cloud Formation does not have any additional cost but you are charged for the underlying resources it builds.

AWS ELB (Elastic Load Balancer)

ELB (Elastic Load Balancer) balances and distributes traffic among various EC2 instances. Below are some basic points regarding ELB:

1. Elastic Load Balancer can distribute traffic among Multiple Availability Zone but not to multiple Regions.

2. Routes traffic to targets within Amazon Virtual Private Cloud (Amazon VPC) based on the content of the request.

3. Ensures only healthy targets receive traffic. If all of your targets in a single Availability Zone are unhealthy, Elastic Load Balancing will route traffic to healthy targets in other Availability Zones. Once targets have returned to a healthy state, load balancing will automatically resume to the original targets.

4. Hybrid Elastic Load Balancing: Offers ability to load balance across AWS and on-premises resources using the same load balancer. 

5. Application Load Balancer: Best suited for load balancing of HTTP and HTTPS traffic

6. Network Load Balancer: Best suited for load balancing of TCP traffic 

7. Classic Load Balancer: Classic Load Balancer provides basic load balancing across multiple Amazon EC2 instances and operates at both the request level (HTTP/S) and connection level (TCP).

8. Load Balancer can be internal/private to VPC or exposed to internet via Internet Gateway.

Monday, December 31, 2018

Route53: Domain Name System (DNS) from AWS

Route53 is the Domain Name System (DNS) service provided by AWS. Below are some basic points regarding Route53:

1. Domain Name System (DNS): Translates names like www.example.com into the numeric IP addresses like 192.0.2.1.

2. Why "53" in name? This services is named Route53 as port 53 belongs to TCP/UPD and mainly handles DNS queries.

3. Routes traffic based on multiple criteria, such as endpoint health, geographic location, and latency. Ensure end users are routed to the closest healthy endpoint for your application.

4. Routing Policies: Simple, Weighted (example: 75% to one server, 25% to other), Latency-based, Failover, Geo-location based.

5. Configure DNS health checks to route traffic to healthy endpoints or to independently monitor the health of your application and its endpoints. It re-route your users to an alternate location if your primary application endpoint becomes unavailable.

6. Also offers Domain Name Registration.

7. Record Sets: NS, SOA, A, AAAA, CNAME

CloudFront: Content Delivery Network (CDN) from AWS

CloudFront is the Content Deliver Network service provided by AWS. Below are some basic points regarding CloudFront:

1. Distribution service / Content Delivery Network (CDN) from AWS.

2. Edge Location: The CloudFront network has 160 points of presence (PoPs) as of now.

3. Edge server caches the data to improve latency and lower the load on your origin servers. 

4. Highly Programmable and Customizable content delivery with LAMBDA@EDGE: Lambda@Edge functions, triggered by CloudFront events, extend your custom code across AWS locations worldwide, allowing you to move even complex application logic closer to your end users to improve responsiveness.

5. CDN Origins: EC2, ELB, S3, Route53

6. TTL (Time to Live): How long your content will be cached/stored at Edge location? Defined in seconds. Default TTL: 24 hours (86400 seconds), Maximum TTL: 365 days (31536000 seconds), Minimum TTL: 0

7. Price Class: Use All Edge Locations (Best Performance), Use Only US, Canada, Europe, Use US, Canada, Europe, Asia and Africa. Price is charged accordingly.

8. Default CloudFront URL: *.cloudfront.net

9. Protocol supported: FTP

10. You can blacklist/whitelist users based on Geo-location.

11. Clearing cache from Edge location is chargeable.

Sunday, December 30, 2018

AWS Compute Services: EC2, Elastic Beanstalk, Lambda and ECS

EC2 (Elastic Compute Cloud), Elastic Beanstalk, Lambda and ECS (Elastic Container Service) are the compute service offerings from AWS. Below are some basic points regarding these AWS compute services:

EC2

1. Most common AWS service called Elastic Compute Cloud.

2. This is the Virtual Server in AWS.

3. Categories of EC2:
  • On Demand Instances (Charged hourly)
  • Spot Instances (Bid-based, Choose it when Start and End date is not a concern)
  • Reserved Instances (1 year or 3 year contract, cheaper than on-demand)
  • Scheduled Reserved Instances (Scheduled Instances)
  • Dedicated Host and Instances
4. EC2 Types:
  • General Purpose (T2, M5)
  • Compute Optimized (C5)
  • Memory Optimized (X1, R4)
  • Storage Optimized (I3, D2)
  • Accelerated Computing / GPU Optimized (P3, G3, F1)
Please note that EC2 is the most important topic in AWS. So, for details, please go through the official documentation.

Elastic Beanstalk

1. Simple way to deploy your application on AWS. No need to take headache of managing the infrastructure.

2. Just upload your application code and the service automatically handles all the details such as resource provisioning, load balancing, auto-scaling, and monitoring.

3. Supports PHP, Java, Python, Ruby, Node.js, .NET, Go and Docker.

4. Elastic Beanstalk uses core AWS services such as Amazon EC2, Amazon Elastic Container Service (Amazon ECS), Auto Scaling, and Elastic Load Balancing to support your applications.

5. Monitor and manage the health of your applications.

Lambda

1. Lambda lets you run code without managing any server (Go Serverless). Just upload your code to Lambda or write your code in Lambda Code Editor and it takes care of everything required to run it.

2. Any code uploaded to Lambda becomes Lambda Function. Code should be written in stateless style. If you need to store any state in between, save it to S3, Dynamo DB etc. and then retrieve from there.

3. Lambda can be directly triggered by AWS services such as S3, DynamoDB, Kinesis, SNS, CloudWatch, API Gateway and Web Applications. Use cases: https://aws.amazon.com/lambda/

4. Languages supported: C#, Java, Python, Ruby, Go, Powershell, Node.js

5. You pay only for the compute time you consume - there is no charge when your code is not running. You are charged for every 100ms your code executes and the number of times your code is triggered. You don't pay anything when your code isn't running.

ECS

1. Elastic Container Service (ECS) is a container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. 

2. Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.

3. Containers without Servers: With Fargate, you no longer have to select Amazon EC2 instance types to run your containers.

4. Amazon ECS launches your containers in your own Amazon VPC, allowing you to use your VPC security groups and network ACLs. 

5. With simple API calls, you can launch and stop Docker-enabled applications, query the complete state of your application, and access many familiar features such as IAM roles, security groups, load balancers, Amazon CloudWatch Events, AWS CloudFormation templates, and AWS CloudTrail logs.

Thursday, December 27, 2018

AWS Data Transport Solution: Snowball, Snowball Edge and Snowmobile (Data Truck)

It can cost thousands of dollars to transfer 100 terabytes of data using high-speed Internet. The same 100 terabytes of data can be transferred using two Snowball devices for as little as one-fifth the cost of using the Internet. For example, 100 terabytes of data will take more than 100 days to transfer over a dedicated 1 Gbps connection. That same transfer can be accomplished in less than one week, plus shipping time, using two Snowball devices.

Below are some basic points to remember about Snowball: 

1. Snowball is a petabyte-scale data transport solution to transfer large amounts of data into and out of the AWS Cloud. Even with high-speed Internet connections, it can take months to transfer large amounts of data. 

2. One snowball can contain approx. 50 TB of data.

3. With Snowball, you don’t need to write any code or purchase any hardware to transfer your data. Create a job in the AWS Management Console ("Console") and a Snowball device will be automatically shipped to you. Once it arrives, attach the device to your local network, download and run the Snowball Client ("Client") to establish a connection, and then use the Client to select the file directories that you want to transfer to the device. The Client will then encrypt and transfer the files to the device at high speed. Once the transfer is complete and the device is ready to be returned, the E Ink shipping label will automatically update and you can track the job status via Amazon Simple Notification Service (SNS), text messages, or directly in the Console.

4. Snowball Edge: 100 TB (storage as well as compute functionality). Local compute equivalent to EC2 (m4.large) instance.

5. Snowmobile: Data-truck with storage up to 100 PB.

Tuesday, December 25, 2018

Kinesis, Firehose and MapReduce: AWS Data Analytics Service

Kinesis, Firehose and Elastic MapReduce are very useful data analytics offerings from AWS. 

You can capture real time data and analyze it in parallel using Kinesis and Firehose. No need to wait to take data in warehouse and then run analytics. Below are some basic and important points about Kinesis and Firehose to remember:

1. Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. 

2. With Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. 

3. Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

4. With Kinesis, you can perform real-time analytics on data that has been traditionally analyzed using batch processing in data warehouses. The most common use cases include data lakes, data science and machine learning. 

5. No need to first save data into warehouse and then run analytics. No need of batch processes. All is done real-time.

6. Types: Kinesis Data and Video Streams, Firehose (also has processing capacity unlike Kinesis), Kinesis Analytics (takes data from Kinesis and Firehose and run SQL queries on it, pay only for the queries you run)

“Kinesis Video/Data Streams” vs “Firehose”

1. Firehose is fully managed whereas Kinesis Streams is manually managed.

2. Firehose PREPARE and LOAD data streams to S3, RedShift, ElasticSearch, Kinesis Data Analytics and Splunk whereas Kinesis Streams just STORES (for 1-7 days) the data streams and you need to write application using Lambda, EC2, Kinesis Data Analytics and Spark to PROCESS it.

For more details, please visit documentation.

EMR (Elastic MapReduce)

1. Big data analysis service

2. Used by data scientist for log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

3. EMR provides a managed Hadoop framework using which you can process vast amounts of data across dynamically scalable Amazon EC2 instances. 

4. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.