Thursday, December 27, 2018

AWS Data Transport Solution: Snowball, Snowball Edge and Snowmobile (Data Truck)

It can cost thousands of dollars to transfer 100 terabytes of data using high-speed Internet. The same 100 terabytes of data can be transferred using two Snowball devices for as little as one-fifth the cost of using the Internet. For example, 100 terabytes of data will take more than 100 days to transfer over a dedicated 1 Gbps connection. That same transfer can be accomplished in less than one week, plus shipping time, using two Snowball devices.

Below are some basic points to remember about Snowball: 

1. Snowball is a petabyte-scale data transport solution to transfer large amounts of data into and out of the AWS Cloud. Even with high-speed Internet connections, it can take months to transfer large amounts of data. 

2. One snowball can contain approx. 50 TB of data.

3. With Snowball, you don’t need to write any code or purchase any hardware to transfer your data. Create a job in the AWS Management Console ("Console") and a Snowball device will be automatically shipped to you. Once it arrives, attach the device to your local network, download and run the Snowball Client ("Client") to establish a connection, and then use the Client to select the file directories that you want to transfer to the device. The Client will then encrypt and transfer the files to the device at high speed. Once the transfer is complete and the device is ready to be returned, the E Ink shipping label will automatically update and you can track the job status via Amazon Simple Notification Service (SNS), text messages, or directly in the Console.

4. Snowball Edge: 100 TB (storage as well as compute functionality). Local compute equivalent to EC2 (m4.large) instance.

5. Snowmobile: Data-truck with storage up to 100 PB.

Tuesday, December 25, 2018

Kinesis, Firehose and MapReduce: AWS Data Analytics Service

Kinesis, Firehose and Elastic MapReduce are very useful data analytics offerings from AWS. 

You can capture real time data and analyze it in parallel using Kinesis and Firehose. No need to wait to take data in warehouse and then run analytics. Below are some basic and important points about Kinesis and Firehose to remember:

1. Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. 

2. With Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. 

3. Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

4. With Kinesis, you can perform real-time analytics on data that has been traditionally analyzed using batch processing in data warehouses. The most common use cases include data lakes, data science and machine learning. 

5. No need to first save data into warehouse and then run analytics. No need of batch processes. All is done real-time.

6. Types: Kinesis Data and Video Streams, Firehose (also has processing capacity unlike Kinesis), Kinesis Analytics (takes data from Kinesis and Firehose and run SQL queries on it, pay only for the queries you run)

“Kinesis Video/Data Streams” vs “Firehose”

1. Firehose is fully managed whereas Kinesis Streams is manually managed.

2. Firehose PREPARE and LOAD data streams to S3, RedShift, ElasticSearch, Kinesis Data Analytics and Splunk whereas Kinesis Streams just STORES (for 1-7 days) the data streams and you need to write application using Lambda, EC2, Kinesis Data Analytics and Spark to PROCESS it.

For more details, please visit documentation.

EMR (Elastic MapReduce)

1. Big data analysis service

2. Used by data scientist for log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

3. EMR provides a managed Hadoop framework using which you can process vast amounts of data across dynamically scalable Amazon EC2 instances. 

4. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.

AWS Application Integration Services: SQS, SNS and SWF

We will look at some basic points of AWS Application Integration services like SQS, SNS and SWF.

SQS (Simple Queue Service)

1. Fully managed message queues for microservices, distributed systems, and serverless applications.

2. Enables application components and microservices to communicate with each other.

3. Pull based system

4. Queue Types: Standard and FIFO

5. Can be used with Redshift, DynamoDB, RDS, EC2, ECS, Lambda, S3 and SNS.

6. Multiple copies of every message are stored redundantly across multiple availability zones so that they are available whenever needed.

For more details, you can read documentation

SNS (Simple Notification Service)

1. Fully managed pub/sub messaging service for microservices, distributed systems, and serverless applications.

2. Push-based, many-to-many messaging.

3. Publishers to Topic: EC2, S3, RDS, CloudWatch

4. Subscribers to Topic: Serverless functions (Lambda), Queues (SQS), HTTP/S endpoints and distributed systems. Additionally, SNS fans out notifications to end users via mobile push messages, SMS, and email.

5. SNS uses cross availability zone message storage to provide high message durability. 

6. SNS Topic owners can keep sensitive data secure by setting topic policies that restrict who can publish and subscribe to a topic.

SWF (Simple Workflow Service)

1. SWF lets you write your application components and coordination logic in any programming language and run them in the cloud or on-premises.

2. SWF creates a logical separation between tasks and components and acts as a task coordinator. 

Monday, December 24, 2018

AWS Storage Services: S3, Glacier, EBS, EFS, FSx and Storage Gateway

AWS provides following services under storage section:

1. S3
2. Glacier
3. EFS
4. FSx
5. Storage Gateway

Following are some basic and important points about AWS Storage services:

S3

1. Cloud storage service like Dropbox and Google Drive.

2. Object based storage not block level (like EBS and EFS). Data is treated as object. Single object in S3 can be uploaded up to 5TB in multi-part. You cannot install OS and software on it.

3. Buckets: Data is stored in buckets which are similar to Windows folders. Bucket name must be in lower case and alphanumeric. Bucket name must be unique globally. By default bucket is private.

4. Versioning: Versioning takes more space as each version is saved individually in same or different bucket. Versioning must be done for cross-region replication. Once versioning is enabled, it can’t be disabled, only suspended.

5. Storage Class
  • Standard (Frequently accessed data, minimum storage duration: 30 days,  99.999999999% durability (11 times 9))
  • Intelligent-Tiering (Long-lived data with changing or unknown access patterns)
  • Standard-IA (Long-lived, infrequently accessed data, minimum storage duration: 30 days, 99.999999999% durability (11 times 9))
  • One Zone-IA (Long-lived, infrequently accessed, non-critical data)
  • Glacier (Data archiving with retrieval times ranging from minutes to hours, minimum storage duration: 90 days)
  • Reduced Redundancy (Not recommended, Frequently accessed, non-critical data which even if get lost, it does not hamper you)
6. Encryption: 
  • SSE-S3 (uses AES 256 encryption methods)
  • SSE-KMS (Key Management Service)
  •  SSE-C (Client Side Encryption)
7. Bucket URL syntax: https://s3.regionname.amazonaws.com/bucketname/objectname

8. Eventual Consistency: When we upload a new file to S3, it becomes available immediately, but when we perform overwrite and delete operation, there is some delay which is known as eventual consistency. When a file upload to S3 is successful, it returns HTTP200 status.

9. Security: Data is secured using ACL (Access Control List) and Bucket Policies at Bucket or Object level. You can write custom bucket policies using JSON.

10. Data Transfer Acceleration: Enables quick upload of data to S3 bucket over long distance using CloudFront.

11. Lifecycle Management: You can manage transition of file from one storage class to another using Lifecycle rules. For example, you can move a file from Standard Storage Class to Infrequently Access Storage Class after some days (min 30 days) if it is not frequently used now. Similarly, if you want to archive this file after some days (min 30 days), you can further move it to Glacier.

12. Static Website Hosting: You can host static website and customize the URL using Route53.

Glacier 

1. Data Backup and Archive

2. Types of data: Hot Data (which we need on daily basis), Cold Data (which we don’t need on daily basis, archive this data to Glacier).

3. Delay in retrieval time and may take 3-5 hours.

4. Minimum storage duration in Glacier is 90 days. Archives deleted before 90 days incur a pro-rated charge equal to the storage charge for the remaining days.

EBS

1. Elastic Block Storage (just like Hard Disk of your laptop and can only be used by mounting on an EC2 instance unlike S3).

2. Backup of EBS volumes is called Snapshot and is done in incremental fashion. You can also take point-in-time snapshots of your EBS volumes and save it on S3.

3. To take backup of Root EBS (where OS is running), you must stop it first for data integrity.

4. Root EBS can’t be encrypted and “Delete on Termination” is checked by default.

5. To share snapshots between AWS accounts, make sure snapshots MUST NOT be encrypted.

6. Multiple Availability Zone is NOT supported.

7. Cannot attach one EBS volume to multiple EC2. Use EFS for this.

8. RAID0, RAID1 and RAID10 (combination of both) are preferred. RAID5 is discouraged.

9. EBS Volume Types
  • General Purpose (SSD) (gp2) volumes can burst to 3000 IOPS, and deliver a consistent baseline of 3 IOPS/GiB. 
  • Provisioned IOPs (SSD) (io1) volumes can deliver up to 64000 IOPS, and are best for EBS-optimized instances. 
  • Max Throughput Optimized HDD (ST1) – For frequent accessed data
  • Max Cold HDD (SC1) – For IA (in-frequent accessed data)
  • Magnetic volumes, previously called standard volumes, deliver 100 IOPS on average, and can burst to hundreds of IOPS. Lowest cost
EFS

1. Elastic File System somewhat like EBS. 

2. EFS can be mounted on several EC2 instances and on-premise servers at the same time unlike EBS.

3. EFS currently only works with Linux, not with Windows.

4. EBS has fixed amount of storage while EFS can be scaled whenever required.

5. Coming soon, the Amazon EFS Infrequent Access storage class.

6. EBS and EFS cannot be used as an origin for CDN unlike S3.

7. EBS and EFS are faster than S3 as these are directly mounted on EC2.

Storage Gateway

1. Integrates on-premise datacenter storage with cloud storage.

2. It connects to AWS storage services, such as Amazon S3, Amazon Glacier, and Amazon EBS, providing storage for files, volumes, and virtual tapes in AWS.

3. Storage Gateway is downloaded and installed at on-premise.

4. Caching and monitoring of data using Storage Gateway.

5. File Gateway: Simple file storage using NFS (Network File System) protocol.

6. Volume Gateway: Hard disk / block storage, cached mode (frequent access data is in cache of Volume Gateway and entire data is in cloud) and storage mode (entire data is in data center and asynchronously backed up to cloud).

7. Tape Gateway

Sunday, December 23, 2018

AWS Database Services: RDS, DynamoDB, ElastiCache, Neptune and Redshift

AWS provides following services under database section:

1. RDS
2. DynamoDB
3. ElastiCache
4. Neptune
5. Redshift

Following are some basic and important points about AWS Database services:

RDS

1. Relational Database Service. Supports Aurora, MySQL, MariaDB, PostgreSQL, Oracle, MS SQL Server.

2. Backup and Restore methods: Automated (done by AWS automatically, backs up data with transaction logs) and Snapshots (manual process, usually done by system admins).

3. To improve DB performance, you can use ElastiCache, DAX and Read Replicas.

Aurora 

1. Combination of MySQL and PostgreSQL (RDBMS based).

2. Up to 5 times faster than standard MySQL databases and 3 times faster than standard PostgreSQL databases.

3. Automatically scales up to 64TB on SSD per database instance.

4. Replicates 6 copies of database across 3 Availability Zones.

5. Each DB cluster can have up to 15 READ replicas.

6. Failover takes less than 30 seconds.

7. Backs up database to S3.

8. You can monitor database performance using Amazon CloudWatch.

DynamoDB

1. DynamoDB (Dynamo Database or DDB) is Amazon NoSQL Database.

2. DynamoDB Security is provided by Fine-Grained Access Control (FGAC) mechanism. FGAC is based on the AWS IAM.

3. DynamoDB Triggers integrate with AWS Lambda.

4. DynamoDB Streams provides a 24-hour chronological sequence of updates to items in a table. AWS Lambda can read updates to a table from a stream.

5. Dynamo DB Accelerator (DAX) is in-memory database cache for Dynamo DB.

Neptune

1. Graph based database

Redshift 

1. Data Warehouse and Reporting System in the Amazon Cloud.

2. Use OLAP (Online Analytical Processing), SQL and BI tools to analyze the data.

3. Redshift Spectrum extends the power of Redshift to query unstructured data in S3 (without loading your data into Redshift).

ElastiCache

1. In-memory database cache in the Amazon Cloud for fast performance.

2. ElastiCache Engines: Redis, Memcached

DMS

1. Database Migration Service with zero/negligible downtime.

2. Supports homogenous (example: Oracle to Oracle) and heterogeneous (example: Oracle to Aurora or MySQL) database migration.

Thursday, October 5, 2017

Use of Private Constructor in OOPS

If we set access specifier of a constructor to private, then that constructor can only be accessed inside the class. A private constructor is mainly used when you want to prevent the class instance from being created outside the class. 

This is mainly in the case of singleton class. Singleton classes are employed extensively in concepts like Networking and Database Connectivity. Using private constructor we can ensure that no more than one object can be created at a time. 

Example of Private Constructor in a Singleton Class

public class SingletonClass
{
    public static SingletonClass singletonClass;

    private SingletonClass() 
    {
    }

    public static SingletonClass getInstance() 
    {
        if(singletonClass == null) 
        {
            singletonClass = new SingletonClass();
        }
        return singletonClass;
    }
}

A class with private constructor cannot be inherited.

If we don’t want a class to be inherited, then we make the class constructor private. So, if we try to derive another class from this class then compiler will flash an error. Why compiler will flash an error? 

We know the order of execution of constructor in inheritance that when we create an object of a derived class then first constructor of the base call will be called and then constructor of derived class. Since base class constructor is private, hence, derived class will fail to access base class constructor.

We can also use sealed class to stop a class to be inherited. Sealed class provide more flexible and readable way to stop inheritance.

Diamond Problem in Multiple Inheritance in OOPS

The diamond problem occurs when two super classes of a class have a common base class. 

Suppose there are four classes A, B, C and D. Class B and C inherit class A. Now class B and C contains one copy of all the functions and data members of class A. Class D is derived from class B and C. Now class D contains two copies of all the functions and data members of class A. One copy comes from class B and another copy comes from class C.

Let’s say class A has a function with name display(). So class D have two display() functions as I have explained above. If we call display() function using class D object then ambiguity occurs because compiler gets confused that whether it should call display() that came from class B or from class C. If you will compile above program then it will show error.

This kind of problem is called diamond problem as a diamond structure is formed (see the image).

That is why major programming languages like C#, Java and Delphi don't have multiple inheritance because it can lead to diamond problem and rather than providing some complex way to solve it, there are better ways through which we can achieve the same result as multiple inheritance. We can use interfaces to resolve this problem.

C++ supports multiple inheritance.

Notice that the above problem with multiple class inheritance can also come with only three classes where all of them has at least one common method.

Because of this problem we can not extend two classes for implementing multiple inheritance and to resolve this problem of multiple inheritance in object oriented programming we use interfaces for implementing the functionality of multiple inheritance. 

As we know we do not define a function but only declare that function in an interface. So if we use interfaces we can extend one class and one or more interfaces or we can implement more than one interfaces at a time to use the functionality of multiple inheritance and we can escape from diamond problem.