Thursday, January 9, 2014

28 MongoDB NoSQL Database Interview Questions and Answers

28 MongoDB NoSQL Database Interview Questions and Answers

MongoDB is the best free open source NoSQL document oriented database. If you are preparing for the technical interview on MongoDB NoSQL database, must prepare the following MongoDB NoSQL database interview questions. These MongoDB NoSQL database interview questions cover basic concepts of NoSQL databases, Replication, Sharding, Transactions and Locking, Profiler, Nuances, Journalling features etc. Lets have a look on following MongoDB NoSQL database interview questions:

1. What do you mean by NoSQL databases? What is the difference between NoSQL and RDBMS? Why and why not to use NoSQL databases? What are the various advantages of NoSQL databases?

I have written a complete blog post to answer above questions here.

2. What are the various types of NoSQL databases?

Types of NoSQL Databases

Examples: MongoDB, Cassandra, CouchDB, Hypertable, Redis, Riak, Neo4j, HBASE, Couchbase, MemcacheDB, RevenDB and Voldemort are the examples of NoSQL databases. Read in detail.

3. What is the basic difference between MySQL and MongoDB?

MySQL and MongoDB are both free and open source databases. MySQL and MongoDB have a lot of basic differences in terms of data representation, querying, relationships, transactions, schema design and definition, normalization, speed and performance. By comparing MySQL with MongoDB, we are comparing Relational and non-relational databases. Read in detail.

4. How do you compare MongoDB with CouchDB and CouchBase?

MongoDB and CouchDB are both document-oriented databases. MongoDB and CouchDB are the best examples of open source NoSQL database. Aside from both storing documents though, it turns out that they don't share much in common. There are a lot of difference between MongoDB and CouchDB in terms of implementation of their data-model, interface, object storage, replication methods etc.

For details, go through following links:

5. What makes MongoDB best?

Following features of MongoDB make it best NoSQL database:

Document-oriented
High performance
High availability
Easy scalability
Rich query language

6. What is 32 bit nuances?

There is extra memory mapped file activity with journaling. This will further constrain the limited db size of 32 bit builds. Thus, for now journaling by default is disabled on 32 bit systems.

7. Will the journal replay have problems if entries are incomplete (like the failure happened in the middle of one)?

Each journal (group) write is consistent and won't be replayed during recovery unless it is complete.

8. What is role of Profiler in MongoDB?

MongoDB includes a database profiler which shows performance characteristics of each operation against the database. Using the profiler you can find queries (and write operations) which are slower than they should be; use this information, for example, to determine when an index is needed.

9. What's a "namespace"?

MongoDB stores BSON objects in collections. The concatenation of the database name and the collection name (with a period in between) is called a namespace.

10. If you remove an object attribute is it deleted from the store?

Yes, you remove the attribute and then re-save() the object.

11. Can I use the journaling feature to perform safe hot backups?

Yes

12. Are null values allowed?

For members of an object, yes. You cannot add null to a database collection though as null isn't an object. You can add {}, though.

13. Does an update fsync to disk immediately?

No, writes to disk are lazy by default. A write may hit disk a couple of seconds later. For example, if the database receives a thousand increments to an object within one second, it will only be flushed to disk once. (Note fsync options are available though both at the command line and via getLastError_old.)

14. How do I do transactions/locking?

MongoDB does not use traditional locking or complex transactions with rollback, as it is designed to be lightweight and fast and predictable in its performance. It can be thought of as analogous to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, performance is enhanced, especially in a system that may run across many servers.

15. Why are my data files so large?

MongoDB does aggressive preallocation of reserved space to avoid file system fragmentation.

16. How long does replica set failover take?

It may take 10-30 seconds for the primary to be declared down by the other members and a new primary elected. During this window of time, the cluster is down for "primary" operations – that is, writes and strong consistent reads. However, you may execute eventually consistent queries to secondaries at any time (in slaveOk mode), including during this window.

17. What's a master or primary?

This is a node/member which is currently the primary and processes all writes for the replica set. In a replica set, on a failover event, a different member can become primary.

18. What's a secondary or slave?

A secondary is a node/member which applies operations from the current primary. This is done by tailing the replication oplog (local.oplog.rs).
Replication from primary to secondary is asynchronous, however the secondary will try to stay as close to current as possible (often this is just a few milliseconds on a LAN).

19. Do I have to call getLastError to make a write durable?

No. If you don't call getLastError (aka "Safe Mode") the server does exactly the same behavior as if you had. The getLastError call simply lets one get confirmation that the write operation was successfully committed. Of course, often you will want that confirmation, but the safety of the write and its durability is independent.

20. Should I start out with sharded or with a non-sharded MongoDB environment?

We suggest starting unsharded for simplicity and quick startup unless your initial data set will not fit on single servers. Upgrading to sharding from unsharded is easy and seamless, so there is not a lot of advantage to setting up sharding before your data set is large.

21. How does sharding work with replication?

Each shard is a logical collection of partitioned data. The shard could consist of a single server or a cluster of replicas. We recommmend using a replica set for each shard.

22. When will data be on more than one shard?

MongoDB sharding is range based. So all the objects in a collection get put into a chunk. Only when there is more than 1 chunk is there an option for multiple shards to get data. Right now, the default chunk size is 64mb, so you need at least 64mb for a migration to occur.

23. What happens if I try to update a document on a chunk that is being migrated?

The update will go through immediately on the old shard, and then the change will be replicated to the new shard before ownership transfers.

24. What if a shard is down or slow and I do a query?

If a shard is down, the query will return an error unless the "Partial" query options is set. If a shard is responding slowly, mongos will wait for it.

25. Can I remove old files in the moveChunk directory?

Yes, these files are made as backups during normal shard balancing operations. Once the operations are done then they can be deleted. The cleanup process is currently manual so please do take care of this to free up space.

26. How can I see the connections used by mongos?

db._adminCommand("connPoolStats");

27. If a moveChunk fails do I need to cleanup the partially moved docs?

No, chunk moves are consistent and deterministic; the move will retry and when completed the data will only be on the new shard.

28. If I am using replication, can some members use journaling and others not?

Yes

Wednesday, January 8, 2014

CouchDB vs CouchBase: Differences and Similarities between CouchDB and CouchBase

CouchDB vs CouchBase: Differences and Similarities between CouchDB and CouchBase

Apache CouchDB and CouchBase, both are Free, Open Source, NoSQL document databases using JSON as their document format. Apache CouchDB and CouchBase have a lot of similarities and differences. Basically CouchBase is made up by combining the features of CouchDB and MemBase databases. CouchDB's document-oriented data model, indexing and querying capabilities are combined with MemBase's distributed key-value data model, high performance, easily scalable, always-on capabilities are combined to form CouchBase. 

In short, we can say:

CouchBase = CouchDB + MemBase

But, CouchBase is not a new version of CouchDB but it is in fact a new version of MemBase. CouchBase Server is actually a new name for the MemBase Server. CouchBase is not a drop-in replacement for CouchDB but is a drop-in replacement for Memcached.  CouchBase still uses the Memcached protocol and not the RESTful API of CouchDB. Meanwhile CouchDB is still CouchDB, actively maintained and enhanced as an Apache project. Also, CouchDB is not obsoleted by the CouchBase. CouchDB is an actively maintained open-source project and CouchBase is a completely separate project. 

Similarities between CouchDB and CouchBase

1. CouchDB and CouchBase, both are NoSQL document databases using JSON as their document format. 

2. Both CouchDB and CouchBase use the same approach to indexing and querying.

3. Both CouchDB and CouchBase use the same replication system except peer to peer replication.

Although CouchBase has been developed by combining the exciting features of CouchDB and MemBase, both CouchDB and CouchBase have a lot of differences with respect to their clustering, caching and licensing systems.

Differences between CouchDB and CouchBase

1. Clustering System

CouchBase has a built-in clustering system that allows data to be automatically spread across multiple nodes. Apache CouchDB, on the other hand, is a single node solution with peer-to-peer replication technology and is better suited for decentralized systems and for holding data amounts that do not need to be spread out across multiple nodes.

2. Caching System

CouchBase - like MemBase - has a built-in, Memcached-based caching technology that consistently provides sub millisecond read/write performance and is capable of hundreds of thousands of ops per second per node. Apache CouchDB is a disk-based database that usually is better suited for use cases where super low latency or high throughput is not a requirement. 

3. Licensing System

The CouchBase is not entirely open-source/free software. There are two versions: Community Edition (free but no latest bug fixes) and Enterprise Edition (there are restrictions on usage, confidentiality provisions, audits by CouchBase Inc. and other terms typical to proprietary software that many people may find unacceptable).

CouchDB is an open-source/free software (no strings attached) project of The Apache Software Foundation and is released under the Apache License, Version 2.0.

4. Other Differences

CouchBase does not support following features of CouchDB:

A) RESTful API (only for views, not for CRUD operations)

B) Peer-to-peer replication

C) CouchApps

D) Futon (there is a different administration interface available)

E) Document IDs

F) Notion of databases (there are only buckets)

G) Replication between a CouchDB database and CouchBase Server

H) Explicit attachments (you have to store additional files as new key/value pairs)

I) HTTP API for everything is not available in CouchBase unlike CouchDB (you need to use the Couchbase Server SDKs or one of the Experimental Client Libraries at Couchbase Develop so no experiments with curl and wget)

J) CouchDB API (it uses the Memcached API instead)

K) You can't do everything from the browser in CouchBase unlike CouchDB (you have to write a server-side application)

L) No two-tier architecture for Web apps is possible in CouchBase unlike CouchDB(you have to write a server-side application to sit between the browser and the database, like with relational databases)

M) Leventual consistency

Monday, January 6, 2014

What is TLogFont in Delphi? How to use TLogFont to rotate the texts on Canvas?

What is TLogFont in Delphi? How to use TLogFont to rotate the texts on Canvas?

TLogFont structure / record in Delphi is used to play with the fonts to be displayed on the Canvas. You can change the look and feel of the text. You can change the orientation of the text and rotate it to any degree you want. TLogFont is declared in Winapi.Windows unit in Delphi. Following is the sample TLogFont structure in Delphi.

TLogFont = record;
  lfHeight: Integer;
  lfWidth: Integer;
  lfEscapement: Integer;
  lfOrientation: Integer;
  lfWeight: Integer;
  lfItalic: Byte;
  lfUnderline: Byte;
  lfStrikeOut: Byte;
  lfCharSet: Byte;
  lfOutPrecision: Byte;
  lfClipPrecision: Byte;
  lfQuality: Byte;
  lfPitchAndFamily: Byte;
  lfFaceName: array [0 .. lf_FaceSize - 1] of Byte;
end;

Following is the brief explanation of TLogFont members:
  
lfEscapement: The angle between the font's baseline and escapement vectors, in units of 1/10 degrees. 

lfOrientation: The angle between the font's baseline and the device's x-axis, in units of 1/10 degrees.

lfWeight: One of the following flags specifying the boldness (weight) of the font:

FW_DONTCARE = 0 Default weight.
FW_THIN = 100 Thin weight.
FW_EXTRALIGHT = 200 Extra-light weight.
FW_LIGHT = 300 Light weight.
FW_NORMAL = 400 Normal weight.
FW_MEDIUM = 500 Medium weight.
FW_SEMIBOLD = 600 Semi-bold weight.
FW_BOLD = 700 bold weight.
FW_EXTRABOLD = 800 Extra-bold weight.
FW_HEAVY = 900 Heavy weight.

lfWidth: The average width of the font's characters. If 0, the font mapper tries to determine the best value

lfItalic: A non-zero value if the font is italicized, 0 if not.

lfUnderline: A non-zero value if the font is underlined, 0 if not.

lfStrikeOut: A non-zero value if the font is striked out, 0 if not.

lfQuality: field contains the one of quality flags: Default_Quality, Draft_Quality or Proof_Quality.

lfCharSet: It may be set as ANSI_CharSet, OEM_CharSet or Symbol_CharSet.

lfFaceName: It contains font name like Times New Roman, Verdana etc.

Following is the code to demonstrate how to rotate the text on canvas using TLogFont structure in Delphi:

procedure TForm1.FormPaint(Sender: TObject);
var
  lf: TLogFont;
  tf: TFont;
begin
  with Form1.Canvas do
  begin
    Font.Name := 'Arial';
    Font.Size := 24;
    tf := TFont.Create;
    try
      tf.Assign(Font);
      GetObject(tf.Handle, SizeOf(lf), @lf);
      lf.lfEscapement  := 320;
      lf.lfOrientation := 320;
      SetBkMode(Handle, TRANSPARENT);
      tf.Handle := CreateFontIndirect(lf);
      Font.Assign(tf);
    finally
      tf.Free;
    end;
    TextOut(10, Height div 2, 'Your Rotated Text!');
  end;
end;

Saturday, January 4, 2014

12 Best Free and Open Source NoSQL Databases

12 Best Free and Open Source NoSQL Databases

NoSQL databases are becoming popular day by day. I have come up with the list of best, free and open source NoSQL databases. MongoDB tops the list of Open Source NoSQL databases. This list of free and open source databases comprises of MongoDB, Cassandra, CouchDB, Hypertable, Redis, Riak, Neo4j, HBASE, Couchbase, MemcacheDB, RevenDB and Voldemort. These free and open source NoSQL databases are really highly scale-able, flexible and good for big data storage and processing. These open source NoSQL databases are far ahead in terms of performance as compared to traditional relational databases. However, these may not be always the best choice for you. Most of common applications can still be developed using traditional relational databases. NoSQL databases are still not the best option for a mission critical transaction needs. I have listed down a small description of all these best free and oper source NoSQL databases. Lets have a look..

1. MongoDB

MongoDB is a document-oriented database that uses a JSON-style data format. It's ideal for website data storage, content management and caching applications, and can be configured for replication and high availability.

This highly scale-able and agile NoSQL database is a amazing performing system. This open source database written in C++ comes with a storage that is document oriented. Also, you will be provided with benefits like full index support, high availability across WANs and LANs along with easy replication, horizontal scaling, rich queries that are document based, flexibility in data processing and aggregation along with proper training, support and consultation.

2. Cassandra

An Apache Software Foundation project, Cassandra is a distributed database that allows for decentralized data storage that is fault tolerant and has no single point of failure. In other words, "Cassandra is suitable for applications that can't afford to lose data."

3. CouchDB

A product of the Apache Software Foundation, CouchDB is another document-oriented database that stores data in JSON format. It's ACID compliant, and like MongoDB, can be used to store data and content for websites, and to provide caching. You can use JavaScript to run MapReduce Queries on CouchDB. It also provides a very convenient web based administration console. This database could be really handy for web applications.

4. Hypertable

Modeled after Google's BigTable database system, Hypertable's creators aim for it to be the "open source standard for highly available, petabyte scale, database systems." In other words, Hypertable is designed for storing massive amounts of data reliably across many cheap servers.

5. Redis

This is an open source, key value store of an advanced level. Owing to the presence of hashes, sets, strings, sorted sets and lists in a key; Redis is also called as a data structure server. This system will help you in running atomic operations like incrementing value present in a hash, set intersection computation, string appending, difference and union. Redis makes use of in-memory dataset to achieve high performance. Also, this system is compatible with most of the programming languages.

6. Riak

This is one of the most powerful, distributed databases ever to be introduced. It provides for easy and predictable scaling and equips users with the ability for quick testing, prototyping and application deployment so as to simplify development.

7. Neo4j

This is a NoSQL graph database which exhibits a high level of performance. It comes well equipped with all the features of a robust and mature system. It provides the programmers with a flexible and object oriented network structure and allows them to enjoy all the benefits of a database that is fully transactional. Compared to RDBMS, Neo4j will also provide you with performance improvements on some of the applications.

8. Hadoop HBASE

HBase can be easily considered as a scalable, distributed and a big data store. This database can be used when you are looking for real time and random access to your data. It comes with modular and linear scalability along with reads and writes that are strictly consistent. Other features include Java API that has an easy client access, table sharding that is configurable and automatic, Bloom filters and block caches and much more.

9. Couchbase

While Couchbase was a fork of CouchDB, it has become more of a full-fledged data product and less of a ball of framework than CouchDB. Its transition to a document database will give MongoDB a run for its money. It is multithreaded per node, which can be a major scalability benefit -- especially when hosted on custom or bare-metal hardware. With some nice integration features, including with Hadoop, Couchbase is a great choice for an operational data store.

10. MemcacheDB

This is a distributed storage system of key value. It should not be confused with a cache solution; rather, it is a persistent storage engine which is meant for data storage and retrieval in a fast and reliable manner. Confirmation to memcache protocol is provided for. The storing backend that is used is the Berkeley DB which supports features like replication and transaction.

11. REVENDB

RAVENDB is a second generation open source DB. This DB is document oriented and schema free such as you simply have to dump in your objects into it. It provides extremely flexible and fast queries. This application makes scaling extremely easy by providing out-of-the-box support for replication, multi tenancy and sharding. There is full support for ACID transactions along with safety of your data. Easy extensibility via bundles is provided along with high performance.

12. Voldemort

This is an automatically replicating distributed storage system. It provides for automatic partitioning of data, transparent handling of server failure, pluggable serialization, independence of nodes and versioning of data items along with support for data distribution across various centers.

Friday, January 3, 2014

NoSQL vs RDBMS: Why and why not to use NoSQL over RDBMS?

NoSQL vs RDBMS: Why and why not to use NoSQL over RDBMS? 

NoSQL (not only SQL) is not a relational database management system (RDBMS). We will discuss what is the difference between NoSQL databases and Relational Databases and then why and why not to use NoSQL database model over traditional and relational database model (RDBMS) in detail. As NoSQL is the new technology, it is also facing many challenges, so will also have a look upon them.

Today, the internet world has billions of users. Big Data, Big Users, and Cloud Computing are the big technologies which every major internet application is using or preparing to use because internet application users are growing day by day and data is becoming more and more complex and unstructured which is very hard to manage using traditional relational database management system (RDBMS). NoSQL technology has the answer to all these problems. NoSQL is meant for Unstructured Big Data and Cloud Computing. A NoSQL database is exactly the type of database that can handle the all sort of unstructured, messy and unpredictable data that our system of engagement requires. NoSQL is a whole new way of thinking about a database. 

Difference between NoSQL and Relational Data Models (RDBMS)

Relational and NoSQL data models are very different. 

The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well.  When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.

NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of  information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications.

Another major difference is that relational technologies have rigid schemas while NoSQL models are schemaless. Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal, extremely disruptive and frequently avoided – the exact opposite of the behavior desired in the Big Data era, where application developers need to constantly – and rapidly – incorporate new types of data to enrich their apps.

It also may not provide full ACID (atomicity, consistency, isolation, durability) guarantees, but still has a distributed and fault tolerant architecture.

The NoSQL taxonomy supports key-value stores, document store, BigTable, and graph databases.

In comparison, document databases are schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption.

Examples: MongoDB, Cassandra, CouchDB, HBase are the examples of NoSQL.

NoSQL Database Types

Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.

Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB.

Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality.

Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

Why to use NoSQL Databases?

1. NoSQL has Flexible Data Model to Capture Unstructured / Semi-structured Big Data

Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. It’s not surprising that developers want to enrich existing applications and create new ones made possible by it. And the use of the data is rapidly changing the nature of communication, shopping, advertising, entertainment, and relationship management. Apps that don’t leverage it quickly will quickly fall behind.

Developers want a very flexible database that easily accommodates new data types and isn’t disrupted by content structure changes from third-party data providers. Much of the new data is unstructured and semi-structured, so developers also need a database that is capable of efficiently storing it. Unfortunately, the rigidly defined, schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data, and is a poor fit for unstructured and semi-structured data. NoSQL provides a data model that maps better to these needs.

A lot of applications might gain from this unstructured data model: tools like CRM, ERP, BPM, etc, could use this flexibility to store their data without performing changes on tables or creating generic columns in a database. These databases are also good to create prototypes or fast applications, because this flexibility provides a tool to develop new features very easily.

2. NoSQL is highly and easily scalable (Scale up vs Scale out)

If millions of users are using your app frequently and concurrently, you need to think about the scalable database technology instead of traditional RDBMS. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance users demand. You need to switch to NoSQL databases.

For the cloud applications, relational databases were originally the popular choice. Their use was increasingly problematic however, because they are a centralized, share-everything technology that scales up rather than out. This made them a poor fit for applications that require easy and dynamic scalability. NoSQL databases have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture.

Scale up vs Scale out

To deal with the increase in concurrent users (Big Users) and the amount of data (Big Data), applications and their underlying databases need to scale using one of two choices: scale up or scale out. Scaling up implies a centralized approach that relies on bigger and bigger servers. Scaling out implies a distributed approach that leverages many standard, commodity physical or virtual servers.

Scale up with relational technology: limitations at the database tier

At the web/application tier of the three-tier Internet architecture, a scale out approach has been the default for many years and worked extremely well. As more people use an application, more commodity servers are added to the web/application tier, performance is maintained by distributing load across an increased number of servers, and the cost scales linearly with the number of users.

Prior to NoSQL databases, the default scaling approach at the database tier was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology. To support more concurrent users and/or store more data, you need a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables. Big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware typically used so effectively at the web/application server tier.

Scale out with NoSQL technology at the database tier

NoSQL databases were developed from the ground up to be distributed, scale out databases. They use a cluster of standard, physical or virtual servers to store data and support database operations. To scale, additional servers are joined to the cluster and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failure making them highly resilient.

NoSQL databases provide a much easier, linear approach to database scaling. If 10,000 new users start using your application, simply add another database server to your cluster. Add ten thousand more users and add another server. There’s no need to modify the application as you scale since the application always sees a single (distributed) database.

At scale, a distributed scale out approach also usually ends up being cheaper than the scale up alternative. This is a consequence of large, complex, fault tolerant servers being expensive to design, build and support. Licensing costs of commercial relational databases can also be prohibitive because they are priced with a single server in mind. NoSQL databases on the other hand are generally open source, priced to operate on a cluster of servers, and relatively inexpensive.

While implementations differ, NoSQL databases share some characteristics with respect to scaling and performance:

DYNAMIC SCHEMAS

Relational databases require that schemas be defined before you can add data. For example, you might want to store data about your customers such as phone numbers, first and last name, address, city and state – a SQL database needs to know what you are storing in advance.

This fits poorly with agile development approaches, because each time you complete new features, the schema of your database often needs to change. So if you decide, a few iterations into development, that you'd like to store customers' favorite items in addition to their addresses and phone numbers, you'll need to add that column to the database, and then migrate the entire database to the new schema.

If the database is large, this is a very slow process that involves significant downtime. If you are frequently changing the data your application stores – because you are iterating rapidly – this downtime may also be frequent. There's also no way, using a relational database, to effectively address data that's completely unstructured or unknown in advance.

NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy to make significant application changes in real-time, without worrying about service interruptions – which means development is faster, code integration is more reliable, and less database administrator time is needed.

AUTO-SHARDING

Because of the way they are structured, relational databases usually scale vertically – a single server has to host the entire database to ensure reliability and continuous availability of data. This gets expensive quickly, places limits on scale, and creates a relatively small number of failure points for database infrastructure. The solution is to scale horizontally, by adding servers instead of concentrating more capacity in a single server.

"Sharding" a database across many server instances can be achieved with SQL databases, but usually is accomplished through SANs and other complex arrangements for making hardware act as a single server. Because the database does not provide this ability natively, development teams take on the work of deploying multiple relational databases across a number of machines. Data is stored in each database instance autonomously. Application code is developed to distribute the data, distribute queries, and aggregate the results of data across all of the database instances. Additional code must be developed to handle resource failures, to perform joins across the different databases, for data rebalancing, replication, and other requirements. Furthermore, many benefits of the relational database, such as transactional integrity, are compromised or eliminated when employing manual sharding.

NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and automatically spread data across an arbitrary number of servers, without requiring the application to even be aware of the composition of the server pool. Data and query load are automatically balanced across servers, and when a server goes down, it can be quickly and transparently replaced with no application disruption.

Cloud computing makes this significantly easier, with providers such as Amazon Web Services providing virtually unlimited capacity on demand, and taking care of all the necessary database administration tasks. Developers no longer need to construct complex, expensive platforms to support their applications, and can concentrate on writing application code. Commodity servers can provide the same processing and storage capabilities as a single high-end server for a fraction of the price.

“Sharding” a relational database can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers.

INTEGRATED CACHING

A number of products provide a caching tier for SQL database systems. These systems can improve read performance substantially, but they do not improve write performance, and they add complexity to system deployments. If your application is dominated by reads then a distributed cache should probably be considered, but if your application is dominated by writes or if you have a relatively even mix of reads and writes, then a distributed cache may not improve the overall experience of your end users.

Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer that must be maintained.

REPLICATION

Most NoSQL databases also support automatic replication, meaning that you get high availability and disaster recovery without involving separate applications to manage these tasks. The storage environment is essentially virtualized from the developer's perspective.

Challenges of NoSQL

The promise of the NoSQL database has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the top challenges.

1. Maturity

RDBMS systems have been around for a long time. NoSQL advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison, most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.

Living on the technological leading edge is an exciting prospect for many developers, but enterprises should approach it with extreme caution.

2. Support

Enterprises want the reassurance that if a key system fails, they will be able to get timely and competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise support.

In contrast, most NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.

3. Analytics and business intelligence

NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications. Consequently, most of their feature set is oriented toward the demands of these applications. However, data in an application has value to the business that goes beyond the insert-read-update-delete cycle of a typical Web application. Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all medium to large companies.

NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.

Some relief is provided by the emergence of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a variety of NoSQL databases.

4. Administration

The design goals for NoSQL may be to provide a zero-admin solution, but the current reality falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to maintain.

5. Expertise

There are literally millions of developers throughout the world, and in every business segment, who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL developer is in a learning mode. This situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.

Conclusion

NoSQL databases are becoming an increasingly important part of the database landscape, and when used appropriately, can offer real benefits. However, enterprises should proceed with caution with full awareness of the legitimate limitations and issues that are associated with these databases.

Wednesday, January 1, 2014

MariaDB: An Open Source Alternative and Replacement of Oracle's MySQL

MariaDB: An Open Source Alternative and Replacement of Oracle's MySQL

MySQL was the best open source database in the market before Sun Microsystems purchased it. After this, Oracle purchased Sun Microsystems and MySQL came in the hands of Oracle. After this purchase a fair number of MySQL users started looking for an alternative. So, here MariaDB comes into play. MariaDB can be considered an open source alternative and replacement of MySQL. MariaDB is a fork of the MySQL source code. MariaDB is also considered as "drop-in replacement” for MySQL. MariaDB has been invented by Monty Widenius, the same author who invented MySQL. MariaDB also includes some new features that make it better than MySQL and better performance than MySQL.

A brief history of MySQL and origin of MariaDB

A) Invention of MySQL

MySQL was invented by a Swedish company, MySQL AB, which was founded by David Axmark, Allan Larsson and Michael "Monty" Widenius. It is named for Widenius' daughter, My. MySQL was born about the same time as the World Wide Web, in 1994. It was popular from its inception and quickly became a standard part of the LAMP stack (Linux, Apache, MySQL, Perl/PHP/Python) that powers most Web sites. MySQL has been dual-licensed from the beginning: free for personal use, but you had to purchase a commercial license to use it on commercial sites or on Windows servers. In 2000 they changed the free license to the GPL, which meant that commercial users no longer had to purchase licenses. MySQL AB took a huge financial hit from this, losing as much as 80 percent of revenues by some accounts.

But, to their credit, they stuck with the GPL. Mickos became CEO in 2001, and the company's market reach and revenue steadily increased. But even though they were becoming dominant when measured by installations (about a third of the total market share), their revenues were still smallish-- for example, they grossed about $50 million in 2007, but to industry giants like Microsoft and Oracle it's a rounding error. Oracle looked upon MySQL and decided they had to have it, and started making purchase overtures around 2006. MySQL AB said no.

B) Sun Microsystems purchased MySQL

In 2007 MySQL AB raked in a cool $75 million. In 2008 Sun Microsystems bought them for approximately $1 billion. Though purchasing the whole company meant buying the talent that built MySQL. But Widenius and Axmark didn't last long at Sun and left in 2008, with many public complaints. Widenius was a vocal critic of Sun's MySQL 5.1 release and called it buggy and crashy. Then Mickos left in 2009, and the original MySQL visionaries were gone.

C) Oracle purchased Sun Microsystems

In 2009 Oracle purchases Sun Microsystems for $7.4 billion. Monty Widenius, in one of the most amazing cases of seller's remorse ever, arose in a mighty dudgeon and tried to derail the deal with his "Save MySQL" campaign. He urged people to write to the European Union and ask them to either shoot down the whole deal, or require Oracle to obey certain conditions. MySQL was firmly entrenched as an extremely important piece of technology, the foundation of the Internet and of many private businesses, and he feared that Oracle would not be a good steward of MySQL, but would kill it, or close the source.

The deal went through anyway and Oracle, at last, owned MySQL.

D) Birth of MariaDB

Monty did not retire to count his giant pile of Sun money, but instead forked MySQL and created MariaDB in 2009. (Fortunately for his naming convention he has another daughter, Maria.) MariaDB is a non-commercial community venture, always free software and always free of cost, though donations are welcome. MariaDB is now managed by the MariaDB Foundation, which is led by Michael Widenius, David Axmark, and Allan Larsson. So they created it, sold it, and got it back for free, which makes this the all-time having-your-cake-and-eating-it winner.

Migration from MySQL to MariaDB

MariaDB is a binary drop in replacement for MySQL. MariaDB has a lot of new options, extension, storage engines and bug fixes that are not in MySQL. Migration from MySQL to MariaDB is very simple because:

1. Data and table definition files (.frm) files are binary compatible.
2. All client APIs, protocols and structs are identical.
3. All filenames, binaries, paths, ports, sockets, and etc... are the same.
4. All MySQL connectors (PHP, Perl, Python, Java, .NET, MyODBC, Ruby, MySQL C connector etc) work unchanged with MariaDB.
5. The mysql-client package also works with MariaDB server.
6. The shared client library is binary compatible with MySQL's client library.

This means that for most cases, you can just uninstall MySQL and install MariaDB and you are good to go. 

MySQL is declining, becoming obsolete and MariaDB is rising

Despite being the most popular open-source database management system (DBMS), Oracle's MySQL has been sinking into trouble. Major Linux distributions like Red Hat and SUSE, are switching it out for its fork, MariaDB. Major Websites, such as Wikipedia, have also replaced MySQL with MariaDB. Now, adding insult to injury, Google is moving to MariaDB from MySQL.

Monday, December 30, 2013

MongoDB vs CouchDB: Open Source NoSQL and Document Databases Comparison

MongoDB vs CouchDB: Open Source NoSQL and Document Databases Comparison

MongoDB and CouchDB are both document-oriented databases. MongoDB and CouchDB are the best examples of open source NoSQL database. Aside from both storing documents though, it turns out that they don't share much in common. There are a lot of difference between MongoDB and CouchDB in terms of implementation of their data-model, interface, object storage, replication methods etc. 

As we say, MongoDB and CouchDB are both document-oriented free and open source databases, the word "document" does not mean a word processing file or a PDF. Rather, a document is a data structure defined as a collection of named fields. JSON (JavaScript Object Notation) is currently the most widely used notation for defining documents within document-oriented databases. JSON's advantage as an object notation is that, once you comprehend its syntax -- and JSON is remarkably easy to grasp -- then you have all you need to define what amounts to the schema of a document database. That's because, in a document database, each document carries its own schema -- unlike an RDBMS, in which every row in a given table must have the same columns.

In this article "MongoDB vs CouchDB", I have tried to make a comparison between MongoDB and CouchDB and figured out following difference between these two Free, Open Source, NoSQL document oriented databases: 

Data Model: MongoDB and CouchDB are both document oriented databases. MongoDB is JSON based while CouchDB is BSON based.

Interface: MongoDB uses custom protocol over TCP/IP while CouchDB uses HTTP/REST protocol.

Object Storage: MongoDB database contains collections and those collections contain documents while CouchDB directly contains all the documents.

Implementation Language: MongoDB is written in C++ while CouchDB is written in Erlang.

Server operating systems: MongoDB can operate on Linux, OS X, Solaris and Windows platforms while CouchDB can operate on Android, BSD, Linux, OS X, Solaris and Windows.

Supported programming languages: MongoDB supports a lot of programming languages like Actionscript, C, C#, C++, Clojure, ColdFusion, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, MatLab, Perl, PHP, PowerShell, Prolog, Python, R, Ruby, Scala and Smalltalk. CouchDB supports lesser programming languages as compared to MongoDB. Programming languages supported by CouchDB are C, C#, ColdFusion, Erlang, Haskell, Java, JavaScript, Lisp, Lua, Objective-C, OCaml, Perl, PHP, PL/SQL, Python, Ruby and Smalltalk. 

Replication: MongoDB supports only Master-Slave replication. On the other hand, CouchDB supports Master-Master Replication as well as Master-Slave Replication.

Triggers: MongoDB does not support triggers while CouchDB does.

Developer: MongoDB is developed by MongoDB, Inc. MongoDB was initially released in 2009. CouchDB is developed by Apache Software Foundation. CouchDB was initially release in 2005.