NoSQL Databases have four distinct types. Key-value stores, document stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases.
Columnar databases have experienced a renaissance in recent years, owing to increased interest in analytic queries that run scans and aggregates across significant areas of a table’s a few columns.
What is a Column-Oriented Database?
As time passes and more data becomes available, businesses increasingly recognise that a one-size-fits-all approach to databases does not work. When analysing terabytes of data with analytical queries spanning 1000s of rows, column-oriented databases can deliver a boost. Let take a deeper look into how the internal data structure is.
Column stores are relational databases that store data on a column-by-column basis, rather than on a row-by-row basis. Whereas standard row-based store stores all of the characteristics for a single row together, followed by the attributes for the next row, and so on, a column-based store uses a single logical file for each attribute.
A column-oriented database stores each column continuously. i.e. on disk or in-memory, each column on the left will be stored in sequential blocks.
Aggregate operations over a limited number of columns are incredibly quick, as is retrieving data in this format. This is especially critical for hard disc drives, as their performance characteristics are optimised for sequential access.
Columnar storage, on the other hand, are often preferable for OLAP (online analytical processing) applications. Column stores are frequently used in analytical applications, where queries scan a large number of individual tables and produce aggregates or other information.
Row vs Column Oriented Database
Choosing between a database that stores data as rows or columns, for example, has a big impact on the database performance. For this reason, certain access patterns are better suited for row oriented databases (row-stores), and others for column oriented databases (columnar-stores)
The critical distinction is in the manner in which the data in the database schema is persistently stored (magnetic/SSD storage). In a row-oriented database, tables are kept in a single location with all of their columns. By contrast, columnar databases keep each column in its own location.
This way if you want to do a SELECT query on the database and you need only one specific field, instead of retrieving all the tables into memory and filtering a column, you can fetch only the relevant column, which is much faster.
Moreover, row stores are excellent for transaction processing at their most fundamental level. On the other hand, column stores are ideal for query models that require a high level of analysis.
While row stores are extremely fast at writing data, column stores excel at aggregating enormous volumes of data for a subset of columns.
Which one is the best fit for you?
As is the case with a great deal else, it depends. While data analysis can still be performed on a row-based database, queries will run slower than on a column store. While transactions can be entered using a column-based structure, writing may take longer. In an ideal world, you would have access to both options, as many firms do.
Let us now jump to some of the benefits and drawbacks of a column types database.
Benefits of Column Oriented Database
Firstly, let’s discover some key benefits to column-oriented databases:
- Excellent performance when it comes to aggregating queries (like COUNT, SUM). By storing data by column rather than a row, you may speed up many of your queries by skipping many fields and focusing on the most important data.
- The compression of data is extremely efficient. This enables you to conserve disc space while storing huge amounts of data in a single column. Run-length encoding (RLE) is a lossless data compression technique in which runs of data are compressed.
- If a row-oriented storage engine is used to process massive data sets, such as the InnoDB engine in MySQL or MariaDB, the vast data must be indexed to optimise query performance. Additionally, indexes improve the performance of queries against a subset of data. However, indexes use more disc space. In comparison, a columnar storage engine data set does not require indexing because it replies faster to queries on a subset of data by default.
- Block-oriented and vectorised processing. Column-stores achieve significantly higher cache utilisation and CPU efficiency by sending cache-line-sized blocks of tuples between operators and working on several values concurrently, rather than utilising a standard tuple-at-a-time iterator. Vectorised CPU instructions for choices, expressions, and other sorts of arithmetic on these blocks of numbers can enhance performance even further.
- Columnar storage is better suited for analytical queries—for example, online analytical processing or OLAP—than for transactional queries, for example, online transaction processing or OLTP. With columnar storage, analytical queries produce a response in real-time. MariaDB’s ColumnStore would be an excellent solution for sophisticated joins and aggregations in big data analytic queries.
Drawbacks of Column Oriented Database
The fundamental downside of a column-oriented technique is the inefficiency with which an entire row can be altered (lookup, update, or deleted). This circumstance, however, should occur infrequently in analytics databases, as the majority of activities are read-only and writes are limited to appends. Consider other disadvantages.
- If most of the queries are in the format of transactional application, the columnar database might not be the suitable one.
- User queries against only a few rows cannot give you any benefits in columnar databases.
- When you need to see numerous fields from each row, though, columnar data isn’t appropriate. Traditional row databases are suitable for queries that just require user-specific values.
- Each column must be created one at a time, columnar databases can take longer to write new data.
Use Cases of Column Oriented Database
Due to the ability to handle and aggregating a high amount of data, this type of database can offer advantages to specific systems and use cases including:
- Data Warehouses and Business Intelligence
- Customer Relationship Management (CRM)
- Library Card Catalogs
- Ad hoc query systems
Columnar databases are being hailed as Business Intelligence (BI) future due to their capacity to conduct sophisticated queries in seconds rather than hours.
Columnar databases are particularly useful in data warehouses, where businesses transfer enormous volumes of data from various sources for business intelligence analysis. Column-oriented databases improve query performance due to the column architecture, which compresses data and reduces seek time.
Examples of Column-Oriented Databases
Amazon Redshift
Amazon Redshift is Amazon’s column-oriented database service. It’s a smart choice if your deployment is already hosted on AWS and you use other AWS services. One of the most critical criteria for Redshift is its capacity to manage analytic workloads on large data sets stored using the column-oriented DBMS approach.
PostgreSQL
PostgreSQL, the well-known row-based database, also supports column stores. This is especially advantageous if your staff is already familiar with and utilises PostgreSQL in production contexts.
ClickHouse
Yandex’s ClickHouse is an open-source database with a DBaaS component. It exists solely for the purpose of column-oriented storage. It is column-oriented and enables the real-time generation of analytical reports via SQL queries.
MariaDB
MariaDB is another widely used open-source database management system that supports both row and column storage. Its advantage over PostgreSQL is that it is compatible with the extremely popular MySQL. This enables you to take advantage of a robust tools ecosystem.
Snowflake
Snowflake provides the Data Cloud – a worldwide network through which thousands of companies can mobilise data at near-limitless scale, parallelism, and performance. Within the Data Cloud, enterprises unify their siloed data, make controlled data easily discoverable and securely shareable, and run different analytic workloads.
HBase
Hbase is a scalable, distributed database that enables the storage of structured data in big tables. HBase is utilised when random, real-time read/writes access to large amounts of data is required.
Let us Recap
Column stores are relational databases that store data on a column-by-column basis, rather than on a row-by-row basis. Whereas a standard row-based store stores all of the characteristics for a single row together, followed by the attributes for the next row, and so on, a column-based store uses a single logical file for each attribute.
Before we conclude, it’s worth noting that column-store databases are not always NoSQL-only. Frequently, the claim is made that column-store databases are so unlike to relational database models that they belong solidly in the NoSQL realm.
While column-store databases are enticing, they do have their own set of complications. For instance, the way it writes data results in a certain lack of consistency, as columns require repeated disc pushes. This is in contrast to relational databases, which store row data sequentially.