Data Cube
Are you familiar with the terms Data Mart, Data Warehouse, Data Lake? If not, we got you covered. Just hover over each of the terms and it will hardly take 2–3mins to get familiar with the terms.
Ever wondered how e-commerce websites store information related to its customers, the stores associated with the e-commerce platform, their location, the purchase details, etc.? Suppose if you want to retrieve the data related to a customer who purchases an item from the store located at someplace. Using traditional methods such as relational tables will make such data retrieval procedure a very vast and time-consuming process! There will be multiple numbers of joins among each table (customer table, store table, location table). To overcome this and to meet the present scenario, data scientists came with the term ‘data cube’. You all must have seen dice. It's a 3D representation of a 2D square, right?. Now in a similar manner, one may think a data cube as a 3D representation of a 2D table. Although called a “cube”, it can be 2-dimensional, 3-dimensional, or higher-dimensional.
Technically speaking, a data cube is a data abstraction to evaluate aggregated data from a variety of viewpoints. Well, I know that the previous line does sound like a rocket-science, but stay tuned, it will make sense to you in a while. Okay. Let’s talk in general terms. In tables, you store data in terms of rows and columns. When you combine or group data in multidimensional matrices, then it is called a data cube. If you ever heard of the terms “Multidimensional databases,” “Materialized views,” and “OLAP (On-Line Analytical Processing), then you are already familiar with data cube because these are alternative names of the data cube. Let’s understand the data cube with the help of a figure.
In this data cube, there are three schemas p as in price, c as in customer, and s as in sales. Taking these three schemas, it can be grouped into 8 views as shown. Here, when p, c, and s taken together, indicates pcs which is a view consisting of the aggregate function value (such as total-sales) computed by grouping three attributes price, supplier, and customer. Similarly, p indicates a view composed of the corresponding aggregate function values calculated by grouping price alone, etc.
A data cube is created from a subset of attributes in the database. Specific attributes are chosen to be measure attributes, i.e., the attributes whose values are of interest. Other attributes are selected as dimensions or functional attributes. The measure attributes are aggregated according to the dimensions. For example, a QuickCart store wants to create a sales data warehouse to keep records of the store’s sales for the dimensions: customer, time, product, branch, and location. Now, these dimensions enable the QuickCare store to keep track of things like monthly sales of items, and the branches and locations at which the items were sold. Each dimension may have a table identify with it, known as a dimensional table, which describes the dimensions. For example, a dimension table for products may contain the attributes product_name, product_subcategory, product_category, brand, and price.
Data cubes come handy when we talk about OLAP. In other words, OLAP tools are based on the multidimensional data model which is usually modeled using data cubes.