“ You can have data without information, but you cannot have information without data ” — Daniel Keys Moran
A very famous quote by Sherlock Holmes states that one cannot make bricks without clay. The term Data can be defined as facts or information that are collected through various observations. At the highest level, data can be classified as Qualified and Quantitative. In a much broader way, data can be divided into multiple categories, depending upon its usage and where it comes from.
Enterprise data is a centralized data that is shared by the users of an organization, generally across departments and/or geographic regions. If we talk about the sources of enterprise data, we can classify it into three categories:
- Structured data
- Semi-Structured data
- Unstructured data
Structured data conform to a data model such as the relational data model for relational databases. In this, similar entities are grouped together and attributes in a group are the same. Structured data is stored in a tabular format, that is, in a row and column format and data resides in fixed fields within a record.
Semi-structured data is information that does NOT conform to any data model. Since the structure of data is not fixed, so it can’t be stored in the form of rows and columns. However, with some process, one can store them in a relational database but that will be very difficult for some semi-structured data. The structure of this type of data consists of tags and elements where tags help to separate semantic elements. Semi-Structured data does not have sufficient metadata.
Unstructured data is data that is not in any particular format or sequence, that is, the structure may differ from record to record. It is not organized in a pre-defined manner or does not have a pre-defined data model. Unstructured data can not be stored in the form of rows and columns in a database since it does not follow any rules or semantics.
Further, we can categories Enterprise Data into two types: OLAP and OLTP.