Modern Enterprise Data Architecture – Data Lake or Data Warehouse?
Author: Joseph Bojilov, Data Architect
Published: 5th April 2022
In today’s world data is the catalyst for all decision making. In previous years companies needed Data Analysts to read and understand the data collected. Now technology has evolved beyond that, with data being transformed using Business Intelligence solutions, allowing data to be presented in a format that can be understood by most employees.
When we talk about Big Data we usually refer to a Hadoop Ecosystem (technology systems). Hadoop uses programming models to process large data sets across clusters of computers, being able to be scaled from a single server to hundreds of machines. One limitation of Hardoop is that doesn’t fit nicely with Cloud services, it is more suited to physical hardware rather than the cloud.
For data to be transformed into understandable information it first needs to be stored somewhere accessible. The two data solutions that are commonly used for data management are Data Warehouses and Data Lakes, each solution having its own benefits.
Example of Contemporary Enterprise Data Architecture
Big Data is stored on the Corporate Data Lakes, a logical construct in which data can be stored and further manipulated. They are a vast pool of raw data that isn’t purposely collected for a defined purpose.
An Enterprise data warehouse (EDW), is a centralized repository of data where organisations store data from business systems and other sources in a structured format. It is a subset of organisational Data used for reporting needs, storing structured data only and is not as flexible as Corporate Data Lakes. EDW Data sources can include Online Transaction Processing (OLTP) databases, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) databases. Due to data warehouses structured format they cannot be used for Big Data.
Although Enterprise Data Warehouses are less flexible then Data Lakes and are sometimes seen as two separate data repositories, EDW data is subset of Data Lakes. A well-designed and defined Data Lakes able to support the EDW functionality.
Future of Big Data
The increased technology used by companies today such as AI and the emergence of the metaverse means that data solutions will be reimagined for the future in order to store and process new types of data.
One new concept is Data Mesh, a decentralised approach towards data ownership, allowing users to access the data, without it needing to go into a Data Lake or Data Warehouse. However for it to be usable as a data solution it requires a lot of standardisation throughout Enterprise and beyond. While it is a good approach from practical point of view, it will take some time before the Data Mesh concept could be implemented, particularly on an enterprise scale. In the near future I believe that Data Lake and Data Warehouse will be gradually merge into one concept in the future – a Corporate Lake–Warehouse. The Corporate Lake-Warehouse will provide the flexibility of storing unstructured, semi structured and structured data of a data lake, combined with the data warehouses structured collection of reporting data.
While the future of Data Architecture will rely on the requirements of new technology, one thing that is guaranteed is that we are heading towards an even more data driven future – and companies need to be prepared.