From various sources has been estimated that the data in the whole world would be around 150 zettabytes by the end of 2025r. What does that mean? – It means that we need big resources – computers, internet, storage to conserve it. It is also pretty likely that we have even more data than estimated.
When we speak of loads of data we have to think of the problems the data will create with managing, generating and storing it. We should do this appropriately. Data sources are different – internet, social media, internal servers. Data also comes in various formats – structured or unstructured.
The data warehouse was meant to store the structured data, with key security features, integrated in the ERP System. But what do we do when it comes to larger volumes which the warehouse cannot handle or deliver with speed? Then we use the “agile” method of “data lakes’. A data lake is a system where you can store data in its raw format. The idea is to produce then data for reporting, analysis and machine learning. All kind of data can be stored – CSV rows and columns, e-mails, PDFs and documents, images etc.
Key priorities were distinguished by introducing data lakes – they store all data formats, from any source and any format. They can be easily analyzed by data analysts and also reduce costs as they increase the volume of data that can be stored on one platform.
You might also think of the disadvantages that data lakes come with. If everyone is to create their own data lake in a company, this would result in multiple data lakes. We know that every department has their own sources and specifications but it turns out that this would result in slowing down the ERP. You need to know how to use them properly. Do not put all your data in data lakes before you develop a strategy to use them.
What is the importance of the agile approach to data lakes? First of all, it will help you design the data lakes. A good way would be to build them step-by-step, according to the business needs and adopt them to the regulations and company rules. The agile approach can identify problems at early stage, and if not planed well it will result in bad performance. Do not use unclear requirements, do not overestimate the quality source. If done properly, the agile approach helps you find any issues and fix them easily.