From various sources has been estimated that the data in the whole world would be around 40 trillion
gigabytes at the end of 2020. What does that mean? – It means that we need big resources –
computers, internet, storage to conserve it. It is also pretty likely that we have even more data than
When we speak of loads of data we have to think of the problems the data will create with managing,
generating and storing it. We should do this appropriately. Data sources are different – internet, social
media, internal servers. Data also comes in various formats – structured or unstructured.
The data warehouse was meant to store the structured data, with key security features, integrated in
the ERP System. But what do we do when it comes to larger volumes which the warehouse cannot
handle or deliver with speed? Then we use the “agile” method of “data lakes’. A data lake is a system
where you can store data in its raw format. The idea is to produce then data for reporting, analysis and
machine learning. All kind of data can be stored – CSV rows and columns, e-mails, PDFs and documents,
Key priorities were distinguished by introducing data lakes – they store all data formats, from any source
and any format. They can be easily analyzed by data analysts and also reduce costs as they increase the
volume of data that can be stored on one platform.
You might also think of the disadvantages that data lakes come with. If everyone is to create their own
data lake in a company, this would result in multiple data lakes. We know that every department has their
own sources and specifications but it turns out that this would
result in slowing down the ERP. You need to know how to use them properly. Do not put all your data in data lakes
before you develop a strategy to use them.
What is the importance of the agile approach to data lakes? First of all, it will help you design the data
lakes. A good way would be to build them step-by-step, according to the business needs and adopt them
to the regulations and company rules. The agile approach can identify problems at early stage, and if not
planed well it will result in bad performance. Do not use unclear requirements, do not overestimate the
quality source. If done properly, the agile approach helps you find any issues and fix them easily.