Тёмный
No video :(

10 best practices for building Data lakes| Evolution of data lakes 

BigData Thoughts
Подписаться 10 тыс.
Просмотров 1,9 тыс.
50% 1

Опубликовано:

 

22 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 17   
@arunalex1988
@arunalex1988 2 года назад
Learnt few new dimensions related to Data Lake. Thank you so much for sharing your wealth of knowledge dear Shreya.
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks arun
@amriteshsingh2952
@amriteshsingh2952 Год назад
Thanks for the great content, appreciate your work.
@BigDataThoughts
@BigDataThoughts Год назад
Thanks Amritesh
@GauravSharmagvs
@GauravSharmagvs 2 года назад
Thanks for the video. It was very informative.
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks gaurav
@Praveen_Kumar_R_CBE
@Praveen_Kumar_R_CBE 2 года назад
Very useful content Shreya…
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks Praveen
@Learn2Share786
@Learn2Share786 2 года назад
Great content covered in right pace. Thank you!
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks Farhad
@TotuBabyBird
@TotuBabyBird 2 года назад
@nikhilgupta110
@nikhilgupta110 2 года назад
Very well described on concepts standpoint, One example bridging the concept would have been a value addition. Please do suggest a book/articles for a deep drive as well. I have few queries, it would be really helpful if you can provide some thoughts on that: 1. Let's say we are having a common batch pipeline for multiple customers from different data sources (s3, mysql etc. ) , One Identifying timestamp column as YYYYMMDD as it has frequency of 24hrs.Now if I want to convert it to real time ELT, what are the things to keep in mind? 2. Extending the above question, if we are running parallel ETL batches for let's say 30 customer each. What is the best optimization strategy, To run ELT jobs parallel for each customer (best performance) or Increase the nodes run the jobs? Which is the best way to scale keeping cost as a constraint?
@BigDataThoughts
@BigDataThoughts 2 года назад
1. From batch to Realtime it has to be a complete architecture and technology choice change. you need to also check if its real time ingestion and processing or real time consumption as well. This would determine how you are designing the system. 2. Orchestrating jobs in parallel , in sequence or a mix depends on many factors: a. SLA b. Is each flow independent or has dependency c. Cost d. Consumption pattern you may not need to have all flows in parallel with higher nodes (as its extra cost) if the data isn't interdependent or doesn't has same SLAs
@ranjithpals
@ranjithpals 2 года назад
Lots of great points you have highlighted and touched upon, as always very useful !! Thanks Shreya
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks Ranjith
@kiranmudradi26
@kiranmudradi26 2 года назад
Awesome video. Madam, can you please make video on how to tackle system design and what tools to use for building pipelines both in inhouse/cloud? It would be helpful.
@BigDataThoughts
@BigDataThoughts 2 года назад
Thanks Kiran sure
Далее
All about Snowflake in 15 mins
15:11
Просмотров 1,2 тыс.
Construction site video BEST.99
01:00
Просмотров 341 тыс.
All about Debugging Spark
18:29
Просмотров 3,5 тыс.
Back to Basics: Building an Efficient Data Lake
7:07
AWS re:Invent 2021 - Building a data lake on Amazon S3
54:52