Тёмный

53. Databricks| Pyspark| Delta Lake: Solution Architecture 

Raja's Data Engineering
Подписаться 26 тыс.
Просмотров 18 тыс.
50% 1

Azure Databricks Learning: Delta Lake Solution Architecture?
==================================================
What is recommended solution architecture of Delta Lake?
This video covers delta lake solution architecture recommendedby Databricks.
#DeltaLakeSolutionArchitecture, #DeltaLakeBronze, #DeltaLakeSilver, #DeltaLakeGold ,#DeltaSolutionDesign,#DeltaLakeDataWareHouse ,#DeltalakeIntro, #IntroductionToDeltaLake, #Deltalake, #DeltaTable, #DatabricksDelta, #DeltaTableCreate, #DatawarehouseVsDataLakevsDeltaLake, #PysparkDeltaLake, #DeltalakevsDatalake, #SQLDeltaTable, #DataframeDeltaTable,#DeltaFormat ,#DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion, #BigDataInterview, #PysparkPerformanceTuning, #PysparkPerformanceOptimization, #PysparkPerformance, #PysparkOptimization, #PysparkTuning, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Pyspark, #Spark, #AzureDatabricks, #AzureADF, #Databricks, #LearnPyspark, #LearnDataBRicks, #DataBricksTutorial, #azuredatabricks, #notebook, #Databricksforbeginners

Опубликовано:

 

1 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 41   
@karanbhanushali3673
@karanbhanushali3673 Год назад
Hi Raja, this playlist is one of the best resources available for Azure over the internet in structured format. Thanks for such amazing content and keep up the good work.
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Thank you Karan👍🏻
@kanstantsinhulevich4313
@kanstantsinhulevich4313 10 месяцев назад
This is "medallion architecture". I was asked on interview: what is medallion architecture? Hope this help someone
@rajasdataengineering7585
@rajasdataengineering7585 10 месяцев назад
Yes this is medallion architecture. Delta lake solutions are based on medallion architecture. Thanks for sharing your interview experience
@giovanaclaro3827
@giovanaclaro3827 8 месяцев назад
Hi Raja, first I would like to say it’s the best content to learn delta lake architecture with pyspark! thank you so much for that! I have one question. You said Silver and Gold layers are built upon Delta Lake, that means if I’m using for example S3 as source for my data in bronze layer, for silver and gold I woudnt store them on top of S3?
@rajasdataengineering7585
@rajasdataengineering7585 8 месяцев назад
Thanks for your comment! No silver and gold layers are also sitting on of top of S3 bucket only. Bronze layer doesn't have delta format, instead they are stored in raw native format
@Amarjeet-fb3lk
@Amarjeet-fb3lk Год назад
Hi, How can we achieve this solution. Can you make a video on how we can achieve this architecture on azure and how data moves through these layers
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Hi Amarjeet, sure will create a video on this requirement
@gk4u444
@gk4u444 Год назад
thnaks for all ur videos. i have a question what is use of database tables tab under data. dose it only for storing the data when data size is small?please elaborate it..
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
That's to provide the list of delta tables created for a particular database. The actual data would be stored under file storage system such as dbfs, adls, hdfs etc
@gravenguan
@gravenguan Год назад
where will the data model table (fact & dimension table) reside? Silver or gold?
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Definitely on gold layer. Depending on use case, it might be at both silver and gold
@pankajagrawal1163
@pankajagrawal1163 9 месяцев назад
Well explained playlist..highly recommended..thanks Raja
@rajasdataengineering7585
@rajasdataengineering7585 9 месяцев назад
Thanks Pankaj! Glad you liked it
@Umerkhange
@Umerkhange Год назад
is Delta lake good for Large datasets 2 TB? Let's say we receive 500GB each day and the underlying table is getting appended? Should we partition it and process it day by day?
@SidharthanPV
@SidharthanPV 2 года назад
Hi Raja.. another great video.. thank you!!! I have 2 qns.. please respond once you get time.. 1. You mentioned the delta lake performance is better compared to data lake. But shouldn't it take more time to get the access from delta table due to the no of metadata files ? 2. How can we use delta lake as a backend for dashboards?
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Hi Sidharthan, Thank you. 1. Delta lake performs better than data lake with the help of metadata files. Using metadata files, it can avoid scanning of entire data files which is called data skipping in delta lake. Also it captures statistics such as max and min values for each column in a table and with the help of that metrics, it can skip files and scan only the required file. In data lake, we dont have metadata and as a result the processing engine needs to scan all the files in order to perform a task 2. There are BI connectors available to connect to Lakehouse (delta lake). For ex, below documentation is for power BI connector for databricks delta lake powerbi.microsoft.com/en-us/blog/announcing-power-bi-integration-with-databricks-partner-connect/ Hope it helps
@SidharthanPV
@SidharthanPV 2 года назад
@@rajasdataengineering7585 Thank you!
@sailajab8197
@sailajab8197 День назад
Thanks Raja!!
@samridhisamridhi6246
@samridhisamridhi6246 2 года назад
Thanks for such an amazing playlist.. it helped me a lot with my interviews. 😊
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Thanks!
@eswarrao4319
@eswarrao4319 2 года назад
Nice. Sqoop does not work on databricks. So what is the best way to get increment data daily from rdbms to databricks?
@manjunathbn9513
@manjunathbn9513 Год назад
Please do a project on this delta lake. Thanks
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Sure, will create a project on this requirement
@kartikeshsaurkar4353
@kartikeshsaurkar4353 Год назад
This channel is going to become most famous youtube channel on Databricks topic
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Thank you Kartikesh 👍🏻
@chiragshah9106
@chiragshah9106 9 месяцев назад
Excellent Explanation. No 1 Channel to learn DataBricks
@rajasdataengineering7585
@rajasdataengineering7585 9 месяцев назад
Glad you think so! Thank you
@Umerkhange
@Umerkhange Год назад
it would be really great if you show the practical demo for this.
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Sure will do
@joyo2122
@joyo2122 2 года назад
Instant like 👍👍
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Thank you Joy
@anjalishar1829
@anjalishar1829 2 года назад
can you recommend a course or any resiourse to learn py spark? or azure
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Spark definitive guide can be very useful
@anjalishar1829
@anjalishar1829 2 года назад
Is there a way you to connect to you?
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Please contact me at audaciousazure@gmail.com
@sravankumar1767
@sravankumar1767 2 года назад
Nice explanation Raja 👌 👍
@rajasdataengineering7585
@rajasdataengineering7585 2 года назад
Thank you Sravan!
@tanushreenagar3116
@tanushreenagar3116 Год назад
Superb content
@rajasdataengineering7585
@rajasdataengineering7585 Год назад
Glad you like it!