Тёмный
No video :(

20 Data Caching in Spark 

Ease With Data
Подписаться 4,6 тыс.
Просмотров 1,7 тыс.
50% 1

Опубликовано:

 

22 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 14   
@nishantsoni9330
@nishantsoni9330 2 месяца назад
one of the best explanation in depth, Thanks :) Could you please make a video on "end to end Data engineering" project, from requirement gathering to the deployment.
@easewithdata
@easewithdata 2 месяца назад
Thanks ❤️ Please make sure to share with your network on LinkedIn 🛜
@ComedyXRoad
@ComedyXRoad Месяц назад
thanks for your efforts it helps lot
@easewithdata
@easewithdata Месяц назад
Thanks ❤️ Please make sure to share with your network over LinkedIn 🛜
@reslleygabriel
@reslleygabriel 8 месяцев назад
Excellent content in this playlist! Thanks for sharing and keep up the good work 🚀
@sureshraina321
@sureshraina321 8 месяцев назад
Nice job and can you please provide more details on serialized and deserialized when dealing with cache/persist in upcoming lectures ?
@at-cv9ky
@at-cv9ky 6 месяцев назад
as already mentioned in a comment, pls make a video on ser/deserialization of the objects
@easewithdata
@easewithdata 6 месяцев назад
will definitely try.
@mohammedshoaib1769
@mohammedshoaib1769 8 месяцев назад
Thanks. Your explanation is too good. Keep making such videos. Also, if possible, make some videos on scenario based interview questions
@sayantabarik4252
@sayantabarik4252 6 месяцев назад
I have one query, Cache() is equal to persist(pyspark.StorageLevel.MEMORY_AND_DISK). Only difference in this scenario is that cache() uses deserialized and persist used serialized data. So, if persist is better in terms of data serialization and functionality, what is the use case of using cache over persist ?
@easewithdata
@easewithdata 6 месяцев назад
You already have the answer in your question, for cache the data is already de serialized thus no hassle but in persist the data is serialized and need to be deserialized before processing.
@sayantabarik4252
@sayantabarik4252 6 месяцев назад
@@easewithdata Got it.. Thank you for the explanation !! I went through all the videos in this playlist. I really loved it !!
@user-dv1ry5cs7e
@user-dv1ry5cs7e 4 месяца назад
Consider you have a orders dataframe with 25 million records now you applied a projection and a filter and cached this dataframe as shown below orders_df.select("order_id","order_status").filter("order_status == 'CLOSED'").cache() Now you execute the below statements... 1) orders_df.select("order_id","order_status").filter("order_status == 'CLOSED'").count() 2) orders_df.filter("order_status == 'CLOSED'").select("order_id","order_status").count() 3) orders_df.select("order_id").filter("order_status == 'CLOSED'").count() 4) orders_df.select("order_id","order_status").filter("order_status == 'OPEN'").count() please answer the below queries... question 1) what point of time the data is cached (partially/completely) ? question 2) Which all queries serves your request from the cache, and which all will have to go to the disk. Please explain.
@easewithdata
@easewithdata 3 месяца назад
As you have already written the complete query, why not just try it out and share the result with us.
Далее
21 Broadcast Variable and Accumulators in Spark
12:35
Просмотров 1,6 тыс.
Apache Spark / PySpark Tutorial: Basics In 15 Mins
17:16
Новый фонарик в iPhone с iOS 18
00:49
Просмотров 512 тыс.
Cache Systems Every Developer Should Know
5:48
Просмотров 477 тыс.
15 How Spark Writes data
14:08
Просмотров 1,8 тыс.
19 Understand and Optimize Shuffle in Spark
15:14
Просмотров 2,1 тыс.
14 Read, Parse or Flatten JSON data
17:50
Просмотров 2,4 тыс.
24 Fix Skewness and Spillage with Salting in Spark
21:17
Which Database Model to Choose?
24:38
Просмотров 51 тыс.