Тёмный
No video :(

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland 

Spark Summit
Подписаться 39 тыс.
Просмотров 69 тыс.
50% 1

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 24   
@stephaniedatabricksrivera
@stephaniedatabricksrivera 3 года назад
Emily's Parkay butter pics made me laugh. Really enjoyed this. Great job Emily!!
@flwi
@flwi 6 лет назад
Wow, great presentation!
@HasanAmmori
@HasanAmmori 2 года назад
Fantastic talk! I wish there was a little more info on the format spec itself.
@manjunath15
@manjunath15 5 лет назад
Very informative and nicely articulated.
@gmetrofun
@gmetrofun 5 лет назад
AWS S3 supports random access queries (i.e., Range Header), consequently pushdown is also supported on AWS S3
@bnsagar90
@bnsagar90 3 года назад
Can you please some text or link where I can read more about this. Thanks.
@Tomracc
@Tomracc 2 года назад
this is wonderful, enjoyed start to end :)
@maa1dz1333q2eqER
@maa1dz1333q2eqER 6 лет назад
Great presentation, touched a lot of important areas, thanks
@tianzhang3120
@tianzhang3120 3 года назад
Awesome presentation!
@amitbhattacharyya5925
@amitbhattacharyya5925 2 года назад
good explanations , this would be great if some git code they can mention
@TheAjit1111
@TheAjit1111 5 лет назад
Great talk, Thank you
@clray123
@clray123 6 лет назад
Eh so basically any sort of growing data can be only partitioned in one way (along the dimension of the growth - which for many use cases will be some meaningless "autoincrement" id). Which then defeats all the push-down filtering for any other dimension. Not to mention that if your data keeps growing in small increments and you need access to latest of it, you will have to jump through hoops to somehow integrate all those small increments into bigger files - because scanning 20000 tiny files ain't gonna be efficient (and this means lots of constant rewriting - that's why write speed DOES matter and it's not "write-once", but write-many)...
@betterwithrum
@betterwithrum 5 лет назад
Where are the slides?
@bogdandubas3978
@bogdandubas3978 4 года назад
Amazing speaker!
@HughMcBrideDonegalFlyer
@HughMcBrideDonegalFlyer 7 лет назад
Great talk on a very important (and too often overlooked ) topic
@djibb.7876
@djibb.7876 7 лет назад
Great talk!!! I set up a spark-cluster with 2 workers. I save a Dtaframe using partitionBy ("column x") as a parquet format to some path on each worker. The matter is that i am able to save it but if i want to read it back i am getting these errors: - Could not read footer for file file´status ...... - unable to specify Schema ... Any Suggestions?
@pradeep422
@pradeep422 6 лет назад
The only thing I liked is the way Emily executed it.
@ardenjar7942
@ardenjar7942 7 лет назад
Awesome thanks!
@deenadayalmuli2756
@deenadayalmuli2756 6 лет назад
to my experience, orc supports nesting...
@thomasgong5538
@thomasgong5538 4 года назад
具有一定的指导学习作用。
@mikecmw8492
@mikecmw8492 6 лет назад
Why is everyone a "spark expert"?? Get real and just show us how to do it...
@betterwithrum
@betterwithrum 5 лет назад
there are spark experts, just far and few between. I've hired a few, but they were unicorns
Далее
Мама приболела😂@kak__oska
00:16
Просмотров 566 тыс.
The columnar roadmap: Apache Parquet and Apache Arrow
41:39
Top 5 Mistakes When Writing Spark Applications
29:38
Просмотров 28 тыс.
Apache Spark - Computerphile
7:40
Просмотров 246 тыс.
What is Apache Parquet file?
8:02
Просмотров 74 тыс.
Apache Spark Meet Up at Spark Summit East 2017
1:35:47
Просмотров 4,5 тыс.