No video :(

Azure Synapse Analytics - Polaris Whitepaper Deep-Dive

Подписаться 32 тыс.

Просмотров 3,9 тыс.

50% 1

The team behind the brand new Polaris engine have submitted a white paper to the VLDB conference, an academic conference focusing on data management technology. It's a great insight into how the new Serverless SQL Pools service within Azure Synapse works and what happens after you submit your query.
In this video, Simon walks through the paper, pulling out some of the key concepts and comparing it to what we see regularly in the world of Spark & other big data querying engines.
The paper itself is available here: www.vldb.org/pvldb/vol13/p320...
As always, don't forget to like & subscribe, and get in touch if you need the Advancing Analytics touch in achieving your data ambitions.

Опубликовано:

6 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 16

@vt1454 3 года назад

This is a great behind-the-scene view on Synapse on-demand /serverless flavor.

@briangao5081 3 года назад

Thanks for this video, it was super informative!

@dmitryanoshin8004 3 года назад

Cool overview, thank you! I am doing a migration from databricks to synapse:)

@rodeldagumampan8858 3 года назад

This quite interesting development at the time that Databricks releases SQLAnalytics which IMO provides easy to digest SQL query layer on top of large collection of file-based data. Isn't this a direct alternative to using Databricks SqlAnalytics? Synapse Polaris uses SqlPool and Databricks SqlAnalytics uses SparkPool (via SqlEndpoint)?

@AndreasBergstedt 3 года назад

LoL, watch out for the confirmation bias(I know I am :)) , Good analysis Simon and I would love to have a conversation about your conclusions at some point :)

@AdvancingAnalytics 3 года назад

Oh I'm very aware - it's super hard to read a very high-level overview without immediately leaping into "oh, that's from x" and "well I've seen that before". I'm VERY upfront that my views are heavily Spark/Microsoft biased, and there's plenty out in the world that I've not seen so can't comment on! Just waiting for the Polaris team to hunt me down for saying "it's basically spark but using SQL Server..." Simon

@AndreasBergstedt 3 года назад

@@AdvancingAnalytics LoL ill make sure that I speak to PG to make sure they have your details, I know the difficulties and we all have our experiences that guide us. I guess only time will tell what happens in cloud data land over time.

@Felix-kp4we 3 года назад

@@AndreasBergstedt and I'm sure it will be amazing!

@Mim_BI 3 года назад

no cache for SQL Serverless yet, for me it is still just a nice demo

@vis7681 3 года назад

Just on your take on the conclusion that you are comparing with the baremetal Spark based big data tools, think it is worthwhile to compare with other likewise competitor like Amazon Redshift, Bigquery, Snowflake..

@AdvancingAnalytics 3 года назад

I completely agree - it's worth making those comparisons, but I'm not the person to do it. I work with clients in Azure every day and know the products to a deep level, I don't have anywhere near that exposure to AWS, GCP, Snowflake etc. I could generalise & compare the basic approach, but my expertise is definitely in Azure & Spark :) Simon

@Mim_BI 3 года назад

actually what they trying to do as you suggested is not new or revolutionary for that matter, in a simple sentence, they are trying to recreate BigQuery on Azure :)

@AdvancingAnalytics 3 года назад

In a way, yes - and that's no small thing. I'm no GCP expert, but everything I've seen indicates it's tricky to build a coherent, well-integrated architecture with BigQuery & other "enterprise" style data stores. Having a BigQuery-esque tool that's fully baked into the Azure stack and integrates well with the Azure Data Platform story is an awesome advancement. The thing that gets me is the big push on using the SQL engine, I'd want to dig into some more advanced query patterns where you'll see the benefits of the SQL query optimiser then see how it performs against Photon etc.

@Mim_BI 3 года назад

@@AdvancingAnalytics thanks for the your reply, I have a Question regarding Databricks analytics SQL regarding the cache, let's say you connect PowerBI to Databricks , how the service knows if the result of the Query was already done before, does the cluster needs to be running or somehow there is a service that decide to start the cluster for new Queries or use the existing result cache, that's literally the key functionality of SnowflakeDB ?