Amazon Redshift: Understanding the Sort Keys

Coffing Data Warehousing

Подписаться 6 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

5 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 33

@prannoyroy5312 2 года назад

Great explanation! the set and background too seems to be a mockery of news rooms

@sampaar 4 года назад

Finally, one video which has cleared my concept on interleaved sort keys. Thank you.

@gaukaugaukau 3 года назад

Awesome sir..you made it very easy to understand for everyone..keep up the good work

@Coffingdw 3 года назад

Gaurav, Thank you so much!!!

@terpinumcp 6 лет назад

Amazing Explanation!!! I was loosing my mind trying to visualize how the various sort keys in Redshift work and you made it soooo simple. Appreciate your hep!

@R9992009 4 года назад

More than anything, I love your style. Great work! Very easy to understand. Thank you.

@Coffingdw 4 года назад

RAHIL, Thank you so much!!!

@kasiviswanathan5431 5 лет назад

While single and compound keys are no brainers, I had to go to multiple websites to learn the interleaved sort key but they offered no clear explanation. You ended my quest here! Very nicely done.

@Coffingdw 5 лет назад

Kasi, Thank you. I, unfortunately, had to hear about 20 explanations myself before I went, "ooooh", that's how it works". I love helping intelligent people with great experience already take the next step towards "Total Guru". You are almost there!!! Great job.

@chrisguevara 4 года назад

Simple...to the point and very clear. A quarter of the video is just review of the topic discussed (good practice but its a video so it can be replayed). GREAT VIDEO!!!

@ericliu1114 3 месяца назад

Thanks a lot. Finally understand interleaved sort keys.

@hamidmahmood345 7 лет назад

Brilliant. This is a simple way to teach complex things :-)

@Coffingdw 7 лет назад

Hamid, Thank you soooooo much for the wonderful comment.

@Coffingdw 7 лет назад

Hamid, Thank you very much for your wonderful comment. It is for people like you that I do these videos. Thanks again!

@aliaksandrsiamenau9922 Год назад

Thanks! Good visualization and explanation!

@venkateshanganesan2606 4 года назад

Nicely explained, and easily understandable for newbie like me. Also if possible can you share any sys table where you can see exact like blocks (block1..blockN) you have shown in your video, so while we try to do practice we will get more familiar on this. Thanks again for this video.

@byong-wuchong7851 2 года назад

Didn't know Will Ferrell is well versed with Amazon Redshift.

@v8m3tal 6 лет назад

good video. very simple and clear.

@freudba1578 6 лет назад

Net explanation, better than million garbage videos out there

@Coffingdw 6 лет назад

Freud Ba, Thanks for your nice reply!!!

@CodeVeda Год назад

For a moment, I thought it's a news channel

@spreetham1989 6 лет назад

Good one. I have a question here. Out of the below two, which is the most efficient way? Accessing one block by filtering on the distkey (or) having even distribution and accessing multiple blocks? some people claim that accessing many blocks will allow parallel processing , which means better utilization of all the nodes.

@Coffingdw 6 лет назад

Preetham, There are two philosophies when querying big data well. You want to use the Distkey when going after a single record (row). This only accesses one block and a single slice. When you need to analyze many rows (thousands-millions) you want the parallel processes to each do an equal amount of the work. You want the Distkey for one or two rows and you want parallel processing for large queries.

@spreetham1989 6 лет назад

Thanks for the response. Can you please make some videos on understanding EMR and identifying the problems that are good candidates for EMR?

@spreetham1989 6 лет назад

Thanks. Can you make a video on whether the sort key column should be compressed or not? If no, why? If yes, what are the benefits and the recommended compression technique?

@Hinkakan 4 года назад

I feel that you are kinda skipping over slices here a bit. As far as I know, with a distkey on Cust_ID in this case, each slice will contain only 1 cust_id, and since each slice has one compute node, you would only have to read one block, pr. slice. So you would be reading 4 blocks yes, but from 4 different slicses, using 4 different nodes and thus, it would perform as fast as a single block read--- right? or I am wrong here? I would be very interested in knowing how interleaved sortkeys work with an even/key distribution style.

@surabimanideep9203 5 лет назад

All the explanation on the sort keys is simply superb Iam stuck with understanding vacuum reindex that we do to sort the rows can you help me out on this

@Coffingdw 5 лет назад

Surabi, Thanks for your nice comments. Superb is good!!! What has been a difficult concept for me is that you can sort or index a table, and it sorts and indexes. But if you load more data tomorrow and the next day and so on the data is no longer sorted (Interleaved sort especially). A reindex refers to an Interleaved sort key. Here is what Amazon says... "When you initially load an empty interleaved table using COPY or CREATE TABLE AS, Amazon Redshift automatically builds the interleaved index. If you initially load an interleaved table using INSERT, you need to run VACUUM REINDEX afterwards to initialize the interleaved index."

@surabimanideep9203 5 лет назад

@@Coffingdw Thanks for your timely response. What I don't understand is the value of interleaved_skew that we use to decide whether to reindex a table or not and after multiple tests I have no clue on what factors will the interleaved_skew depend on? Please help me in that sense

@Coffingdw 5 лет назад

Surabi, Imagine you have 9-months of data and the month is September. You use an Interleaved sort key on your table (9-months of data) and your data has two sort keys with equal power. The queries work great for the table because whichever key they query on they get great result speeds. Now, imagine it has been one year. The last three months have simply been appended to the end of the table. They are not sorted with the previous 9-months. When you do a vacuum reindex all one year of data is now sorted properly.

@surabimanideep9203 5 лет назад

@@Coffingdw I get that completely. Iam completely with it that we need to do vacuum reindex to reorder the data once in a while. But, my question is on the term interleaved_skew and on what factors does it depend(reasons for its change). You can find interleaved_skew in svv_interleaved_columns.

@Coffingdw 5 лет назад

@@surabimanideep9203 You will find the interleaved_skew when one slice has a lot more data on it than other slices.

@RohanKSharma 4 года назад

The comment about slices (01:51) and distribution keys almost tripped me over. I feel that is a little unnecessary. Apart from that it is nice explanation. IMO in discussion of SORT keys, slices and distribution key should not pay a significant role.