AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304)

AWS Events

Подписаться 120 тыс.

Просмотров 138 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 97

@mullassery 4 года назад

This is one of the best talks on DynamoDB modeling

@lugu2689 3 года назад

0:00 Introduction 0:50 What we will cover in DynamoDB 1:20 Who I am? and books I wrote. 3:05 What is DynamoDB 5:10 what DynamoDB was designed for 6:57 Key concepts of DynamoDB 8:41 Primary Keys in DynamoDB 9:41 Composite Primary Key 9:46 API Actions -Item based 10:51 API Actions -Query Actions 11:37 API Actions -Scan Actions. Such Very Expensive! Wow. 12:05 Secondary Indexes 13:43 Data modeling example 15:12 Forget concepts from Relational Databases 16:30 The example 16:50 The example -ERD 17:53 The example -Identify your access patterns 19:29 The example -Design your primary keys & secondary indexes 22:27 The example -One-to-many relationships 23:00 De-normalize + document types (Maps, Lists) 28:58 One-to-many relationships patterns recap 30:00 Filtering: "Build filter in primary key" 31:42 The example -Orders in different partitions 33:21 The example -Filtering access patterns 33:34 The example -Composite sort key 37:34 The example -Sparse index 39:00 Filtering patterns recap

@kavyachoppalli3147 2 года назад

AAaA

@dwpalme2670 2 года назад

@@kavyachoppalli3147 Thanks, I don't need to hear about his career. LOL I wish he had covered global keys

@maryranz 3 года назад

This is kept open for the rest of my new project. My first time with dynamoDB!

@vikramkon 4 года назад

One of the best DynamoDb videos I have watched. Explained with clarity how concepts like GSI and Sort keys are applied in real world.

@tango12341234 3 года назад

Hi Vikram I was wondering if you know how Amazon can use DynamoDB for their shopping cart when someone buys something dont they need ACID transactions?

@vktransactions7460 3 года назад

@@tango12341234 Amazon gets around not using transactions because 1) The way inventory (Network of distribution centers throughout the country) and supply chain is managed, they can meet demand almost all of the time. In the rare case an item is not available they either offer refund or provide a latter delivery date. 2) Airline tickets, concert tickets etc.. require transaction because every seat is unique w.r.t location/time , contrast this with a particular book title , which is the same irrespective of the quantity. As a result the costs of not meeting an order in Amazon or Brick store's case is minimal. P.S: I had a race condition only once with Amazon, they offered a refund in that case.

@waheedsislam 4 года назад

For a newby like me, Alex is the lifesaver. I've watched Rick's talks earlier, but there were many things which gone over my head. Alex did a wonderful job doing fill in the gaps for me here. So, I'd suggest anyone who is just starting with Dynamodb, first watch this video. Rick is the wizard, so to understand him properly, you'll need some normal guy talk first (basics) which Alex will provide you in this video, in a much comfortable speed. Thanks Alex.

@Zengggg 3 года назад

New to DynamoDB and this is a gold mine!

@rang20300 4 года назад

Thanks!, really helped me understand things. I started with Rick's lecture and it was a bit hard for a beginner to understand. This lecture was really well made and clear, now after I practice a bit I am going to rewatch Rick's lecture :)

@jonnytheponny5753 3 года назад

seems I found ricks lecture, most likely it is this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-hwnNbLXN4vA.html

@RonanWalsh1 4 года назад

This is such a great talk. I've been doing lots of theory on DynamoDB for the last couple of weeks with an idea in my head how GSI's may work and this explained it really well. Would be great to hear you do an extended talk about projections and such. Thanks for your imparting your knowledge!

@JAlexanderCurtis 3 года назад

Glad to learn that I have been using DynamoDB wrong for over a year. Seriously though, these access patterns make a ton of sense when handling multiple entities. I had been maintaining multiple tables, one for each entity, like I was still using an RDBS. I had created "ghetto" joins by invoking Lambda functions that would basically make 2 independent queries and then join them computationally inside the Lambda by merging the JSON results of the two queries, with the Lambda returning the joined result. This works, but is slower (80 - 100ms instead of ~10ms), in addition to the cost of a lambda invoke every join query. I will definitely consider this new organizational pattern for a single table. The Lambda joining method could still work for people that are running occasionally complicated queries like running a report or something, but not a good technique for everyday queries.

@cochorrorshow1005 2 года назад

LOL, your "ghetto" join was exactly what I had planned to do on my next project with DynamoDB, until I watch this video. Damn, that was a close one.

@lollystar3055 3 года назад

The session and the blog are awesome. More to learn.

@switchyard 2 года назад

The concept that rows can have different attributes is blowing my mind

@uiedbook7755 Месяц назад

Am wondering how a company like AWS is making such un-industrous compromise in the name of availabilty.

@andrestone 4 года назад

Edit: this was just before watching Rick Houlihan, the wizard. Go check his videos of you haven't. Original: Amazon really needs to select their developer advocates / speakers better. This guy is one out of ten speakers I've recently listened to, while trying to understand DynamoDB basics and data modeling, who speaks plain English. The other guys just get lost in their own abstractions.

@bijuramachandran 4 года назад

Very well presented and explained - the best starting point for data modeling with DynamoDB I have seen so far!

@justinkim7202 3 года назад

The practical examples really helped and they were explained really well.

@YusDyr 4 года назад

I'd really be glad to see comparison of WCU/RCU counts (cost) for each method because I suppose adding more and more and more GSI is pretty expensive, in all sense.

@memis21 2 года назад

Great talk! really helpful for dynamoDB beginners

@sethcenterbar 4 года назад

This was fantastic Alex. Now can we do many to many, but at your speed Instead if Rick’s 😂

@Deepz007 3 года назад

Amazing!!! No words!!! Thanks a ton Alex!

@JulianHarris 3 года назад

Very helpful. I’d love to see a follow-up that shows the code and setup for these, including creation of data, deletion, and updates. The latter two are important as they’re obviously messier when you have denormalised data.

@antonkot6250 4 года назад

Amazing talk, many thanks from relation world ;)

@yan_donyan 3 года назад

great presentation. excellent speaker

@OtRatsaphong 4 года назад

👍 Great talk on DynamoDB database design.

@torengaw 4 года назад

Great explanation. Really good talk. Thanks, Alex.

@sanjib1980 3 года назад

This is really super helpful presentation. Thank you !

@viswanadharajukakani4849 4 года назад

One of the best videos..

@王鑫海-d4c 4 года назад

Creat presentation! I'm looking forward to the many-to-many data modeling.

@Deepz007 3 года назад

At 27:46 can't I put OrderID in PK and ItemID in SK and then create a GSI same way as Alex created. i.e., PK as OrderId and SK as ItemID ? For me it should solve the problem ? Please correct me if I am missing something here.

@prashantgupta6282 3 года назад

Great presentation!! However for strongly consistent requirements, if the Order Items are updated, and we use a inverted index (GSI1) to fetch items, then there is a chance that we may get inconsistent status of the order Items. How does one ensure strong consistency in this case?

@naderdabit 4 года назад

Great talk Alex!

@gracemanlutac276 3 года назад

This is really helpful! I learned a lot

@sharonhorsewood8 3 года назад

Very helpful! Thank you so much!

@UsmanKhan-nw8wd 3 года назад

Good Job Jim Halpert!!!

@Dongdot123 3 года назад

This is very helpful !

@on_the_same_page 2 года назад

I don't understand the logic behind modeling the entity OrderItem as PK:ITEM# SK:ORDER#. Couldn't we use PK:USER# SK:ITEM##? Is to share the order items more evenly across different partitions? Even PK:ORDER# SK:ITEM# would make more sense to me because it would be following the previous pattern of one(PK)-to-many(SK).

@lucisetumbrae 2 года назад

Superb talk. thank you.

@ikgo 4 года назад

Great talk, Alex!

@bigdbassn4489 4 года назад

I am new to DynamoDB and am probably missing something here. Why does order item have a different PK? I thought you could only have one primary key for the entire table.

@richardbrennan1487 4 года назад

Thanks for this really great video. I'm surprised at how much you can do with DynamoDB in terms of querying etc. You've shown some really great strategies for querying. It's like you can do almost anything that you can do with Sql. Can you, therefore, give some guidance on when not to do things in DynamoDB e.g. when does the extra effort become too expensive, when should you stick to sql/ mysql?

@shellderp 3 года назад

Is the placeid removed immediatelly or is it eventually consistent? I didn't understand why you would put all your entities in one table

@arormann03 4 года назад

Great talk! Thank you

@csandrade 4 года назад

Really good Talk! 👏🏽👏🏽

@stevealves7306 4 года назад

Great talk !

@fabian.f97 3 года назад

@36:35 unsure if I'm wrong here, but surely this would break if you placed 2 orders on 1 date when they share a status?

@wale-A 3 года назад

yh, but i think in the ideal the date-time will be used instead of date, so that way there can be differentiated. i'm guessing he simplified it.

@ChrisShenton 4 года назад

Thanks, Alex. Your SKs are prefixed by the entity type, but the USER SK had "#PROFILE#" which was prefixed with a hash (at 27:24). Why is this one done differently?

@Cenot4ph 4 года назад

probably just because it's a sort key, to distinguish the formatting from the partition key. This is totally up to you as a developer anyway how you choose to create your composite keys. I have a similar setup to this

@alexbdebrie 4 года назад

Good question, Chris. It's one that a few people have asked, so I probably should have explained better :). The short answer is "It doesn't really matter". I mostly wanted to do it to make it look nice. I wanted the User Profile item to appear before the Order items the table, but the Sort key is sorted lexicographically. This means "PROFILE" would be *after* "ORDER". As such, I prefaced the "PROFILE" record with a "#" so that it would appear before ORDER items. If I had a use case of "Fetch a User Profile and the User's Orders", then I may want this sorting so that I know my User Profile item would always be the first one when changing into objects in my application. There are a few other reasons you might need something like a "#" to help with ordering as well. A little hard to explain in this comment section though.

@ChrisShenton 4 года назад

@@alexbdebrie Thanks for this, and I look forward to a more full explanation in your upcoming book :-)

@alexbdebrie 4 года назад

@LaggyOnline Just to make sure I understand -- is your question why do the filtering in DynamoDB? Instead, just fetch all items related to a user (both Orders and the Profile), and then filter client-side to get just the Profile? If I'm understanding correctly, my response would be that it could be a significant amount of extra data you're reading without any benefit. In this example with 11 items that are pretty small, it doesn't seem like a big deal. But what if a User has 200 Orders? Or what if each Order has a lot of data on it and is 50KB per item? Now you're using extra read capacity units to fetch data that you're just going to throw away. Also, if your item collection is more than 1MB, you'll have to page through multiple requests just to find that one item you want. Hope that helps. Let me know if I misunderstood :)

@alexbdebrie 4 года назад

@LaggyOnline Ahh gotcha. Yea, if you wanted to fetch two 1-N relationships in a single request, you're probably going to need to make multiple requests *unless* you have an idea of how many related items there are. One note is that you can basically model two 1-N relationships within a single item collection. For your sort key, have your parent item right in the middle, with one relationship going ascending and one relationship going descending. I've got a few examples detailing this further in the book --> dynamodbbook.com

@joelemmer4111 4 года назад

I was wondering if putting "USER#" at the beginning of a partition key (as you can see at 21:31) would ever result in a hot partition? I guess I'm imagining that the partitions would be created based on alphabetical order of partition keys so in the above scenarios there would be a lot of entries in the partition for partition keys starting with U, but probably something more sophisticated is going on. I just wanted to check.

@alexbdebrie 4 года назад

LaggyOnline is correct -- partition keys are hashed before being placed on storage nodes to prevent this issue. DynamoDB even announced some interesting rebalancing features recently where they will move frequently-accessed items to a different partition to help alleviate pressure on other items that happened to be on the same node. aws.amazon.com/about-aws/whats-new/2019/11/amazon-dynamodb-adaptive-capacity-now-handles-imbalanced-workloads-better-by-isolating-frequently-accessed-items-automatically/

@Expateer 3 года назад

Not a dig on DynamoDb in particular, but he glosses over the severe modeling issues you'll encounter in this kind of database. It seems to work fine in the scaled-down data model he demonstrates, but in a more real-world scenario, NoSQL will be unable to reasonably deal with queries such as "find the average sales volume per month of customers who spend 80% of their purchases on bananas that come from Guatemala". The model would have to be designed and pre-joined in advance to answer this specific query. You'll find yourself creating multitudes of models, each designed for a different query. I suppose that's good for people who sell compute and disk space, tho.

@Y2hlc3Rlcg 2 года назад

what will happen if your ORDER#123 would be your PK and SK would be ITEM#123 ?? 27:05

@pabloreyes6442 4 года назад

@Alex DeBrie really good talk, every time I try to watch the Rick talk I end by watching yours to clean my mind hahaha. But a big question for me is not about access patterns but the way to store the data in real life thinking in FE -> BE. From your table is very clear to me to use USER#userId as partition to store users and orders, make sense every time you create an order from a user you store it one by one so a user can not create many orders at the same time. But in case of items, when you create an order, the same order can have many items, so from UI point of view, you don't create items at the same time you create an order, you pick them from a list, so to store an item you should not have any order Id yet right? And in case you do have a list of items, every time you create an order, you also store each items of it? I don't get it. Thx for this video :)

@theaungkhant 4 года назад

This is exactly the question that is running in my mind right now. User creates an order, which will get an OrderID, and from there with that OrderID, it further populates the order item from the list to store inside the table? Wouldn't that be using multiple writes?!

@ahmadnabil2441 4 года назад

I'm also having the same issue.

@shahzadgul8933 3 года назад

So from GUI prospective user select the products and once the order is placed these products will be stored on the same order as Map/list; similarly how Alex showed one-to-many relation between user and user_address. Or did I misunderstood your question?

@tango12341234 3 года назад

If Amazon is switched over to using Dynamo DB for their shopping cart how can they guarantee ACID transactions when someone buys stuff on shopping cart. Dont you have to use a relational database for that?

@ianpogi5 4 года назад

Great talk Alex. Thank you very much! I have a question though. Let say we have another order status called 'DELIVERED'. How do you filter orders with status 'SHIPPED' or 'DELIVERED' efficiently? In sql, it will look like "select * from orders where status = 'SHIPPED' OR status = 'DELIVERED'".

@alexbdebrie 4 года назад

Thanks, Christian! Glad you liked it. Good question. One follow-up: do you know you'll be filtering for just those two statuses (SHIPPED and DELIVERED), or are you wanting the ability to flexibly specify multiple statuses (SHIPPED and DELIVERED here, but maybe another access pattern has SHIPPED and CANCELED) ? If the former, you could use the Composite Sort Key pattern or the Sparse Index pattern discussed in the talk. If using the composite sort key, make the sort key something like 'SHIPPED_OR_DELIVERED#", which would allow direct queries on that. If using the Sparse Index, then you would have a property that only existed on items that were in the SHIPPED or DELIVERED status. Create an index using the property and then query that index. If you want the latter -- ability to have some flexibility around fetching multiple statuses -- it's a little tricky. DynamoDB wants you to be very specific about the access patterns you have. In this case, I would just make two parallel queries to DynamoDB on the index with the Composite Sort Key -- one that queries for Status = 'SHIPPED' and one that queries for Status = 'DELIVERED". Does that make sense?

@ianpogi5 4 года назад

Thanks @@alexbdebrie and yes it makes sense. My use case is the later. Closest one is the "Refine by" section in amazon.com where you can select multiple brands and/or sellers. Looks to me that dynamodb is not suited for this use case.

@alexbdebrie 4 года назад

@@ianpogi5 Making two parallel requests isn't the end of the world either. Where you really get into trouble with DynamoDB is where you're waiting to make multiple, dependent requests that create a waterfall. That's when your requests get slow.

@ianpogi5 4 года назад

HI @@alexbdebrie if I do parallel requests, then how do I take in to account sorting and pagination?

@nathanjones9963 4 года назад

@Alex DeBrie How would you query for both order and user information when you only have an order id? Is it only possible with two sequential queries?

@noli-timere-crede-tantum 4 года назад

That's what the "inverted index" pattern is for, which he explains at 28:10 In other words, the original composite key is PK=USER#nathanj, SK=ORDER#123 and PK=ITEM#456, SK=ORDER#123. The inverted key (a global secondary index) would give you PK=ORDER#123, SK=USER#nathanj, PK=ORDER#123, SK=ITEM#456. Thus, if all you have is orderId=123, you can query for get(PrimaryKey="ORDER#123"), and that will return the item whose sort key starts with USER# and ITEM#.

@nathanjones9963 4 года назад

@@noli-timere-crede-tantum yeah sure but there will be a second query required to get user details that were not denormalized into the order record.

@SaaSPro-de 4 года назад

Very good! Thanks!

@dwpalme2670 2 года назад

Dammit, now I have to go back and rework my tables ;(

@luchitoskt 4 года назад

Thanks:)!

@HankMarquardt 4 года назад

This is a little dated, but just found from the context of your book being released. Wondering about the STATUS sparse index example ... I'd imagine status needs to be queried as part of many business processes ... once the status moves from 'picked' to 'ready', doesn't the shipping process need to query all the 'ready' status items? Burn another GSI? ... then it goes from 'shipped' to 'delivered' ... again, burn another GSI? Or is there some implied micro service architecture that once a status is achieved it's in another app and at some point the order status in this process will be updated just by order_id? Or maybe an inverted index from the USER# STATUS#ORD_DATE? Seems the last is problematic as can't there be multiple orders for a customer with the same STATUS#ORD_DATE? In the inverted index, since you'd need the full STATUS#ORD_DATE to query as PK how do you handle a distant backorder that's now filled?

@HankMarquardt 4 года назад

Maybe Composite key GSI PK STATUS#ORD_NO SK ORD_DATE?

@HankMarquardt 4 года назад

Anyway, great talk, enjoyed it and this and the one by Rick always make me think!

@Hirschischnitte 4 года назад

I can't help my self... working with DynamoDB feels like cheating... Rearrange / combine my data as long as it fits all my access patterns.

@diegosasw 3 года назад

I have the same feeling. Quick access at the expense of too much accidental complexity. Not really concerned about the cost of duplicating everything for each access pattern we need (i.e: a secondary index is essentially creating another table with some or all attributes projected/duplicated), but the modeling, maintenance and complexity seems just too much compared to a MongoDB or similar document DB. Add pagination to the mix and it's just crazy.

@prathprath265 4 года назад

its look like redis patterns with query power.

@bhanu4j 4 года назад

So we will persist the same dataset repeatitively based on access pattern and hence different secondary index. We are adding lot to Storage cost here by adding same records with different sk and tags to pk and sk attributes. The solutions need to be simplified in nosql world! Miles to go....

@diegosasw 3 года назад

this is a problem with DynamoDB and key value stores, not with NoSQL in general. In MongoDB you can query, index and filter documents without having to worry about duplicating information for every single access pattern.

@Eliecer2000 4 года назад

examples?

@prathprath265 4 года назад

Look like redis pattern. :)

@swayamraina4564 4 года назад

related lectures: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-DIQVJqiSUkE.html

@SP-yf1ib 6 месяцев назад

I was with you till you added the model for order Item here - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-DIQVJqiSUkE.html. It seems like a bad idea to put Order Item Id as a primary/partition key, as the order items of a particular order is supposed to stay together in a single partition for better performance. Now by making it as a primary key, you're going to have multiple order items in different partitions, making the query perform badly. And just to overcome that we're adding extra index and all, making it less efficient overall.