Good question! DynamoDB excels at OLTP-like queries where you're operating on a specific, limited number of rows -- create an Order for this Customer, or view the 10 most recent Orders for this Customer. These queries partition well and are narrowly focused based on fixed parameteres. DynamoDB is not as good at 'table-wide queries', such as 'Give me all the Customers in my application'. There are some ways to do that, but it's tricky. It's usually best done in an OLAP database (if used for internal analytics) or something else like ElasticSearch
This video is amazing, however I've been trying to implement this code using Java Spring Boot but I don't know how could I implement this transaction to display the message 'Customer with this username already exists.' and 'Customer with this email already exists'. In my implementation if i send the same username and email nothing happended.
Hey, thanks for the video; it's very helpful! I have a question regarding the order and order items and the way you modeled them in DynamoDB. I understand the reason for moving orderItem to a different partition key, as you don't need to fetch all order items when fetching a customer with orders. But why did you use the following partition key (PK) for the order item: ORDER#<order_id>#ITEM#<item_id>, and not just create a PK with ORDER#<order_id> and two sort keys (SK) with ITEM#<item_id>? For example, instead of this PK: ORDER#1#ITEM#1, SK:ORDER#1#ITEM#1, <skip>, GSI1PK: ORDER#1, GSI1SK: ITEM#1 PK: ORDER#1#ITEM#2, SK: ORDER#1#ITEM#2, <skip>, GSI1PK: ORDER#1, GSI1SK: ITEM#2 do this PK: ORDER#1 SK: ITEM#1, <skip>, GSI1PK: ORDER#1, GSI1SK: ITEM#1 PK: ORDER#1 SK: ITEM#2, <skip>, GSI1PK: ORDER#1, GSI1SK: ITEM#2
Couldn't the main problem (essentially 'employer' users having restricted access to a users data) have been solved with field level @auth in the GraphQL Schema, instead of using the custom Lambda function? I haven't played with field level @auth yet, and kind of new to this but it kind of seems redundant to me. Thanks for the vid, learnt a lot about resolvers!
Wouldn't adding the country as the partition key create hotspots? Since we should be choosing high cardinality attributes as parition key? (in composite sort key)
Are you talking about for a global secondary index (GSI)? The write to the GSI won't happen during the write to your table -- it will happen asynchronously after the write completes. This ensures the initial write is fast and not delayed by propagating to indexes.
@@alexbdebrie how can we model to fetch records based on createddate or enddate. i searched in internet everybody is suggesting to create GSI on dates will it overload the partition ? any suggestions would be great
@@b-or-m-ln6hk Either on your main table or in a GSI you would need to set up a structure where the partition key is what you're grouping by (e.g. CustomerId if customer orders) and the sort key is what you're filtering by (createdDate or endDate). Then, do a Query operation that does an exact match on the partition key and filters on the sort key accordingly.
Hey Alex, I’ve gone thru quite a bit of the book, but where I’m confused if I have a query using the pk only, and it returns different entities in the result, say a post details and a list of comments, I guess you can’t leverage paging when the entities are mixed? Do you know what I mean? Is it ok to run separate requests, one to get the post details and one to get the comments in order to page thru comments? Thanks Alex, the book is amazing and these videos are great too!
Hey Gary! Great question, and glad you're liking the videos. If you do want to implement paging on the related item (e.g., comments), then you probably will want to make separate requests. In that case, you don't need to put them in the same item collection / use the same PK -- you can separate them. In doing so, I'd still try to make it where you can identify the relevant comments w/o needing to fetch the post page first so that you can fetch the two pieces of data in parallel rather than needing to fetch the post first, then follow up for the comments. Let me know if that helps!
@@alexbdebrie it absolutely does and thank you Alex for your generous response! Merry Christmas to you and your family!! I wish the best for you for 2023. You have helped me out immensely with your book, videos etc this year 😊
There are updates on partitions limit to make it less real but overall great content 👏👏👏 what i realised after 2 decades of using sql & then mongodb is once you hit any kind of scale you tend to be forced to use any database with very similar limits as dynamodb or you end up with huge spikes in lag, or bills, or both. It's easier with sql/mongo at first but usually people tend to build app with expectation of high usage and if that happens, those databases may hit you hard at worst time - during the traffic spike, when people actually try to use your app.
In DynamoDB, how do you handle an access pattern in which you need to get a list of partition values? For instance, say the partition-key is userid, and the attributes are firstname and lastname for the sake of simplicity (but in reality there is a lot more attributes). How do I then get a list of all the value pairs [(uiserid_1-firstname, uiserid_1-lastname), (uiserid_2-firstname, uiserid_2-lastname), ...]? Obviously, doing a scan or a query for each partition will consume too many read requests. So far, I think the best option would be to create a composite key: The pk is a random number prefixed with, say, list-users- to use as a partition-key; The sk is the `userid`. Then, all the items in these partitions are going to be the value of the userid partition duplicated. (I hope this makes sense, if not I can create a visualisation to better convey the idea). Doing it this way, you only need to perform one query on that random numbered partition-key; hence much more efficient. Is this the correct dynamodb-way of handling this access-pattern?
That's a pretty tricky pattern. Just to be clear -- you want a list of *all* PK + attribute combinations? To do this, wouldn't you need to see all the data anyway? Table scans aren't efficient, but that's usually b/c you don't want to read all the data. In this case, it seems like you do?
Hey Rohit, glad you liked it. Interactions with DynamoDB are over an HTTP API, so your connection will be like other HTTP clients. The big thing here is to ensure you're using HTTP Keep-Alive to reuse the connection across requests.
Very useful video Alex. Thank you. Can there be more than one sort key in the composite primary key? For example if partition key is customerId#, could there be 4 other sort keys such as customerAddress# [along with attributes], customerDemographicInfo#[along with attributes], customerAdditionalInfo#[attributes], etc? I am trying to create a one to many relationship for unique customers in my table to their many addresses, contacts, etc.
Thanks, Bahar! For your question, will these each be different items in your table? If so, the answer is yes -- you can have different items with different sort key patterns. Let me know if I'm misunderstanding -- feel free to email me or DM me on twitter.
For this example, you could use the DynamoDB Scan operation to handle that. In general, Dynamo doesn't want you to fetch data from across your entire table. It's assuming you'll want to break it down by something (e.g. CustomerId, City, etc.), and order the results (e.g. by OrderDate, ZipCode, etc.). Hit me up if you have any examples / questions -- can try to answer those :)
@@alexbdebrie Maybe a video in which you go through how you would structure your dynamodb tables giving some non-trivial (yet a common) situtation (e.g. a dating website). Each user have a list of images, chat messages, comments, e.tc.
@@asdfasdfuhf Good call! I have some like this already. :) You can see an e-commerce example here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nsAJmyg2QU8.html&ab_channel=AlexDeBrie Also, that example is from my book, which has a bunch of data modeling techniques plus 5 full walkthrough examples: www.dynamodbbook.com/
Hi Alex nice preview. I will defintely going to buy this. I have some massive data in one RDBMS table and I would like to move it to dynamodb. Table has 600 columns. I am struggling to find some good read or link explainng the migration. Any recommendations?
Thanks, Rohit! For the migration, make sure you plan out your data model with DynamoDB and how your RDBMS data will get to that state. The number of columns shouldn't be a big deal as long as you're not exceeding the item size limits.
Thanks, Ori! This is from The DynamoDB Book. You can get the source code with the Plus or Premium package, and this video (plus videos for all chapters) are available in the Premium Package. www.dynamodbbook.com/
Hi, bro, I very need your help!!!!! I have a question about partition key of GSI. If it's hashing works the same way like partition key of the table. It writes to the same ssd, and partition key of GSI should be unique like usual partition key, because of throughput. Or if whole table will have 1 constant partition key for all GSI it will not take affect to the throughput like in the situation when 1 constant partition key will very bad for standard table?
Hey, not sure I quite understand your question but yes, the partition key of a GSI generally acts similarly to that of the main table. Thus, you want to avoid using the same partition key for all items.
Hi, Alex. I'm starting to learn AWS and your videos and writing is helping me to better understand the DB on AWS. Do you recommend to use the Single-Table design for apps made with AWS Amplify?
Great to hear! I generally recommend a similar philosophy with GraphQL. It can add complexity to your resolvers but will be more performant. That said, I've seen many folks that have opted to use more isolated resolvers and a waterfall strategy for resolving a big graph in a GraphQL query, and that can work as well.
Hi Alex these are great content and i have gone through almost all of your videos. We have a use case where we have to retrieve all customers or all orders. But whatever i have seen there we have customer or order as pk or sk. To serve this use case we have to scan whole table which is not a good idea. is there anyway to do that ?
Hey there, it's a good question and one I get a lot. I need to write up the different options on these. It's tricky because you can give them the same partition key in a secondary index, but that can often result in a hot partition that hits partition throughput limits. What is the use case for this? Is this a production, user-facing use case, or more for internal processing?
@@alexbdebrie we are designing it for production from scratch and we had seperate table earlier but now we moving on single table design. We are able to cover almost all use cases but this get all customer, get all orders and there sub tasks are difficult to approach. It would be really great if you can put some light on this. Thanks you so much alex for your prompt reply btw
@@shailendraacharya I probably won't have a blog post in the next week or so. If you send me an email (alexdebrie1@gmail.com) and tell me the basic overview, I can point you in the right direction.
@@alexbdebrie thank you so much alex and I really appreciate your prompt response for helping me in design db. I will write you email today with all details.
Thanks, Karim! There's no out-of-the-box way to get all of the primary keys in the table. You could scan over the whole dataset (which could be enormous), or you could maintain a list of the primary keys yourself.
Great video, how to handle the same order for mutiples entries in dynamodb, like PK = store#id, SK=order#id and customer in your example customer#id, SK=order#id because store and customer will share the same order, how to handle it, to avoid duplication data and unneeded WUC consume? thanks a lot.
If read from GSI, is it consider 2 read? (1 from GSI 1 from original table) Also write to original table, is it consider 2 write? ( to original table then to secondary index)?
Hey there! The read is only considered one read -- directly from the GSI itself. For the writes, you will pay both for the write to the main table as well as to any secondary index. Thus, if you have two GSIs that your item would be written to, you would pay for 3 writes (main table + 2 secondary indexes).
Hi Alex, a quick question for you. I've been trying to wrap my head around DynamoDB data modelling desing for a few days now. I have a scenario where I have 3 related entites such as video stream, product group and product item. We have a 1 to 1 relationship between a stream and a product group and a 1 to many realationship between a product group and a product item. As you can tell it's still hard for me to think out of the relational box and this is the main reason why I want to use a NoSQL db for my project. My question is can these be achieved with the single-table design and if so what would be the best way to aproach it? All of your examples I found on-line take into consideration only a one-to-many scenario. Thank you, and all the best. David
Hey David, yep you should be able to model that scenario in DynamoDB. I often don't model 1 to 1 relationships as a separate entity. Can you combine stream + product group into a single entity?
@@alexbdebrie Hi Alex, I believe so since there can only be one product group inside one stream. This is what I actually got so far. My PK is some globally unique streamID (but for my SK I decided to use my groupID (which I think is incorrect since we can have more than one SK with the same ID given the nature of the composite key and having already set a unique streamID) and the product itself ended up as a list (denormalization by using a complex attribute) where each product item is uniquely identified by its own ID. Ended up having a one, massive item that contains it all. Now the problem is that if I want to retrieve say only my groupId and my groupName (those are the only two attributes this entity originally had) I most likley will have to use a GSI. If there are more than a few scenarios where I need to query this table looking for different items I may end up creating a bunch of indices and that can bring the cost of using dynamo up if I understand billig correctly. What do you think? Any suggestions on how you would go about it? Thank you for yout time.
@@davidhodowany8029 Hey David, I think you're on the right track for some parts. There's a lot here so tough to say without really getting into the specifics :) Have you read my book? I think it can help you grok the mental model of DynamoDB and work through some examples. If it doesn't help you, let me know and I'll refund you entirely. dynamodbbook.com
Hey Jishnu, you can't use a complex attribute like a map or a list in your primary key, whether for the main table or GSI. It needs to be a scalar value -- string, number, or binary only.
Hey Alex, Very good series, watched all 3 videos, amazing, and easily understandable to people who want to get started on DynamoDB. Would love to see content like this on access patterns and data modeling also. Thanks!
9:50 is there any scenario when writing cloudformation script is actually better than deploy code using serverless framework since SLS is way easier and time efficient than cfn.
Good question. Under the hood, the Serverless Framework is generating CloudFormation so I consider that the same. I'm not *that* worried about which CloudFormation tool you use (Serverless Framework, SAM, arc.codes, CDK) as long as you're using some infrastructure-as-code tool to deploy.
Good catch! They don't really tell us (we don't need to know!), but I believe DynamoDB doesn't actually remove partitions after they're created. That said, because we don't need to know, they could change that in the future if needed.
Nope, this is all hidden from you. You can sneakily find out the number of partitions in your DynamoDB table by examining DynamoDB streams, but it shouldn't really help you much. Project to find number of partitions here: github.com/zaccharles/dynamodb-table-partitions
On the page size limit, since dynamodb has throttling on provisioned iops, 1MB or any other practical limit helps customers that their request once accepted will be served and won't be throttled in between. In order to achieve this, they need some practical limit on the total data returned.