I am convinced that single table design would be a good pattern to work with DynamoDb. But, does it defy the Database-per-service style for Microservice patterns ? Thoughts ?
With the amplify framework it uses multiple dynamodb tables (each gql model creates a new dynamo table) which is a little bit confusing as it goes against the single table principle.
One question we are calculating hash of a primary key to identify the partition so in case new partition(s) are added, don't we have reshuffle all the data?
This is generally solvable with GSI-overloading. The number of access patterns having 20 GSI’s enables when you’re using generic GSI PKs is insanely high
What happens when you have the sort key encoded with your partition key (such as a user name) and then the user name changes? You have to go update user name in a bunch of locations? (in partition key and then in all the sort keys)?
The reason to limit the projection in a GSI is not primarily because of performance, it is because of storage costs. Do you want to pay real money for storing all of that data that you may not use in a GSI?
In his "why no JOIN" section, he goes over why 1 table is better than joining across 2 tables. But I have a use case where, e.g., I will only ever want either the org's info or all the org's user's info. So in that scenario, I could do 2 separate tables and never need a JOIN. Is this a scenario in which 2 table is okay, and 1 table won't offer better performance?
His example of "keeping track of LeBron James Twitter followers" is very similar to my use case. So in that scenario, I either want who is being followed (metadata) or their full list of followers, no JOINs needed.
Also have to think about things like monitoring... You now have to monitor two table... Two might not be too much but if the number of table increase, it can become of consequence. Like in everything, it's all questions of trade-offs. Understand what you're gaining/losing in the choice made
“Know your access patterns in advance” … so much harder to get this right than it seems, makes dynamo development so much slower and frustrating to work with…I found using much simpler access patterns to be easier to manage
Hey Alex I have question regarding secondary index you mentioned while mentioning ticket handling, couldnt you just create USER#USER_ID as primary key and TICKET#TICKET_ID as sort key so we would have additional record with user as primary key and his ticket items as sort key
I am little bit confused like i have a table name : my-application-table 1st Item : pk= USER#12345 , sk = PROFILE#12345 , image= {some_url}, name = Taman 2nd Item : pk= USER#123456 , sk = PROFILE#123456 , image= {some_url}, name = Alex 3rd Item : pk = USER#12345, sk = POST#1 , postbody = This is some tutorial 4th Item : pk = POST#1 , sk = COMMENT#1 , comment = my own comment , user = 123456 In this case when i want to show user comment in the post along with the user image and image in the comment but in the comment section i wont have user image and name. So which approach should i take 1.Complex Data Type : I cannot use this because user might change image or name 2. Duplication : I cannot use this too because user image and name might change What could be the correct approach for this type of structure or should restructure the table.
You have to allow some data duplication when switching SQL to NoSQL database. So, to handle user profile change, write some cloud functions to ensure data changed in all duplicated place.
By faaaar the best video about modeling single-table in Dynamo. In comparison, the official AWS videos assume you know a lot, and for those who are just starting NoSql + Dynamo, AWS videos are advanced.
For a single table, how do you efficiently keep track of dynamo stream events, and make sure not all events make a stream event? Or do we just allow them to all make stream events and filter them out with a "--filter-criteria"?