Great Video! A small point to add, Most of the NoSQL offerings also offer, Consistency levels for users to choose. So, If I want to make sure my users read always consistent data I can have strong consistency which means a write is acknowledged only when a quorum of replicas have also acknowledged it. This makes sure, consistency is present even when one of the replicas go down. But obviously the tradeoff is the writes are slow. If availability is preferred over consistency, then Eventual consistency can be choosen in which the writes are acknowledged when the present replica writes it in memory, hoping that all other replicas catch up with the write "eventually" .
Agree! This is what allows us to aggregate and read fast on NoSQL. I made a mistake in the video by stating that reads are slow. Reads are in fact faster in NoSQL than standard RDBMS as long as consistency requirements are relaxed.
Writes are fast in Cassandra if replication factor is local_One. Although if you change it to quorum then obviously it adds to the throughput of the transaction. It’s all about the system requirement at the end of the day. 😃
@@gkcs Has anything changed with MongoDB 4.2, are the writes any faster considering we get to keep our consistency? Also, We know that NoSQLs go for Availability over Consistency but with MongoDB 4.2, you can guarantee the consistency and also I get to keep my availability by scaling across many shards. MongoDB 4.2 (FULLY ACID ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-iuj4Hh5EQvo.html) They claim that they're "the only database to fully combine the ACID guarantees of traditional relational databases with the speed, flexibility, and power of the document model, and an intelligent distributed systems design to scale-out and place data where you need it." (www.mongodb.com/collateral/mongodb-multi-document-acid-transactions) Which also puts me to a question that why should I use RDBMS over MongoDB when with horizontal scaling I will have a hard time but with MongoDB 4.2 it's built-in with all it brings. Does MongoDB > RDBMS in 2020 after the 4.2?
It is a difficult skill to be able to understand/comprehend the lower layers of a given technology AND also be able to present it in a clear, concise manner that many can grasp. You have this skill and are able to present the data in a way that is simple with stacks that are complex. This is why being an "instructor" or "presenter" requires skills beyond just knowing the technology really well. Anyway, I appreciate the videos as its a wealth of valuable information!
In Cassandra, “strong consistency” is typically expressed as W + R > RF, where W is the write consistency level, R is the read consistency level, and RF is the replication factor.
When a Non DB guy can understand this.... there is nothing else as better ! Thanks a Ton for you Videos. Your Videos are one of the things that helped me through 2020 Lockdown.
Was just randomly browsing to know about NOSQL and I must say i couldn't move forward without watching full video and I feel confident with the concept. Thank you Gaurav
Great video Gaurav. You have simplified it so much. I have one doubt though... At 6.20 when you are mentioning the 4th point, you say NoSQL are good for metrics/analysis, etc. because it easier to perform operations like average age, total salary, etc. At 7.30 you are saying these are not read optimized because data will have to be read from each blob of data and then perform some operation like sum or average. I am confused about this part.
This is misleading. Not all NoSQL store data as a key-value pair. What you are talking about is only a subset of NoSQL. There are 4 meain types of NoSQL databases: Document, key-value, wide-column and graph.
wow, I am understanding now many things which I had already worked on and faced the technical issues and never used to get the "why" part from my architect's talks. Thank you Gaurav.
Then you were never an actual architect. Most people think they are architects, it takes time... Unless you got 20 years in multiple industries, companies, environments you can't be truly an architect. Working in MS, JAVA, WEB, Service, Networking, Infrastructure, CICD, UX, Security from every possible way I think I have a better understanding. This video is your novices or juniors not architects
Hi Gaurav, Thanks for this great video and all other videos. I'm extremely benefiting by your videos. Basically, I'm a Mechanical Engineering graduate, with zero CS/dev background but currently work as a Technical Writer with one of top technology Giant in the world. I want to transition to product management and one of areas I lack is technical design. Yours videos are helping me in those aspects. Kudos your great effort. Appreciate every bit of it.
How I look RDBMS and NoSQL databases are used based on the requirements or use cases. So, while designing any application we need to understand them first which @GS has done very well. I love your presentation skills @GS and I won't mind mentioning the same in your videos. Keep it up👍
This is the first video of yours that I saw but amazing way of explaining bro. This video is great for someone like me who had absolutely zero idea about NoSQL Databases since I have always worked with Relational Databases only. Subscribed!
Very informative video. In continuation would be nice if we would cover scenario or use case in which column or document or graph are best suited. What should be our cheatsheet while choosing DB among (Cassandra, MongoDB, Neo4J, RAIK etc.)? Just a small note, when we do update in Cassandra, not the entire row is managed only the column which got updated is/are kept in MemTable later consolidated into SSTable.
Interesting point. The next topic has been voted for by the community to be: Master Slave Architecture. Once it's done, I will conduct another poll. Let's see when we hit graph databases 😁
@@gkcs Thanks for swift response, if we compare case by case when to go with Document-based/Column-based/Key-Value-based/Graph-based then nothing like that. MySQL do have JSON datatype since 5.7, If storing everything in single JSON - mysql do also have support. If we pick few scenario and do the data modeling / designing with different family document/column/key-value/graph and try to understand for which use case which is best suited & why? Please suggest if you come across any such article? Thanks again.
First off, great video, must have taken a lot to put together I have been in this field since 1992, and I have seen it all, mostly. The Crays, the rdbms, object oriented Nosql dbs and so on. I tell you though, you can’t do a lot of stuff you can in rdbms with nosql. And yes the other way is also true. Each has its place and application. You can expect rdbms to stay for another 30 or so years at the least! Anyways, great video, keep up the good work.
Hi Gaurav, Really nice video. It must have taken lot of effort. One point, Consider situation of altering a table and adding salary column. In SQL it will acquire lock and slow the process. In NoSql it will faster. But column won't be added in all pre existing documents. In will break backward compatibility if user tried to fetch salary from any pre existing document. Thanks
Thanks Saurabh! Yes it would, and that's one of the issues with constraint addition here. A check for null with a default value could be added in the application layer perhaps?
@@gkcs Hi Gaurav, In SQL queries present in our code will throw ColumnNotFound. Don't know how we fetch in NoSql. Maybe due to json structure, we get that flexibility.
I think one of the major use case scenario of NOSQL is fast scanning/fast read due to the indexing feature which holds all column family together, so I think there is some ambiguity in there at this particular point,
love your explanations, but 1) Joins are made easy in MongoDB. you can make use of $lookup. But being in NoSQL, ideally it is always best to remember the one basic reason why we want to choose to avoid complex joins (more time), relationships.
@Gaurav Sen Is the quorum value always computed while reading? If there was a profile write to node 5 , and 5 crashes , but next minute the read request goes to node 1 , which doesn't have the profile, why would it go to other replicas to see if profile exists? This way if its anyway going to query all nodes and hence defeating the purpose of horizontal scaling. In which cases all replicas are queried and quorum value is computed? Or does it compute when customer has opted for Strongly consistent DB ? Or only when a node crashes?
Consistent hashing is to minimise the data movement in a cluster. Databases cannot afford much movement between nodes anyway, so a replication architecture is the way to go.
thank you. how to scale my MySQL database? i am using 1vcpu,2gb ram digital ocean droplet, which contains a node js API and MySQL server, it works fine, but after 10-12 days MySQL queries become slower. how to scale MySQL? i have 50% ram free and only 3-4 % cpu is using most of the time.
Hey Gaurav, great video. I have a different question. Do big companies like facebook take backups of huge user data? If they do i'm assuming a backup would be a heavy load (read time) on the database. How would they do that while simultaneously serving user requests. My initial assumption is that they don't do the whole backup at once, but rather in parts at different times. Also a different strategy i can think of is a time based solution , essentially do your backups for a given country when it's night time as we can assume the load will be significantly lesser and avoid potential conflicts. Would love to hear your thoughts .....
Hi Gaurav, I have one question can you please guide me ,Out of Big data project in real time which No sl db is mostly used and which one we should learn first Mongo DB, Casandra,Hbase ??: I am currently workinh in ETL testing and want to move into Bigdata Testing so confused where to start ??
Can someone share more the point made at 6:33? Does it mean that it is cheaper to do calculation for NoSql? The datasets are to be pre-processed before storing for NoSql?
Question: Does most other NoSQL databases provide aggregate functions? I remember from working with cosmosDB that I couldn't do group by. Is it just azure's cosmos that group by isn't a thing.
Good explanation.. one request is it possible to change the subscriber name to some thing like related to technology Because it's getting confused as it looks like personal video instead of knowledge sharing video... As you are doing more videos on tech knowledge based videos Thankyou
I know this isn't what this video is really about, but is it a good idea to store "name" as a single value in SQL? I'm a novice programmer and I've been taught to split them up into first and last name as part of the normalization process.
Question. In another video, I saw that NoSQL is used in places where there are a lot of reads. The explanation given was that join is a costly operation and would be computation expensive. I see the opposite in this video. Forgive my ignorance. Could someone explain which is the correct way to go? If both are right, could someone kindly explain the difference?
Gaurav, I have a doubt about understanding quorum. You have mentioned 2 scenarios: 1) RF- 3 , Quorum -3 . If server 5 fails and even if data is present in 1 or 2, u said we will return failure. 2) RF-3, Quorum -2. If server 5 fails, and if either of 1 or 2 has data, u said we will return data. What exactly is quorum? According to point 1: it is like the number of nodes that should agree on a point According to point 2: is like the majority out of the total number of nodes(= QUorum value) that should agree on a value. Please, answer my query.
both point 1 and 2 are same. Quorum is like the number of votes required to pass a bill. In point 1 only if we have 3 minimum votes we shall pass..and in point 2 it's 2
Hi need a suggestion, I'm using kairosdb and MongoDB in my project. now I have another requirement to store particular data in to cassandra. Kairos written on top of cassandra, will there be any disturbance for my kairos????? If I use cassandra?
Gaurav, you have tremendous ability to articulate modern day computer science concepts. Its great to see someone so young having this charisma and tech flair which is a rare combo. I have been in software for 20 years and sadly i was never taught like this or then around early 2000's there were no youtube channels like yours. You are redefining online learning with your videos. Keep it up mate.
Hey, 1. When you say that NoSQL is better for insertion and retrieval, you say that Relational DBs are slower because you will have to join the two tables with address as foreign key. If that's the scenario, we should not normalize the table here and have address also in the same table for faster reads. Also, locks are still present in implementations of NoSQL DBs because of concurrency. So both are comparable actually and anyone can be slower than the other as it depends on the way you have decided the structure to store data. Ofcourse, retrieving data with a join in SQL will turn out to be slower. 2. Then at 7:58 you say NoSQL is not read-optimized and in advantages you mentioned its used for aggregations as well as. I think it makes sense not to have such comparisons made out as NoSQL databases can be implemented in variety of ways. In-memory databases are also subset of NoSQL and pretty pretty fast to read because of RAM coming into play. Talking about consistency at scale, one is user's choice of stronger v/s eventual but we can't say that Relational DBs will be slow because locks will still be there in implementations of NoSQL databases because of concurrency. To win that, we have concept of granular locks in Relational Databases so that lock is placed only on necessary part. I think the journey to scale any database is pretty complex and only experience can teach us exactly how to do that. RU-vid is running on Relational DB after all - but I am sure its much more complex with portions of NoSQL, CDNs, caching, and what not flying around in their backend architecture.
Hey Rachit, thanks for the good insights. Here are my thoughts: 1. We "shouldn't" normalize a table is a difficult decision to make. Normalizing offers a logical break up of the data, ACID properties and is the way SQL databases are designed to run. It's true that a lot of systems run their analytics on NoSQL databases. The reason for this is the non-normalized form of data in these tables. 2. I made a mistake here, you can have a look at the pinned comment for clarifications. Reads and writes are usually faster in NoSQL, because it is rare to take a ton of locks in these tables. Aggregations are faster if you have a columnar database, and that along with faster read times contribute to the performance. The implementation decisions influence a lot of how a database performs, but the core ideas on which it runs are very important. The NoSQL databases are, by definition, denormalized and expected to take fewer locks. Eventual consistency, Quorum and Fast updates are selling points of databases like Cassandra. Using them for diametrically opposite goals wouldn't be wise, in my opinion.
@@gkcs I don't get the part that aggregates are faster in NOSQL than SQL at 06:21. How is total salary at 06:21 in NOSQL faster than find all ages of all employees in company at 7:30? Both will require getting each blob, parsing etc which won't happen in SQL columnar table. appreciate your reply.
Great video as always :D, Just one correction , data is kept in self sorted structures like (AVL/ Red-Black Trees) in memory, and once the memory is past some threshold value (say ~50kb), then the entire memtable(the self sorted trees) are dumped into a SSTable (on disk) which is efficient as the data is already sorted.
Listen guys... this way of classifying a database as either "SQL" or "NoSQL" is meaningless. The relational datamodel is pretty well defined, but there are countless of non-relational databases each with their own datamodel and consistency guarantees. It's meaningless to lump them all into 1 category and talk about them as if there were all MongoDB.
I am kind of confused that how No-SQL databases can have the read time as a disadvantage along with data aggregation as an advantage. Isn't a lot of read required for aggregating the data?
I think that the key is database sharding and partitions. If you don't take this into account, read across partitions its a very expensive operation. If your database is well design, then you are ok.
@@amythpaddy8527 I think what Gaurav means is most of NoSQL databases give inbuilt support for aggregation like mongo. Please go through this link you might get your answer docs.mongodb.com/manual/core/aggregation-pipeline/
I think read performance can vary a lot for each NoSQL database. To further complicate the matter small reads vs large batch reads could have vastly different performances. Data aggregation tends to be large batch reads I would assume. Perhaps that is the reason for his statements?
21:50 Two corrections 1) Cassandra do not store log file in-memory. Rather it stores on disk. That's how it can recover from during failures 2) Cassandra do not append in-memory it rather appends to commit log on disk. So, in-memory the data is sorted Memtable. When it reaches certain limit then it flushes to the disk on SSTable
Thanks for video @Gaurav. query: If read times are slower in NoSql, how is it that it is good for aggregations? If I want average age per city, I still have to go through all records and and entire blob for each record, this is expensive right? Can you elaborate more on this?
Love your videos dude! Just watched about 8 of them and I now use them in the gym because you can pretty much follow along just listening to you :-) Quick Tip: Set your camera to manual focus and increase the aperture (will keep everything in focus) a little as your camera 'hunt's for focus and is a bit distracting on the eye. Love the quality your producing so please keep them coming!
thanks very match for this great video ,i have a question please answer me :i want to do android app social media but i still struggle with myself for chosing type of database i search in stackoverflow i found answer for my question : that you should use graph database :is the graph database is a great solution for my problem ?? (english not my original language)
Your Videos are always Amazing Let me know how can we Develope programming skills on Algorithms and all advanced level coding skills apart from a average programming . I am in BCA 2nd year learnt C, C++, now learning DS. Can i go with learning competitive programmings . Please Provide Best way to Improve Coading Skills which really worth for future.