I really likes and follows your sessions. I found these videos are really helpful for building up concepts but it would be more helpful, if you can provide practical approach, vendors details and its alternatives.
Appreciated !! maybe u could have reduced the time by talking to the point more specifically, It would have been more interesting (Just a suggestion ;))
Good video. My 2 cents. The data should be replicated across different Availability Zones rather than across just different Data Centers. It is entirely possible that due to a Geographic catastrophe 2 DCs in a zone as equally susceptible to destruction. This is what most of the top Cloud Providers use.
Thanks for the video. For DFS you have multiple copy of data file, I understand that when some machines break it has back up. Okay that is one benefit however in terms of efficiency is it efficient then having one file system w/o a back up, how does having multiple copies of data file be faster?
Hi, I'd like to ask that how can we make system design of anti-virus scanner system, upgrading software on a fleet of machines, or distrubeted Botnet? Are all of these related with the distributed file system logic? How can I find the system design details of above subjects? Are there any videos of your that are uploaded? I looked into your video library but I couldn't find the directly related one. Could you please help me? Thank you
In the video, you mentioned that the name node will tell the client which data node to use to upload the file. However, files are stored in chunks into the different data nodes the cluster has. Does the name node also pass the information of how many chunks need to be created for the given file and which data node to use for each chunk?
The default HDFS block size is 64 MB. The block size of a data product can affect the performance of the filesystem operations where larger block sizes would be more effective you can configure the blocksize and every namenode will use the configured blocksize
@@TechDummiesNarendraL Thanks for the reply. I understand that the block size can be configured, but I am more interested in learning how the file gets split in chunks and how the name node knows this. Is it the responsibility of the client (hdfs library) to split the file into chunks and then talk to the data nodes to upload the file?
@@MrSauce714 but who is the source of truth in this case? I would imagine that name node is, but the video is not explicit about this. Not sure why you are talking about cache here.
@@minostro what I mean is at runtime the client application would have a mapping of chunk to node mapping, this could be stored in memory. I may be wrong
Sir, As u said that if we will store our file in Node1 its replica will be stored in Node 3... right.? but if Node3 will Crash ,, MASTER or Name node will ask NODE1 to replicate this file to NODE2 if space available....So my question is if space is full in NODE 2 ,,where replica of file will be stored at ..?
First paper to read: static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf Second paper to read: queue.acm.org/detail.cfm?id=1594206