JLCPCB PCB Fab & Assembly from $2! Sign up to Get $60 Coupons: jlcpcb.com/?fr... In this episode we discuss about Dynamic RAM, and lear about all the fundamental-level challenges that makes it slow compared to Static RAM.
It seems your true interest is in electrical/computer engineering, vs. CS - this is a VERY welcome addition to RU-vid, where this kind of clear, concise, well-animated, and perfectly-paced content may already exist for CS, but is essentially nonexistent for CE. Please don't stop :)
I am an electrical engineer, have some knowledge of some programming and hardware description languages, have been working for many many years, and am familiar with many educational materials and lectures. I can tell you this much, your way of presenting and showing things are by far the most intuitive and understandable I have ever seen. I am also familiar with the Branch Education videos, which provide an incredible level of detail and make it tangible to the viewer. But your presentation goes so much deeper into the basics that not only newcomers but even experienced people can't help but say FINALLY. I take my hat off to you and your work. The greatest respect! PS: Maybe you could make a video about why NAND flash or memory in SSDs, for example, is slower than DRAM/SRAM. Especially in view of the fact that you have described very well how SRAM gets its "storing" property when reading, a further presentation could show that it is not comparable to NAND flash or non-volatile memory. In my opinion, this would be a good bridge to explain the last bottleneck (memory) in terms of CPU(cache)->RAM>non-volatile memory.
I don't have any actual degrees, but I do have the knowledge and understanding of most of these fields from computer science to software and hardware engineering and I was thinking the same thing in regard to volatile vs no-volatile memory. I'd also be curious in a fine detail explanation of atomic operations.
thanks for such a great, well-explained, well-structured video explaining how the DRAM works at the hardware level. I paused and thought many times to let my brain process and understand it thoroughly. really appreciate your hard-working
When I was writing a piece about Commodore (my background is in economics), I always thought it was weird for Jack Tramiel, the cheapest man in the world, to use SRAM in his first 2 successful home computers. Seeing how complicated it is, and the necessity of DRAM refresh, I understand why now
@@oglothenerd For example, majority of DDR5 server sticks stay below 5000MT/s* while consumer DDR5 quite often has 6000MT/s* and some even go above 7000MT/s*. This of course comes with instability issues (even without current Intel blunders) so PC build guides recommend keeping clock speeds modest for professional users. And now I know where that instability comes from! * I use MT/s here, which is likely the correct unit, but RAM clock speed units provided in specs are a hot mess, so take the unit with a grain of salt. The main point still stands tho.
Manufacturers lately have not trusted their capacitors lately. Based off The refresh cycles every 30 ms And they're adding error correction to the die in ddr5.
There’s a type of memory in-between dynamic and static RAM called “lambda memory”. It uses a reverse biased diode as a constant current source and a pair of depletion mode mosfets. It’s called a lambda memory cause the current through it rises then falls with voltage. Because of parasitic resistance/leakage the current actually rises then falls then rises again. Due to this it can store 3 voltage states at constant current (LOW, MID, HIGH). It also has another enhancement mode mosfet for reading/writing. In total, 1 diode, 2 depletion mosfets and 1 enhancement mosfet gives 1 memory cell that has 3 states and uses 7 semiconducting junctions. Compared to a normal static memory cell that has 2 states and uses at least 12 junctions, it stores 1/3 more data in 1/2 as much space. Quite a bargain. Unfortunately, it is not in use due to very tight tolerances for manufacturing each memory cell since the nonlinear behaviour of the silicon is sensitive to even slight imperfections or doping variations.
Very, very well done! I'm also a software engineer - albeit one hiding a logic analyzer and soldering iron behind his back. So a few comments and nitpicking. At the level of your video I think the finer details of newer memory types such as DDR memory imho can safely be ignored. That's basically additional details that should be left for a closer look. Memory refresh is complicated and some memory controllers have ample options to configure refresh. For many if not most hardware this is undocumented black magic. This kind of setup is usually performed by firmware in early initialization right after the CPU itself is ready. Depending on the CPU the cache SRAMs might contain junk such as data with invalid parity or ECC which needs to be cleared first because the CPU can perform a cached memory access without blowing itself up. Even an implicit memory access such as for the stack could do so, so at this stage subroutine calls are taboo. You'd think hardware'd make that easy but wiring a reset line to everything that needs to be initialized for use once after reset is something that gets harrdware guys rioting and point their fingers at the software guys "you do it" 🙂 Next memory controllers. The cache may be working but DRAM still can't be accessed. In older systems that was as simple as writing a few constants into the memory controller. Some systems had to perform strange voodoo to figure out how much memory is actually physically present. Yet more modern systems have feature known as SPD allowing the system to detect the quantity, type and speed of memory. Software then programs the memory controller accordingly. Still no stack access so such code often is a unholy mess of deeply nested C macros. Optimal programming includes the use of features such as interleaving where possible and many more, so it's not trivial Once this has been completed memory may need to be cleared to avoid parity or ECC errors. And after that sanity arrives, everything else is much simpler now that "normal" programming is possible. Some very old systems are nice in that they don't need any software initialization at all for their memory controller. The hardware is (in hindsight) unsophisticated enough to just know what to do without being told to. Finally caches may not always consist of SRAM. One of the systems I worked with had three levels of cache. The CPU was switched to a different architecture and the new CPU architecture had a different bus, so conversion logic was needed. But that logic slowed down memory access. That was fixed / kludged (you choose the term) by adding a 64MB L4 DRAM cache. The only DRAM cache I know off but I haven't researched that exhaustively.
Yet another absolutely amazing video! I am so happy you make those videos, because they answer a lot of questions that always bothered me but would take hours or days to research. And that visual aspect helps so much!. Do you plan on making a video about clocks and their role in components? They are seemingly crucial for computers, but don't really appear in your videos to reduce complexity. Yet I'm still curious how clocks keep everything in running and in sync, so such video would be amazing!
Nice job. You succeed to simplify while remaining complete. Continue in this direction... I would like to see more programmers having interest in hardware mechanics. It really helps understanding complexity and program improvements.
This is a very well made video. As an electrical engineering student, I'm sending this channel to all of my classmates for our list of educational RU-vid channels
i love your YT-videos. I have always looked for an explanation how the actual hardware of CPUs works. And I always got these zoomed out views that never explain how storage and code actually is stored in hardware. Thanks
I felt bad b/c I thought I hadn't subscribed. Realized I had subscribed many videos ago. Good decisions were made. I really hope you keep making these videos. You have a clear talent for it. And I LOVE learning stuff like this. I'd much rather watch this than the brain rot BS others are making. 10/10 channel content
I don't know... but i subscribed to your channel a long time ago & finished all the Previous videos... still i didn't get it recommended... this Channel is Seriously Criminally Underrated by RU-vid algos... your contents are truly unique...
16:53 - in earlier computers, the ram chips handled a single bit of a memory location and you put multiple together to make up the width of a memory location. The address lines on the chips would only handle half of the address lines and would have pins that indicated whether the value on the represented at that moment a row or a column (RAS & CAS).
I see how scalability of DRAM is so good! You can basically keep the part with mux-demux and sense amplifiers and extend in the other direction, for which only a bigger decoder is needed. I wonder, on the physical RAM memory chips, does this concept get used? The memory chips themselves are rectangular, so it is tempting to assume that the mux-demux and sense amplifier part is along the shorter side.
Simplifying to the essentials to make it understandable to people not involved in designing chips, which is the vast majority of viewers. Great job deciding on what is important to show in detail, and what to show with vague blocks with no internal detail.
This is great. Your explanation was very easy to understand. I wish you had explained the refresher more in depth; it seems quite difficult to make it work with the existing circuitry you explained before.
Your content is incredible. I did startt getting confused around 8 minute mark. Idk why but all of a sudden it stopped clicking in my head. Just wanted to provide feedback.
Very nice description of an issue, I knew only from the design level of CPU's. The Z80 for example, has an inbuilt dynamic RAM logic and refresh generator.
Please make a video discribing all the type of memories like.. Registers, Cache, Flash, Magnetic disks, Ram, Rom, and comparision in terms of cost speed etc. It will be very helpful...
As a professional JS hater I really appreciate your hardware-related videos, can you recommend any books or other materials for learning more about electrical engineering?
Since you're usually reading a lot of data at once tho when you do perform a read operation, computers do often cache the data in fairly large chunks, up to a few kb. It makes sense to do that since you're already reading the whole row, each extra byte you grab has pretty minimal cost. As far as refresh rates go, iirc 50-60ms is a pretty common interval, but you could go lower to like 20-30ms if you were really concerned about rowhammer attacks or similar
Fun fact: A MOSFET in integrated circuits is a 4 Terminal device. The BULK. it is just always connected to the gate when it is produced as a discrete device. In the more elaborate icon for a MOSFET, this is made visible.
You can get four-terminal FETs in discrete form. Usually the B terminal is used for biasing. You don't see them often, but they are most common in high-frequency usage where the input signal is so difficult to work with that just putting the correct bias on it is difficult - it's sometimes easier to have the bias voltage entirely separate from the signal path.
Hopefully, that's the next episode. The reason I haven't finished it is because I'm also developing an interactive tool (related to that topic) so you guys can use it in the browser.
GPUs need an entire book maybe even a couple of books to explain. Primarily because GPUs rely heavily on fixed function hardware so you need to explain every function how they work and why they are needed.
2:32 the transistor model doesn't actually map the gate model of the static ram cells: the transistor model is a double-(cmos)-inverter cell with two access transistors while the gate model is a double-nand cell with no further access method except of course the second input from both nand gates.
One thing I was hoping you'd go over, but it seems you (understandably) didn't, is what makes up the capacitors on the physical die. I know that MOSFETs are said to have parasitic capacitance, so is that what's being used? Or do they have special layers of materials for capacitors, specifically? How big on the die is a capacitor, compared to a transistor? I've seen conflicting answers when I try researching those things on my own. One of the things I remember seeing is a VLSI layout diagram that showed a capacitor being absolutely massive compared to a transistor, which would seem to imply that it should be possible to pack more SRAM into a space than DRAM, but if that were the case then nobody would use DRAM.
Modern DRAM processes use quite complicated techniques to fit as much capacitance as possible into as small an area as possible. They typically use trenches with a conductive layer, a layer of oxide, and then another conductive layer to make the capacitor. So although the capacitor is quite big compared to the transistor, most of its size is in the vertical direction rather than the horizontal direction.
Very good explanation, but not the reason why DRAM is a bottleneck. The main issue is physical interfaces and process technology. DRAM cells are incompatible with standard CMOS (especially the newer FinFET / GAA nodes), preventing integration close to the computational logic (unlike SRAM). eDRAM was somewhat compatible, but required special considerations so had a similar issue with locality. If the RAM is not next to the compute, then data needs to be moved between the two (i.e. the bottleneck is the bus itself). SRAM caches have a similar issue. For DRAM, this is worsened by moving the memory to an external chip since the interface now has additional physical requirements (think HBM vs DDR). Granted, the read speed of DRAM arrays is still around 200 MHz compared to 2 GHz for computational elements, however, reads could be sequenced with co-design to provide an effective bandwidth of 2 GHz (the GPU in the Playstation2 did something similar through parallelism).
Wouldn't adding a second transistor that acts as an and gate, requiring both row and column to be powered solve the issue with reading a whole row of bits?
What happens when there is a cache miss during an instruction such as a load or add or sub instructions that has to now use slower ram? Also similarly what happens when it has to use drive - Is that when a process goes D state temporarily?
well the core has an out of order execution unit so it will execute other instruction instead or even switch the thread. there is always time wasted waiting and the trick is to make it do as many as possible.
Capacitors are awesome at storing data. DRAM designs just push them to the edge. If you want to pay more for DRAM, you can get some on a better process and with much lower density but very long storage times. It is just not economical nor necessary. Do refresh times affect you personally or your PC experience? Absolutely not. So while I do agree that DRAMs don’t have very long data retention - they absolutely don’t have to. I am using electrolytic capacitors in a relay computer memory. They retain the state for many hours without refresh. You can stop the clock, turn the thing off, later in the day turn it back on and all the memory and register content is retained.
Because as soon as the gate of the MOSFET is opened, the capacitor starts to discharge. The voltage provided by the capacitor is proportional to its charge; so it decreases while it loses charges. In that scenario bitlines would be outputting a variable voltage. Also, using the bitlines as capacitors doesn't require the process to completely charge or discharge the capacitors, so when those capacitors need to be refreshed at the end of the operation, the process won't require to wait for a fully charge or discharge, which for obvious reasons would take more time than only charging or discharging it "a little bit".
I think we need to remove the ghost from the previous write and also bias the pre amps on the edge. The capacitors are charged if not too old. We just want to know the polarity.
One thing to note is that the bitlines, like every other component, have some capacitence. In fact, when you consider that to reach however many gigabits of capacity a modern RAM chip has, a bit line must cross hundreds of thousands of rows, we can see that these are rather large structures, and so the capacitence can be presumed to be quite significant. Did I mention that we also want to make the data storing capacitors as small as possible so we can fit more of them? What this means is that DRAM capacitors are very likely not large enough to fully charge or discharge an entire bitline, not even close. But they don't need to. If we precharge the bitline to right about the threshold voltage for the sense amplifier to switch one way or the other, then just a small change in voltage is enough to tip the balance and read the bit.
Historically, computers were commonly word based rather than byte based, with the definition of "word" varying with the architecture (i.e. 16 bits, 24 bits, 32 bits, 48 bits, 64 bits)