Rebooting the LMARV-1 RISC-V project!

Robert Baruch

Подписаться 38 тыс.

Просмотров 17 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

6 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 102

@bignope5720 3 года назад

junk-in-the-junk-room caused depression is real, it's so much easier to work on projects once you've fixed that.

@ksbs2036 3 года назад

Yeah, it sucks the energy right out of you by continuously peeling away your attention. Eventually the frustration builds up and I personally fall back onto unproductive social media. ADHD, family medical challenges, and pandemic just aren't that helpful either.

@KingJellyfishII 3 года назад

THAT WAS 2 YEARS AGO? wow it feels like yesterday lol. Glad you're revisiting it!

@xnooga 3 года назад

finally! I was thinking about this project yesterday!

@champstyl 3 года назад

I just watched the OLD series 4 days ago and thought it would be niece to see more. Thanks for your great contend.

@johnemory7485 3 года назад

You too? Fortuitous timing.

@ab0tj 3 года назад

Thanks for revisiting this! The LMARV series has been my favorite so far.

@AussieRossco 3 года назад

Yay! Soooo happy to see this, thank you. I have just finished another course in my studies. Have to rewatch the series, I’m getting so much more out of it now.

@GloriaTheFox 3 года назад

It's good to see that this project is returning

@veontube 3 года назад

You could use the surplus address lines of RAM to create bank-switched register file and then use it for fast context switching!

@flutterwind7686 3 года назад

Thanks for revisiting the RISC-V idea! I was a huge fan of the original project series!

@martinmckee5333 3 года назад

Glad to see you taking another swing at this. It's a fascinating project. I'm absolutely looking forward to this and I love the new approach.

@JG-nm9zk 3 года назад

The risc v project was how I discovered this channel. Excited to see how this goes!

@anothertijuano 3 года назад

Thank God you're back ♥️

@davidprice875 3 года назад

Welcome back Robert. Very keen to see where you go with this as I have just started exploring nMigen to build stuff on a ULX3S ECP5 dev board and RISC-V is the main target.

@BrightBlueJim 3 года назад

"Ben Eater" level clarity of explanations. Subscribed and liked. Also, it's good to see an example of implementing the RISC-V architecture, maybe taking away some of the fear and loathing aspect of it.

@BenjaminWheeler0510 2 года назад

I love this series! Can't wait to watch more even though I'm late to the party

@brendanhansknecht4650 3 года назад

I'm really excited for this! Glad it's continuing.

@Roanokekidstech 3 года назад

Awesome, glad to see it. Exciting to see where it will go.

@GegoXaren 3 года назад

I was just thinking about this the other day... Good to see that this project is back.

@vincei4252 3 года назад

Welcome back, Robert. Good to see you're well.

@steveh4595 3 года назад

Robert, I feel for pain when you mention that you have all of these projects on the backburner and you get depressed looking at all of the equipment to dive into in your "junk room". I feel the same way about software projects I want to do and soooo many computer books (many are 30+ years old b/c I like pre-Internet topics) I want to read. I mentioned on a Discord channel relating to Erlang that that all of the people I have come across who either program in Erlang or creating compilers on top of the BEAM (i.e., the VM for Erlang) are so smart. One guy replied, "I don't think it's that. I think it's because we are stubborn". THAT'S the key Robert, be stubborn, be unwavering in your resolve. Do small blocks of concentrated effort. They all add up to achieving your goals.

@kaypope1581 3 года назад

Yay! Good to hear that this series is resuming again .

@kunimitanaka1079 3 года назад

Good to see you pick this back up. Looking forward to more.

@ADR69 3 года назад

This is awesome. Good to see you again. Also, I think most technical people have dealt with that analysis paralysis from the junk room. I know I have.

@jamesrivettcarnac 3 года назад

So happy you are back with RISC v

@niclash 3 года назад

Wow! A Vector Graphics MZ!!! It was the 2nd computer I had access to back in 1980. A lot of fun memories from those days, incl hand typing in machine code in the built-in memory monitor to play games...

@hjups 3 года назад

It's great to see you back on this project, and designing it in HDL first is probably a very good idea. I think you may be over-complicating parts of your architecture though. First thing to keep in mind is that the instructions require two reads and a write per Instruction, not per cycle. That's the programmer state view / architectural view. The actual implementation depends on the microarchitecture, where you could have a single bi-directional data bus which first reads RS, then reads RT, then writes back RD. If you wanted to, you could even pass the flags and ALU op on that same bus. There's nothing wrong with trying to make every operation a single cycle, but it's not necessarily required. Another way to look at it is you have a large cycle (1 per instruction) broken down into multiple phases, where each phase puts something else on the bus. For your function units, you may want to look at some of the modern out of order CPUs, not to implement something out of order, but to see how they broke down the function units. All OOP CPUs basically have fetch logic -> decoder -> issue bus -> function units, which is very similar to your backplane idea (i.e. the issue bus is a back plane which can issue instructions to function units based on the decoded instruction). From that perspective though, you probably should not combine the sequencer with the memory access component, and instead should implement a load/store card. If you take the multi-phase idea, then you could dual purpose the load/store card to do both data R/W and instruction reads. Also, if you compress the backplane down to fewer buses, you could run them parallel, say X, Control, Addr, Data, Mem Control. Then your load store could just bridge the two lanes. Or even compress the memory bus down further so you have X, Control, AD, and Mem Control (in that case, you memory bus would look a lot like PCI, which could be an interesting architecture choice - i.e. implement your memory bus using the PCI PHYS, or even using the PCI standard). To better utilize SRAM for a register file, you could multi-phase that as well. So for example, let's say you run your main phase clock at 1 MHz. Then, you can run your internal register file clock at 4 MHz, and read 8-bits at a time. That way you can have a single x8 SRAM which stores your 32x32-bit register words. Though, to make sure you don't have synchronization issues, you would probably be better off running the register clock a little higher so that you are guaranteed that you have to correct output regardless of the clock phase offset. I believe that would be +1 cycle, so run the register clk 5x the phase clock. Hopefully that was helpful.

@KaneYork 3 года назад

Yeah, multi-phase control is usually the way to go - some instructions won't fit into a single clock cycle budget, no matter how hard you try!

@hjups 3 года назад

@@KaneYork If you look at one of his newer videos, you see that he sort of ran into a problem there where he has to sequence control signals based on sub-phases. Which is essentially making each "cycle" multiple cycles long. E.g. he is using two 6 phase clocks, and may need to increase the phase count to avoid multiple bus drivers.

@martandrmc 3 года назад

It's been a long time coming. I cant wait for more videos, as it seems your small hiatus has ended.

@tylerreeves8026 3 года назад

This video got my like and sub, just wonderful work your doing!

@roganmurphy6198 3 года назад

I'm excited for the rebooted series

@GeorgeTsiros 3 года назад

you still have the 28s's box. i can respect that. historical calculator. the birth of rpl, if i recall correctly

@TomStorey96 3 года назад

Yay, you're back, and LMARV is back too! This is absolutely my favorite series, very interested to see how this turns out with your new design process. Would it be worth having a "register clock" which is essentially 2x the "system clock"? The register clock gives you two cycles to do your work which translates to a single system clock cycle.

@basvalkema4532 3 года назад

Read on falling clock and write on rising clock. Or the other way around? Lol, only now see 20:00

@BrightBlueJim 3 года назад

Another somewhat similar approach is how Motorola implemented a two-phase clock for the 6800 series - by having two overlapping clocks (in quadature), so that any of four phases could be synthesized just by ANDing the two. Robert's use of an overlapping clock specifically for write pulses is probably a good solution, though, since it can be used everywhere else in the system that needs similar setup and hold times. If you use a 2x clock, then you still have to AND the clock with something in order to determine WHICH state you are in, which is the way Intel liked to do it back in the 8-bit day.

@JB52520 3 года назад

Getting kinda depressed and asking "What is the point?" is my specialty!

@RyanThompsonrthomp 3 года назад

Glad you’re back bubba

@vanceshipley5828 3 года назад

You're teasing us with Zork I in the background of the intro. Hope to see the conclusion of that project one day.

@RobertBaruch 3 года назад

nMigen truly unlocks a lot of things. So it's more probable that I'll be dusting that project off eventually.

@Dhruv.Wadhwa 3 года назад

I've just found gold!🔥🔥🔥🔥

@nigelhungerford-symes5059 3 года назад

Great work man

@BobBeatski71 3 года назад

Yay !!!

@jope4009 3 года назад

You _could_ use dual-port memories. But you need an additional MUX for the case of reading and writing the same address. One MUX input is the read output of the memory, one input is the write data. In the case of read_addr == write_addr, switch the write data to the output of the MUX, else use the read data of the memory.

@seriousmarble2561 3 года назад

The idea to use nMigen for simulation of logic ICs is nice :D Maybe you can get an estimate of the timing constraints by synthesizing the design for some platform and use a tool like icetime.

@fredo514 3 года назад

It’s great to see this project alive again! Did you plan for interrupt lines from the IO cards?

@petedavis7970 3 года назад

Been cleaning up my workspace for weeks. A little here and a little there. Though it might not look it to an outsider (say my wife), I've actually made a great strides, but I still have a ways to go. lol.

@beatricemeyers4640 3 года назад

So it looks like you'll have "internal" buses and "external" buses. One thing I might suggest is having a simple Memory Management Unit (MMU) that bridges the "internal" and "external" bus. This could be as simple as an address latch and a data latches. You can treat it like a normal functional unit on the internal bus and have it manage the address and data buses for the "external" cards. The "sequencer" (Control Unit?) can then handle data movement to and from the MMU. You might also want to implement a "cache card" to save round trips to the "external" bus for at least instruction memory. Regarding "internal" bus movement for data, you could make your operand fetch take multiple cycles, using the same bus for bidirectional data transfers. You would need to add input latches/registers (I think you were using buffers before anyways) to the functional unit cards, but your bus can be much smaller. Instead of 3x32b you can do all of it with 1x32b. This also helps you avoid having duplicate register files just to handle simultaneous reads. You can still use your scheme of reading on the rise and writing on the fall, it just might take 2 cycles instead of 1 cycle to fetch both operands. You might leverage that unused write slot during the first cycle for updating the Program Counter (PC += 4) or some other operation.

@RobertBaruch 3 года назад

Design choices! I did think of multiple ideas, and they all ended up being a compromise between number of cycles and amount of hardware. My line sort of settled on fewer cycles for most instructions.

@donwald3436 3 года назад

Let's do this!!!

@AmauryJacquot 3 года назад

yay ! I was afraid the project was just shelved forever !

@obiwanjacobi 3 года назад

Love it! That sequencer block needs more fleshing out. You know at least you need a Fetch and Decode unit in there somewhere. Are you going to do any pipelining?

@0toleranz 3 года назад

Pipelining wouldn’t work on this simple bus architecture because you have simultaneous incoming and outgoing data streams in addition to the internal register transfers. For pipelining you need a matrix switch fabric that can route your incoming data/instruction to the decoding, transfer the register values to the Alu/shifter/whatever and the result back in the registers and transfer the last operations output into the databus output buffer. Your bus is now an active component in the cpu not just some lines.

@beautifulmind684 2 года назад

老哥你太牛x了，in English, man you are amazing!!!!

@PeterCCamilleri 3 года назад

Setting an uninitialized value to 0 can hide propagation of bad data errors. Better to use a crazy value like 0xDEADBEEF used by IBM in their dev tools.

@esra_erimez 3 года назад

"And... this happened" God, that was brilliant!!!

@kalj7 3 года назад

Care to explain the joke to us lesser beings? :P

@veontube 3 года назад

@@kalj7 2:05

@kalj7 3 года назад

@@veontube Yeah, but is that something I should recognize? Isn't it great to talk a joke apart...

@ksbs2036 3 года назад

@@kalj7 yeah, I don't get it either but I'm an olde pharte so I figured it's a modern meme of some sort

@xspager 3 года назад

Pet bug!

@kai990 3 года назад

Your Pet Bug is super cute, think you could make a little video about him/her?

@JonnyRobbie 3 года назад

I have a question - do I inderstand correctly that you use fpga-like environment for a concept and proof of work and then after you make sure it all works you'll produce the cards/hardware directly without fpga, right?

@florianrassl2213 3 года назад

Nice

@chuuni6924 3 года назад

There's no part of the RISC-V specification that requires you to do two reg-reads and one reg-write in the same cycle. With latches on the ALU, you could very well sequence the same operation over several cycles instead. You only need 2R1W registers if you plan to make a tightly pipelined, scalar architecture. I'm certainly all for making that kind of architecture :), but there's no reason that you *need* to do that if it makes the project more complicated than you need it to be.

@RobertBaruch 3 года назад

That's true. I should have said that my assumption was to make enough instructions single-cycle.

@arnauddurand127 3 года назад

14:02 Can someone explain why we need two banks to access two registers at the same time? It makes sense for memories due to single addressing but I don't get it for the flip-flops case.

@robiniddon7582 3 года назад

Because otherwise you would need 32 register flipflops and 2x32 latches (x latch and y latch) to send the bits from R to X and/or Y. RS1 can be the same register as RS2. So that's 3x32 bit chips instead of 2x32 if you parallel write the registers.

@robertmenteer3462 3 года назад

Here’s a thought on reading/writing to the same register: use a 6 bit address, 0xxxxx for reading, 1xxxxx for writing.

@BrightBlueJim 3 года назад

Not sure how this helps. You end up stripping off that high bit anyway, I think.

@robertmenteer3462 3 года назад

@@BrightBlueJim The thought was if you used double ported memory the memory would see different addresses for the x/y bus vs the a bus.

@BrightBlueJim 3 года назад

@@robertmenteer3462 I see. I don't think that addresses the problem, though. The problem is that he needs to read two addresses and write one, all in the same cycle. If the two addresses were just offsets within the same chip, then writing could only be done to one of these at a time, so you'd have to use a second clock pulse to write to the second copy of any given location. By using two memory chips, each write updates both chips (both copies of the same location) in the same clock cycle.

@pseudo_goose 3 года назад

At 12:05 you say that RV32E has 32 registers, but that's not right. The base integer instruction set, RV32I, has 32 registers, and RV32E reduces it to 16.

@RobertBaruch 3 года назад

That's correct -- I made a mistake. I meant RV32I.

@thesuit4820 2 года назад

More blinkin-lights. Always more blinkin-lights.

@GabrielDalposso 3 года назад

Doesn’t the RV32E have only 16 gp registers instead of 32? That would make the processor a RV32I instead

@YellowsourceOrg 3 года назад

Why didn't you consider 2 clock regions? That would make the design much clearer.

@BrightBlueJim 3 года назад

In case you haven't seen it yet, this project is featured in a Hackaday blog post: hackaday.com/2020/11/09/the-logic-chip-risc-v-project-reboots/#comment-6293177. I'm looking foward to the rest of the modules!

@Handskemager Год назад

Can you use the register card for floating point also? Granted you have a fp card like the alu card?

@bennetb01 3 года назад

What shades are those? :)

@RobertBaruch 3 года назад

Lutron

@howardjones543 3 года назад

@18:24 this doesn't sound right. The ALU takes a non-zero amount of time to do its operation... wouldn't the write be on the next clock? (or the falling edge or whatever, but not at the same time)

@tomasz-rozanski 3 года назад

Sorry for off-topic, but you sound just like little Luke in 'The Haunting of Hill House' show.

@fnordipard Год назад

omg

@microcolonel 3 года назад

Why wouldn't the memory run instructions like the ALU? It needs rs1 and rs2, and can write to registers too. Maybe you could put the opcode on the bus, and decode it on each card? Also, you said RV32E has 32 registers, but it has 16 registers.

@RobertBaruch 3 года назад

Whoops, you're correct. I meant RV32I.

@RobertBaruch 3 года назад

My doodles led me to conclude that if I didn't want to have each instruction take at least twice as long, I needed separate buses. It'll be a choice.

@mandarbamane4268 3 года назад

2:08 This happened lol

@wmlye1 3 года назад

+1 for "Zed" Canadian here :-)

@404Anymouse 3 года назад

Shouldn't it now be LMARV-2?

@RobertBaruch 3 года назад

Wellllll.... originally I meant LMARV-1 to be the discrete version, with FPGAs slowly replacing each piece (LMARV-2) until the whole thing was just one FPGA (LMARV-n). In reality, I'll never do that.

@crasbee 3 года назад

15:19 RIP headphone users :D

@RobertBaruch 3 года назад

I know, right? :

@johnjosephlonergan 3 года назад

Like the channel! nmigen doesn't support delays? I've been building my own 8bit CPU called SPAM-1 that has a somewhat similar multubus design superficially perhaps ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VJgfgP1Q89U.html and I've similated the components in Icarus verilog and spent a LOT of time approximating the delays of the datasheets of the 7400 devices. I found this invaluable for spotting glitches or other propagation delay side effects. How do you plan to discover at least some timing issues prior to committing to hardware? I understand"careful design" but simulation and automated tests have been invaluable spotting the problems that my care and attention didn't avoid.