Тёмный

Finding the Critical Path - Making an 8 Bit pipelined CPU - Part 101 

James Sharman
Подписаться 23 тыс.
Просмотров 14 тыс.
50% 1

Опубликовано:

 

30 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 177   
@weirdboyjim
@weirdboyjim Год назад
Join us on Discord: discord.gg/jmf6M3z7XS Follow me on Twitter: twitter.com/WeirdBoyJim Support the channel on Patreon: www.patreon.com/JamesSharman
@johansteenkamp9214
@johansteenkamp9214 Год назад
This Critical Path exercise reminds me so much of my FPGA endavours, where the design simulates perfectly, but fails to run the the same on an actual FPGA because of incorrect clocking constraints and critical paths. Seems that no matter what desigm process one does for digital circuits, or how modern your tools are, you are always bound to the physics of electrical circuits 🙂 Keep up the good work!
@weirdboyjim
@weirdboyjim Год назад
Absolutely! no amount of static analysis will completely replace actual testing!
@WyrdieBeardie
@WyrdieBeardie Год назад
Your videos are always a welcome addition to my feed. Thank you for making these very interesting videos!
@weirdboyjim
@weirdboyjim Год назад
Glad you like them!
@Homebrew_CPU
@Homebrew_CPU Год назад
James, nice video - brings back memories. I put in a lot of effort searching for critical paths in my Magic-1 HomebrewCPU back in 2004-2006. Like you, I was mostly interested in validating my design. It didn’t really matter if Magic-1 peaked at 3 Mhz or 4 Mhz or 5 Mhz, but it did matter to me that I wasn’t leaving performance on the table because of a needlessly long critical path. In my case, I found that memory accesses were the first critical path - but realized that with a little redesign I could move some of the address formation into the previous half clock. I had to redo my memory board and some of the microcode, but those changes enabled Magic-1 to go from about 3.5 Mhz to the current 4.09 Mhz. Once there, though, I identified 3 or 4 new closely occurring paths such that even if I improved the new critical path another would immediately crop up and I wouldn’t really be able to speed up the machine much. So, 4.09 Mhz was good enough. As far as methodology, I started with my schematics and data sheets. I guessed at what likely critical paths would be and then added up the worst-case gate delays from the data sheets. Once key paths were identified, I first looked for redesign possibilities to simplify them. Next, I shortened them by substituting key 74LS parts with faster 74F parts. At one point, I thought I’d just make everything 74F - but that was a bad idea. The “F” parts are much noisier and there is no point in using them where they are not needed. Also like you, I found that real-world performance of the devices was generally quite a bit better than the worst-case specs from the data sheets. This exercise was also another in which my logic analyzer was super useful. I wrote some special programs to test paths with the timing module of the analyzer, and for one particular path I even whipped up some special test microcode. Anyway, interesting stuff!
@weirdboyjim
@weirdboyjim Год назад
Thanks Bill! Your project is as always an inspiration! I have a small set of changes I could make that I think would get me about 50% faster but one of the changes would add an extra cycle of latency to my conditional jumps. I'm satisfying myself by adding all my ideas to a future project plan so I can focus on getting this machine finished.
@markrgreenlane
@markrgreenlane Год назад
It’s took over two weeks on and off but I’ve made it through all the CPU build videos, sound and VGA’s, I hate to think how many hours that must be but I’ve enjoyed every minuet, now to go back to the first one and start again as I’ve realised I haven’t been giving the early ones a 👍🏻 Thank you James, fantastic content, I’m now tempted to break my fear and start using SMD components as well.
@weirdboyjim
@weirdboyjim Год назад
Great to hear you have found the series interesting! Apologies if my video production rate4 feels slow compared to binge watching the older videos.
@cj09beira
@cj09beira Год назад
Stay with me on the DIP land. we will be stronger together. 😄
@markrgreenlane
@markrgreenlane Год назад
That’s quite alright James we all have work lives as well, I’ve never been one of them ones that moans at people about getting videos out as you just end up watching videos done for the sake of it (fillers) and would rather wait for the hood content to drop.
@bobfish7699
@bobfish7699 Год назад
Over and abpve the tracing of the timing of the critical path, it was a great reminder of how all the various components of the cpu link together. Very interesting..
@weirdboyjim
@weirdboyjim Год назад
Glad you like it!
@LeeSmith-cf1vo
@LeeSmith-cf1vo Год назад
That's really interesting. Unfortunately the purple line was nearly impossible to see some of the time, I think it could have done with being logarithmically boosted in brightness (so the dim bits are increased a lot while the bright bits are increased only a little) It was also interesting to see the address lines going up and down on the same spot, something you normally only see in those really confusing timing diagrams.
@weirdboyjim
@weirdboyjim Год назад
Yeah I appreciate that wasn't always clear. I did what I could when editing but there was only so much. I wish I had thought to swap the yellow / purple traces around as the yellow was far more visible.
@LeeSmith-cf1vo
@LeeSmith-cf1vo Год назад
@@weirdboyjim I didn't even think of swapping but yeah it probably would have helped a fair bit.
@roelandriemens
@roelandriemens Год назад
@@weirdboyjim Or perhaps triggering on the other channel could help?
@twobob
@twobob Год назад
@@weirdboyjim Simply wear light enhancing goggles while watching like everyone else.. just me then... gets coat
@ligius3
@ligius3 Год назад
I'm actually having trouble seeing the red lines, the purple ones are quite visible for me. Depending on the video, I just boost the brightness and they are easier to see.
@andre.xuereb
@andre.xuereb Год назад
Long-time lurker here. Just came in to say thanks for this interesting series, and kudos on being potentially the only RU-vidr with Penrose's Road to Reality in almost every video 🤣
@weirdboyjim
@weirdboyjim Год назад
Thanks! And well spotted!
@m1geo
@m1geo Год назад
Really interesting, James. Nice to be back on the CPU for a bit :) You'll be at 4 GHz in notime! :) On my FPGA implementation, the ROM are the issue on that, too. There are several circumstances of race conditions where a ROM output feeds a ROM input through some logic, which takes time to settle. That causes some issues. I ended up having to use a faster clock and register the ROM outputs with the fast clock to remove the metastability caused by the feedback.
@weirdboyjim
@weirdboyjim Год назад
Thanks George, glad you are still enjoying it! Not sure about your FPGA issue, there are no places in my build where a ROM data output directly feeds a ROM address input. It's always got a 574 d-type in the way.
@m1geo
@m1geo Год назад
@@weirdboyjim Sorry, yes, I have the same now! I had a bug in an early design that didn't have the latches quite right. It was a task to port the design to Verilog! Working on a V2 now.
@inherentlyflawed
@inherentlyflawed Год назад
@@m1geo Any updates? I'm curious, if you got your hands on the mandelbrot program weirdboyjim wrote to benchmark his system, exactly how much faster is the FPGA version? Also, what clock speeds can you get it to?
@khatharrmalkavian3306
@khatharrmalkavian3306 Год назад
Awesome episode. I haven't geeked out this hard in quite a while.
@weirdboyjim
@weirdboyjim Год назад
Good to hear that you found it interesting!
@Rouverius
@Rouverius Год назад
Yeah, most of my projects that used glue logic have been simple and didn't required a fast clock. So, this is great to see what the effects of propagation delay and how you evaluated them. Seriously, thanks for this.
@weirdboyjim
@weirdboyjim Год назад
You're very welcome! It's great to have got the clock rate as fast as this. The vga circuit actually goes faster but that has the advantage that errors are visual artifacts rather than crashes.
@pipsqueak2009
@pipsqueak2009 Год назад
Fascinating as ever. Thank you for sharing this
@weirdboyjim
@weirdboyjim Год назад
Thanks! Glad you are finding it interesting!
@mik310s
@mik310s Год назад
Fascinating project, I did something like this back in the 90's but yours if far more complex.
@weirdboyjim
@weirdboyjim Год назад
Thanks! Glad you are enjoying it!
@joeysartain6056
@joeysartain6056 Год назад
Very interesting. Another side to CPU and related circuitry I have never thought about. I have seen some 8 bit systems clocked at 3.7 MHz and wondered why. This may be part of that reason.
@weirdboyjim
@weirdboyjim Год назад
Yeah lots of the 8-bit systems fell into that general clock rate range and part of that was the general speed of the logic parts available at the time!
@sjwatt
@sjwatt Год назад
Hmm, super cool! I can think of a couple of paths I might go down we’re I in your shoes. 1) elimination of the zero flag completely, not the best option as you lose the jump if equal instruction, but certainly possible. 2) decoupling the program counter assertion from the fetch, either by having it pre saturate the address bus at the tail end of the previous instruction or by splitting it into two clock cycles. 3) adding a pre fetch stage to the pipeline that uses a hole in the previous instruction to preemptively fetch the next instruction and hold it in a register, only re-fetching it in the event of a jump It seems like deliberately delaying the bus control to account for timings is going to be a trip-up at higher speeds… I’d have to give it a lot more thought, but my instincts tell me that giving more cycles to some operations might bear fruit… instructions per cycle might not be the best thing to optimize? I’m not sure. Regardless, I love this thing and I only wish I had the time/resources/creativity to do similar projects. Awesome stuff
@weirdboyjim
@weirdboyjim Год назад
You are thinking about the right kind of stuff. I'd latch the flags at the end of the cycle so the decode sees them a cycle behind (but they get eliminated from the critical path).
@peter.stimpel
@peter.stimpel Год назад
What a great lesson, thanks a lot.
@weirdboyjim
@weirdboyjim Год назад
Glad you liked it! Digging around with the scope and checking assumptions is something everyone should do with their circuits!
@edgeeffect
@edgeeffect Год назад
I'm not saying I haven't learned anything from this channel but most things you've talked about and done, I've at least had an idea beforehand... but I've never even considered what the phrase "critical path analysis" might mean before watching this (something "real" EEs do, that I don't need to bother my pretty little head with?)... thanks for a superb introduction.
@weirdboyjim
@weirdboyjim Год назад
That's good to hear. Worrying about what is on/off the critical path is something we do a lot optimizing game engine code. Very easy for a less experienced programmer to put a lot of work into optimizing a piece of code and then discovering it makes no difference to the frame rate!
@nahkamursu
@nahkamursu Год назад
Happy new year James, hope we see more of you in 2023 :)
@weirdboyjim
@weirdboyjim Год назад
Happy New Year!
@mikehibbett3301
@mikehibbett3301 Год назад
Brilliant analysis James. I love the approach.
@weirdboyjim
@weirdboyjim Год назад
Thanks Mike! Glad you are still finding it all interesting!
@GeomancerHT
@GeomancerHT Год назад
I'm taking this advice for my software microservices architecture, thank you for sharing, I love to watch your videos on the second screen while coding :D
@weirdboyjim
@weirdboyjim Год назад
Glad you are enjoying! Hope I'm not too distracting!
@GeomancerHT
@GeomancerHT Год назад
@@weirdboyjim on the contrary, I'm really loving the series :D
@RelayComputer
@RelayComputer Год назад
From my understanding (possibly oversimplified) of your CPU design, one way to dramatically improve the critical path would be to generate the pipelined control signals all at once, concurrently with the following instruction fetch, and then just latch them to the cycle they are meant to be used, instead of generating those signals out of the ROMS within their own cycle. I would expect an improvement of maybe one third of the critical path. But of course I may be misunderstanding or overlooking things (?)
@weirdboyjim
@weirdboyjim Год назад
No, but you are re thinking about the right type of thing. The biggest change I'll make in a future build is designing the pipeline such that it get's the entire cycle for a decode. Having layers of logic form part of the pipeline rom address input makes for a simpler circuit but adds up badly for high speed. Separating the rom's into the stages didn't add to the problem (Although I did end up with things in stage 2 that would have gone in stage 1 if it wasn't necessary to balance the line counts).
@TheEmbeddedHobbyist
@TheEmbeddedHobbyist Год назад
Might have said it before, but in the old days and working with the 8086 the eproms and the ram were two slow and gave us issues with reliability. The work round was to break the program into odd and even byte and program two eproms and have two ram chips so then they can run at ½ the clock speed. Also in your case the clock does seem to have quite a long transmission line of undefined impedance so may have a few reflections bouncing around at points. This can be were then you put the scope probe on to see the issue, the impedance changes and the fault hides until the probe is removed.
@weirdboyjim
@weirdboyjim Год назад
I've been doing some planning for future builds surrounding the rom's, having the flag chain as an address input is a nasty limitation. Modifying the design so all the address inputs are resolved at the start of the clock giving the outputs an entire cycle to settle is the first big step.
@TheEmbeddedHobbyist
@TheEmbeddedHobbyist Год назад
@@weirdboyjim i do see as well you might get lots of other issues while going outside the datasheet spec limits on timings etc. having always had to work to about 80% of the spec limits as even going close to them is seen as a bad design idea. I think maybe you should move away from eproms in the logic chain and have a look at Pal's these are of the same epoch.
@lawrencemanning
@lawrencemanning Год назад
Jolly interesting analysis. You do indeed need to be very careful giving too much (or any?) weight to observed timings. It’s the datasheets that matter. It would be interesting to see manufacturer’s data on the shape of the distribution curves; might be wider then we think. Also temperature plays a hand.
@weirdboyjim
@weirdboyjim Год назад
It's been suggested that the EEPROM's maybe tested and sold based on performance buckets. So the ultra expensive 70ns parts are the ones that passed the test and the 150ns parts are a bigger window to have a large yield rate.
@lawrencemanning
@lawrencemanning Год назад
@@weirdboyjim yup. “Binning” I think it’s called. It would be interesting to buy job lots of 150ns parts, from the same batch and wildly different batches, and seeing how close to 70ns they get. None of that helps with mass production though when, unless you want to play games like Uncle Clive did, you have to stick to the sheets out of fear that the manufacturer will change some manufacturing parameter, knowing that it will still meet his published spec, as he’s entitled to do.
@OscarSommerbo
@OscarSommerbo Год назад
It is interesting to see that much of the timing issues are down to using discrete logic (entirely expected) like the 8 input nor-gate not being available isn't really an issue when you bake your own silicon. But I wondered if some modules push the actual output signals through LEDs? I have a vague recollection that this happened early in the build. Sure, the propagation delay will be dwarfed by chip settling time, but there are an awful lot of blinky lights. It is always a good idea to check your assumptions especially when writing code, today we use so much code written by others, and we just assume it is correct and performant.
@kvadratbitter
@kvadratbitter Год назад
Interesting, would you mind expand a bit for a “happy amateur” on the negative effect driving LEDs has on the quality/delay on the signals? Is it negligent or major at these speeds?
@OscarSommerbo
@OscarSommerbo Год назад
@@kvadratbitter Since everything in electronics has a propagation delay time taken and an LED has an associated resistor in series, we do get "some" delay. It is probably in the sub nanosecond range, so it is probably negligible on its own, but stack up a few, and it starts to matter. And remember we are looking at 250 ns slices, so negligible is relative. Granted, passive components don't have much of a delay (most datasheets doesn't list it) then it is more about the length of the data path, but again that is probably too small to matter. Stack up 100 LEDs in series and I bet you would start to see some propagation delays, again probably not the issue here, and I think James quickly changed to tapping off the signal and using an LED driver to drive the LEDs and let the signal path not include the LEDs. But some of the boards in the build are rev 1.0 and is fairly old, so the ida popped into my head. As for signal quality, I have no idea how noisy LEDs are, but there seems to be some interference for radio operators. Again a single one is probably fine but 100 will probably really mess with the S/N ratio. If I had a scope, I would check. You need a square wave and as many LEDs you can jam into a breadboard, and a scope to check the signal.
@weirdboyjim
@weirdboyjim Год назад
The LED's are not in series so they don't delay the signal like that, what they can do though is load the line to slow down the rise time of signals.
@OscarSommerbo
@OscarSommerbo Год назад
@@weirdboyjim Thinking back, I seem to remember an early video where the LEDs were causing issues, and that was when you started using driver chips, but I can be entirely wrong. Since you built it, your knowledge of it supersedes anyone else's. But as kvadratbitter pointed out the delay would be very small indeed, if the LEDs were in the data path, but they aren't, so I was wrong. But at least we fed the algorithm some interaction.
@damouze
@damouze Год назад
A very interesting video! I think 4MHz is already quite an achievement in and of itself, but a bit of optimization never hurts. Two questions popped up in my head while watching, one of which you already answered I think, namely the one about the speed of the ROM and RAM chips. The other one intrigues me. Watching all this and trying to follow your train of thought made me wonder if there is actually something in the microcode itself that could be optimized. Since pipeline stage 1 is smack dab in the middle of the critical path, is there logic in the microcode that could be sped up, or exchanged with logic in pipeline stage 2 to take things out of the critical path that do not need to be there?
@weirdboyjim
@weirdboyjim Год назад
Thanks! Glad you are finding it interesting! The logic in the pipeline stages is handled by pretty much using the rom's as a big lookup table to replace what would be a complex set of logic. I don't think there is anything you could change there unless you had some extremely specialized knowledge about the internal layout of the roms chips and what address changes would be handled quicker. I would very much be looking at removing any inputs that aren't resolved at the start of the cycle.
@JohnDlugosz
@JohnDlugosz Год назад
Don't forget to add it to the playlist!
@weirdboyjim
@weirdboyjim Год назад
Noted. I've been meaning to ask if you were happy with the mandlebrot benchmark, you were the first to suggest it.
@Zadster
@Zadster Год назад
Lots of interesting points in there, especially surprising to see the Zero line delayed so much. I guess a 74HC4078 or similar is going to solve that. I do wonder if the physical and electrical size of the main bus is causing slow slew rates because of fan-out (drive current) limitations and line capacitance / inductance + load resistance. Rather than use a fixed time delay, you might find it more reliable to derive the main clock from a higher frequency, and use a divider to generate a second phase delayed by 90-degrees, like the 6502 uses.
@weirdboyjim
@weirdboyjim Год назад
The zero calculation is only about 20ns, sure you can make it faster but will never make it zero. it would be better to move the dependencies around so it was no longer a constraint.
@KelmarFiresun
@KelmarFiresun Год назад
One thing I found when I went looking for multiple input nor gates was it seemed to be difficult to find them. Worse they seemed painfully slow. My plan for my design is to use some diodes and a single inverter. For the pipeline ROMs, (and maybe this is considered cheating by your design goals), I considered using some PLAs. The other ideas I had seen were to use some RAM chips that are preloaded with the contents of your lookup ROMs during the reset phase of your CPU. I look forward to seeing how you handle these challenges.
@aaronjamt
@aaronjamt Год назад
Using RAM for the lookup would be very interesting, especially if it could be self-modified. Could make a crazy encryption algorithm that modifies what each instruction _does_ in addition to modifying its own code... it goes past "self-modifying code" and becomes "self-modifying code on a self-modifying architecture" Edit: could also be used to implement new instructions for super specific instructions. If you're trying to run a game or something and you need fast transfers between a specific set of registers, for instance, you could make your program overwrite the instruction table and replace an instruction you don't need with the very specific one you do
@weirdboyjim
@weirdboyjim Год назад
Lots of people have been talking about replacing the rom chips with ram chips but it would take a bunch of extra components to do and it's worth asking yourself if you can make an improvement elsewhere with fewer parts. The point of analyzing the critical path is to make it shorter which can be much more than just asking "Can I make this one thing faster?".
@aaronjamt
@aaronjamt Год назад
@@weirdboyjim Makes sense and I understand that that would increase the project scope. However, I wonder how hard it would be to create a daughter board that would plug into the 4 ROM slots, have RAM chips on it and somewhere to put the original ROMs and connect to the main data/address bus. It would probably be similar to the temporary ROM-in-RAM board you built for program code while testing it, except for 4 chips at once. Maybe if/when you're running out of ideas for how to keep the series going that could be another option ;) I think there's actually a chance that it could be useful to have a self-modifying instruction set, not to mention the fact that this would be (AFAIK) the only architecture with the ability for a program to not only modify itself at runtime, but modify the actual CPU architecture.
@KelmarFiresun
@KelmarFiresun Год назад
@@aaronjamt Could be, but as James points out, there's a trade off with the complexity of the circuit. This is one of the reasons why I was considering PLAs for my design. Which I think most purists would shun; but they are fast and almost identical to how ROMs behave.
@aaronjamt
@aaronjamt Год назад
@@KelmarFiresun Yeah, that makes sense. Like I said, I just think it could be a cool idea if/when he runs out of things to add to it.
@jonnoMoto
@jonnoMoto Год назад
Just realised my dog finds your videos interesting. Think it might be the blinking LEDs
@weirdboyjim
@weirdboyjim Год назад
Everyone likes the blinking lights!
@janhofmann3499
@janhofmann3499 Год назад
So your CPU is now officially overclocked👍
@weirdboyjim
@weirdboyjim Год назад
Apparently So!
@RelayComputer
@RelayComputer Год назад
The Zero flag delay is always a surprise when you design a CPU like this. If you are aware of this problem from the very start in your design, you can optimise the instruction set in a way that you move the condition to be tested to the compare instruction, instead of the conditional jump instruction. This ends with one single flag, that can be computed early in the NEXT (pipelined) instruction cycle, thus avoiding completely the overhead of the Zero flag, which otherwise must be computed at the end of the current cycle.
@weirdboyjim
@weirdboyjim Год назад
When I started this build I wasn't thinking about it running at anything like this clock-rate. The main difference I would make to this bit of circuit is put the ALU->Pipeline flags connection through a d-type so they only changed at the start of the next cycle.
@RelayComputer
@RelayComputer Год назад
@@weirdboyjim Not sure if I understand what you mean... but you would still have to compute the Zero Flag after the ALU result in the same cycle, right? (I do not mean to suggest a radical redesign of your processor, just pointing out a solution that worked for me)
@weirdboyjim
@weirdboyjim Год назад
@@RelayComputer The idea is to only use it (outside the ALU) a cycle behind so we don't need to speed up the generation but the decode will always have a stable value from the start of a cycle.
@ligius3
@ligius3 Год назад
Very nice delay visualisations.
@weirdboyjim
@weirdboyjim Год назад
Glad you like them!
@nolan412
@nolan412 Год назад
This CPU doubles for lights for the season.
@nolan412
@nolan412 Год назад
Got playing w/ FM radio transmission using a GPIO pin and modulating the cpu clock on a Raspberry Pi...how noisy is 4 MHz at this table top scale?
@weirdboyjim
@weirdboyjim Год назад
Hmm, I wonder how deliberate a pattern I can make them flash while playing a tune.
@nolan412
@nolan412 Год назад
@@weirdboyjim Zero or fill every register after every op. 🤔
@nolan412
@nolan412 Год назад
In the video there's one led strip that looks like bits shifting. Would looks better than on/off.
@sirnukesalot24
@sirnukesalot24 Год назад
As the clock speed rises, we also have to be on the lookout for weird antenna effects in the traces (standing wave ratios, reflections and the such). The microwave signal engineers point out that it's less the clock speed itself, but rather the rise/fall time within the signal that becomes the basis for a virtual frequency that needs to be considered as well. You could justify a collaboration with Phil's Lab for a "Design in Review Playlist", either for this build or for the next. If you were to have this or the next build translated to IC (whether that's realistic or not), would you want it to be a system on a chip, or a classic chipset? Just looking into it could be a great excuse for an episode, even if you don't actually do it. Are interrupts actually needed for peripherals or is it possible to do I/O exclusively through sets of dedicated registers and code? Also, would that prevent anything ubiquitous from being accessed by such a system?
@weirdboyjim
@weirdboyjim Год назад
Interrupts are never needed, but polling status slowly adds complexity to code so wanting interrupts develops as a way to manage that complexity rather than as a "Interrupts is the one true way to handle X" way people often talk about it these days.
@sirnukesalot24
@sirnukesalot24 Год назад
@@weirdboyjim So it just depends on what you're doing. Thanks.
@JohnnyWednesday
@JohnnyWednesday Год назад
It's fascinating to see how it could be made to clock higher - but actually doing so wouldn't be as interesting as adding to the capabilities. Personally? I'd be delighted to see some form of demodulation from audio - preferably an old tape deck ;)
@weirdboyjim
@weirdboyjim Год назад
Glad you are finding it interesting. I'm very focused on finishing this build but gathering learning's and ideas for a future build rather than getting distracted. Ahh man, audio tape storage, I could touch on that for curiosity's sake but it would never be my main storage mechanic.
@JohnnyWednesday
@JohnnyWednesday Год назад
​@@weirdboyjim - I look forward to anything you care to share with us! it's an intellectual treat to follow the thought processes of a genius :P And yes - certainly not preferred storage but given tape's prevalence historically (I'm thinking more home-micros here in the UK) it does feel rather fitting - not to mention interesting! if a 56k Modem is near one end - and a ZX spectrum near the other - what's the simplest possible circuit and how much further did the average home-micro take it? Happy New Year! :)
@janfrederick0061
@janfrederick0061 Год назад
Very interesting. I wonder if exchanging the 28xx EEPROMs dor 27xx EPROMs would give you enough headroom for the bottleneck to be moved to the zero flag / bus control 🤔. I think the 2732 is available with 50ns access time.
@weirdboyjim
@weirdboyjim Год назад
Earlier in the project I planned to try exactly that, but the 27's are all on back order at the main suppliers due to the chip shortages.
@frazer26
@frazer26 Год назад
I should have some 2732 EPROMs I can send. We use them in old arcade machines, will have to check after Christmas but if I do I don’t mind sending you some
@cj09beira
@cj09beira Год назад
@@frazer26 join the discord, and contact him there
@andrewlindh5047
@andrewlindh5047 Год назад
Or go MUCH faster with a WS57C43C-25 EPROM chip.
@Zer0ji
@Zer0ji Год назад
I spent most of the video wondering if changing the clock could help the problem. After watching your analysis of the fetch cycle my idea seems superfluous, but what if you had a clock say 150ns high / 100ns low, or something like that? Also do you think a "more square" signal might help?
@weirdboyjim
@weirdboyjim Год назад
In this specific case an imbalanced clock might help but I would start to suspect that it's symptomatic of something not being quite right. .
@alejandroalzatesanchez
@alejandroalzatesanchez Год назад
Normie title: OVERCLOCKING THE PC - HOMEMADE 8 BIT COMPUTER - Part 101
@weirdboyjim
@weirdboyjim Год назад
Maybe find some way to describe it as a "lifehack".
@Philip8888888
@Philip8888888 Год назад
Very interesting. Do you have an idea of how much faster you could get it if you implemented the changes you mentioned?
@weirdboyjim
@weirdboyjim Год назад
I think a few tweaks would get you about 50% faster but I'm more interested working out better design principles for future builds.
@RandomUser311
@RandomUser311 Год назад
If the next one is CMOS based NOR-Flash might be an interesting alternative.
@weirdboyjim
@weirdboyjim Год назад
Yeah, NOR-Flash really is the modern tech for this kind of thing. You used to be able to get really fast UV EPROM's that would have been a drop in here but they are tough to get now.
@WyrdieBeardie
@WyrdieBeardie Год назад
Are you going to keep up this or a similar video series in the future? Please?
@weirdboyjim
@weirdboyjim Год назад
This one still has a few videos to go but yes there will be new series in the future!
@thomasa5722
@thomasa5722 Год назад
Will you eventually share the schematic of the main CPU board? Would love to play around with it.
@weirdboyjim
@weirdboyjim Год назад
Most of it is already shared on my easy eda profile.
@thomasa5722
@thomasa5722 Год назад
@@weirdboyjim ah thank you. I will check that out. Thank you for your amazing work.
@CharlesVanNoland
@CharlesVanNoland Год назад
When us programmers are looking for what limits our project's performance we are "profiling" our code and its execution to figure out where the slowest parts are, that are serving as performance "bottlenecks". It looks like you're profiling your CPU's component parts to find its performance bottleneck! :D
@weirdboyjim
@weirdboyjim Год назад
Fine tuning is a key part of all serious engineering. [Normon Osborne impression] You Know, I'm something of a programmer myself".
@renaissanceman5847
@renaissanceman5847 10 месяцев назад
Im just finishing up my 8 bit CPU build and have discovered that the EPROMs are the biggest hindrance to CPU speed. typical EEPROMs have a Taac speed of about 150ns... this is the time the data output is valid according to the address inputs. So if that's the case how are you able to achieve 4mhz with EEPROMs? then comes the issue of most 74 series chips have a propagation delay of on average 25 ns for LS (some logic chips like the ACT series are as fast as 10ns but have horrendous ringing in these types of builds)
@weirdboyjim
@weirdboyjim 10 месяцев назад
Ok, this is a really tough comment to reply to as I’m not sure how much of my content you have watched. My pipeline roms are indeed 150ns EEPROMS. For 4mhz we have 250ns to play with in a cycle which gives us 100ns over the raw lookup time. The simplest overview of why it works I can give is that I don’t do the lookup and then use the outputs in the same cycle which would create a long critical path, I do the lookup and then store the data in a 574 for use at the start of the next cycle, this is the essence of pipelining.
@kirknelson156
@kirknelson156 Год назад
very cool, but it left me thinking just how fast of a processor it would be if you built it doing the bulk of it in a FPGA. I realize that isn't the purpose of this build, and I love how you built out circuits using basic logic IC chips, and while I have no experience with FPGA's, they are well above my skill level. I know that some people have replicated entire CPU's in them, maybe a future project for you??? not that it would be as interesting to watch as this build. I tried watching a video that was an introduction to programming FPGA's and I did not get very far before falling asleep.
@weirdboyjim
@weirdboyjim Год назад
Absolutely, on an fpga it would be possible to approximate this circuit at a vastly higher clock rate. This build is very much about research / education / learning. Can't have led's on the control lines inside an fpga!
@youhackforme
@youhackforme Год назад
@@weirdboyjim from experience, my non pipelined CPU that I built in an intro course hit 10 MHz first try with no effort so I'm sure that's one of the lowest bounds on performance haha
@contentnation
@contentnation Год назад
Have you thought about using static RAM for the pipeline stage lookups? Yes you need to add a circuit to copy from EEPROM on reset or power on, but might give a huge boost in latency I assume modern CPUs with their microcode update while running need to do something similar, EEROM/flash access is too slow, no matter what tech you use, RAM will be faster.
@weirdboyjim
@weirdboyjim Год назад
It would indeed be much faster but the copy logic would be adding quite a bit to the pipeline stage circuits. You would probably want to mere those stages up so you could share the counters etc.. Personally I'm more interested in maximizing the utilization of the components I have. If you solve all the other issues fairly cheap flash chips would work to several times the current clock rate, so solving the challenges ti make that happen is interesting. Especially since the same changes would let you get still more performance out of faster memory technologies.
@contentnation
@contentnation Год назад
@@weirdboyjim It's all about the purpose you choose, and the battles you pick. I'm just throwing ideas into the room, keyboard warrior style. (OK; got at least some experience to back it up...) Using your critical path approach tells your weak points. If the eeproms become your limit and this can be solved by faster chips: go for it :) Btw, I'm courious if there are off-the-shelf hybrid ram/rom chips that do the copy on power up or writing. Or if they leave it to system designer to do this when needed, like in the early bios days where they reserved precious RAM for this.
@reinoud6377
@reinoud6377 Год назад
There are hybrids of course like Flash or NVRAM (battery backed up SRAM) Flash seems to be suitable but I agree with James that overall critical paths are also important. I liked the idea suggested in these threads to pre saturate the address bus for instruction/data loads at the end of the former cycle. That could cut out a chunk from the load cycle. The Zero flag could well be done with the diodes and an inverter yes. Might be a lot faster and easier than cascading nors with an and
@stevelupton2533
@stevelupton2533 Год назад
Would it be a possibility to use SRAM instead of the pipeline ROMs - obviously would need bootstrapping somehow...
@weirdboyjim
@weirdboyjim Год назад
You could put a shadow ram system in but it would be a whole stack of extra circuitry. You would need to be able to cycle all the addresses to write the ram on power up which would either need multiplexers or some clever overriding of the input circuitry upstream.
@m1geo
@m1geo Год назад
Have you thought about changing the ROM chips to actual logic? Should be able to get that to work in much less than 150ns. Just wondering, now the CPU is pretty fixed.
@weirdboyjim
@weirdboyjim Год назад
I thought about it, and I’ve thought about all the things I can change that would make it easier. I don’t think I will though as I could easily get into a perpetual cycle of refining this thing. My plans for a 2nd generation while not ROM less do implement a number of the changes designed to make decoding without ROM’s easier. The next build will have less ROM’s in that role.
@m1geo
@m1geo Год назад
@@weirdboyjim Excellent. I hacked your ROMs about a little bit to save space.
@weirdboyjim
@weirdboyjim Год назад
@@m1geo If I wanted to reduce the ROM complexity I would first look at all the different assert lines, in the data they have explicit values but most of the time they are actually “don’t care”, for examples LHS and RHS select only matter then ALU Op!=0, Xfer assert only matters when Xfer load is not zero etc.. That would let you reduce the data right down and I expect you could reorganise the instructions such that some of them can be directly derived from the incoming bits.
@m1geo
@m1geo Год назад
@@weirdboyjim I instantly halved the size of the first pipeline ROM by removing the reset bit from it and just sending the reset signal.
@alexloktionoff6833
@alexloktionoff6833 Год назад
There is an idea to use dozen ns SRAM chips instead of ROM.
@tehaxor69
@tehaxor69 Год назад
I've done this sort of thing with an eye diagram scope many times.
@weirdboyjim
@weirdboyjim Год назад
Fair
@Wobblybob2004
@Wobblybob2004 Год назад
This might be a stupid statment but... You do know you can pull the ends off those scope probes.
@weirdboyjim
@weirdboyjim Год назад
You mean to use the pin style tip? Like the yellow probe in the clock header for almost all of the video?
@Wobblybob2004
@Wobblybob2004 Год назад
@@weirdboyjim Yes, exactly that! 😄 I did say it might be stupid. (I shall endeavour to concentrate more)
@tomcombe4813
@tomcombe4813 Год назад
The datasheets show you the speed ratings for when they are driving a certain capacitance, usually equal to just a few CMOS inputs. If a chip is driving a lot of inputs then it will be far slower than what is listed on the datasheet and if its driving less it will be far faster. So as you've found out, the datasheet speeds aren't even that good as a rule of thumb!
@weirdboyjim
@weirdboyjim Год назад
The load / capacitance can be quite noticeable on the rise/fall times of signals.
@tomcombe4813
@tomcombe4813 Год назад
@@weirdboyjim that's exactly right, it increases delay because it takes longer for the output to reach a valid high/low value.
@vitormoreno1244
@vitormoreno1244 Год назад
For the zero flag the faster way to do it is use a diode array on a NOT gate
@weirdboyjim
@weirdboyjim Год назад
You might save a few nano seconds that way but this exercise shows it wouldn't make any difference. If fetch were improved it would become an issue but rather than save a maximum of 21ns (if you made it instant) I would redesign the dependencies to remove the flag calculation from the chain completely.
@DigGil3
@DigGil3 Год назад
Gotta go fast!
@weirdboyjim
@weirdboyjim Год назад
Trying!
@bzuidgeest
@bzuidgeest Год назад
Why not swap the roms for faster ones? I get you don't need a increase, but the possibility of that is just a side effect. As you concluded you are running the current parts of of spec. Why not replace them just to prevent possible instability issues later down the line. Running parts out of spec might lead to unexpected failure modes later on, maybe just from putting wear on them.
@weirdboyjim
@weirdboyjim Год назад
Earlier in the build I had planned to replace the 28c256's with 27c256's which are pin compatible and much faster (but OTP) but they are not readily available due to the chip shortage. Other ROM's are not pin compatible so I'd have to redo the pipeline pcb's OR make a messy adapter board.
@bzuidgeest
@bzuidgeest Год назад
@@weirdboyjim in that case let's hope you are not forced to redo them later on anyway because of some failure. These roms work at this speed, but being outside spec it's not guaranteed all roms of that type will do that. Some might be better off the production line then others. Like transistors having a b c suffixes for classification in past specifying how good they were. Before I forget, happy new year, and i hope to see many more interesting videos as to expand the video and other stuff.
@AndyGoth111
@AndyGoth111 Год назад
At least for me, this video doesn't appear to be in the CPU playlist
@weirdboyjim
@weirdboyjim Год назад
I tend not to add things to the play lists at launch
@DAVIDGREGORYKERR
@DAVIDGREGORYKERR 5 месяцев назад
When this is reduced to a FPGA something a AMD/XILINX SPARTAN it should be faster or have it fabricated using Boron doped Diamond then you will be able to get it up to something like 5GHZ
@weirdboyjim
@weirdboyjim 5 месяцев назад
We should probably make it quantum while we're at it. 😅
@amanda_bynes226
@amanda_bynes226 Год назад
Bro make this series a playlist pls
@weirdboyjim
@weirdboyjim Год назад
If you go over to the main channel page you will find multiple playlists for different sections of the build. The core cpu one is here: ru-vid.com/group/PLFhc0MFC8MiCDOh3cGFji3qQfXziB9yOw
@relatingdata
@relatingdata Год назад
Have you ever considered a FPGA as a target?
@weirdboyjim
@weirdboyjim Год назад
Not my goal here but some viewers have been trying to do just that. My use of ROM chips makes that goal difficult, my future build plans would be more suitable.
@Jkauppa
@Jkauppa Год назад
float only 8-bit machine
@Jkauppa
@Jkauppa Год назад
or maybe 16-bit single step rom float machine
@Jkauppa
@Jkauppa Год назад
if all operations are atomic, no need to pipeline
@Jkauppa
@Jkauppa Год назад
rom memory ops are all 1-cycle atomic
@Jkauppa
@Jkauppa Год назад
better be cross-point (cross-line) direct access memory (1-cycle memory access read/write)
@Jkauppa
@Jkauppa Год назад
shared memory multiple access
@stevenbliss989
@stevenbliss989 Год назад
Surprised the ROMS are so slow, I expected 15nS or better.. Why not use SRAM and fill them from very slow ROM during reset. That way you can get < 10n! :) The ALU car and other flags are a surprise!
@weirdboyjim
@weirdboyjim Год назад
I use a shadow ram for the rom in the memory subsystem (That is as much for code flexibility though) but here it's a lot of extra components to do that copy. It would be cool to absolutely maximize the performance of this design but I only have so much time and I think it will be more interesting to explore some other bits of architecture.
@stevenbliss989
@stevenbliss989 Год назад
@@weirdboyjim Ok you lazy bum! :) ..then just solder a couple tiny 3V cells with a resistor and diode etc. on top of each SRAM (assuming you can find pin compatible ones, and "program" them as ROMS. Failing that, and this would take more time, make emulator boards as such.
Далее
g-toilet fights juggernaut (skibidi toilet 77)
00:59
Просмотров 1,2 млн
I designed my own 8-bit computer just to play PONG
17:19
EEVblog 1524 - The 10 CENT RISC V Processor! CH32V003
19:55
World's worst video card? The exciting conclusion
24:23
A new OS for the Z80! [Open Source][Zeal 8-bit OS]
11:05
8-Bit Adder built from 152 Transistors
12:25
Просмотров 673 тыс.
Designing a Card Slot Modular Z80 Computer
14:29
Просмотров 43 тыс.