Тёмный
Fabian Schuiki
Fabian Schuiki
Fabian Schuiki
Подписаться
I build an 8-bit superscalar out-of-order computer on breadboards and custom PCBs to rediscover modern computing.

#homebrew #8bit #breadboard #superscalar #computer #cpu #core
Carry Flag - Superscalar 8-Bit CPU #30
52:23
10 месяцев назад
Комментарии
@mistrzdrewna233
@mistrzdrewna233 8 дней назад
are you alive?
@fabianschuiki
@fabianschuiki 8 дней назад
Yes! Summer was pretty busy, but the next episode is almost ready 🙂
@mistrzdrewna233
@mistrzdrewna233 8 дней назад
@@fabianschuiki I'm glad to hear that.
@cate9541
@cate9541 23 дня назад
I have those same varistors lol
@fabianschuiki
@fabianschuiki 23 дня назад
😃 They came in an assorted set of different resistances that I got off of Amazon 🙂
@Pixellions
@Pixellions 24 дня назад
Awesome! What a neat tester. I was just wondering if it's a good idea to add some current limiting resistors to the output of the tester, just in case something goes wrong... Imagine you have a clear short to GND somewhere!
@fabianschuiki
@fabianschuiki 24 дня назад
Yeah that would be a very nice feature to have! I hadn't planned for that, but you could insert the resistors you're describing at the pin header. Or try to put the output drivers onto a power rail with a short-circuit detector on it 🤔🙂
@hardiksarraf1221
@hardiksarraf1221 25 дней назад
If possible can you please provide a block diagram for this
@fabianschuiki
@fabianschuiki 24 дня назад
Definitely! As things move to PCBs, the schematics will follow 😀
@hardiksarraf1221
@hardiksarraf1221 25 дней назад
These videos are really amazing thank you
@fabianschuiki
@fabianschuiki 24 дня назад
Thank you 🙂!
@jasonvandonsel1384
@jasonvandonsel1384 Месяц назад
No wire diagrams?
@fabianschuiki
@fabianschuiki Месяц назад
I haven't prepred any for the individual breadboard experimentation, sorry. The later PCB conversions will have the full schematic though! 🙂
@taivas7216
@taivas7216 Месяц назад
Bro you are like a Ben Eater clone, lol, cool video. btw the animations are great
@fabianschuiki
@fabianschuiki Месяц назад
Haha thanks, that's very high praise 😁!
@quazillionaire
@quazillionaire Месяц назад
Ooh, I've been looking forward to the PCB-ification process. Also, loved the LED test haha. Great work!
@fabianschuiki
@fabianschuiki Месяц назад
Thanks 😃! Can't wait to get everything onto PCBs such that I can focus on fancier new things.
@imakexyz4968
@imakexyz4968 Месяц назад
You have a nice and clean coding style.
@fabianschuiki
@fabianschuiki Месяц назад
Thank you 😃!
@VEC7ORlt
@VEC7ORlt Месяц назад
Not sure why, but that is one ugly pcb, the way the everything was routed just irks me, somehow placing the resistor networks at the sides of the buffers seems wrong, placing them on top and moving leds up would have looked better, but thats just ocd talking, also moving to tssop package would have helped, but those are notorious for getting solder bridges...
@fabianschuiki
@fabianschuiki Месяц назад
But all that beautiful space that would have been wasted by moving the LEDs up! 😏
@skmgeek
@skmgeek Месяц назад
There's 15 seconds of black screen at 15:45, you could probably cut that out with the youtube video editor :3 just letting you know
@fabianschuiki
@fabianschuiki Месяц назад
Right, great point! I forgot about the RU-vid editor 🙂
@cherrymountains72
@cherrymountains72 Месяц назад
Very nice. Looking forward to the next episode comrade! 😜
@fabianschuiki
@fabianschuiki Месяц назад
😃 Thanks!
@Thomas_G.
@Thomas_G. Месяц назад
Is 0000_ssss dddd_0100 an alternative encoding for mv rd, rs ? [ should be something like: add_no_carry rd = zero + rs ]
@fabianschuiki
@fabianschuiki Месяц назад
Yes, that should definitely work! 🙂 You could argue that you don't really need a separate move instruction and just use the ALU in the way you suggested! RISC-V does a similar thing, where a move is just syntactic sugar for `addi rd, rs, 0`. 👍
@griffinretro
@griffinretro Месяц назад
The fact that one can reduce a truth table into DNF is the basis of the PLA chip design. They are structured in the same way, AND gates generating terms that are then ORed for the outputs.
@fabianschuiki
@fabianschuiki Месяц назад
You're totally right! 🙂
@awesomecronk7183
@awesomecronk7183 Месяц назад
The led test was great, loved it!
@fabianschuiki
@fabianschuiki Месяц назад
Haha thanks 😃 That was a lot of fun to do 😅
@OscarSommerbo
@OscarSommerbo Месяц назад
New post for sanity. @weirdboyjim did use shadow ram in his build, he started talking about it in part 91 and then implemented it in parts 96 & 97. Assuming worst case scenario it takes 4.5 ish seconds to copy 64kb at 70ns, which might be acceptable, then James approach is simple and a very reasonable path to take. I wouldn't describe his method as DMA, at least not in the conventional sense, he just uses 4 counter ICs to step through the address range and copying as the counter race through the addresses. Using a battery powered SRAM chip, I think, is a more elegant solution, but it adds complexity, no doubt about that. Either way, I think it is a good approach to reduce the access time for the much slower EEPROM.
@fabianschuiki
@fabianschuiki Месяц назад
😃 That would be a very neat approach. I guess initially the 70ns ROM would also work directly as the decoder/program memory. And as soon as we get to some performance analysis and tweaking, an upgrade to an SRAM decoder would make sense.
@OscarSommerbo
@OscarSommerbo Месяц назад
@@fabianschuiki Looking into SRAM chips, apparently the leakage has almost been eliminated, negating the need for a battery backup to keep the data alive. But using the "Always copy on boot" like James does can be done rapidly using the ghetto DMA circuit he built. I really go back and forth on the checking the SRAM for correct data. There are some fun stuff that can be made, like grouping functions by type in different regions and only update the regions that have changed. But for simplicity, at the cost of a few seconds of boot time the "always copy the entire rom" is hard to beat, and an additional benefit is that if we step through the entire address range we zero out the ram and put the computer in a known state. Which is nice when debugging the hardware.
@fabianschuiki
@fabianschuiki Месяц назад
@OscarSommerbo Yeah I really do like the ghetto DMA 😂👍👍👍
@mrshodz
@mrshodz Месяц назад
great work.
@fabianschuiki
@fabianschuiki Месяц назад
Thanks! 🙂
@pup4301
@pup4301 2 месяца назад
What tool is used for the layout?
@fabianschuiki
@fabianschuiki 2 месяца назад
It's EasyEDA 🙂
@pup4301
@pup4301 2 месяца назад
@@fabianschuiki Thank you but I was asking about the tool you used for planning where component systems would go when using easyeda.
@fabianschuiki
@fabianschuiki 2 месяца назад
@pup4301 Ah sorry, I misunderstood. That's draw.io / diagrams.net. It's pretty convenient for architectural sketches and block diagrams 🙂
@pup4301
@pup4301 2 месяца назад
@@fabianschuiki Its okay I should have been more specific have a good day night or evening. It is 1am where I am. Late night work and all.
@fabianschuiki
@fabianschuiki 2 месяца назад
@pup4301 Oh yeah, 1am is definitely late 😴!
@cate9541
@cate9541 2 месяца назад
Thanks
@75slaine
@75slaine 2 месяца назад
Simply wonderful, a master class in production 👏
@fabianschuiki
@fabianschuiki 2 месяца назад
Thank you very much 🙂!
@somethingnonsense5389
@somethingnonsense5389 2 месяца назад
funny idea for testing! I know i'd have hooked up an arduino mega, and make it do a cylon eye (galactica) or kitt effect! haha
@fabianschuiki
@fabianschuiki 2 месяца назад
Haha, that would have been pretty neat as well 😃!
@tmbarral664
@tmbarral664 2 месяца назад
Thanks for the huge smile you made me have :D Very nice touch 😇
@tmbarral664
@tmbarral664 2 месяца назад
It’ll be interesting to watch you when this tiny computer of yours will make some sounds ;)
@fabianschuiki
@fabianschuiki 2 месяца назад
😃 Maybe it *does* need some sound output...
@ke9tv
@ke9tv 2 месяца назад
Style points for Korobeyniki! 😀
@fabianschuiki
@fabianschuiki 2 месяца назад
I couldn't resist 😃
@JaenEngineering
@JaenEngineering 2 месяца назад
Excellent 👌 Also, when you get round to the clock PCB, if you want to go with a pure logic build you can do it with just a pair of 2-input quad NAND gate ICs plus a few passives.
@OscarSommerbo
@OscarSommerbo 2 месяца назад
I think that given he is aiming for a superscalar design, the clock needs to be stable and predicable. While a Pierce oscillator could work, Fabian have already built the classical clock generator, that Ben Eater and James Sharman uses.
@JaenEngineering
@JaenEngineering 2 месяца назад
​​​​@@OscarSommerbonot even that complicated. Do away with the crystal and output cap, and use the 74hc132 Schmitt input NAND and you have a very stable relaxation oscillator that also has a very handy enable input. You can then use another one of the gates to debounce a push button for stepping (again with a ready built in enable) and you still handily have a gate left over for combining the outputs. And those oscillators are every bit as stable as the 555 based circuits as they basically use the same idea.
@ArneChristianRosenfeldt
@ArneChristianRosenfeldt 2 месяца назад
@@JaenEngineeringevery home computer uses a Xtal. I don’t see why we need to do worse.
@fabianschuiki
@fabianschuiki 2 месяца назад
A relaxation oscillator like the 555 or a Schmitt trigger as you point out would definitely be nice to have on the clock PCB. 🤩 I was thinking about diving a little bit deeper into the whole frequency synthesis topic at that point and build up a PLL from discrete components on that PCB. I'd love to figure out what exactly the criticial path and its length is through the CPU, and then tune a clock to run the build as close to that upper limit as possible. It would be really cool if the CPU could write to a bunch of PLL registers to configure its own clock speed 🤓! That will still need a crystal reference oscillator. Could be a simple 1 MHz one though, since the PLL can synthesize a wide range of frequencies from that. But that's just the free-running part. I'd still want to have a potentiometer-tunable oscillator and the manual stepping button. A Schmitt trigger might even be simpler or cleaner than a 555-based approach, because it can generate a nice triangle waveform compared to the 555's rather lopsided sawtooth-y wave.
@OscarSommerbo
@OscarSommerbo 2 месяца назад
@@fabianschuiki A tuneable PLL based clock sounds super cool!
@Eugensson
@Eugensson 2 месяца назад
You can save some space there! Have you considered redesigning your register files so they slot on the edge, like a. NES cartridge? You can move the LEDs to the top edge so they are still visible, and use the L-shaped through hole jumper arrays so you can plug the register daughter boards into the back plane vertically.
@fabianschuiki
@fabianschuiki 2 месяца назад
That is a fantastic idea! 😀 I'm not entirely sure how bad things will get once we go dual-issue, but the number of chips for each register will roughly triple. That'll get a bit unwieldy in this flat build style, so making them vertical as you suggest would be a very nice solution. I do like the fact that you can see all ICs in this flat layout, but there's nothing too interesting about 16+ copies of the exact same set of register ICs... So maybe vertical it shall be? 😏
@phookadude
@phookadude 2 месяца назад
@@fabianschuiki The RC2014 uses 90 degree bent pins, rather than a slot. Might be cheaper and easier than card slots.
@Eugensson
@Eugensson 2 месяца назад
@@phookadude yeah card slots are difficult to comply with. Gold edge treating, precise PCB thickness etc. L-pins are easier
@JaenEngineering
@JaenEngineering 2 месяца назад
I wonder if you can get 45° pins? Then you could kind of "shingle" them and get the benefit of both a lower footprint while still being able to somewhat see the boards and ICs 🤔
@fabianschuiki
@fabianschuiki 2 месяца назад
My mind also immediately went to card edge connectors and card slots, but yeah, they can be pretty annoying. I like the idea of a 90° pin header though, or the shingles for extra awesomeness and style points 😎. For dual issue out-of-order execution, the registers will need 4 read ports and 4 write ports. That's 64 data lines... might be worth taking a regular two-row pin header and sandwich the PCB in between the rows, soldering have the connections on to one side, and the other half onto the other 🤔
@OscarSommerbo
@OscarSommerbo 2 месяца назад
James Sharman started out with a hot air rework station but moved to a hotplate as it allows for finer heat control and the components don't go flying. Something to think about. And your LED function test was FANTASTIC!
@JaenEngineering
@JaenEngineering 2 месяца назад
He could go full Marco Reps and do whatever the hell it is he does in this video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-D28uSzCs7-k.htmlsi=2APRJA3AfDym_uC4
@fabianschuiki
@fabianschuiki 2 месяца назад
Thanks 🙂! I'm definitely considering getting a hot plate for the soldering. Might do that sooner rather than later. And I had a lot of fun with the LED function test 😄
@fabianschuiki
@fabianschuiki 2 месяца назад
I love Marco Reps' channel. So many crazy builds and precision electronics 🤓. Absolutely fantastic.
@OscarSommerbo
@OscarSommerbo 2 месяца назад
I forgot, solder bridges on those resistor arrays, very common in James' videos.
@fabianschuiki
@fabianschuiki 2 месяца назад
@OscarSommerbo Pretty sure my clumsy solder paste application didn't help with that either 🫣
@dmoisset
@dmoisset 2 месяца назад
Hi! I'm slowly catching up, so sorry for the bunch of comments appearing on old videos. Still loving the series! A minor detail about what you said at 7:30 , PINs 15 and 16 can actually be used as inputs alright. What's missing is the "feedback loop" which means that when you're using pins 15 and 16 and output, you can not use that output also as an input. With the other outputs you can, essentially creating loops! that allows you for example to define something like an SR latch using the PLD with something similar to the typical pattern of two gates looped together. But for this application, where there are no loops and everything is combinatorial, you can use as many pins as inputs as you like (you can even use CLK and OE as inputs, given that you don't need tristating for your outputs).
@fabianschuiki
@fabianschuiki 2 месяца назад
I'm very happy to read all your comments and feedback 🙂! Excellent point about the two somewhat special pins on the output side.
@dmoisset
@dmoisset 2 месяца назад
A cool detail about ATF16V8s is that they have internall pull-ups on pins, so it's actually correct to leave inputs unconnected. Pins will float up and won't suffer from the noise randomly toggling your FETs and eating power. Those are described in section 7 of the datasheet
@fabianschuiki
@fabianschuiki 2 месяца назад
Great point! Feels a lot like the 74LS series of logic with all the high-side pullups 🙂. I'm still not entirely sure whether I want to have the operand data buses pulled low by default, which would mean that I'd have to add a buffer in front of the ALU inputs. Otherwise I could also let the ALU pull RD1 and RD2 high through the ATF16V8's pullups.
@anon_y_mousse
@anon_y_mousse 2 месяца назад
You might want to consider '.' as a valid label character as well. Not that prior examples really matter here, but `gcc` does use '.L' as a label prefix, and I'm not sure if it has the same semantics as it does for `nasm`, but in `nasm` it's used for local labels so you could have a .L1 in each function and it would still work. Just 0xf00d for thought.
@fabianschuiki
@fabianschuiki 2 месяца назад
That's an excellent point! I like the idea of giving special meaning to labels starting with a `.` and treating them as relative/scoped to the previous label that did not have a leading `.`. Very elegant.
@milesrout
@milesrout 2 месяца назад
The way you add onto strings within a loop (buffer += bytes(...)) is accidentally quadratic. Every time you append a new thing to the end of a string, you have to allocate a whole new string. This will start to bite with bigger programs. You should use bytearray or a list which you b"".join() at the end.
@fabianschuiki
@fabianschuiki 2 месяца назад
Excellent point! 👍 I'll fix that 🙂
@dmoisset
@dmoisset 2 месяца назад
This channel is great, i love the editing and presentation style. I'm curious, what software do you use to create your animations? Those look amazing and add a lot to the production quality and communication
@fabianschuiki
@fabianschuiki 2 месяца назад
Thank you very much! 😃 I use Manim by the fantastic Grant Sanderson @3blue1brown
@dmoisset
@dmoisset 2 месяца назад
Oh, that's cool! I've seen his videos, I should have recognized the style 😊
@fabianschuiki
@fabianschuiki 2 месяца назад
@dmoisset 😃 It has a very distinct way of animating things. I love the style of his videos and how much effort and attention to detail he puts into it. And the fact that it's a programmatic way of animating helps a lot with the more systematic animations, like showing the CPU's internal state.
@milesrout
@milesrout 2 месяца назад
Good stuff. Pity the syntax highlighting of '1f' and '1b' isn't the same in your text editor though. Custom syntax highlighting is tedious but so nice once you do it. It's pretty easy in Sublime Text too if I remember correctly (I haven't used Sublime in over 10 years!). Just a few regular expressions. I wrote an assembler and emulator during the 0x10c/DCPU-16 craze. I pretty sure more assemblers were written and published online as free software in the space of a month than have ever been written in any other calendar month in history.
@fabianschuiki
@fabianschuiki 2 месяца назад
😃 I'll definitely go and fix the syntax highlighting at some point!
@lawrencemanning
@lawrencemanning 2 месяца назад
I cheated and just borrowed ARM’s 4 bit codes. 😂 Except I made 0000 “always” as it bugged me otherwise. Edit: and on a previous build I had instruction bits to burn so just had a nybble for “cares” flags and a nybble for what value the cared for bits had to be. It works fine and the programmer can dream up nonsensical tests (eg. Zero and negative) if they want, but it is wasteful.
@fabianschuiki
@fabianschuiki 2 месяца назад
Haha, that's definitely a good approach! 😃 How come you had bits to spare?
@lawrencemanning
@lawrencemanning 2 месяца назад
@@fabianschuiki my very first softcore processor was a 16 bit address and data multi cycle. Many instructions took trailing 16 bit immediates, including branching. Was quite pleased with it (got as far as programming Snake (video on my channel if you are interested), but in retrospect it wasn’t great. Latest 32 bit core is a mostly RISC like 2 stage pipeline with embedded immediates. It’s not quite as friendly on the assembly programmer, but more interesting technically. I’ve implemented Boulderdash on that. Yes I build processors to play 80s computer games! 🤣
@fabianschuiki
@fabianschuiki 2 месяца назад
@lawrencemanning That's really nice 😃🤓!
@ArneChristianRosenfeldt
@ArneChristianRosenfeldt 2 месяца назад
@@lawrencemanningwhat stages? A typical Homecomputernhad a single system bus. It was used for code, data, graphics, ROM, and sound. So naturally, a CPU grabs the data from the bus at the optimal time and keeps it in a register. Even the 6502 has a fetch stage in its pipeline. So the other stage is decode and execute reg-reg and reg-imm?
@lawrencemanning
@lawrencemanning 2 месяца назад
I've never seen someone use the flags register state to directly calculate a branch offset. Well done. :) Question: computed branches seem a bit unusual. Computed jumps, sure, many uses. But branching through an offset held in a register? How many ISAs have that?
@fabianschuiki
@fabianschuiki 2 месяца назад
None that I am aware of 😃. Usually you would just load a base address into a register and add the offset onto that yourself. But since this is an 8 bit machine I thought it might come in handy. And it was basically free hardware-wise. Immediates and the rs2 operand are on the same wires 🙂
@lawrencemanning
@lawrencemanning 2 месяца назад
Have you done any tests to determine the current fMax? I’m curious. 😊 Very neat design, though the downside of GALs (etc) is they obscure what’s going on. I’ll let you off with the Boolean ops. 😂 I liked James’s solution too and it would have been a bit repetitive to solve it the same way!
@fabianschuiki
@fabianschuiki 2 месяца назад
I haven't done any tests yet. It might be worth to statically compute the timing of the CPU and then compare it to the actual hardware, and then figure out where to place registers to cut the long paths.
@lawrencemanning
@lawrencemanning 2 месяца назад
So you didn’t fancy implementing a zero page mode then. 😂
@lawrencemanning
@lawrencemanning 2 месяца назад
On decoupling caps: my preferred solution is to put them (and other passives) on the back. Few folks do this and I’m not sure why. The only drawback is you need mounting posts on the corners to bring the board off the table when attaching the parts to the back, with suitable holes for the screws holding the posts in in your stencil if you really need one, but it works well: place the (decent size) cap in the centre of the back of the IC with some fat traces going through a via to the power pads on the front.
@fabianschuiki
@fabianschuiki 2 месяца назад
It would be fantastic to have components on the back 🥳! I haven't figured out how to make that reliable during soldering though. You probably have to go solder the caps manually afterwards because the hot air would heat the PCB to a point where the front parts would fall off again 🤔 Not sure how this is done industrially. I've seen glue being used to hold components in place occasionally -- not sure if that's the answer though.
@lawrencemanning
@lawrencemanning 2 месяца назад
@@fabianschuiki I’ve never had a problem with caps falling off. I usually solder the caps etc on the back with hand placed paste, then flip the board over and solder the front parts the same way. I used PCB posts on all corners on both the front and back by screwing posts into posts in a stack. That way the board is always flat on desk. I do not usually solder a whole board in one sitting with a stencil as my boards are bigger, but I have done it before. I don’t believe enough heat will travel through the board to make components fall off, but don’t hold me to it. I have been doing that for a few years now though with my projects and never had a problem. Also, your caps are disproportionately small compared to the SOIC? 50mil packages. In my opinion anyway. 1206 or 0805 would probably be the size I’d go for.
@lawrencemanning
@lawrencemanning 2 месяца назад
I’ll never like this style of schematic. Far better is to use symbols that don’t resemble the physical parts. Do you not have busses in that schematic capture tool? Case in point: what if you suddenly had to switch to a different package type for the IC? This would be much easier if logical symbols were used. It really goes against the principle of what a schematic is for; they are more then a means to design a PCB they are for explaining how the circuit works. Also how come you didn’t use KiCAD, which is far more friendly for people building their own? Sorry for coming across as negative. I love what you’re doing.
@fabianschuiki
@fabianschuiki 2 месяца назад
It's pretty much down to limitations in EasyEDA. The neat thing about it was that it's incredibly easy to get going, you can do all the PCB ordering right in the program, and it has pretty much every LCSC part already in the library. KiCAD will definitely be my tool of choice for future projects 🙂
@lawrencemanning
@lawrencemanning 2 месяца назад
@@fabianschuiki I’m glad you are going to be looking at KiCAD soon. You’ll like it. The workflow for getting a board made is really not that scary: export as gerbers (defaults are almost certainly sufficient), export the drill file, zip it up in a directory and upload it to wherever you like. I actually use JLC at the moment, and they will decode the zip, work out what layer is which for you, figure out what layers are internal, silkscreen, mask etc, and tell you how much it costs. It’s pretty simple. I wouldn’t be surprised if someone has written a KiCAD plugin to make it even easier actually. FYI the pathological example of logical vs physical symbols is the use of individual gates, which become nearly indecipherable when a physical representation of the package is used, compared to it its logical symbol. Consider also the realistic possibility that you might decide you only need one NAND gate on a board and could swap out that quad ‘00 for a single gate package. Check out some old computer schematics for a nice illustration of what to aim for. Someone “digitised” the Amiga schematics and they are like art. 😊
@lawrencemanning
@lawrencemanning 2 месяца назад
I think you could have done with some explanation as to why you went for a wider instruction word vs a fully encoded instruction format like the Z80. Presumably you didn’t discuss this because it opens up some bigger questions around instruction sequencing and microcode etc. Also 2 ROMs would have been an option that would have been interesting to discuss. That would have been my preferred solution as it avoids some interesting timing considerations. Anyways, loving this series. 😊
@simontillson482
@simontillson482 2 месяца назад
Seems to me he’s going for a more RISC style instruction set, which lends itself to a more hardwired instruction opcode decoding. I too was wondering about the timing issues - surely, toggling the latch enable at the same time as the address LSB will cause something of a race. I was a bit surprised that it seems to work so well! (Actually, he does explain why it works further down this thread - it seems the ROM is way slower than the latch, so the latch has effectively stored the old value before the ROM’s data lines being to transition to the new value. Cool.)
@fabianschuiki
@fabianschuiki 2 месяца назад
I wanted a setting where instructions have a fixed length and somewhat rigid layout. For operation at 1 or 2 instructions per cycle, you need to be able to decode the instruction in a single cycle, including all operands. Variable length encoding makes this much more annoying, especially when you decode two instructions in parallel. (Although it's a very attractive thing in "subscalar" designs where you have multiple clock cycles available per instruction.) The wide instruction word is just to have enough registers and opcode space. If I had only four registers, I could have at most 16 different two-operand 8 bit instructions. With eight registers it would be down to 4 instructions. So the 16 bits were a necessity.
@fabianschuiki
@fabianschuiki 2 месяца назад
@simontillson482 The timing feels pretty wonky at first 😃. In synchronous digital design you want to have all your signal toggles launched exactly at the clock though. This works here as well since the hold time of the latch (time after the clock that the data needs so stay stable) is around 1-2ns, but the contamination delay of the ROM (time between changing an input and seeing first changes at an output) is somewhere around 40-70ns. I've seen a little bit of instability in the ALU though that is likely related to the latch not holding its data for long enough after it becomes transparent (propagation delay faster than the hold time in the ALU flags register).
@lawrencemanning
@lawrencemanning 2 месяца назад
Nice to see you using minipro and not the awful Windows software. 😊
@fabianschuiki
@fabianschuiki 2 месяца назад
😃
@lawrencemanning
@lawrencemanning 2 месяца назад
Can’t believe I haven’t found your videos before! Very very cool. 5:29 surprised not to see basic pipelining (one instruction issue per clock, multiple clocks to complete an instruction) here, since it’s the logical next step after concluding your clock rate won’t scale with every instruction needing to execute in a single clock cycle. Anyway, great stuff!
@fabianschuiki
@fabianschuiki 2 месяца назад
Thanks! 😃 Basic pipelining will definitely come into the picture.
@perkyelixir2254
@perkyelixir2254 2 месяца назад
i realize this might be a bit much to ask, but if you would make a video on writing an llvm backend (or some other high level language thing) at some point in the future, that would be great
@fabianschuiki
@fabianschuiki 2 месяца назад
That's a great idea! I've been toying with the thought of taking the assembler and building out a simple IR and register allocation, to conceptually explore what LLVM and other compiler backends do. Adapting LLVM for my CPU would be a very nice thing to do 😃
@sysfab
@sysfab 2 месяца назад
You can ass more things into assember as .align, .text and .data
@fabianschuiki
@fabianschuiki 2 месяца назад
Absolutely! Once the assembler supports expression evaluation, a lot of cool other features get unlocked in a sense. I like the idea of having segments like `text` and `data`, and allowing the user to lay those out in memory. Also, having the ability to plop down data and strings would be really handy.
@sysfab
@sysfab 2 месяца назад
Cool video! Your channel actualy very helped me with my own assembler written in Lua. Can i ask what ':=' in if statement means?
@fabianschuiki
@fabianschuiki 2 месяца назад
Thanks! 🙂 `:=` assigns the value on the right to the variable on the left, and also returns the value on the right. It's a nice way to check if a value is not none in an if, and then have the value available in a variable inside the if block.
@DavidLatham-productiondave
@DavidLatham-productiondave 2 месяца назад
It's called the walrus operator. Which is kinda cute.
@fabianschuiki
@fabianschuiki 2 месяца назад
@DavidLatham-productiondave Haha, I love that name 😂
@lawrencemanning
@lawrencemanning 2 месяца назад
I use CustomASM for my softcore. It’s far from perfect, but it’s pretty good. Have you played with it? Not that there’s anything wrong with doing it yourself, even if CustomASM meets all your needs. 😊
@fabianschuiki
@fabianschuiki 2 месяца назад
I haven't really played around with it. The prospect of writing an assembler totally from scratch was too exciting 😅
@lawrencemanning
@lawrencemanning 2 месяца назад
@@fabianschuikiyes indeed! I will probably look at it eventually. One of the thing CustomASM can’t do (AFAIK) is generate linkable objects, so you end up with includes to bring in your “modules”. It works, but it’s not nice. I shall have a look at your other videos later; I’ve never been brave enough to build a processor on breadboard and have massive respect for folks who take this on!
@OscarSommerbo
@OscarSommerbo 2 месяца назад
Very nice and tight video. Pacing was just right.
@fabianschuiki
@fabianschuiki 2 месяца назад
Thanks! 😃 Trying to be a bit more on-point 😉
@mekafinchi
@mekafinchi 2 месяца назад
One feature I'd strongly recommend are local labels - where using a prefix (usually '.') prepends the most recent normal label to the logical name of a label. This lets you have descriptive names without global scope rather than being limited to globals or numbers. Local labels could also be accessible from any context by using the full logical name e.g. "normal.local" referring to ".local" in "normal" even outside normal's block
@fabianschuiki
@fabianschuiki 2 месяца назад
That is a fantastic suggestion, thanks a lot! Will definitely add those 🙂👍
@DavidLatham-productiondave
@DavidLatham-productiondave 2 месяца назад
I also use unnamed labels in ca65. An unnamed label is a : by itself. Then you can branch to a count of unnamed labels in forwards or backwards direction. Unnamed labels are scoped from the previous normal label until immediately before the next normal label. Eg.(6502) ``` count_to_65536: ; here unnamed labels are scoped to count_to_65536 ldx 0 : ldy 0 : iny beq :- inx beq :-- next_lable: ; here unnamed labels are scoped to next_label ```
@fabianschuiki
@fabianschuiki 2 месяца назад
@DavidLatham-productiondave That's a pretty neat approach! I like how this doesn't clutter the text at all. The relative labels I have implemented tend to be just `1:` and `2:` in practice... Makes sense to leverage that and provide more compact syntax! 👍
@OscarSommerbo
@OscarSommerbo 2 месяца назад
YES!! Labels! Finally, something I would have added way earlier. But then I come from a C background and I learned assembler backwards by looking at the compiled byte code, and labels are essential. Of course, you can do what Fabian have been doing so far calculating his own jump offsets, but when you have an incredibly powerful calculator why not use it. 😊 This will be a fun episode I bet.
@fabianschuiki
@fabianschuiki 2 месяца назад
Yeah, it was about time to add those 😁
@0ffGridTechClub
@0ffGridTechClub 2 месяца назад
This is the coolest thing I've ever seen ! I'm just starting to learn PCB design as well as i2c design and debugging.
@fabianschuiki
@fabianschuiki 2 месяца назад
Thanks! 🙂 It comes in pretty handy for testing more complicated behaviors of your PCBs.
@calculus7
@calculus7 2 месяца назад
I assume your CPU will also be pipelined like James Sharmans? I agree that his ALU design is quite elegant, but since I’m designing a stepped processor, I’m thinking of not running everything through the adder. In James’ design, I believe this requires two pipeline stages before an ALU result can be obtained…not a problem for a pipelined processor. I’m not quite sure why two stages are needed but I’m thinking for my stepped processor design it might be better to keep each calculation circuit separate (add, shift, logic ops, etc) and only choose between which one to apply at the output at the final output of the ALU. Hope that makes sense. My cpu, though a stepped processor, currently takes three cycles for most operations. That’s why I’d like to minimize the number of cycles needed in the ALU.
@fabianschuiki
@fabianschuiki 2 месяца назад
Yes, at a later point I'll start adding pipeline stages in the places where they make sense. It's a good idea though to quickly sketch out on paper what kinds of chips you have feeding into each other, and then sum up the propagation delay along those paths. The adder is pretty fast, probably 1/4th of the time a ROM-based decoder would take to produce an output. AND/OR/XOR and multiplexers are going to be even faster. So you can probably rack up quite a few adders or logic chips in your ALU before they start making your CPU slower (because they overtake the decoder as criticial path). Keeping the ALU paths separate for the different functions is a nice idea! That would likely take quite a few additional chips because you have to replicate some work (XOR for subtraction in parallel to logic ops, maybe some redundancy in the logic ops?) but you should be able to get a bit more speed out of it 👍