Yeah I think it will come in handy a bit later. It currently has two inputs connected to the carry flag. If I swap one of them to be the sign bit of the LHS input, the ALU can do arithmetic right shifts. Need some op decoding first though 😀
Yeah great point! I hadn't considered the rotate instructions. Those could be cool to have. 🙂 It's always a battle between making the initial breadboard versions of the subcircuits feature rich, and moving faster while postponing features to future revisions.
I love it! So, it is technically possible to implement right and left bit shifting with other operations. It’s not *necessary* to add additional hardware for bit shifting. However, these become pretty expensive operations that way, which is certainly unintuitive compared to pretty much any real CPU.
I assume your CPU will also be pipelined like James Sharmans? I agree that his ALU design is quite elegant, but since I’m designing a stepped processor, I’m thinking of not running everything through the adder. In James’ design, I believe this requires two pipeline stages before an ALU result can be obtained…not a problem for a pipelined processor. I’m not quite sure why two stages are needed but I’m thinking for my stepped processor design it might be better to keep each calculation circuit separate (add, shift, logic ops, etc) and only choose between which one to apply at the output at the final output of the ALU. Hope that makes sense. My cpu, though a stepped processor, currently takes three cycles for most operations. That’s why I’d like to minimize the number of cycles needed in the ALU.
Yes, at a later point I'll start adding pipeline stages in the places where they make sense. It's a good idea though to quickly sketch out on paper what kinds of chips you have feeding into each other, and then sum up the propagation delay along those paths. The adder is pretty fast, probably 1/4th of the time a ROM-based decoder would take to produce an output. AND/OR/XOR and multiplexers are going to be even faster. So you can probably rack up quite a few adders or logic chips in your ALU before they start making your CPU slower (because they overtake the decoder as criticial path). Keeping the ALU paths separate for the different functions is a nice idea! That would likely take quite a few additional chips because you have to replicate some work (XOR for subtraction in parallel to logic ops, maybe some redundancy in the logic ops?) but you should be able to get a bit more speed out of it 👍