RISu064 talk at Latch-Up

2023-04-05

I recently made a 3 minute talk at the Latch-Up conference.

The slides is available here, and the recording is available on YouTube and Archive.org.

The following is the transcript:

Hello everyone. Today I would like to present one of my recent side project, the RISu064, and things related to it.

A little bit about myself, I am a digital IC designer at Zero ASIC, and I have various digital or embedded system related side projects that I do during my spare time.

During the last latch-up, I presented the VerilogBoy project I did during my undergraduate as a course project, which is a reimplementation of the GameBoy system. As a sort of logical continuation, I am trying to build a fantasy console this time.

The project I am presenting today is the CPU of it, called RISu064. It has a 6-stage dual-issue pipeline, implementing the RV64IM instruction set. I call it non-blocking because it fully uses out-of-order write-back with no re-ordering logic, so a long latency instruction doesn’t block execution of non-dependent instructions. This is possible thanks to the fact that RISC-V doesn’t have any instruction that generate an exception late in the execution. So precise exception could be implemented with out-of-order write-back as long as the issue is in order. This does introduce some issue however, for instance write-after-write hazard is now a real issue.

For the result, with highest IPC configuration it gets 4.3 coremark per MHz with best effort, which is not that great considering it only allocates 1 cycle to RAM access. But further optimization is probably something for the future, and I am happy with the result as it’s my first RISC-V project and the whole thing was written from scratch in 3 weeks. I also implemented it with yosys and openroad using SiliconCompiler, targeting the Sky130B process node. Without cache it achieves 80MHz fmax but drops to 50MHz if cache is added, with cache tag comparison becoming the critical path.

About next steps, other than all the possible micro-architecture improvements, I really hated the verboseness of verilog during writing it. One obvious solution is to use higher level language such as Chisel, and I am already using Chisel in some projects. But I also want to explore options of making it less verbose while still keeping it in Verilog. A common solution as far as I know is using preprocessors to generate code

That’s exactly what I am doing here, I used the PyHP like from the OpenPiton project, and developed my own library for generating common things in Python.

Of course a game console wouldn’t be complete without graphics capabilities, so I am also designing a simple 3D graphics accelerator as part of the plan, but that’s for another time.

Thanks for listening!