-
Notifications
You must be signed in to change notification settings - Fork 173
Description
@mp-17 @suehtamacv Thank you very much for your open source project, which has been of vital help to us. Based on ara, we changed the configuration to VLEN=128 and Lane=1, and used the random instruction generator to construct about 5,000 random cases, and solved a large number of bugs. I saw that some of these bugs were also fixed in the latest. However, during our development and testing, we found that ara's deep pipeline and operand_queue architecture do not seem to be suitable for scenarios with small VLEN, which would result in a pipeline that is too deep as well as too large in area. Our team has also done some evaluation work and subsequently hopes to :
- remove operand_requester and operand_queue, add vrf read and write ports, and all calculations directly access vrf;
- send ara's fpu calculations to core to perform;
- merge simd_mul64/32/16/8;
- move addrgen's address requests to the TLB forward from the pipeline, thus generating page-fault exceptions earlier and improving core efficiency;
We would like to ask whether these solutions are feasible, or whether there are any better suggestions on how to modify ara to make its area smaller and more suitable for short VLEN scenario. If we implement ara with a shorter VLEN scenarios with overall PPA good, we're happy to share our work with you, Thanks!