Skip to content

How to achieve short VLEN of ARA #429

@AD738560581

Description

@AD738560581

@mp-17 @suehtamacv Thank you very much for your open source project, which has been of vital help to us. Based on ara, we changed the configuration to VLEN=128 and Lane=1, and used the random instruction generator to construct about 5,000 random cases, and solved a large number of bugs. I saw that some of these bugs were also fixed in the latest. However, during our development and testing, we found that ara's deep pipeline and operand_queue architecture do not seem to be suitable for scenarios with small VLEN, which would result in a pipeline that is too deep as well as too large in area. Our team has also done some evaluation work and subsequently hopes to :

  1. remove operand_requester and operand_queue, add vrf read and write ports, and all calculations directly access vrf;
  2. send ara's fpu calculations to core to perform;
  3. merge simd_mul64/32/16/8;
  4. move addrgen's address requests to the TLB forward from the pipeline, thus generating page-fault exceptions earlier and improving core efficiency;

We would like to ask whether these solutions are feasible, or whether there are any better suggestions on how to modify ara to make its area smaller and more suitable for short VLEN scenario. If we implement ara with a shorter VLEN scenarios with overall PPA good, we're happy to share our work with you, Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions