News

To this end, we propose ChainPIM, the first ReRAM-based processing-in-memory accelerator for HGNNs featuring high-computing parallelism and vertices data reuse. Specifically, we introduce R-chain, ...
The bottleneck associated with the key-value(KV) cache presents a significant challenge during the inference processes of large language models. While depth pruning accelerates inference, it requires ...