2025 09 Arxiv
One work on joint quantization and sparsification is released on arxiv. In this work, we propose an error compensation method to reconcile the conflicting requirements of quantization and pruning. The proposed method delivers up to 4.72x speedup and 6.4x memory reduction compared to the FP16-dense baseline.