Hacker Neus
Lossless LLM compression for efficient GPU inference via dynamic-length float
(arxiv.org)
339 points
by CharlesW
15 hours ago |
106 comments