Jethro's Braindump

Quantization

tags
Model Compression, Machine Learning

Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision.

Quantization is useful in reducing model size, and memory requirements.

Pytorch supports two libraries for quantization:

Questions

Links to this note