Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision.
Quantization is useful in reducing model size, and memory requirements.
Pytorch supports two libraries for quantization:
- Can we quantize Spiking Neural Networks? They already do binary precision computation, what about the weights?