Jethro's Braindump

Quantization

tags: Model Compression, Machine Learning

Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision.

Quantization is useful in reducing model size, and memory requirements.

Pytorch supports two libraries for quantization:

Questions

Can we quantize Spiking Neural Networks? They already do binary precision computation, what about the weights?

Links to this note