BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art ResultsDiffusion models (DMs) have gained significant attention for their remarkable ability to generate high-quality and diverse data across various domains, including images, speech, and video. These models achieve this by iteratively refining a random noise input through a denoising process that can involve thousands of steps

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Diffusion models (DMs) have gained significant attention for their remarkable ability to generate high-quality and diverse data across various domains, including images, speech, and video. These models achieve this by iteratively refining a random noise input through a denoising process that can involve thousands of steps. While advancements in faster sampling techniques have reduced the required number of steps, the computationally expensive floating-point operations at each step remain a significant bottleneck, limiting the widespread application of DMs in resource-constrained environments. Consequently, compressing diffusion models has become crucial for broader adoption. Existing compression methods primarily focus on quantization, distillation, and pruning, aiming to reduce storage and computational costs while preserving accuracy. Quantization, in particular, is a highly effective technique, achieving compact storage and efficient computation during inference by representing weights and/or activations as low-bit integers or even binary values. Several studies have applied quantization to diffusion models, successfully compressing and accelerating them while maintaining reasonable generative quality. 1-bit quantization, or binarization, offers the most significant reduction in model size and has proven particularly effective in discriminative models like Convolutional Neural Networks (CNNs). Furthermore, full binarization (binarizing both weights and activations) allows the replacement of matrix multiplications with efficient bitwise operations such as XNOR and bitcount, leading to maximum speedup. While some existing works have explored 1-bit quantization in diffusion models, they have largely focused on quantizing weights alone, leaving the challenge of full binarization largely unaddressed.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Full binarization of both weights and activations in diffusion models, however, presents significant challenges. The rich intermediate representations crucial for the generative capabilities of DMs are highly time-step dependent, and the highly dynamic activation ranges are severely restricted when using binarized weights and activations. Additionally, generating complete images, a hallmark of diffusion models, becomes problematic due to the highly discrete parameter and feature spaces, making it difficult to match the real-valued targets during training. The difficulty of optimization in this discrete space, coupled with the insufficient representational capacity of time-step-dependent representations, often leads to poor convergence or even training failure in binarized diffusion models.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Introducing BiDM: Full Binarization of Weights and Activations

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

To address these limitations, researchers from Beihang University, ETH Zurich, and other institutions have introduced BiDM, a novel method that pushes the boundaries of diffusion model compression by achieving full binarization of both weights and activations. BiDM is designed to tackle the unique requirements posed by the activation characteristics, model architecture, and generative nature of diffusion models, overcoming the challenges associated with full binarization. BiDM incorporates two key innovations:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

1. Time-Step-Friendly Binary Structure (TBS): Recognizing the strong time-step dependency of activation features in diffusion models, TBS employs a learnable activation binarizer to match the dynamic activation ranges of the diffusion model. It also incorporates cross-time-step feature connections, leveraging the similarity between features in adjacent time steps to enhance the representational capacity of the binarized model.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

2. Spatial Patch Distillation (SPD): Acknowledging the spatial locality inherent in the convolutional U-Net architecture commonly used in diffusion models, and the nature of image generation tasks, SPD introduces a full-precision model as supervision. By mimicking self-attention on patches, SPD focuses on local features, effectively guiding the optimization of the binarized diffusion model.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Extensive experiments demonstrate BiDM's superior performance, exceeding all existing baselines across various evaluation metrics while maintaining comparable inference efficiency. Specifically, in pixel-space diffusion models, BiDM achieves an Inception Score (IS) of 5.18, approaching the performance of full-precision models and outperforming the best baseline by 0.95. In Latent Diffusion Models (LDMs), BiDM achieves a Frchet Inception Distance (FID) score of 22.74 on LSUN-Bedrooms, a significant improvement over the state-of-the-art of 59.44, while simultaneously achieving 28.0x storage savings and 52.7x operational efficiency gains. As the first method capable of full binarization for diffusion models, BiDM generates visually acceptable images, enabling efficient deployment of DMs in resource-constrained scenarios.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Implementation Details: Binarized Diffusion Models

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Baseline Diffusion Models: Given a data distribution p(x), the forward diffusion process generates a sequence of random variables x, ..., x using a transition kernel q(x|x), typically involving Gaussian perturbations:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

q(x|x) = N(x; (1-)x, I)

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

where (0, 1) is a noise schedule. The Gaussian transition kernel allows for marginalization of the joint distribution, so that samples can be easily obtained by sampling a Gaussian vector ~N(0, I) and applying a transformation based on x = (1-)x + .

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

The reverse process aims to generate samples by removing noise, approximating the unavailable conditional distribution q(x|x) with a learnable transition kernel p(x|x):

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

p(x|x) q(x|x) = N(x; (x, t), (x, t))

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

The mean and variance can be obtained using the reparameterization trick. The training of diffusion models typically uses a simplified variant of the variational lower bound as a loss function to improve sample quality:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

L = Ep(x), tU(1, T) [||x - (x, t)||]

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

U-Nets are widely used as backbones in diffusion models due to their ability to fuse low-level and high-level features. The input and output blocks of a U-Net can be represented as x and y, where smaller m corresponds to lower levels. Skip connections propagate low-level information from D() to U(), and the input to U thus becomes:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

U() = [D(), x]

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Binarization: Quantization compresses and accelerates the noise estimation model by discretizing weights and activations to low bit-widths. In a baseline binarized diffusion model, weights W are binarized to 1 bit:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

  • W = s(W) ||W||

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

where s(.) is the sign function, restricting W to +1 or -1 with a threshold of 0. ||W|| is a floating-point scalar, initialized to (n denoting the number of weights), learned during training. Activations are typically quantized using a simple BNN quantizer:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

  • A = s(A) ||A||

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

When both weights and activations are quantized to 1 bit, the computation of the denoising model can be replaced by XNOR and bitcount operations, achieving significant compression and acceleration.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Time-Step-Friendly Binary Structure (TBS)

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Before detailing the proposed method, the authors summarize their observations on Diffusion Model (DM) properties:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

Observation 1: While activation ranges change significantly across long time steps, activation features exhibit similarity between short, adjacent time steps. Previous work, such as TDQ and Q-DM, has shown that the activation distribution of DMs is highly time-step dependent during the denoising process, exhibiting similarity between adjacent time steps but significant differences between distant time steps. Applying a fixed scaling factor across all time steps results in severe distortion of the activation range. This observation motivates a re-examination of existing binarization structures. Binarization, especially full binarization of weights and activations, leads to greater loss of activation range and accuracy compared to low-bit quantization like 4-bit. This makes generating rich activation features more difficult. The insufficiency of activation range and output features severely impairs generative models with rich representations like DMs. Therefore, employing a more flexible activation range binarizer and enhancing the overall expressive power of the model by leveraging its feature output are crucial strategies to improve generative capability after full binarization.

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

The authors first address the differences between long time steps. Most existing activation quantizers, such as BNN and Bi-Real, directly quantize activations to {+1, -1}. This method severely disrupts activation features and negatively impacts the expressive power of the generative model. Improved activation binarizers, like XNOR++, utilize a trainable scaling factor k:

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

A = s(kA)

BiDM: Pushing Diffusion Model Quantization to the 1-Bit Limit, Achieving New State-of-the-Art Results

While this partially recovers activation feature representation, it doesn't match the highly correlated time steps and can still lead to significant performance loss. The authors turn their attention to the original XNOR, which uses a dynamically computed mean to construct the activation binarizer. This method naturally preserves the range of activation features and dynamically adjusts to the input range at different time steps. However, due to the rich representation of DM features, local activations exhibit inconsistencies in the range before and after passing through a module, indicating that a predetermined k value cannot effectively recover activation representation. Therefore, the authors make k adjustable, allowing it to be learned during training to adapt (The text cuts off here. Please provide the remaining text

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])

Previous 2025-01-11
Next 2025-01-11

Guess you like