Investigating the Limitations of Style Loss as a Metric for Assessing SD-LoRA Model Compatibility
The experiment aimed to establish whether style loss could serve as a viable metric for assessing the compatibility between Stable Diffusion (SD) models and LoRA models. To this end, a rigorous methodology was adopted, involving the generation of sample images using specific SD and LoRA models under controlled parameters. These models, including Yesmix V3.5, Aing Diffusion V10.5, and Dream Shaper V7 for SD, and Liang Xing Style for LoRA, were utilized to generate 20 images each, with carefully curated prompts and image resolution settings.
In the process, the researchers meticulously selected three distinct style images, comprising an original artwork by Liang Xing, another by renowned artist Sakimichan, and a self-doodled random image, which were used as benchmarks for evaluating the generated images. Additionally, the images underwent a series of preprocessing steps, ensuring uniformity in the data transformation process across the different models.
Analyzing the calculated style loss for the generated images, the researchers found intriguing disparities between the numerical values and the visual distinctions apparent in the images. Notably, when Liang Xing’s original artwork was employed as the style image, the style loss values for the three SD models were found to be in the same order of magnitude. Despite this, the generated images from these models exhibited distinct stylistic differences, illustrating the limited capacity of style loss as a comprehensive metric for assessing compatibility.
Further investigations into the discrepancies between the calculated style loss and the observed results were discussed in the context of the limitations outlined in the LyCORIS paper. This paper emphasized the challenges associated with capturing the intricacies of artistic styles solely through the lens of style loss, highlighting the multifaceted nature of stylistic elements that defy quantification through a single metric. This comprehensive evaluation, including the provision of the source code and test data in the appendix, underscored the need for a nuanced understanding of the complexities involved in evaluating compatibility between SD and LoRA models
Furthermore, the researchers meticulously described the procedure for calculating style loss, employing the well-established approach outlined in the official PyTorch documentation for neural transfer. This involved extracting the outputs of several randomly selected convolutional layers in the VGG19 model and computing the Gram matrices for these outputs, followed by the calculation of the Mean Squared Error (MSE) between the Gram matrix of the input image and the style image.
Additionally, the experiment delved into the specific generation parameters utilized for the sample images, incorporating intricate prompts such as “1girl,” “raiden shogun,” “portrait,” and other nuanced details designed to capture subtle variations in style and artistic expression. The careful consideration of resolution, VAE settings, and sampling techniques underscored the meticulous approach adopted by the researchers to ensure the comprehensive evaluation of the SD and LoRA model compatibility.
Moreover, the study highlighted the challenges inherent in using style loss as the sole criterion for evaluating the compatibility of SD models with LoRA models, emphasizing the need for a holistic approach that considers the intricate nuances and complexities associated with artistic styles. The detailed analysis, along with the inclusion of the source code and test data in the appendix, provided readers with a comprehensive understanding of the experiment’s scope and findings, paving the way for further exploration and research in the field of AI-generated artistic expression.