Stable Diffusion LoRA Automotive Application

Fine-tuned a Stable Diffusion XL (SDXL) model using Low-Rank Adaptation (LoRA) to generate photorealistic Mercedes-Benz E-Class Sedan (2012) images. Trained on 87 images from the Stanford Cars Dataset using Kohya sd-scripts, optimized for Apple Silicon.

Training Pipeline

Data Collection

87 Mercedes E-Class images from Stanford Cars Dataset

Caption Generation

Automated captioning via Qwen2-VL vision model (Ollama)

LoRA Training

Kohya sd-scripts on SDXL, optimized for Apple M4 Max

Inference

ComfyUI with SDXL base + refiner pipeline

Training Configuration

Base Model

SDXL 1.0

Network

LoRA (dim=32, alpha=16)

Optimizer

AdamW (weight_decay=0.01)

Learning Rate

5e-5 (cosine w/ restarts)

Resolution

1024 x 1024 (bucketing)

Epochs

18

Batch Size

2

Hardware

36GB+ RAM recommended

Trigger Word

mercedesbenzeclasssedan2012

Training Images

87

Training Progression

Sample outputs at each checkpoint epoch, showing progressive improvement in generation quality.

Epoch 18 sample 1
Epoch 18 sample 2
Epoch 18 sample 3

GitHub · Training framework: Kohya sd-scripts · Inference: ComfyUI · Dataset: Stanford Cars