Stable Diffusion LoRA Automotive Application
Fine-tuned a Stable Diffusion XL (SDXL) model using Low-Rank Adaptation (LoRA) to generate photorealistic Mercedes-Benz E-Class Sedan (2012) images. Trained on 87 images from the Stanford Cars Dataset using Kohya sd-scripts, optimized for Apple Silicon.
Training Pipeline
Data Collection
87 Mercedes E-Class images from Stanford Cars Dataset
Caption Generation
Automated captioning via Qwen2-VL vision model (Ollama)
LoRA Training
Kohya sd-scripts on SDXL, optimized for Apple M4 Max
Inference
ComfyUI with SDXL base + refiner pipeline
Training Configuration
Base Model
SDXL 1.0
Network
LoRA (dim=32, alpha=16)
Optimizer
AdamW (weight_decay=0.01)
Learning Rate
5e-5 (cosine w/ restarts)
Resolution
1024 x 1024 (bucketing)
Epochs
18
Batch Size
2
Hardware
36GB+ RAM recommended
Trigger Word
mercedesbenzeclasssedan2012
Training Images
87
Training Progression
Sample outputs at each checkpoint epoch, showing progressive improvement in generation quality.



GitHub · Training framework: Kohya sd-scripts · Inference: ComfyUI · Dataset: Stanford Cars