当前位置: 首页 > java >正文

Diffusion inversion后的latent code与标准的高斯随机噪音不一样

可视化latents_list如下;

 可视化最后一步与标准的噪声:

能隐约看出到最后一步还是会有“马”的形状

整个代码(及可视化代码如下):

## 参考freeprompt(FPE)的代码
import os 
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
from diffusers import DDIMScheduler
from typing import Optional
import numpy as np
from Freeprompt.diffuser_utils import FreePromptPipeline
from Freeprompt.freeprompt_utils import register_attention_control_new
from torchvision.utils import save_image
from torchvision.io import read_image
from Freeprompt.freeprompt import SelfAttentionControlEdit,AttentionStore, AttentionControl# Note that you may add your Hugging Face token to get access to the models
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model_path = "runwayml/stable-diffusion-v1-5"
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = FreePromptPipeline.from_pretrained(model_path, scheduler=scheduler).to(device)import yaml
def load_image_data(yaml_file):with open(yaml_file, 'r') as file:data = yaml.safe_load(file)return data# Note that you may add your Hugging Face token to get access to the models
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")def load_image(image_path, device):image = read_image(image_path)image = image[:3].unsqueeze_(0).float() / 127.5 - 1.  # [-1, 1]image = F.interpolate(image, (512, 512))image = image.to(device)return imageself_replace_steps = .8
NUM_DIFFUSION_STEPS = 50   # 替换40步out_dir = "examples/outputs_noise_test"# SOURCE_IMAGE_PATH = "examples/img/000141.jpg"
SOURCE_IMAGE_PATH = "/opt/data/private/ywx/EasyNLP/diffusion/FreePromptEditing/data/wild-ti2i/data/horse.png"
source_image = load_image(SOURCE_IMAGE_PATH, device)source_prompt = ""# invert the source image
start_code, latents_list = pipe.invert(source_image,source_prompt,guidance_scale=7.5,num_inference_steps=50,return_intermediates=True)
# latents_list, 从最后一个到第一个时间步的中间特征, 51个中间潜在变量# target_prompt = 'a red car'   
target_prompt = 'a photo of a pink horse in the beach'latents = torch.randn(start_code.shape, device=device)
prompts = [source_prompt, target_prompt]start_code = start_code.expand(len(prompts), -1, -1, -1)
controller = SelfAttentionControlEdit(prompts, NUM_DIFFUSION_STEPS, self_replace_steps=self_replace_steps) # 自定义模块register_attention_control_new(pipe, controller)# Note: querying the inversion intermediate features latents_list
# may obtain better reconstruction and editing results
results = pipe(prompts,latents=start_code,guidance_scale=7.5,ref_intermediate_latents=latents_list) #latents_list:51save_image(results[0], os.path.join(out_dir, str(target_prompt)+'_recon.jpg'))
save_image(results[1], os.path.join(out_dir, str(target_prompt)+'.jpg'))

 可视化:

import torch
import matplotlib.pyplot as pltnum_images = len(latents_list)
grid_size = (num_images // 5 + (num_images % 5 > 0), 5)  # 自动计算行数,确保所有图像都显示fig, axes = plt.subplots(*grid_size, figsize=(15, 15))
axes = axes.flatten()  # 将二维网格展平成一维,便于索引for i in range(num_images):latent_image = latents_list[i].squeeze().cpu().detach().numpy().transpose(1, 2, 0)axes[i].imshow(latent_image)axes[i].set_title(f"Step {i+1}")axes[i].axis('off') for j in range(num_images, len(axes)):axes[j].axis('off')plt.tight_layout() 
plt.show()

http://www.xdnf.cn/news/1618.html

相关文章:

  • 手机访问电脑端Nginx服务器配置方式
  • 新规!专利优先审查,每个申请主体每月推荐不超过2件。
  • 配置 C/C++ 语言智能感知(IntelliSense)的 c_cpp_properties.json 文件内容
  • 【k8s】KubeProxy 的三种工作模式——Userspace、iptables 、 IPVS
  • Maxscale实现Mysql的读写分离
  • 第七届能源系统与电气电力国际学术会议(ICESEP 2025)
  • 力扣热题100题解(c++)—矩阵
  • 碰一碰发视频源码文案功能,支持OEM
  • 扩散模型(Diffusion Model)详解:原理、发展与应用
  • VS Code扩张安装目录
  • CSS element-ui Icon Unicode 编码引用
  • websocket
  • 什么是 YAML:技术特性、应用场景与实践指南
  • 深入探索Spark-Streaming:从Kafka数据源创建DStream
  • CPT204 Advanced Obejct-Oriented Programming 高级面向对象编程 Pt.8 排序算法
  • 算法设计与分析(基础)
  • JetBrains GoLang IDE无限重置试用期,适用最新2025版
  • CentOS系统中MySQL安装步骤分享
  • 计算机图形学实践:结合Qt和OpenGL实现绘制彩色三角形
  • 硬件知识点-----SPI串联电阻、振铃、过冲
  • python的mtcnn检测图片中的人脸并标框
  • 精选面试题
  • 观成科技:摩诃草组织Spyder下载器流量特征分析
  • [原创](现代Delphi 12指南):[macOS 64bit App开发]:如何使用NSString类型字符串?
  • [Mac] 使用homebrew安装miniconda
  • 机器学习中的特征存储是什么?我需要一个吗?
  • 游戏引擎学习第241天:将OpenGL VSync 和 sRGB 扩展
  • nerf 有哪些 高精度建图算法
  • vue3,element ui框架中为el-table表格实现自动滚动,并实现表头汇总数据
  • 如何保证高防服务器中的系统安全?