搜索

回答

1、首先要解决图片向量化

可以使用用 OpenAI 的 CLIP 模型和 ViT 模型 openai/clip-vit-base-patch32行图片向量化比较相似性，你可以按照以下步骤进行：

准备图片和模型：

确保你有需要比较相似性的两张图片。下载并加载 CLIP-ViT 模型，如 openai/clip-vit-base-patch32。

处理图片：

将图片调整为模型的输入大小和格式要求。通常 CLIP-ViT 模型的输入大小为 224x224 像素，并且可能需要归一化。

生成图片向量：

使用 CLIP-ViT 模型将每张图片转换为其对应的向量表示。这一步骤需要使用模型的编程接口（例如 Python 中的 PyTorch 或 TensorFlow）来计算图片的特征向量。

计算相似性：

使用生成的图片向量计算它们之间的相似性。通常可以使用余弦相似度或欧氏距离等指标来衡量向量之间的相似程度。

这些步骤需要一定的编程知识和基础，特别是对模型的调用和图像处理。以下是一个简单的伪代码示例，演示了如何使用 Python 和 PyTorch 来加载 CLIP-ViT 模型并计算两张图片之间的相似性：

import torch
from PIL import Image
from torchvision import transforms
from clip import CLIP

# 加载模型
device = "cuda" if torch.cuda.is_available() else "cpu"
model = CLIP(clip_model='ViT-B/32', jit=False).to(device)
model.eval()
# 图片处理和向量化
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

def image_to_vector(image_path):
    image = Image.open(image_path)
    image = transform(image).unsqueeze(0).to(device)
    with torch.no_grad():
        image_features = model.encode_image(image)
    return image_features

# 例子：计算两张图片之间的相似性
image1_path = 'path/to/image1.jpg'
image2_path = 'path/to/image2.jpg'

vector1 = image_to_vector(image1_path)
vector2 = image_to_vector(image2_path)

# 计算余弦相似度
similarity = torch.cosine_similarity(vector1, vector2, dim=-1)
print(f"Similarity between the images: {similarity.item()}")

在这个例子中，image_to_vector 函数加载图片、处理大小和格式，然后使用 CLIP-ViT 模型计算图像的特征向量。最后，使用余弦相似度计算这两个向量的相似性。

2、图片向量插入qdrant向量数据库并搜索

800_auto

client = QdrantClient("localhost", port=6333)
print("[INFO] Client created...")
root_dir = "new_dataset" 


for subdir, dirs, files in os.walk(root_dir):
   for file in files:
       #look only for image files with jpeg extension
       if  file.endswith(".jpeg"): 
           image_path = os.path.join(subdir, file)
           try:
               image = Image.open(image_path) 
               image_dataset.append(image) 
           except Exception as e:
               print(f"Error loading image {image_path}: {e}")

print("[INFO] Loading the model...")
model_name = "openai/clip-vit-base-patch32"
tokenizer = AutoTokenizer.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForZeroShotImageClassification.from_pretrained(model_name)

print("[INFO] Creating qdrant data collection...")
client.create_collection(
   collection_name="animals_img_db",
   vectors_config=models.VectorParams(size=512, distance=models.Distance.COSINE),


)

print("[INFO] Creating a data collection...")
records = []
for idx, sample in tqdm(enumerate(image_dataset), total=len(image_dataset)):
   processed_img = processor(text=None, images = sample, return_tensors="pt")['pixel_values']
   img_embds = model.get_image_features(processed_img).detach().numpy().tolist()[0]
   img_px = list(sample.getdata())
   img_size = sample.size
   records.append(models.Record(id=idx, vector=img_embds, payload={"pixel_lst":img_px, "img_size": img_size}))

for i in range(30,len(records), 30):
   print(f"finished {i}")
   client.upload_records(
       collection_name="animals_img_db",
       records=records[i-30:i],
   )

3、图片搜索

800_auto

client = QdrantClient("localhost", port=6333)
print("[INFO] Client created...")

#loading the model
print("[INFO] Loading the model...")
model_name = "openai/clip-vit-base-patch32"
tokenizer = AutoTokenizer.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForZeroShotImageClassification.from_pretrained(model_name)

# Gradio Interface
iface = gr.Interface(
   title="Building a Reverse Image Search Engine Using Qdrant Vector Search",
   description="by Niranjan Akella",
   fn=process_text,
   inputs=gr.Image(label="Input Image"),
   outputs=gr.Gallery(label="Relevant Images"), 
)
iface.launch()
def process_text(image):
   processed_img = processor(text=None, images = image, return_tensors="pt")['pixel_values']
   img_embeddings = model.get_image_features(processed_img).detach().numpy().tolist()[0]
   hits = client.search(
       collection_name="animals_img_db",
       query_vector=img_embeddings,
       limit=5,
   )

   images = []
   for hit in hits:
       img_size = tuple(hit.payload['img_size'])
       pixel_lst = hit.payload['pixel_lst']


       # Create an image from pixel data
       new_image = Image.new("RGB", img_size)
       new_image.putdata(list(map(lambda x: tuple(x), pixel_lst)))
       images.append(new_image)

   return images

最终效果

800_auto

网友回复

我知道答案，我要回答

我有问题

私活外包

回答

开发了一个网站ai聊天助手

一个月开发一套类似coze的智能体平台

部署一套内网离线ai助理

私有ai助理开发

类似如家的租房app开发

h5手机端考试网站开发

开发一个短剧解锁剧集的小程序

我要开发一个酒类拍卖交易平台

开发艺术品拍卖收藏买画卖画h5网站

帮我做个数字货币交易所网站

ace.js如何获取选择文本的开始和结束行数？

如何把qwen code cli或gemini cli的免费调用额度换成http api对外开放接口？

如何限制windows10电脑只能打开指定的程序？

python如何调用ai大模型实现web网页系统的功能测试并生成测试报告？

有没有免费进行web网站ai仿真人测试生成测试报告的mcp服务或api？

Context Engineering到底是啥，有什么用？

如何使用Google veo 3+高斯溅射(Gaussian Splatting)技术生成4d视频？

浏览器中如何实时调用摄像头扫描二维码？

grok4、gemini2.5pro、gpt5、claude4.1到底谁的编程能力更强一些？

python能将2d平面户型图转换成3d三维户型效果图吗？