我的方法是用qwen2.5 VL用提示词告诉他去掉印章就完了。

对大模型来说去印章和去水印差不多吧,应该可以解决你的问题,你可以测试一下单个文档处理的效果-百炼控制台,批量处理是要调用API接口。

你可以截图上传,提示词要有取出印章的要求,简单补充下印章特征。

这个是vlmax,你要批量处理调用qwen-vl-plus API足够了,新注册的用户有赠送的token。代码是AI生成的,供参考。

"""Qwen-VL模型客户端封装"""
import io
import base64
from PIL import Image
import openai

class QwenVLClient:
    def __init__(self, api_base="https://dashscope.aliyuncs.com/compatible-mode/v1"):
        self.client = openai.OpenAI(
            base_url=api_base,
            api_key="XXX"
        )
    
    def process_image(self, base64_image, prompt):
        """处理图片并返回识别结果"""
        # 每次创建新的client实例
        client = openai.OpenAI(
            base_url=self.client.base_url,
            api_key=self.client.api_key
        )
        self.base64_img = base64_image
        messages = self._build_messages(prompt, self.base64_img)
        response = self._call_api(messages, client)
        return response.choices[0].message.content
    
    def _image_to_base64(self, image_path):
        """图片转Base64编码"""
        if image_path is None and hasattr(self, 'base64_img'):
            # 直接返回已编码的base64字符串
            return self.base64_img
        with Image.open(image_path).convert('RGB') as img:
            buffered = io.BytesIO()
            img.save(buffered, format="JPEG")
            return base64.b64encode(buffered.getvalue()).decode()
    
    def _build_messages(self, prompt, base64_img):
        """构建OpenAI格式消息"""
        api_messages = []
        msg = {
            "role": "user",
            "content": prompt if prompt else "",
            "image": None if not base64_img else base64_img
        }
        
        if "image" in msg and msg["image"]:
            image = "data:image/jpeg;base64," + msg["image"]
            api_messages.append({
                "role": msg["role"],
                "content": [
                    {"type": "text", "text": msg["content"] if msg["content"] else ""},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image,
                        },
                    },
                ],
            })
        else:
            api_messages.append({
                "role": msg["role"],
                "content": msg["content"] if msg["content"] else ""
            })
            
        return api_messages
    
    def _call_api(self, messages, client=None):
        """调用API接口"""
        client = client or self.client
        response = None
        try:
            print("调用QwenVL API,请求参数:", messages)
            response = client.chat.completions.create(
                model="qwen-vl-plus",
                messages=messages,
                temperature=0.01,
                stream=False
            )
            print("API调用成功")
            return response
        except Exception as e:
            print(f"API调用失败: {str(e)}")
            raise
        finally:
            if response:
                print("API响应内容:", response.choices[0].message.content)

我是用这个做了个简单的GUI界面,支持快捷键截图,直接提取文字copy到剪贴板,识别效果还可以。

提示词里面我增加了去水印的要求: