有什么好的方法可以去除文档的印章以进行后续的ocr识别吗?
我的方法是用qwen2.5 VL用提示词告诉他去掉印章就完了。
对大模型来说去印章和去水印差不多吧,应该可以解决你的问题,你可以测试一下单个文档处理的效果-百炼控制台,批量处理是要调用API接口。

你可以截图上传,提示词要有取出印章的要求,简单补充下印章特征。
这个是vlmax,你要批量处理调用qwen-vl-plus API足够了,新注册的用户有赠送的token。代码是AI生成的,供参考。
"""Qwen-VL模型客户端封装"""
import io
import base64
from PIL import Image
import openai
class QwenVLClient:
def __init__(self, api_base="https://dashscope.aliyuncs.com/compatible-mode/v1"):
self.client = openai.OpenAI(
base_url=api_base,
api_key="XXX"
)
def process_image(self, base64_image, prompt):
"""处理图片并返回识别结果"""
# 每次创建新的client实例
client = openai.OpenAI(
base_url=self.client.base_url,
api_key=self.client.api_key
)
self.base64_img = base64_image
messages = self._build_messages(prompt, self.base64_img)
response = self._call_api(messages, client)
return response.choices[0].message.content
def _image_to_base64(self, image_path):
"""图片转Base64编码"""
if image_path is None and hasattr(self, 'base64_img'):
# 直接返回已编码的base64字符串
return self.base64_img
with Image.open(image_path).convert('RGB') as img:
buffered = io.BytesIO()
img.save(buffered, format="JPEG")
return base64.b64encode(buffered.getvalue()).decode()
def _build_messages(self, prompt, base64_img):
"""构建OpenAI格式消息"""
api_messages = []
msg = {
"role": "user",
"content": prompt if prompt else "",
"image": None if not base64_img else base64_img
}
if "image" in msg and msg["image"]:
image = "data:image/jpeg;base64," + msg["image"]
api_messages.append({
"role": msg["role"],
"content": [
{"type": "text", "text": msg["content"] if msg["content"] else ""},
{
"type": "image_url",
"image_url": {
"url": image,
},
},
],
})
else:
api_messages.append({
"role": msg["role"],
"content": msg["content"] if msg["content"] else ""
})
return api_messages
def _call_api(self, messages, client=None):
"""调用API接口"""
client = client or self.client
response = None
try:
print("调用QwenVL API,请求参数:", messages)
response = client.chat.completions.create(
model="qwen-vl-plus",
messages=messages,
temperature=0.01,
stream=False
)
print("API调用成功")
return response
except Exception as e:
print(f"API调用失败: {str(e)}")
raise
finally:
if response:
print("API响应内容:", response.choices[0].message.content)
我是用这个做了个简单的GUI界面,支持快捷键截图,直接提取文字copy到剪贴板,识别效果还可以。

提示词里面我增加了去水印的要求:
