VisQueryPDF

provenance:github:oztrkoguz/VisQueryPDF

WHAT THIS AGENT DOES

VisQueryPDF checks if pictures and text in documents make sense together. It can do this by asking questions about the images and then seeing if the document answers them correctly, or by comparing the images and text to see how closely related they are. This helps ensure that visuals and written content are aligned and consistent. Businesses that work with documents containing images, like reports, manuals, or marketing materials, would find this helpful. It’s useful for quality control, making sure documents are clear and accurate, and ultimately improving communication. One approach is more accurate than the other, achieving a 60% similarity rate compared to 33%.

View Source ↗First seen 2y agoNot yet hireable

USE CASES

Testing Data Extraction Automation

README

# VisQueryPDF
### The aim of the project is to verify the alignment between images and texts in documents.
### 2 different methods were discussed.
## 1.Method 

![yontem1](https://github.com/oztrkoguz/VisQueryPDF/assets/101019436/65b62ab9-c98a-44db-bee1-abd71e6d0714)

Images automatically extracted from the document were described using a VLM agent structure. Using the description results, questions were generated with a question generation agent. Subsequently, these questions were posed to the document using the RAG system, and answers were verified.

## 2.Method

![Adsz-2024-06-29-0711](https://github.com/oztrkoguz/VisQueryPDF/assets/101019436/2a0dd56e-8839-446c-b42b-3758c577cf86)

Images and texts are automatically extracted from the document. Text data undergoes processing using a summarization agent to obtain a concise summary. Subsequently, embeddings of images and texts are extracted using the CLIP model, and their similarities are compared.

### The first method achieved a similarity rate of 60%, whereas the other method showed similarities around 33%.

## Usage
```
git clone https://github.com/oztrkoguz/VisQueryPDF.git
cd VisQueryPDF
python main.py
```
## Requirements
```
Python > 3.10
langchain==0.2.6
langchain-chroma==0.1.1
langchain-community==0.0.38
langchain-core==0.1.52
langchain-openai==0.0.5
langchain-text-splitters==0.2.1
langsmith==0.1.82
ollama==0.2.1

```

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenJun 29, 2024

last updatedFeb 20, 2025

last crawledtoday

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

crewai-veroq

CrewAI-Veroq provides a suite of tools that allow AI agents to access and verify reliable financial information. It help

blog-writer-multi-agents

Here's a plain English summary of the blog-writer-multi-agents AI agent: This agent automatically creates professional-

boss-skill

This agent, boss-skill, is designed to help employees navigate challenging workplace dynamics, particularly those involv

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

More Testing agents →

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:oztrkoguz/VisQueryPDF)