VisQueryPDF
VisQueryPDF checks if pictures and text in documents make sense together. It can do this by asking questions about the images and then seeing if the document answers them correctly, or by comparing the images and text to see how closely related they are. This helps ensure that visuals and written content are aligned and consistent. Businesses that work with documents containing images, like reports, manuals, or marketing materials, would find this helpful. It’s useful for quality control, making sure documents are clear and accurate, and ultimately improving communication. One approach is more accurate than the other, achieving a 60% similarity rate compared to 33%.
README
# VisQueryPDF ### The aim of the project is to verify the alignment between images and texts in documents. ### 2 different methods were discussed. ## 1.Method  Images automatically extracted from the document were described using a VLM agent structure. Using the description results, questions were generated with a question generation agent. Subsequently, these questions were posed to the document using the RAG system, and answers were verified. ## 2.Method  Images and texts are automatically extracted from the document. Text data undergoes processing using a summarization agent to obtain a concise summary. Subsequently, embeddings of images and texts are extracted using the CLIP model, and their similarities are compared. ### The first method achieved a similarity rate of 60%, whereas the other method showed similarities around 33%. ## Usage ``` git clone https://github.com/oztrkoguz/VisQueryPDF.git cd VisQueryPDF python main.py ``` ## Requirements ``` Python > 3.10 langchain==0.2.6 langchain-chroma==0.1.1 langchain-community==0.0.38 langchain-core==0.1.52 langchain-openai==0.0.5 langchain-text-splitters==0.2.1 langsmith==0.1.82 ollama==0.2.1 ```
PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
