VISIONDOC AI

A GenAI system that interprets and semantically links embedded images within narrative documents for visual question answering and retrieval.

Accepts input formats: PDF, DOCX.
Uses a vision-language model (GEMMA3) to extract semantic meaning from images.
Implements chunking logic that binds pre-image and post-image text with image metadata.
Stores enriched chunks in a vector store (FAISS).
Enables semantic search and retrieval using vector similarity.
Builds a chatbot using retrieval-augmented generation (RAG) over vector data.
Ensures chunk provenance (page, image position) is preserved.
Implements access control and role-based user permissions.

PURPOSE

Let's take for example an auto brochure for Porsche Cayenne Turbo 2006.

It identifies each image and relevant text near it.

A description for each image is also created. Whenever the user asks for an image, he receives the most relevant picture.

The projects supports Q&A retrieval on multiple docx and pdf files.

LOGIC:

USAGE:

Step 1: Add your desired documents into VisionDOC-AI/documents directory (remove any existing documents if you don't intend to use them)

Step 2: Run extraction/documents_extraction.py (extracts information about documents)

Step 3: Run db_build.py (storing information into vectorstore)

Step 4: Run app.py for web chatbot interface

use user: 'admin' and password: 'adminpwd' for admin permissions
now, you can ask the model about any image that exists in documents

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
VisionDOC-AI		VisionDOC-AI
explanation		explanation
README.md		README.md
Screenshot 2025-08-27 151900.png		Screenshot 2025-08-27 151900.png
Screenshot 2025-08-27 152118.png		Screenshot 2025-08-27 152118.png
Screenshot 2025-08-27 152543.png		Screenshot 2025-08-27 152543.png
Screenshot 2025-08-27 153026.png		Screenshot 2025-08-27 153026.png
VisionDoc AI.odt		VisionDoc AI.odt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISIONDOC AI

PURPOSE

LOGIC:

USAGE:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VISIONDOC AI

PURPOSE

LOGIC:

USAGE:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages