Component Selection
RAGFlowDocument parsing and unified indexing
Qwen-VL / CLIPImage understanding and labels
Whisper / FFmpegAudio transcription and video frame extraction
vLLM / OllamaPrivate or local inference services
Different media require different parsing paths, but the results need one retrievable, traceable, evaluable knowledge layer.
Unified retrieval for text, image, audio, and video
Different media require different parsing paths, but the results need one retrievable, traceable, evaluable knowledge layer.
Preserve headings, paragraphs, tables, and document structure.
Generate descriptions, tags, and classifications.
Transcribe audio and combine keyframes with subtitles.
Text queries can hit documents, images, videos, and transcripts.