Enhanced file extraction with improved layout and table support was added to Pega Knowledge Buddy 25. This feature uses a GenAI extraction method to process tabular content while preserving layout and formatting. This enhancement applies to file content from Pega Knowledge, Knowledge Loaders, REST API, and the Knowledge Buddy Portal.
GenAi extraction is used for supported file types to retain tables and other tabular formats during ingestion. GenAI extraction applies only to file-type content. If the GenAI extraction fails, the system uses the Standard method.
The table below shows the file attachment formats supported by different LLMs. There are size limitations for each format.
| Column 1 | Column 2 | Column 3 | Column 4 |
|---|---|---|---|
| Attachment type/LLM provider | Google Gemini Flash 2.0 | Anthropic Claude Sonnet 3.7 | OpenAI GPT 5o, 5o-mini |
| Image | PNG - image/png, JPEG - image/jpeg, WEBP - image/webp, HEIC - image/heic, HEIF - image/heif | PNG - image/png, JPEG - image/jpeg, WEBP - image/webp, Non-animated GIF - image/gif | |
| Document | PDF - application/pdf, JavaScript - application/x-javascript, text/javascript, Python - application/x-python, text/x-python, TXT - text/plain, HTML - text/html, CSS - text/css, Markdown - text/md, CSV - text/csv, XML - text/xml, RTF - text/rtf | PDF - application/pdf, DOCX - application/vnd.openxmlformats-officedocument.wordprocessingml.document | PDF- application/pdf |
| Audio | WAV - audio/wav, MP3 - audio/mp3, AIFF - audio/aiff, AAC - audio/aac, OGG Vorbis - audio/ogg, FLAC - audio/flac | Not supported | Not supported |
| Video | MP4 - video/mp4, MPEG - video/mpeg, MOV - video/mov, AVI - video/avi, FLV - video/x-flv, MPG - video/mpg, WEBM - video/webm, WMV - video/wmv, 3GP - video/3gpp | Not supported | Not supported |
NOTE: Files in the PPTX format are currently not supported by any generative AI model.
NOTE: For information about size limitations for specific models and attachment types, please refer to the external documentation of each model provider.