KBA-01124: Support for full text search of PDF files

Question:

Why can‘t I find a PDF file by using a full text search?

Answer:

The full text search feature is driven by Microsoft Indexing Service. By default, the Indexing Service supports DOC, DOCX, XLS, XLSX, PPT, TXT and HTM files. Spitfire sfATC extends support to PDF files by extracting text and storing the text alongside the PDF file. This process may use OCR or text directly extracted from the file. Due to this extra processing, there is a delay between when the file is cataloged and when the text is available for indexing. Additionally, Adobe supplies a free download to add in support for PDF files with text.

You download the IFilter installation from Adobe, for example here. Or search for the latest.

Please be aware that an iFilter can only index PDF files that contain text—some PDF files (particularly those created by a copier/scanner) contain only images. If the iFilter does not support GZ compressed file streams, use ICTool to add the PDF extension to the list of file types that are excluded from compression.After downloading and installing the add-in, reboot the server (or restart Indexing Service) and use Enterprise Manager to rebuild your full text catalogs.

Additional Comments:

See the Adobe readme file included with the download for additional information. As you might expect, there are limitations such as:

PDF iFilter will not extract text from PDF files that are password-protected. Password-protected PDF files will not appear on an unfiltered files list.
PDF iFilter will not extract text from PDF files that have protection against copying.
PDF files composed only of images are not supported by the iFilter. The Adobe Reader find text feature does not work for such PDF files either. sfATC will make an attempt to extract images from these PDF files and OCR the images using Microsoft Office Document Imaging.

See also KBA-01327 for similar support for DWG files.

KBA-01124; Last updated: March 6, 2019 at 15:02 pm;

KBA-01837: TinyMCE says resource might not exist or is inaccessible

Problem: When using the new text editor TinyMCE, you may see the following messages: On the screen: Cannot convert blob (long URL). Resource might not exist or is inaccessible. In

KBA-01010: How do I move my SQL Server database files?

Question: When I set up the SQL Server database for Spitfire, I had the opportunity to specify the location for the primary MDF, the secondary file group NDF, and the

KBA-01195: How do I create a Spitfire icon for my Desktop?

Question: How do I create a Spitfire icon for my Desktop? Answer: To create a Shortcut icon for Spitfire: Load Spitfire (sfPMS). Select Create Desktop Shortcut from the site’s gear

KBA-01124: Support for full text search of PDF files

Question:

Answer:

Additional Comments:

Related Post

KBA-01837: TinyMCE says resource might not exist or is inaccessibleKBA-01837: TinyMCE says resource might not exist or is inaccessible

KBA-01010: How do I move my SQL Server database files?KBA-01010: How do I move my SQL Server database files?

KBA-01195: How do I create a Spitfire icon for my Desktop?KBA-01195: How do I create a Spitfire icon for my Desktop?