About Indexing PDF Documents

An index stores the content of many PDF files in a compact way, suited to easy search and retrieval.

Use the Document > Advanced Processing > Create Full Text Indexes command to build a new index or update an existing one.

You can index PDF documents written in languages that use Roman characters or Asian characters (Chinese, Japanese or Korean). You can index not only the document text, but also bookmarks, comments, attachments, digital signatures, form fields, metadata, and other custom document properties.

You can build an index file from all the PDF files in a set of folders you define. Before starting you choose a folder where the index will be stored. Indexing proceeds in the background. A small index definition file is created, with the extension zpi. This refers to the index files that are stored in an automatically created sub-folder that has the same name as the zpi file, with a suffix _index.

These search indexes are not embedded in the PDF files; to make them available to other users you would have to save them to a shared location. Use a different command to create an embedded index for a single document so that it is truly portable.
 

Preparing for indexing

Collect all PDF documents to be indexed into one or more folders. If you just choose existing folders, be sure they include only PDF files you want indexed.

If you plan to migrate the PDF files with their index, it is better to store them in a single folder.

Add document properties to PDF documents so you can use them as search criteria.

Notes

Be aware that if you create a full text index before redacting a document to remove its sensitive information, that information is NOT removed from its index, and can be easily found. When redacting finishes you are invited to also inspect the document. Accept the offer and remove the index. We advise that redaction and inspection are better done on a copy of a document – this lets you retain the index in the original document.

Indexing hundreds of large PDF files can take time and computing resources – best done over a lunch break.

 

 

 

About Indexing PDF Documents