OCR Text Search

Camera scan, indexing, tips and troubleshooting

The OCR feature lets you photograph a book page or load a screenshot to find the corresponding position in the audiobook.

How It Works

The Vision framework recognizes text in the image — entirely on-device, no internet connection required.

The recognized text is matched against the audiobook's transcription index (fuzzy matching).

On a match, the player jumps to the detected position. You'll see a preview with a confidence score.

Input Methods

Camera Scan: Photograph a book page directly with the camera.

Photo Library: Select a screenshot or photo from your gallery.

Text Input: Enter text manually or paste from clipboard.

Indexing

For OCR search to work, the audiobook needs to be transcribed and indexed once. This happens automatically in the background — even when the app is not active. Each chapter is processed individually, with the currently playing chapter prioritized.

You can see indexing progress in the library (e.g., '12/24 chapters'). A fully indexed book is shared via iCloud with your other devices.

Transcription Engine

You can choose between two engines in settings:

Apple Speech Framework: Built-in speech recognition, free, on-device.

WhisperKit: Local Whisper model for higher accuracy, runs on Apple Neural Engine.

Tips for Best Results

Ensure good lighting when scanning with the camera.

Hold the camera steady and parallel to the page.

Printed text works better than handwriting.

Make sure the audiobook is fully indexed.