OCR Text Search
Camera scan, indexing, tips and troubleshooting
The OCR feature lets you photograph a book page or load a screenshot to find the corresponding position in the audiobook.
How It Works
The Vision framework recognizes text in the image — entirely on-device, no internet connection required.
The recognized text is matched against the audiobook's transcription index (fuzzy matching).
On a match, the player jumps to the detected position. You'll see a preview with a confidence score.
Input Methods
Camera Scan: Photograph a book page directly with the camera.
Photo Library: Select a screenshot or photo from your gallery.
Text Input: Enter text manually or paste from clipboard.
Indexing
For OCR search to work, the audiobook needs to be transcribed and indexed once. This happens automatically in the background — even when the app is not active. Each chapter is processed individually, with the currently playing chapter prioritized.
You can see indexing progress in the library (e.g., '12/24 chapters'). A fully indexed book is shared via iCloud with your other devices.
Transcription Engine
You can choose between two engines in settings:
Apple Speech Framework: Built-in speech recognition, free, on-device.
WhisperKit: Local Whisper model for higher accuracy, runs on Apple Neural Engine.