Information and Links
Join the fray by commenting, tracking what others have to say, or linking to it from your blog.
Spotlight Voice Recognition and OCR
Part 3 of a series
<- 'Nested Smart Folders and Spotlight' 1,2,3
Apple improved metadata using OCR and voice recognition
Have you ever tried to find that song but can’t remember who it’s by or its name? Apple thinks it can help…
The problem with metdata is that it’s a bit boring- here is an excerpt from a digital photograph (mdls is a useful commandline tool that lists all a files metdata):
[~/Images/ ]% mdls IMG_0016.jpg
IMG_0016.jpg ————-
kMDItemAcquisitionMake = “Canon”
kMDItemAcquisitionModel = “Canon DIGITAL IXUS”
kMDItemAperture = 2.970856
kMDItemAttributeChangeDate = 2006-01-16 20:14:12 +0000
kMDItemBitsPerSample = 32
kMDItemColorSpace = “RGB”
kMDItemContentCreationDate = 2006-01-10 19:13:19 +0000
kMDItemContentModificationDate = 2006-01-10 19:13:20 +0000
kMDItemContentType = “public.jpeg”
kMDItemExposureTimeSeconds = 0.03333334
kMDItemFlashOnOff = 1
kMDItemFocalLength = 5.40625
This data is great but it’s not really accessible - what you really want to know is this the picture I took as a closeup that night? This metdata won’t tell you that but what if your computer processed this metadata and added new metadata that better described the photograph (nighttime, closeup and the text from any signs in the photo etc).
Sound crazy? Apple has put a patent application covering this technology and more…

How is Secondary Metadata Gathered?
- The analytical techniques that could potentially be used include:
- Latent semantic analysis (LSA)
- Tokenization
- Stemming
- Concept extraction
- Spectrum analysis and/or filtering
- Optical character recognition (OCR)
- Voice recognition (also referred to as speech-to-text operations)
From the Pat. App:
Methods and apparatuses for processing metadata are described herein. In one embodiment, when a file (e.g., a text, audio, and/or image files) having metadata is received, the metadata and optionally at least a portion of the content of the file are extracted from the file to generate a first set of metadata. An analysis is performed on the extracted metadata and the content to generate a second set of metadata, which may include metadata in addition to the first set of metadata. The second set of metadata may be stored in a database suitable to be searched to identify or locate the file.
According to certain embodiments of the invention, the metadata that can be searched, for example, to locate or identify a file, may include additional metadata generated based on the original metadata associated with the file and/or at least a portion of content of the file, which may not exist in the original metadata and/or content of the file. In one embodiment, the additional metadata may be generated via an analysis performed on the original metadata and/or at least a portion of the content of the file. The additional metadata may capture a higher level concept or broader scope information regarding the content of the file.

| OCR stands for Optical Character Recognition. One of the fields that OCR is used in is PDF conversion. Programs such as PDF converters use OCR to read a PDF document and convert it to a text file such as .txt or a Microsoft Word file. With the increase in popularity it’s become more are more necessary to get a .pdf to word converter. If you choose a good piece of software it can not only convert a .pdf to word but write a .pdf as well. |

[...] Spotlight Voice Recognition and OCR Published February 21st, 2006 in OS X. [...]