Apple is eyeing ways to interact with video capture devices though means such as hand gestures, as evidenced by a patent (number 20110261213) at the US Patent & Trademark Office.

A method and apparatus of interaction with and control of a video capture device are described. In the described embodiments, video are presented at a display, the display having contact or proximity sensing capabilities. A gesture can be sensed at or near the display in accordance with the video presented on the display, the gesture being associated with a first video processing operation. The video are modified in accordance with the first video processing operation in real time. The inventors are Benjamin A. Rottlier and Michael J. Ingrassia.

Here’s Apple’s background and summary of the invention: “The embodiments described herein relate generally to real time interactive control of image processing by a video capture device. More particularly, gestures can be applied in real time to initiate processing of video data by one or more video capture devices.

“Small form factor video capture devices such as camcorders and video capable portable media players often utilize a touch-screen display to facilitate video image capture. Typically, a display portion of the touch screen is used for previewing video during or after playback of the recorded video. However, in order to process the video, the recorded video data must be ported to another device, such as for example, a host computer. Unfortunately, due to the portable nature of these devices, immediate access to a host computer can be difficult or impossible.
Therefore, there is a need to provide an efficient method and apparatus for using a small form factor video capture device to process video data in real time.

“It is an advantage of the presently described embodiments to provide real time interactive control of image processing performed by a video capture device.
In one embodiment, a method is described. The method can be carried out by performing at least the following operations. A digital video source provides an unprocessed digital video stream at least some of which is directly presented at a display in real time. A first gesture associated with a first video process is sensed. In response to the sensed gesture, at least a portion of the unprocessed digital video stream presented at the display is modified in real time in accordance with the first video process.

“In one aspect, a second gesture associated with a second video process is sensed in conjunction with the first gesture. The first and second gestures are interpreted as a third gesture associated with a third video process used to modify the unprocessed digital video. For example, when the first gesture corresponds to a selection process for selecting an object image and the second gesture corresponds to an association process for associating at least two object images, the third video process is interpreted as associating the selected object images.

“In another embodiment, a method for local control of video processing performed at a first remotely located video capture device is described. The method can be carried out by performing at least the following operations. At a local control device having a display, receiving in real time from the first remotely located video capture device, a first unprocessed digital video stream that is presented at the local display concurrent with the receiving.

“A first gesture associated with a first video process is sensed at the local control device. The local control device responds by generating and then sending a first instruction to the first remotely located video capture device. The first remotely located video capture device uses the first instruction to modify at least some of the first unprocessed digital video stream in real time in accordance with the first video process.

“In yet another embodiment, a real time method for identifying an object image in a video formed of a plurality of video frames is described. The method can be carried out by performing at least the following operations. Receiving the video wherein at least some of the plurality of video frames includes object information corresponding to the object image.

“Using the object information from at least some of those video frames that include the object information to generate a correlation value. In the described embodiment, the correlation value indicates a degree of correlation between the object image and an object profile in a database of stored object profiles. The object image is identified by associating the object image with the stored object profile having the highest correlation value above a threshold correlation value.
In still another embodiment a method for associating at least a first and a second object image in a digital video formed of a plurality of digital video frames is described.

“The method can be performed by receiving at least some of the plurality of digital video frames at least some of which include object information corresponding to the first object image and the second object image. A correlation value between the first and second object images is determined. When the correlation value is greater than a threshold correlation value, the first and second object images are determined to be associated and the digital video is modified in accordance with the association.

“In yet another embodiment, a video capture device is described. The video capture device includes at least a video source, a processor, and a display on which is presented in real time an unprocessed digital video stream received from the video source. The video capture device also includes a touch sensitive surface arranged to sense a gesture corresponding to a first video process. In response to the sensing of the gesture, the processor modifies the unprocessed digital video in real time in accordance with the first video process.”

— Dennis Sellers