According to a new Apple patent (number 20110141219) at the US Patent & Trademark Office the company is working on a way to use face detection as a metric to stabilize video during a video chat session.

Here’s Apple’s background and summary of the invention: “Many handheld wireless communication devices that are in use today provide video capturing capabilities. An example of such a handheld wireless communication device (“device”) is a mobile phone that includes a digital camera for capturing still images and videos. With such a device, a user can record a video or conduct a live video chat session with a far-end user.

“During a video chat session, the image of the user (typically, the face of the user) is captured by the camera of a near-end device, and then transmitted over the wireless network to a far-end device. The far-end device then displays the image on its screen. At the same time, the image of the far-end user (typically, the face of the far-end user) is captured by the camera of the far-end device, and then transmitted to and displayed on the near-end device.

“During the video capturing process, any relative movement between the camera and the user can reduce the image quality of the video. For example, if the user is walking or otherwise moving when he talks, the image of his face may be unstable. Further, the user’s hand holding the device may be unsteady, which results in unsteady images.

“To improve image stability, a user may mount his device on a stable surface, e.g., on top of a desk. However, mounting the device at a fixed location reduces the mobility of the user, as he cannot easily move outside the range of the camera during a video chat session. Further, even if the device is mounted at a fixed location and the user is sitting during a video chat session, the image captured by the device can still be degraded when the user moves his face or body. In some scenarios, the user may post-process the video captured by the device. However, post-processing techniques are not suitable for a live video chat.

“An embodiment of the invention is directed to a handheld wireless communication device that has a camera on the same side of the housing as a display screen. The camera captures a video during a video chat session that is conducted between a user of the handheld communication device and a far-end user. This input video includes images frames, each of the image frames containing an image of a face of the user.

“A video processor in the device detects the position of the face in each of the image frames. Based on the detected position of the face, a boundary area of each of the image frames is cropped, to produce an output video (while the input video is being captured). The image of the face stays substantially stationary in the output video. The output video is transmitted to the far-end user during the video chat session.

“In one embodiment, the video processor calculates a motion vector is calculated as the difference between the detected position of the face in a current image frame and a reference position of the face in a previous image frame. The motion vector indicates the direction and the amount of face movement relative to the reference position. The video processor adjusts the size of the boundary area to be cropped based on the motion vector.

“In one embodiment, the boundary area to be cropped from an image frame comprises a top margin, a bottom margin, a right margin and a left margin. The video processor determines the size of the margins in each direction to substantially center the image of the face in the output video. In another embodiment, the handheld communication device provides one or more options for the user to select a fixed position in the output frames of the output video as the position of the face. The selectable fixed location may be anywhere in the output frames as desired by the user.

“In one embodiment, when the amount of face movement exceeds the available margin in any of the top, bottom, right and left directions, the handheld communication device generates a visual or audio warning to alert the user. When this occurs, the user can adjust the position of the handheld communication device or adjust the position of the face, to, for example, re-center the face image.

“The handheld communication device may be configured or programmed by its user to support one or more of the above-described features.

“The above summary does not include an exhaustive list of all aspects of embodiments of the present invention. It is contemplated that embodiments of the invention include all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.”

— Dennis Sellers