An Apple patent (number 7895252) for single-channel convolution in a vector processing computer system has appeared at the US Patent & Trademark Office. The invention relates generally to signal processing within a computer processor.

It involves a system and method for performing convolution in a single channel of a vector processing computer system takes advantage of the parallel computing capability of the vector processing system and the distributed properties of the discrete-time convolution sum by performing convolution on portions of an overall data stream, or data chunks, simultaneously. Partial solution are thereby obtained and superimposed to achieve an overall solution data stream. To simplify the convolution sum and eliminate the need for calculating products, a specialized data signal or vector containing a series of ones may be used in the convolution operation. The inventors are li Sazegari and Doug Clarke.

Here’s Apple’s background and summary of the invention: “One of the most important, value-adding features in a computer is its ability to process large amounts of data and information. Some of the information frequently processed by a computer includes image and other signal information. Frequently, information processed on a computer may relate to the general computer display, computer graphics, scanned images, video, and other data. With each of these types of data, it is often desirable to utilize the convolution function to process the data.

“Convolution is useful in demonstrating the manner in which two signals interact in the time domain, and in expressing a resulting signal from the mixing of the two signals within the time domain.”

“One of the main problems in performing convolution using a computer is that the process is inherently linear. For relatively long sequences, therefore, the convolution process can be quite lengthy. Generally, a computer reads each function to be convolved as a stream of data, one element at a time. This requires valuable processor time, and the time required increases proportionately to the complexity and length of the signals to be processed. This is especially problematic, for example, in image processing applications and/or video applications, where signals are complex and memory-intensive. In video applications, another problem arises in that the real-time display of images, which is essential for a user’s understanding in viewing the video information, requires numerous computations at a high rate of speed without delays. If the convolution sum used to process these video signals delays the output of the video, the result may be difficulty in understanding the output signal.

“As processor speeds and users’ demands for quality increase, it is essential that signals which are processed by way of a convolution sum, such as the one shown in Equation 2, are processed in the most efficient manner without sacrificing quality. Even with the increased processor speeds of today, performing convolution as a serial process whereby entire streams of data are input, output, and computed sequentially, slows a computer’s ability to process signals and information, and generally slows the processing of data involved in unrelated functions by the computer.

“Recently, vector processing, which utilizes parallel computing operations, has been implemented in various computer systems. This type of computer processing has the advantage that multiple calculations may be performed simultaneously. This is accomplished by using vector calculations whereby entire matrices may be added, subtracted, multiplied, divided, or otherwise operated upon. However, even with the increased speeds afforded by performing vector calculations in a vector processing computer system, convolution has traditionally been a serial operation that does not take advantage of the vector processing power. As a result, an efficient, vector processing system may perform multiple tasks using parallel computing and not make use of the parallel calculating capability for convolution operations, thereby slowing the entire system while awaiting the results of a convolution calculation. The diminished processing speed is further exacerbated by the fact that linear processing typically occurs in a part of the computer’s central processing unit separate from the vector processor. Consequently, the delays associated with transferring data between the linear and vector processors further slow the overall process.

“Accordingly, it is desirable to create a system and method for performing convolution in a vector processing computer system that utilizes the parallel calculating capability of the system in a manner so as to make the most efficient use of the computer system.

“In accordance with the present invention, these objectives are achieved by a system and method that performs convolution in a single channel of a vector processing computer system. This system and method take advantage of the distributive properties of the discrete-time convolution sum by reading in data, buffering data into a given number of data chunks, transposing the data chunks within a matrix to align the first bit of each data chunk, performing the convolution sums on each of the columns of a matrix simultaneously, storing the results from each column’s convolution sums as partial solutions, superimposing the results of each column’s convolution sums into a single data stream representing an overall solution to be further processed by the computer. According to an embodiment of the invention, the data is transposed and manipulated within a matrix. According to another embodiment of the present invention, one of the data signals or vectors used in the convolution sum is a vector comprising a series of ones. By utilizing a series of ones, a simplification of the overall convolution sum, which is the sum of products is achieved as the operation is reduced to an operation of sums only.”

— Dennis Sellers