Zack Rusin recently blogged about advances in the Gallium infrastructure, with work him, Keith Whitwell and José Fonseca have just finished. With much happing lately in the open-source graphics drivers front, it is sometimes difficult to track what remains to be done in the GPGPU area for open-source Linux drivers.
One of the novelties in Gallium-land is the introduction of the concept of resources in the shader representation. Two new items of functionality have been added, the most interesting of them is Gather4, which comes from DirectX 11:
Gather4: Modern GPUs use dedicated hardware blocks known as texture units to fetch data rapidly into their processing cores. These texture units have historically been optimized for rendering graphics, where techniques such as bilinear filtering are typically used to improve image quality. Compute Shaders often make use of these same units to fetch data as well, but they generally have no use for their filtering capabilities, leaving them underutilized. GPUs with Shader Model 5.0 support have the ability to use the excess fetch capability with the Gather4 operation, which can fetch up to 4 values simultaneously and provide a 4xincrease in data bandwidth.
One of the novelties in Gallium-land is the introduction of the concept of resources in the shader representation. Two new items of functionality have been added, the most interesting of them is Gather4, which comes from DirectX 11:
Gather4: Modern GPUs use dedicated hardware blocks known as texture units to fetch data rapidly into their processing cores. These texture units have historically been optimized for rendering graphics, where techniques such as bilinear filtering are typically used to improve image quality. Compute Shaders often make use of these same units to fetch data as well, but they generally have no use for their filtering capabilities, leaving them underutilized. GPUs with Shader Model 5.0 support have the ability to use the excess fetch capability with the Gather4 operation, which can fetch up to 4 values simultaneously and provide a 4xincrease in data bandwidth.