Converting video with GPU acceleration tested

Converting video with GPU acceleration tested

Introduction


During the past couple of years the possibilities of video cards have increased to help in more than just 3D modelling and video games. Nowadays video cards can be used in for example breaking password protections, medical research and calculations, as well as video processing.

The processing power of video cards cannot be used automatically in generic software. In order to use the additional power provided by GPU the program needs to include code and support for the appropriate interface. The most popular of these interfaces is NVIDIA's CUDA, which is officially supported by the company's video cards. Other alternatives include ATI's Stream, OpenCL which recently introduces version 1.0 and Compute Shader provided by Microsoft's DirectX 11.



Software


We decided to get familiar with CUDA and its improvements to video processing because CUDA is now supported out of the box by TMPGEnc 4.0 XPress and PowerDirector.

The goal of the test was to see how much CUDA speeds up the compression of video and if the video quality of the final product is worse than the one with CPU-only compression. This could be the case at least with the default software provided by ATI and NVIDIA.

TMPGEnc 4.0 XPress uses the CUDA for MPEG-1 and MPEG-2 decoding and processing of certain video filters. CUDA is therefore not used in every situation possible, but according to TMPGEnc developers, further support for CUDA is in development. In some cases the processing speed can increase by 800 percent when the system includes a CUDA supporting video card.

CyberLink introduced a new version of PowerDirector which includes support for both CUDA and Stream. The PowerDirector -- released just before CES 2009 -- takes advantage of video card muscle in not only filter processing but also in H.264/AVC encoding. This is a major improvement to TMPGEnc's limited CUDA support and should show some definite results.

Neither of the programs use CUDA as a default. In both programs enabling CUDA is simply done by checking couple of check boxes (pictured below).


Enabling CUDA support in TMPGEnc 4.0 XPress


Enabling CUDA support for video effects in PowerDirector




Enabling CUDA support for H-264/AVC encoding in PowerDirector

To compare the results of CUDA processing we decided to use the open source AVIdemux as a control. It doesn't have support for CUDA and should provide a clear comparison in video quality when encoded with x264.

Tests


To test the video encoding we used a Sony HDW-F900 footage found from W6RZ.net. Video is in TS container and MPEG-2 format with 1080p resolution. The bitrate of the video data reaches up to 30 Mbps and the average bitrate is around 18 Mbps so the video should provide enough load for the setup.

The test video was encoded to H.264/AVC format with all three software. The first goal was to produce a 720p video with 2 pass variable bitrate settings for Internet distribution. We set the average bitrate to 3 000 kbps and audio bitrate to 64 kbps -- other settings were default.


TMPGEnc 4.0 XPress -- 720p encoding setup


PowerDirector -- 720p encoding setup




AVIdemux -- 720p encoding setup

For the second test we converted the video to 640x360 resolution (iPhone and iPod Touch) video with 1 pass constant bitrate at 1 Mbps and 64 kbps audio bitrate.


TMPGEnc 4.0 XPress -- iPhone encoding setup


AVIdemux -- iPhone encoding setup

Unfortunately PowerDirector doesn't support MP4 format with H.264/AVC video so the video was converted from m2ts file produced by PowerDirector to MP4 by using mencoder to separate the video and converted it into MP4 format with MP4box. PowerDirector has quite a restricted resolution support and therefore the iPod Touch/iPhone video was not converted at all. It also doesn't support AAC audio for m2ts files so there is no audio in the final versions produced by PowerDirector. For some reason the PowerDirector had problems with the aspect ratio of the video even though it was forced to 16:9 mode.

Both conversions were tested in all of the three programs without filters and with Color Correction filter. The conversion times are not comparable with the other programs because they use different libraries or methods for H.264/AVC encoding and filter processing. You can however compare the times between filters on and off in the particular program.



Test setup was as follows:

PC
-1,6 GHz Intel Pentium Dual E2140
-2 GB< DDR2
-Club 3D Geforce 9600GT (NVIDIA's Forceware 181.20 drivers)
-Windows Vista SP1

Software
-TMPGEnc 4.0 XPress v4.6.3.268
-PowerDirector v7.0.2416a
-AVIdemux v2.4.3 (r4494) with x264 library r1080

The times were measured with a stop watch.

Test results


720p without filters
SoftwareTime without CUDATime with CUDATime improvement with CUDA
TMPGEnc 4.0 XPress 16 min 16 sec 15 min 17 sec 59 sec
PowerDirector 4 min 54 sec 3 min 51 sec 1 min 3 sec
AVIdemux 11 min 10 sec--


720p with filters
SoftwareTime without CUDATime with CUDATime improvement with CUDA
TMPGEnc 4.0 XPress 25 min 22 sec 16 min 15 sec 9 min 7 sec
PowerDirector 6 min 2 sec 4 min 59 sec 1 min 3 sec
AVIdemux 11 min 18 sec - -


iPhone without filters
SoftwareTime without CUDATime with CUDATime improvement with CUDA
TMPGEnc 4.0 XPress 3 min 44 sec 3 min 54 sec -10 sec
AVIdemux 2 min 34 sec - -


iPhone with filters
SoftwareTime without CUDATime with CUDATime improvement with CUDA
TMPGEnc 4.0 XPress 8 min 26 sec 5 min 56 sec 2 min 30 sec
AVIdemux 2 min 36 sec--




Video files (MP4)
TMPGEnc 4.0 XPress 720p without CUDA
TMPGEnc 4.0 XPress 720p with CUDA
PowerDirector 720p without CUDA
PowerDirector 720p with CUDA
AVIdemux 720p

TMPGEnc 4.0 XPress iPhone without CUDA
TMPGEnc 4.0 XPress iPhone with CUDA
AVIdemux iPhone

Screen captures (720p)
(click for a larger PNG image)


TMPGEnc 4.0 XPress -- 720p @ 33 sec (without CUDA)


TMPGEnc 4.0 XPress -- 720p @ 33 sec (with CUDA)


PowerDirector -- 720p @ 33 sec (without CUDA)


PowerDirector -- 720p @ 33 sec (with CUDA)


AVIdemux -- 720p @ 33 sec






TMPGEnc 4.0 XPress -- 720p @ 60 sec (without CUDA)


TMPGEnc 4.0 XPress -- 720p @ 60 sec (with CUDA)


PowerDirector -- 720p @ 60 sec (without CUDA)


PowerDirector -- 720p @ 60 sec (with CUDA)


AVIdemux -- 720p @ 60 sec



TMPGEnc 4.0 XPress -- 720p @ 90 sec (without CUDA)


TMPGEnc 4.0 XPress -- 720p @ 90 sec (with CUDA)


PowerDirector -- 720p @ 90 sec (without CUDA)


PowerDirector -- 720p @ 90 sec (with CUDA)




AVIdemux -- 720p @ 90 sec

Screen captures (iPhone)
(click for a larger PNG image)


TMPGEnc 4.0 XPress -- iPhone @ 33 sec (without CUDA)


TMPGEnc 4.0 XPress -- iPhone @ 33 sec (with CUDA)


AVIdemux -- iPhone @ 33 sec


TMPGEnc 4.0 XPress -- iPhone @ 60 sec (without CUDA)


TMPGEnc 4.0 XPress -- iPhone @ 60 sec (with CUDA)


AVIdemux -- iPhone @ 60 sec


TMPGEnc 4.0 XPress -- iPhone @ 90 sec (without CUDA)




TMPGEnc 4.0 XPress -- iPhone @ 90 sec (with CUDA)


AVIdemux -- iPhone @ 90 sec

Conclusion


TMPGEnc 4.0 Xpress showed improvements in processing times with CUDA especially when filters were used -- just like TMPGEnc's own tests anticipated. The use of CUDA in TMPGEnc 4.0 Xpress didn't affect much to the quality of the video so it is safe to recommend it for anyone with a CUDA supporting setup. In some cases CUDA does slow down the process a bit so you might want to try it out for a couple of videos before committing to it, especially if you use TMPGEnc often for same format videos produced by for example your digital video camera.

PowerDirector used different encoding profile when CUDA was enabled. This resulted into a better quality and an approximately three megabytes larger file because of the higher bitrate. Even though video card didn't itself improve the quality, it did improve the speed of the conversion and CUDA is recommended to be enabled in PowerDirector as well.

Between the three software PowerDirector's video had clearly the lowest quality, it does however do the compressing quickly. The problems with aspect ratios and limited settings don't paint a rosy picture of PowerDirector either. The x264 file produced by AVIDemux takes the crown for video quality with flying colors and we can only hope that it will get help from GPU processing in the future.

Written by: Matti Robinson @ 20 Jan 2009 9:46
Advertisement - News comments available below the ad
  • 7 comments
  • ZippyDSM

    Is it me or are GPUs becoming like math co processors?
    In 5 to ten years all PCs will come with not some crappy video chipset but a study and handy GPU that combined with slot card add on for more graphics power if needed.

    20.1.2009 13:41 #1

  • DXR88

    They had those zippy it was on all Vesa Graphics cards. 3DFX (voodoo)PCI had 2 processors on you could swap out and one you couldn't. Voodoo 2 just soldered them both on the boards Voodoo3 and up usually had just one ship excluding rage.

    i actually looked at the pics and w/o cuda looked better (not as blurred ) w/ cuda looked more vibrant(but was blurred).

    20.1.2009 14:02 #2

  • ZippyDSM

    Originally posted by DXR88: They had those zippy it was on all Vesa Graphics cards. 3DFX (voodoo)PCI had 2 processors on you could swap out and one you couldn't. Voodoo 2 just soldered them both on the boards Voodoo3 and up usually had just one ship excluding rage.

    i actually looked at the pics and w/o cuda looked better (not as blurred ) w/ cuda looked more vibrant(but was blurred).
    I am referring to something more open the chip makers make a standardized GPU chip of varying power this chip is installed on most motherboards and a DVI/VGA port is offered either as a optional add on or with the other ports, the add on card will be able to interface with it have have more options with bandwidth and features.

    The main competition can create features and maximized support of thier chips and add ons creating a a more sub driven market.

    20.1.2009 19:15 #3

  • Pop_Smith

    Personally, the CUDA pictures look blurry and even a bit lighter in color compared to the non-CUDA pictures.

    It would have been nice to see the originals as well, to see which picture was closer to the original, which is the ultimate goal.

    With time I hope the CUDA footage will get clearer and better looking and end up being the way to go due to the much quicker encoding times in most situations.

    Peace

    21.1.2009 00:08 #4

  • Mr-Movies

    Originally posted by Pop_Smith: Personally, the CUDA pictures look blurry and even a bit lighter in color compared to the non-CUDA pictures.

    It would have been nice to see the originals as well, to see which picture was closer to the original, which is the ultimate goal.

    With time I hope the CUDA footage will get clearer and better looking and end up being the way to go due to the much quicker encoding times in most situations.

    Peace
    That is exactly how I see it plus some of the Cuda shots seem a bit distorted, I’m not impressed.

    War LOL

    27.1.2009 07:32 #5

  • pmshah

    As a matter of fact the GPUs have inherently more processing power than your "normal" CPUs when you consider only the sheer processing power. The difference is that they are limited in their command sets, again designed for the specific task of video related tasks.

    The ideal processor to use would be IBM's 32 core cell processor, each perhaps equivalent to a 486 (my guess) but each doing its own thing in parallel and capable of full x86 command set. It would probably compute the pants off of anything else in the market.

    I believe PS3 employs it.

    27.1.2009 21:44 #6

  • DXR88

    Originally posted by pmshah: As a matter of fact the GPUs have inherently more processing power than your "normal" CPUs when you consider only the sheer processing power. The difference is that they are limited in their command sets, again designed for the specific task of video related tasks.

    The ideal processor to use would be IBM's 32 core cell processor, each perhaps equivalent to a 486 (my guess) but each doing its own thing in parallel and capable of full x86 command set. It would probably compute the pants off of anything else in the market.

    I believe PS3 employs it.
    in computer engineering that might work, but it would require a major over haul of standardized parts. hence the reason Major GFX chipset manufacturers have employed there own method of parallel. called stream processing, not true parallel of coarse more like hyper threading. the PS3 uses RSX a GPU based on the Nvidia GeForce 7800 Architecture, not exactly top of the line anymore.

    27.1.2009 23:42 #7

© 2025 AfterDawn Oy

Hosted by
Powered by UpCloud