Decoder performance
Today I built the decoder with profiling and debugging information disabled to compare the speed with that of the reference implementation. When building the reference implementation I disabled MMX, so I am just comparing C code with C code. At the end my code can also be speed up using SIMD code.
To measure the speed, I used `time’. The first video I tried is a small video of just a few seconds. Using the reference implementation:
real 0m4.029s
user 0m3.704s
sys 0m0.272s
When using my decoder:
real 0m3.754s
user 0m3.528s
sys 0m0.192s
The first video is a longer video. Using the reference implementation:
real 0m59.709s
user 0m52.447s
sys 0m5.168s
Using my decoder:
real 0m55.814s
user 0m51.327s
sys 0m2.460s
I had a look at what makes the difference. It appears that because I cache halfpel interpolated reference frames I save lots of time. This is not being done for the reference implementation. The reference implementation recalculated the interpolated frame every time.