Archive for the ‘Summer of Code’ Category

ATA bug fixes

Thursday, August 14th, 2008

Last week I have updated the ATA driver code in GRUB 2. Error handling was broken and delays were long until Colins TSC patch was applied. The current code is now quite resilient to errors. In the meanwhile I removed the ATAPI code. Instead, I have added a SCSI module. The SCSI modules allows other modules to register a SCSI command interface. The SCSI modules exports a disk device that is communicated with using SCSI commands over this SCSI command interface. The ATA driver was changes to uses scsi.mod.

I am working on USB support for Google Summer of Code. One of my goals is implementing USB Mass Storage Device support (USBMSD). USBMSD also uses SCSI commands. Thus this module will be shares between at least USBMSD and ATAPI. Hopefully someone will add SCSI drivers some day.

Google Summer of Code 2008

Friday, May 2nd, 2008

Last year I have implemented a Dirac (BBC) decoder for FFmpeg, as a Google Summer of Code project. This year I will be working on a Summer of Code project again, now for GNU GRUB. The goal is to implement a USB framework including some drivers for GRUB 2. When this is implemented, you can access USB HID and USB Mass Storage devices from GRUB, even when the firmware does not support this. You might want to use this when your BIOS does not support USB. More importantly, this increases the usability of GRUB 2 on top of Coreboot (formerly known as LinuxBIOS). Because of this (and other GRUB 2 features we are working on), Coreboot + GRUB 2 might serve as a good (and Free!) BIOS replacement.

The first thing I will do is reading some books about USB, namely “USB Complete” and “Universal Serial Bus System Architecture”. Although I have some doubts about the quality of the second book, it’s better than reading the USB specification from the screen (which I eventually will have to do). I have ordered both books and I am looking forwards to getting started.

Because I will read these books first and because I am busy with my studies, I will not start producing code from the start, you can expect the first code somewhere in June. The first thing I plan to look at is libusb, which makes it possible to play with USB without having a complete UHCI driver. It should be easy to integrate this into grub-emu, more on this later.

As for Dirac, it still isn’t finished (unfortunately!). When I have time to look at this, I will. But if someone is interested in working on this, please tell me. The remaining tasks are not hard.

Motion Vector packing

Monday, September 10th, 2007

To implement motion vector packing, I just generate semi-random motion vectors and write these to the bitstream. In that case the rest of the encoding process has to adapt to these (incorrect) vectors. Most importantly: this will result in a bigger residue, thus in a bigger file. But this way this part of the encoder can be tested.

The motion vector packing now works! I have also implemented calculating the residue and fixed many bugs in the other parts of the encoding process. Now the encoder writes intra+inter frames with the semi-random motion vectors for a single reference frame.

So what needs to be done now is searching for (optimal) motion vectors. Perhaps I can use the algorithm from the Snow codec for this, otherwise I will have to implement this myself. I also need support for two reference frames. Actually the support is there, it just has to be enabled properly :-). In order to produce smaller files I need to add quantization.

Despite what’s missing, I think I can say I quite succeeded with writing the Dirac codec during Summer of Code (about 2 months, with a break of one week because of exams). When Summer of Code started, one of the Schrodinger developers called me Biggest Optimist in the Universe and Beyond 2007. Although the encoder is not in a state that it can efficiently code all videos yet, the biggest part has been implemented already. But more importantly, my decoder can play back videos at a decent speed. So I got the “Biggest Optimist in the Universe and Beyond 2007 Award”, now I am waiting for the certificate, medal or whatever comes with the title ;-).

Mike and Michael from the FFmpeg project both wrote a blog entry to wrap up how they look at the last Summer of Code, my work on the Dirac codec is part of this. Summer of Code is finished now so I will start using the FFmpeg category for my future work on Dirac. Don’t worry, I will keep working on Dirac!

Intra frame coding fixed

Monday, September 3rd, 2007

In a previous post I mentioned that intra frame encoding works, but the reference implementation couldn’t decode my frames yet. Now I fixed this bug, it seemed to be a silly typo, I was using uint8_t instead of int8_t. With this fixed, the reference implementation is capable of decoding the frames encoded by my encoder.

Of course a lot is missing in the encoder. First I will focus on writing the bitstream for the inter frames. So I need motion vectors to write out. First I will just take some random vectors, calculate the residue using these vectors and write both to the bitstream. That way it will become easier to test if writing out this data works, before I start working on Motion Estimation.

Decoder performance

Sunday, September 2nd, 2007

Today I built the decoder with profiling and debugging information disabled to compare the speed with that of the reference implementation. When building the reference implementation I disabled MMX, so I am just comparing C code with C code. At the end my code can also be speed up using SIMD code.

To measure the speed, I used `time’. The first video I tried is a small video of just a few seconds. Using the reference implementation:

real 0m4.029s
user 0m3.704s
sys 0m0.272s

When using my decoder:

real 0m3.754s
user 0m3.528s
sys 0m0.192s

The first video is a longer video. Using the reference implementation:

real 0m59.709s
user 0m52.447s
sys 0m5.168s

Using my decoder:

real 0m55.814s
user 0m51.327s
sys 0m2.460s

I had a look at what makes the difference. It appears that because I cache halfpel interpolated reference frames I save lots of time. This is not being done for the reference implementation. The reference implementation recalculated the interpolated frame every time.

Intra frame coding works

Saturday, August 18th, 2007

The intra frame coding of the encoder works. This doesn’t mean it is perfect. The encoder can only use the LeGall wavelet. Quantization is not being used, so the files will be rather big. Also the LeGall wavelet is not ideal for intra frames, so I will add the Deslauriers-Debuc wavelet soon. The encoder currently just does whatever is required to produce a Dirac compatible bitstream and nothing more. The results are playable with my decoder, but unfortunately not with the reference implementation. The reference implementation does not crash or so, it just outputs garbage, like you can see below.

Output of the reference implementation

Now I will work on regression tests, I will try to get something to make testing easier into subversion soon. Using this, I will fix any regressions that were introduced last week.

Encoding

Saturday, August 18th, 2007

After the decoder was finished, I got some feedback on my code and worked hard to improve my code where possible. These were mainly small changes, but a lot of small changes are still a lot of work. Because I was quite annoyed by the bad quality of the motion compensation code and how slow it is, I worked a bit on improving this. I mainly merged the MC code with the qpel/eighthpel code. This made it possible to eliminate certain checks for border conditions. It also made it possible to avoid some multiplications. This resulted in a 50%(!!) speed-up of the MC code.

In the meanwhile I worked on the encoder. First I implemented Golomb coding and arithmetic coding. After that I started working on writing the Access Unit Header. A the moment I am working on packing of coefficients. First I will just write the pixels from the frame to encode directly to the file, without doing a DWT first. This will result in big pictures that will not be usable in practice because this is not support by the standard. But this is a simple way to test my current code actually works, before I start working on the DWT algorithms. After implementing a DWT (even a simple one like Haar) will make it possible to encode files that can be played by any Dirac decoder. These files will just be very big :-)

Fixing the decoder

Tuesday, August 14th, 2007

There were some bugs in the decoder. Although the decoder worked, there were still a few bugs and regressions. These have all been fixed. One annoying bug was that the weighting code was incorrect. The main reason for this is that the specification was not correct at this point. At the end I just dumped the spatial weighting matrix from the reference decoder and figured out by hand how it should be calculated. The main problem here was related to rounding. One bug was related to calculating incorrect MC block ranges. The pseudo code in the specification was wrong, so my code was also wrong. Besides that, there were some off-by-one errors, typos, etc.

In the meanwhile I optimized my code significantly. The motion compensation code was completely restructured. This resulted in a 2-3 times performance boost of this code. The IDWT code got a 2x performance boost.

So now the video plays perfectly, the output is exactly the same as the output of the reference implementation. It is still quite slow, but certainly not as slow as it used to be. Because I restructured the code, I think MC can be optimized even further. The most time consuming code is the interpolation code, which can be easily vectorized.

I don’t think I have to say I am really happy with these results :-). Instead of working on the optimizations and vectorizing, I first will work on the encoder. It’s easy for other people to optimize my code, they have more experience. I know quite a lot about Dirac and my code so I can better start working on the encoder.

Functional Decoder

Thursday, August 9th, 2007

Finally the decoder works! Today I fixed some bugs in the interpolation code. After that the frames were played back, but not yet in the correct order. After reordering, the video plays back quite well! There are some visual distortions at the border of the screen I have to look into. There are some other distortions that are caused by the encoder. Most likely because I used an old encoder to encode this file. I have reencoded a few second of the output of my decoder to MPEG so you can have an impression of what the output of the decoder looks like at the moment. I must add to this that the decoder can not decode at full speed yet.

Dirac video reencoded to MPEG

Now I am first going to work on the visual problems and check the output of my decoder to the output of the reference implementation. After that I will update the code so it will work with Dirac files as described by the current specification (from CVS). After some more testing, I will make some obvious optimizations.

The most important optimization I want to make is in motion compensation. At the moment I am looping over all the pixels and apply the blocks to be motion compensated. This is how it was described by the specification, I wanted to have this working first, besides that I had no previous experience with motion compensation. Now I noticed that looping over the blocks is way more efficient. You can apply the same options for a big set of pixels and it is more natural. It appears that a lot of decoders work this way. I wonder what increase of performance will be.

Stupid bug

Tuesday, August 7th, 2007

When I moved the code to remove the reference frames to after the decoding process, I used a buffer to keep track of the indexes of the frames to remove in the reference frame buffer. Of course when removing one frame, the indexes of all frames after this one change. Because of this, the incorrect frame was removed. This was a silly, small and simple bug. Unfortunately it consumed quite some time to find. This bug is fixed now.

At the moment I am figuring out how to proceed with the interpolation code. The current code does play back the video, but with a lot of artifacts and sometimes the entire screen is messed up. In the meanwhile, I am trying to find a bug in dirac_get_se_golomb. For some reason, it does not return the correct sign on the PPC.

So I didn’t really do exciting things today, but not everyday can be as productive as I hoped it would be. At least some problems are now fixed and the PPC bug is localized.