This week I had to make parts of OpenBSC work with TCH/H and use AMR. This work is needed for On Waves and when I say parts I mean the strict BSC subset of OpenBSC (in contrast to make the MSC code we have work as well).
The first part is to make TCH/H work and that was easy as LaF0rge did almost everything to make it work. You have to change the OpenBSC configuration to use TCH/H instead of TCH/F for the given timeslots. The next thing was to make channel assignment work. The Mobile Station (MS) comes on the Random Access Channel (RACH) and is asking for a channel and gives a random number (so it can identify the response). Now depending on a global indicator (NECI) the MS will ask for different channels.
So the next step was to add a NECI configuration to our VTY configuration code and then change the code that decodes the channel request to know about the NECI and pick the right channel. On top of that a small hack to assign a TCH/H in case of a MS requesting "any" channel as part of paging.
Now that TCH/H should work one has to focus on the speech. GSM 08.08 and GSM 04.08 have different enums for speech. GSM 08.08 is differenting speech version 1,2,3 for full and half rate totalling in six different values, for GSM 04.08 there is a TCH mode that includes speech version 1,2,3, various data modes and signalling (but no differentation full/half rate channel). After getting this right and selecting speech version 3 it still didn't work. It turned out that one has to fillout the optional Multirate Configuration when one is using speech version 3. This multi rate configuration needs to be present in the GSM 04.08 RR Assignment Command, Modify Channel but also in the RSL messages for Modify Request and Channel Assignment.
After this AMR on a TCH/H should work (when the BTS is supporting it too). The next step for someone else is to make the MSC code in OpenBSC work with TCH/H and other audio codecs. This would require to stop to ask for a TCH/F, change the channel requested decoding again..
Thursday, November 19, 2009
Wednesday, November 18, 2009
Visiting On Waves in Iceland
Currently I'm sitting in the nice offices of On Waves and when not trying to convince the embassy of India to give me a visa I'm working on OpenBSC. For this week I try to make call handling with the MSC rock solid.
So far I have fixed some bugs, added features to OpenBSC, enabled A5/1 encryption, started using TCH/H, started using AMR, fixed bringup on nanoBTS coldstart and now I'm working on the MGCP to verify that I can hear audio for my calls.
So far I have fixed some bugs, added features to OpenBSC, enabled A5/1 encryption, started using TCH/H, started using AMR, fixed bringup on nanoBTS coldstart and now I'm working on the MGCP to verify that I can hear audio for my calls.
Saturday, November 07, 2009
The benefit of using GCC NEON intrinsics
I'm currently writing NEON code for the Qt PorterDuff SourceOver implementation. At the beginning one has to make the decision to use inline assembly, a seperate .S file or the ARM NEON Intrinsics.
I have chosen to go with the ARM NEON Intrinsics embedded into C++ code for a couple of simple reasons. At first it is portable across gcc and RVCT doing a .S or inline assembly would not work for RVCT that is used by the Symbian people. The second reason is that I get type safety. The NEON registers can be seen as 8bit, 16bit, 32bit, 64bit signed/unsigned registers when doing low level assembly you might pick the wrong operation and it is hard to see, with using the intrinsics you get a compiler warning about your mistake. One downside is that with some easy things I can make my compiler abort with an internal compiler error... but this will change over time.
Next is the myth that GCC is crap and that the instrinsics are badly "scheduled". From my looking at the assembly code it is mostly arranged like I wanted it to be. On a simple operation GCC was putting a LDR in the code right inbetween neon load and stores operations, with a simple change in the code this LDR was gone and I should not see any of the described hazards.
Now my ARM NEON code is slower than the C code (that is using tricks) but that is entirely my fault and I have some things I can try to make it faster. And to be more specific the ARM NEON code is four frames faster than the old C code (that was not using tricks).
I have chosen to go with the ARM NEON Intrinsics embedded into C++ code for a couple of simple reasons. At first it is portable across gcc and RVCT doing a .S or inline assembly would not work for RVCT that is used by the Symbian people. The second reason is that I get type safety. The NEON registers can be seen as 8bit, 16bit, 32bit, 64bit signed/unsigned registers when doing low level assembly you might pick the wrong operation and it is hard to see, with using the intrinsics you get a compiler warning about your mistake. One downside is that with some easy things I can make my compiler abort with an internal compiler error... but this will change over time.
Next is the myth that GCC is crap and that the instrinsics are badly "scheduled". From my looking at the assembly code it is mostly arranged like I wanted it to be. On a simple operation GCC was putting a LDR in the code right inbetween neon load and stores operations, with a simple change in the code this LDR was gone and I should not see any of the described hazards.
Now my ARM NEON code is slower than the C code (that is using tricks) but that is entirely my fault and I have some things I can try to make it faster. And to be more specific the ARM NEON code is four frames faster than the old C code (that was not using tricks).
Collecting hints to increase performance in Qt (and apps)
I'm working part time on improving the performance of QtWebKit (memory usage and raw speed) and I have created some tools to create an offline copy of a number of webpages (gmail, yaho mail, google, news sites...).
Using these sites I have created special purpose benchmark reductions. E.g. only do the image operations we do while loading, while loading an painting, load all network resources. One thing I have noticed is that with a couple of small things one can achieve a stable and noticable speedup. These include not calling QImage::scanLine from within a loop, avoid using QByteArray::toLower or not use QByteArray::append(char) from a loop without a QByteArray::reserve.
I have created a small guide to Qt Performance, I will keep it updated and would like to hear more small hints that can be used to improve things. If it makes sense I can migrate it to the techbase as well.
Using these sites I have created special purpose benchmark reductions. E.g. only do the image operations we do while loading, while loading an painting, load all network resources. One thing I have noticed is that with a couple of small things one can achieve a stable and noticable speedup. These include not calling QImage::scanLine from within a loop, avoid using QByteArray::toLower or not use QByteArray::append(char) from a loop without a QByteArray::reserve.
I have created a small guide to Qt Performance, I will keep it updated and would like to hear more small hints that can be used to improve things. If it makes sense I can migrate it to the techbase as well.
Tuesday, November 03, 2009
Painting on ARM
I'm currently work on making QtWebKit faster on ARM (hopefully later MIPS hardware) and in my current sprint I'm focused on the painting speed. Thanks to Samuel Rødal my work is more easy than before. He added a new paintengine and graphicssystem that allows to trace the painting done with QPainter and then later replay that. Some of you might feel reminded of Carl Worth's post that mostly did the same for cairo.
How to make painting faster? The Setup
What did I do so far?
Most samples are recorded in the comp_func_SourceOver routine. With some searching in the MMX optimized routines and talking to the rasterman I'm doing the following things to improve things on the const_alpha=255 path. In the qttracereplay I go from about 17.4 fps to around 26 fps on my beagleboard with Qt Embedded Linux on the plain OMAP3 fb but I still need to do a more careful visual inspection of the result.
I will have to clean all this up, merge it with the symbian optimized copies (which sometimes require armv6 or later)... I will probably look at BYTE_MUL now and see if I can make it faster without taking a armv6 or later instruction... or honestly first understand how the current BYTE_MUL is working...
How to make painting faster? The Setup
- Record a paint trace of your favorite app with tst_cycler -graphicssystem trace, do the rendering and on exit the trace will be generated
- Use qttracereplay to replay the trace on your hardware (I had some issues on my target hardware though)
- Use OProfile to look where the time is spent and do something about it...
- Change code go back to qttracereplay..
What did I do so far?
Most samples are recorded in the comp_func_SourceOver routine. With some searching in the MMX optimized routines and talking to the rasterman I'm doing the following things to improve things on the const_alpha=255 path. In the qttracereplay I go from about 17.4 fps to around 26 fps on my beagleboard with Qt Embedded Linux on the plain OMAP3 fb but I still need to do a more careful visual inspection of the result.
- Handle alpha=0x00 on the source special by not doing anything
- Handle alpha=0xff on the source special by simply copying it to the dest
- Unroll the above block eight times interleaved with preloads...
I will have to clean all this up, merge it with the symbian optimized copies (which sometimes require armv6 or later)... I will probably look at BYTE_MUL now and see if I can make it faster without taking a armv6 or later instruction... or honestly first understand how the current BYTE_MUL is working...
Subscribe to:
Posts (Atom)