Sunday, November 29, 2015

osmo-pcu and a case for Free Software

Last year Jacob and me worked on the osmo-sgsn of OpenBSC. We have improved the stability and reliability of the system and moved it to the next level. By adding the GSUP interface we are able to connect it to our commercial grade Smalltalk MAP stack and use it in the real world production GSM network. While working and manually testing this stack we have not used our osmo-pcu software but another proprietary IP based BTS, after all we didn't want to debug the PCU issues right now.

This year Jacob has taken over as a maintainer of the osmo-pcu, he started with a frequent crash fix (which was introduced due us understanding the specification on TBF re-use better but not the code), he has spent hours and hours reading the specification, studied the log output and has fixed defect after defect and then moved to features. We have tried the software at this years Camp and fixed another round of reliability issues.

Some weeks ago I noticed that the proprietary IP based BTS has been moved from the desk into the shelf. In contrast to the proprietary BTS, issues has a real possibility to be resolved. It might take a long time, it might take one paying another entity to do it but in the end your system will run better. Free Software allows you to genuinely own and use the hardware you have bought!

Sunday, June 07, 2015

Cisco probeless monitoring protocol

The Cisco probeless monitoring protocol (pmp) is a proprietary protocol used by the Cisco ITP. This protocol is used to forward M3UA/MTPL3 messages to another server. The data is being sent on port 33500 using UDP.

Previously I used okteta to study the file format and this time I used Pages and HexFiend. To understand the basic structure one needs to start somewhere. The first assumption for a telco protocol is DER or TLV encoded data. In wireshark one could already see some ascii strings and the first step is to search for a Tag (T) and a Length (L) in front of it. I didn't find a tag but the length was there. At the same time the number of octets to express the length appears to depend on the data that follows. This means the data is certainly not DER encoded. In front of a block of information i found a header that contains the command. The highest bits seems to encode C/R, there is a sequence number and something else I can't decode (doesn't look like a MAC address and not like a time).

I copied the data from HexFiend into an editor document and then used new lines and indentions to illustrate the grouping. This is how I found the "number of messages" field for the data and saw that a message has no size by itself.

After having understood the basic structure I started with a wireshark dissector is around 200 lines of code. It still needs some clean-up, better presentation of the data, checking with fuzzed data packages and then I can propose it for inclusion in wireshark.

Tuesday, February 03, 2015

Moving Jenkins jobs from multi-project to project

When starting to use Jenkins I started with a configuration of multiple nodes and most jobs are multiconfiguration jobs. As it turns out looking at the build results for these jobs is annoying and most jobs don't need to be multiconfiguration jobs at all.

Jenkins doesn't offer to re-configure the jobs and I decided to edit the config.xml file directly. The first thing I did is to change "matrix-project" to "project". The next optional thing is to remove the "axes" nodes and the last and very important bit is to remove the "executionStrategy" node. If the last thing is not done the job will not be parsable and vanish from the jobs list. After making a configuration change I used the reload configuration option to get an immediate result.

Thursday, December 25, 2014

The state of mobile telecommunication protocol design and the way ahead

I have been implementing various ETSI/3GPP specifications for more than a decade. At GMIT we provided implementation feedback for DVB-H and OMA BCAST. With the Osmocom project and Sysmocom I have several years of implementing GSM (and UTRAN) specifications on my back.

In general GSM is a great engineering project. It lead to the creation of the ETSI, they adopted the English language for their specifications and they applied the information hiding principle. The group speciale mobile managed to create well described components that communicate through fully specified interfaces. Somebody implementing a SIM card does not need to know about a VLR. Somebody implementing a VLR does not need to know about the AuC. Somebody implementing a BTS doesn't need to know about the MSC.

This summer and in December severe privacy issues on ETSI/3GPP MAP have been revealed. At the 31C3 there will be two in-depth talks about different aspects of it and some of the issues have given us a nice laugh, some the OMG feeling, some gave us pity but I am a software engineer so the question is how did we end up in this situation? ETSI/3GPP MAP was designed at the time SS7 was a walled garden and when it came to telephony one nation could trust another. This trust model fell apart with the liberation of the telephony industry but the specification was not updated. ETSI/3GPP MAP went through several phases and they have had some really bad design choices that are thankfully (or thanks to capitalism) ceased out from the network. All of the issues found and disclosed appear to be bugs in the specification and not a specific implementation. ETSI/3GPP MAP is an old protocol and while one could improve the specification to fix the protocol it is unlikely that new implementations would be rolled out.

But there is hope, there is a new protocol. The protocol was designed in a world where IP was well understood. We had big security issues, viruses, worms, targeted attacks. The basic trust model had changed to a world where one needs to protect oneself. The protocol is called DIAMETER and will power true 4G networks. It is based on the well known RADIUS protocol that many people may know from eduroam.

So do we just need to wait until most networks deploy 4G and we will be more secure against privacy disclosure and other attacks? 3GPP has even created interworking between MAP and DIAMETER. This is done by mapping one or more MAP operations to calls to DIAMETER. Wait what? How can this be possible? It is possible because the fundamental design and trust model is the same. If you tell a HLR/HSS that a subscriber is in your network then they will believe it. While RADIUS was verifying credentials in the home server, in DIAMETER it is not part of what a HSS has to do. This means a subscriber can be still be hijacked.

My understanding of DIAMETER is still very small but it is quite clear that the protocol and mindset is bug-compatible and only the encoding and number of messages has changed.

So where does it leave us? And what should we do?
  • We should have the "public" attend 3GPP meetings to push for better specifications.
  • We should try to stop the DIAMETER roll-out before we have an insecure legacy system long before the old one has been ceased out.
  • We need to push for mobile stacks that only implement L1 and have Free Software that implements the protocol (who wants to have a SIP, UDP and IPsec stack in a proprietary baseband processor?)
  • We need strong End-to-End encryption. But for that we need to be able to control more of the telephony part. Make it possible to run SIP over TLS servers that can interconnect with each other. Make sure that your Network Operator just gives you IP and you handle your telephony yourself.
  • We need to push for protocol design that limits and reduces the overhead around the IP header.
  • We need interest groups and funding that make that possible.

Saturday, September 06, 2014

Speeding up my SIP/MGCP Smalltalk Parsers

When creating the MGCP and SIP implementation I didn't want to do string splitting/scanning myself but follow the grammar of the two RFCs. I decided to use the PetitParser framework. PetitParser is a parsing
combinator. Which mostly mean you create small parsers and combine them with things like a sequence (parse this, than that and expect the input to be fully consumed).

For a long time this approach was just okay but I recently started to use the code on a under powered ARM system and the performance started to hurt. The MGCP/SIP Parser is created at runtime and then the result will be used.


Besides things being too slow it is important to know how slow it is and to see if a change has made a difference or not. GNU Smalltalk has a Time class that provides a monotonic nanosecond clock. The easiest way to benchmark is to use an approach like:

 [ | start | start := Time nanosecondClock. benchmarkCode. start - Time nanosecondClock] value.

This will give me a number. Now with a language like Smalltalk there will be extra runtime code so there will be a noticeable variance.

Improving the PetitParser GNU Smalltalk port

PetitParser needs to use backtracking to go back in the stream when one element of a choice could not be parsed. This means PetitParser will heavily exercise >>#position, >>#position: and >>#atEnd. In Pharo the implementation has a position variable but in GNU Smalltalk the implementation has a pointer that points to the next character. When a method is called in Smalltalk an activation record (stack frame, context) needs to be created. GNU Smalltalk has the optimization to detect methods that simply return a value or instance variable. In the case of >>#position and >>#position: an arithmetic operation will be executed and can not be optimized. I have decided to change the PetitParser code to use >>#pointer and >>#pointer: to avoid the extra activation cost.

Local PetitParser hacks

In my older port of PetitParser.209 the PPRepeatingParser class will store the parsed elements in an OrderedCollection and at the end convert it to an Array. My client code will simply iterate over the
result and I don't need to pay the price of the extra memory collection and copying.

With Smalltalk and open classes I can simply load my own/improved/reduced version of the >>#parseOn:
and benefit for the speed-gain in my implementation.

Speeding up parser construction

In PetitParser one defines the parser in selectors and on creating everything from the >>#start will be turned into the parser. In case some basic parsers are re-used one can create instance variables and the PetitParser baseclass will then cache the result in that variable. In the past I haven't used this mode too much but I had a lot of parts that are used more than once. This was a significant speed-up in parser construction. At least in PetitParser.209 (I think later versions had some improvements) not every caching will make things more quick. It is good to benchmark both number of created parsers and number construction speed.

Simplifying the Grammar

In both MGCP and SIP I have a SIPGrammar/MGCPGrammar class and then a subclass called SIPParser/MGCPParser that creates a structure my client code can work with. When creating the SIPGrammar/MGCPGrammar I followed the RFC BNF but e.g. for matching the possible parameters I had to create a choice parser with many many choices but all of them are in the form of "Key: Value". So instead of checking the valid keys with the Grammar, I should check the structure in the grammar and do the key validation inside the parser class.

Using PetitParser instead of following the Grammar

In the case of SIP there a lot of rules that specify where whitespace can be. This means I created PPSequenceParsers where some elements consume the space that nobody cares about it. The PPParser class already knows the >>#trim selector to deal with such things. It will automatically take away leading and trailing whitespace.

In SIP (e.g. with the challenge BNF) there is one mandatory parameter followed by optional elements separated by comas. PetitParser has a built-in way to express such things. The >>#separatedBy: selector will take a parser (e.g. one that can parse a comma) as parameter and then parse one or more occurrences.

Using PPObjectPredicateParser

In PetitParser one can write #digit asParser / #word asParser and this will either parse a single digit or a single character. In both cases a PPObjectPredicateParser with a PPCharsetPredicate (with an internal look-up table) will be created. I had many of such occurrences and could simplify the code.

Creating a custom parser

In SIP some parameters can be of the nature of key=value or key="VALUE". The later is a quoted-string that permits certain characters and certain escape sequences. The rule to parse this were nested character parsers of choices of choices. The resulting parser was very slow. A simple string to parse a nonce with a quoted-string could take 40ms. I decided to write my own SIPQuotedStringParser.


The MGCPParser construction time was in the range of 20 seconds, after the change we are in the ballpark of a second or such and the situation was similar for the SIPParser. A testcase to parse a 401 SIP message went from 200ms to 70ms. This is on a slow ARM with the plain interpreter and there should still be some room for improvements.


I think PetitParser could have some further optimizations. Instead of the PPCharSetPredicate and the PPObjectPredicateParser there could be a PPCharSetPredicateParser that avoids the call to >>#value:. and one creating two PPCharSetPredicateParsers and joining them with a PPChoiceParser one could simply join the two look-up tables. This would optimize #digit asParser / #blank asParser and save one LookupTable. The next thing would be to create sparse PPCharSetPredicate when one knows that only a subset will be filled.

In terms of GNU Smalltalk we need to port to GNU lightning 2.0 and we would gain JIT support for ARM and can take it from there as well. Another option would be to start having ByteCode to ByteCode optimizations like inlining often called methods (with and without OSR).

Last but not least we could use MrGwen's GNU lightning bindings and JIT a parser from the PetitParser representation.

Tuesday, June 17, 2014

Playing with QV4

QtDeclarative is where the fun is. Starting with Qt5.2 a JavaScript Engine written by Digia is used. Compared to JavaScriptCore and V8 this engine is very basic but tightly integrated with the QML and Quick code. Motivated by attending the Qt Developer Summit and my work on GNU Smalltalk I started to look at the VM.

There are some environment variables that help to see what the JavaScript VM is doing and also how it is doing things. The below table gives a quick overview of the available flags and what they do. Specially the usage of QV4_SHOW_IR and QV4_SHOW_ASM helps to understand what is going on.

Name Function
QV4_NO_SSA Do not convert the IR::Function to SSA representation. This disables optimizations as well.
QV4_NO_OPT Do not run the optimizer. This disables dead-code-elimination, constant propagation, copy propagation
QV4_SHOW_IR Show the Intermediate Representation at the various stages of the compilation/optimization.
QV4_SHOW_ASM Show the disassembled code. This requires QtDeclrative to be compiled with CONFIG+=disassembler and without PCH
QV4_NO_REGALLOC Do not use the linear register allocator
QV4_FORCE_INTERPRETER Do not use the JIT but force the interpreter
QV4_MM_AGGRESSIVE_GC Run the GC on every allocation
QV4_MM_STATS Print the time it took to mark and sweep

Thursday, March 27, 2014

Long Time no See (好久不見)

Almost a year ago I took my Qt/DirectFB Jenkins down. With that CI for MIPS/uclibc and DirectFB stopped being done (as far as I know). It was a difficult decision but it reflected my situation. I didn't do a Qt project for a long time and didn't have any Qt related work on the horizon.

My main work involves using Smalltalk (GNU Smalltalk and Pharo) and writing a lot of a C code for the various Osmocom/GSM related projects and the joy of integrating software to build  HW products. I didn't think I would use/write/develop using Qt anytime soon. For sentimental reasons I stayed on the qt-devel mailinglist (I still read it day to day) and followed the work done by Lars, Simon and all the others with great joy.

For an internal project of sysmocom I needed to parse, generate and stream (through HTTP) JSON content. I built a first prototype using GNU Smalltalk and the Iliad Framework. The system was quickly built and we were able to gain some experience with it. Given the amount of data we intended to pipe through the system we wanted to move to a fully compiled version (instead of spending the time tuning the VM). I decided to use the Json support of QtCore, QtNetwork for the HTTP client and planned to use libsoup as the HTTP Server. After some research I stumbled across Tufao and used that one instead and don't regret it. Thanks to QThread and queued signal and slots I was able to make use of more than one CPU core and the application was fun to develop.

I am a Unix Dinosaur so for the buildsystem I shortly considered using qmake but then ended up using autotools. Thanks to the autotroll m4 macros building Qt applications is not that bad. I decided against using cmake as it still feels backward (e.g. no config.log, most scripts don't use pkg-config but some hand rolled magic like the FindBoost thing).

We are building Debian packages using an internal OBS appliance and this way can easily update/deploy our software thanks to that. After deploying every couple of hours the application/thread would crash. The backtrace pointed to QNAM, deleteLater and QObject.. David Faure had documented and fixed various race conditions and after we upgraded our application to use Qt5.2 the crashes stopped occurring. The application was last re-started on the 5th of February and works quite reliable.

This month we started a simple REST server using Qt and Tufao. Maybe I should search for a nice Qt consumer related project too? Qt is definitely here to stay.