Analytics

Saturday, September 06, 2014

Speeding up my SIP/MGCP Smalltalk Parsers

When creating the MGCP and SIP implementation I didn't want to do string splitting/scanning myself but follow the grammar of the two RFCs. I decided to use the PetitParser framework. PetitParser is a parsing
combinator. Which mostly mean you create small parsers and combine them with things like a sequence (parse this, than that and expect the input to be fully consumed).

For a long time this approach was just okay but I recently started to use the code on a under powered ARM system and the performance started to hurt. The MGCP/SIP Parser is created at runtime and then the result will be used.

Measuring

Besides things being too slow it is important to know how slow it is and to see if a change has made a difference or not. GNU Smalltalk has a Time class that provides a monotonic nanosecond clock. The easiest way to benchmark is to use an approach like:

 [ | start | start := Time nanosecondClock. benchmarkCode. start - Time nanosecondClock] value.

This will give me a number. Now with a language like Smalltalk there will be extra runtime code so there will be a noticeable variance.

Improving the PetitParser GNU Smalltalk port

PetitParser needs to use backtracking to go back in the stream when one element of a choice could not be parsed. This means PetitParser will heavily exercise >>#position, >>#position: and >>#atEnd. In Pharo the implementation has a position variable but in GNU Smalltalk the implementation has a pointer that points to the next character. When a method is called in Smalltalk an activation record (stack frame, context) needs to be created. GNU Smalltalk has the optimization to detect methods that simply return a value or instance variable. In the case of >>#position and >>#position: an arithmetic operation will be executed and can not be optimized. I have decided to change the PetitParser code to use >>#pointer and >>#pointer: to avoid the extra activation cost.

Local PetitParser hacks

In my older port of PetitParser.209 the PPRepeatingParser class will store the parsed elements in an OrderedCollection and at the end convert it to an Array. My client code will simply iterate over the
result and I don't need to pay the price of the extra memory collection and copying.

With Smalltalk and open classes I can simply load my own/improved/reduced version of the >>#parseOn:
and benefit for the speed-gain in my implementation.

Speeding up parser construction

In PetitParser one defines the parser in selectors and on creating everything from the >>#start will be turned into the parser. In case some basic parsers are re-used one can create instance variables and the PetitParser baseclass will then cache the result in that variable. In the past I haven't used this mode too much but I had a lot of parts that are used more than once. This was a significant speed-up in parser construction. At least in PetitParser.209 (I think later versions had some improvements) not every caching will make things more quick. It is good to benchmark both number of created parsers and number construction speed.

Simplifying the Grammar

In both MGCP and SIP I have a SIPGrammar/MGCPGrammar class and then a subclass called SIPParser/MGCPParser that creates a structure my client code can work with. When creating the SIPGrammar/MGCPGrammar I followed the RFC BNF but e.g. for matching the possible parameters I had to create a choice parser with many many choices but all of them are in the form of "Key: Value". So instead of checking the valid keys with the Grammar, I should check the structure in the grammar and do the key validation inside the parser class.

Using PetitParser instead of following the Grammar

In the case of SIP there a lot of rules that specify where whitespace can be. This means I created PPSequenceParsers where some elements consume the space that nobody cares about it. The PPParser class already knows the >>#trim selector to deal with such things. It will automatically take away leading and trailing whitespace.

In SIP (e.g. with the challenge BNF) there is one mandatory parameter followed by optional elements separated by comas. PetitParser has a built-in way to express such things. The >>#separatedBy: selector will take a parser (e.g. one that can parse a comma) as parameter and then parse one or more occurrences.

Using PPObjectPredicateParser

In PetitParser one can write #digit asParser / #word asParser and this will either parse a single digit or a single character. In both cases a PPObjectPredicateParser with a PPCharsetPredicate (with an internal look-up table) will be created. I had many of such occurrences and could simplify the code.

Creating a custom parser

In SIP some parameters can be of the nature of key=value or key="VALUE". The later is a quoted-string that permits certain characters and certain escape sequences. The rule to parse this were nested character parsers of choices of choices. The resulting parser was very slow. A simple string to parse a nonce with a quoted-string could take 40ms. I decided to write my own SIPQuotedStringParser.

Summary

The MGCPParser construction time was in the range of 20 seconds, after the change we are in the ballpark of a second or such and the situation was similar for the SIPParser. A testcase to parse a 401 SIP message went from 200ms to 70ms. This is on a slow ARM with the plain interpreter and there should still be some room for improvements.

Outlook

I think PetitParser could have some further optimizations. Instead of the PPCharSetPredicate and the PPObjectPredicateParser there could be a PPCharSetPredicateParser that avoids the call to >>#value:. and one creating two PPCharSetPredicateParsers and joining them with a PPChoiceParser one could simply join the two look-up tables. This would optimize #digit asParser / #blank asParser and save one LookupTable. The next thing would be to create sparse PPCharSetPredicate when one knows that only a subset will be filled.


In terms of GNU Smalltalk we need to port to GNU lightning 2.0 and we would gain JIT support for ARM and can take it from there as well. Another option would be to start having ByteCode to ByteCode optimizations like inlining often called methods (with and without OSR).

Last but not least we could use MrGwen's GNU lightning bindings and JIT a parser from the PetitParser representation.



Tuesday, June 17, 2014

Playing with QV4

QtDeclarative is where the fun is. Starting with Qt5.2 a JavaScript Engine written by Digia is used. Compared to JavaScriptCore and V8 this engine is very basic but tightly integrated with the QML and Quick code. Motivated by attending the Qt Developer Summit and my work on GNU Smalltalk I started to look at the VM.

There are some environment variables that help to see what the JavaScript VM is doing and also how it is doing things. The below table gives a quick overview of the available flags and what they do. Specially the usage of QV4_SHOW_IR and QV4_SHOW_ASM helps to understand what is going on.

Name Function
QV4_NO_SSA Do not convert the IR::Function to SSA representation. This disables optimizations as well.
QV4_NO_OPT Do not run the optimizer. This disables dead-code-elimination, constant propagation, copy propagation
QV4_SHOW_IR Show the Intermediate Representation at the various stages of the compilation/optimization.
QV4_SHOW_ASM Show the disassembled code. This requires QtDeclrative to be compiled with CONFIG+=disassembler and without PCH
QV4_NO_REGALLOC Do not use the linear register allocator
QV4_FORCE_INTERPRETER Do not use the JIT but force the interpreter
QV4_MM_AGGRESSIVE_GC Run the GC on every allocation
QV4_MM_STATS Print the time it took to mark and sweep

Thursday, March 27, 2014

Long Time no See (好久不見)

Almost a year ago I took my Qt/DirectFB Jenkins down. With that CI for MIPS/uclibc and DirectFB stopped being done (as far as I know). It was a difficult decision but it reflected my situation. I didn't do a Qt project for a long time and didn't have any Qt related work on the horizon.

My main work involves using Smalltalk (GNU Smalltalk and Pharo) and writing a lot of a C code for the various Osmocom/GSM related projects and the joy of integrating software to build  HW products. I didn't think I would use/write/develop using Qt anytime soon. For sentimental reasons I stayed on the qt-devel mailinglist (I still read it day to day) and followed the work done by Lars, Simon and all the others with great joy.

For an internal project of sysmocom I needed to parse, generate and stream (through HTTP) JSON content. I built a first prototype using GNU Smalltalk and the Iliad Framework. The system was quickly built and we were able to gain some experience with it. Given the amount of data we intended to pipe through the system we wanted to move to a fully compiled version (instead of spending the time tuning the VM). I decided to use the Json support of QtCore, QtNetwork for the HTTP client and planned to use libsoup as the HTTP Server. After some research I stumbled across Tufao and used that one instead and don't regret it. Thanks to QThread and queued signal and slots I was able to make use of more than one CPU core and the application was fun to develop.

I am a Unix Dinosaur so for the buildsystem I shortly considered using qmake but then ended up using autotools. Thanks to the autotroll m4 macros building Qt applications is not that bad. I decided against using cmake as it still feels backward (e.g. no config.log, most scripts don't use pkg-config but some hand rolled magic like the FindBoost thing).

We are building Debian packages using an internal OBS appliance and this way can easily update/deploy our software thanks to that. After deploying every couple of hours the application/thread would crash. The backtrace pointed to QNAM, deleteLater and QObject.. David Faure had documented and fixed various race conditions and after we upgraded our application to use Qt5.2 the crashes stopped occurring. The application was last re-started on the 5th of February and works quite reliable.

This month we started a simple REST server using Qt and Tufao. Maybe I should search for a nice Qt consumer related project too? Qt is definitely here to stay.

Wednesday, December 18, 2013

Ported DBI and DBD-PostgreSQL from GNU Smalltalk to Pharo

We are building a gsmSCF in Pharo. Normally I would use MongoDB and Voyage to store my objects but as a gsmSCF deals with the balance of users I wanted to use something that supports transactions.

Looking at the SQL support in Pharo. I found SqueakDBX, DBXTalk, native Postgres drivers but all of them lacked prepared statement support and one really doesn't want to format queries by hand.

This is when I decided to port GNU Smalltalks DBI framework and the DBD-PostgreSQL driver. It is not the greatest database framework (QtSql is really close to that) but it does support prepared statements and I have running systems that use this framework.

I have used the gst-convert utility to convert the syntax from GST to FileOut syntax as understood by Pharo and converted the use of DateTime to DateAndTime, Dictionary>>#from: to Dictionary>>#newFrom: and some other incompatibilities. I have fixed some other incompatibilities by hand, e.g. >>#subString: aChar changed to >>#subString: aString and >>#nl to >>#cr. This gave me a port of ROE, DBI and DBD-PostgreSQL relatively quickly. I had some issues with Monticello not properly detecting certain extensions and I had to re-save the methodSource to fix that.

I spent the most time porting the DBD-PostgreSQL FFI code from GNU Smalltalk to NativeBoost of Pharo. The easy part could be converted easily. The literal array in Pharo is interesting and the usage in the >>#nbCallOut: is quite funny. The most difficult part was to create a char** for one call-out and type conversation. I had a de-tour using the NBExternalArrayType and then went to use NativeBoost class>>#allocate: and >>#asNBExternalString. It didn't help that when NativeBoost can't convert a parameter and there will be use-less error message. This is specially annoying when a function takes like 6 arguments or more. Anyway, I had some success yesterday night.

In GNU Smalltalk we are using namespaces so we have DBI.Connection as the entry point. In Pharo this is Connection right now but I should call it DBIConnection in the future. A small example is below and with simple tests it looks like there are no memory leaks in the code. I need to make the code work reliable across image save/restore.

 | connection statement result |
connection := Connection connect: 'dbi:PostgreSQL:dbname=aDb' user: nil password: nil.
statement := connection prepare: 'SELECT * from table WHERE col > $1 AND type = $3'.
result := statement executeWithAll: #(1 'abc').
[result atEnd] whileFalse: [
   | row |
   row := result next.
   row at: 'columnName'.
].

Tuesday, June 25, 2013

Using GNU autotest for running unit tests

This is part of a series of blog posts about testing inside the OpenBSC/Osmocom project. In this post I am focusing on our usage of GNU autotest.

The GNU autoconf ships with a not well known piece of software. It is called GNU autotest and we will focus about it in this blog post.

GNU autotest is a very simple framework/test runner. One needs to define a testsuite and this testsuite will launch test applications and record the exit code, stdout and stderr of the test application. It can diff the output with expected one and fail if it is not matching. Like any of the GNU autotools a log file is kept about the execution of each test. This tool can be nicely integrated with automake's make check and make distcheck. This will execute the testsuite and in case of a test failure fail the build.

The way we use it is also quite simple as well. We create a simple application inside the test/testname directory and most of the time just capture the output on stdout. Currently no unit-testing framework is used, instead a simple application is built that is mostly using OSMO_ASSERT to assert the expectations. In case of a failure the application will abort and print a backtrace. This means that in case of a failure the stdout will not not be as expected and the exit code will be wrong as well and the testcase will be marked as FAILED.

The following will go through the details of enabling autotest in a project.

Enabling GNU autotest

The configure.ac file needs to get a line like this: AC_CONFIG_TESTDIR(tests). It needs to be put after the AC_INIT and AM_INIT_AUTOMAKE directives and make sure AC_OUTPUT lists tests/atlocal


Integrating with the automake

The next thing is to define a testsuite inside the tests/Makefile.am. This is some boilerplate code that creates the testsuite and makes sure it is invoked as part of the build process.


 # The `:;' works around a Bash 3.2 bug when the output is not writeable.  
 $(srcdir)/package.m4: $(top_srcdir)/configure.ac  
  :;{ \  
         echo '# Signature of the current package.' && \  
         echo 'm4_define([AT_PACKAGE_NAME],' && \  
         echo ' [$(PACKAGE_NAME)])' &&; \  
         echo 'm4_define([AT_PACKAGE_TARNAME],' && \  
         echo ' [$(PACKAGE_TARNAME)])' && \  
         echo 'm4_define([AT_PACKAGE_VERSION],' && \  
         echo ' [$(PACKAGE_VERSION)])' && \  
         echo 'm4_define([AT_PACKAGE_STRING],' && \  
         echo ' [$(PACKAGE_STRING)])' && \  
         echo 'm4_define([AT_PACKAGE_BUGREPORT],' && \  
         echo ' [$(PACKAGE_BUGREPORT)])'; \  
         echo 'm4_define([AT_PACKAGE_URL],' && \  
         echo ' [$(PACKAGE_URL)])'; \  
        } &>'$(srcdir)/package.m4'  
 EXTRA_DIST = testsuite.at $(srcdir)/package.m4 $(TESTSUITE)  
 TESTSUITE = $(srcdir)/testsuite  
 DISTCLEANFILES = atconfig  
 check-local: atconfig $(TESTSUITE)  
  $(SHELL) '$(TESTSUITE)' $(TESTSUITEFLAGS)  
 installcheck-local: atconfig $(TESTSUITE)  
  $(SHELL) '$(TESTSUITE)' AUTOTEST_PATH='$(bindir)' \  
  $(TESTSUITEFLAGS)  
 clean-local:  
  test ! -f '$(TESTSUITE)' || \  
  $(SHELL) '$(TESTSUITE)' --clean  
 AUTOM4TE = $(SHELL) $(top_srcdir)/missing --run autom4te  
 AUTOTEST = $(AUTOM4TE) --language=autotest  
 $(TESTSUITE): $(srcdir)/testsuite.at $(srcdir)/package.m4  
  $(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at  
  mv $@.tmp $@  


Defining a testsuite

The next part is to define which tests will be executed. One needs to create a testsuite.at file with content like the one below:


 AT_INIT  
 AT_BANNER([Regression tests.])  
 AT_SETUP([gsm0408])  
 AT_KEYWORDS([gsm0408])  
 cat $abs_srcdir/gsm0408/gsm0408_test.ok > expout  
 AT_CHECK([$abs_top_builddir/tests/gsm0408/gsm0408_test], [], [expout], [ignore])  
 AT_CLEANUP  

This will initialize the testsuite, create a banner. The lines between AT_SETUP and AT_CLEANUP represent one testcase. In there we are copying the expected output from the source directory into a file called expout and then inside the AT_CHECK directive we specify what to execute and what to do with the output.

Executing a testsuite and dealing with failure

The testsuite will be automatically executed as part of make check and make distcheck. It can also be manually executed by entering the test directory and executing the following.


 $ make testsuite  
 make: `testsuite' is up to date.  
 $ ./testsuite  
 ## ---------------------------------- ##  
 ## openbsc 0.13.0.60-1249 test suite. ##  
 ## ---------------------------------- ##  
 Regression tests.  
  1: gsm0408                     ok  
  2: db                       ok  
  3: channel                     ok  
  4: mgcp                      ok  
  5: gprs                      ok  
  6: bsc-nat                     ok  
  7: bsc-nat-trie                  ok  
  8: si                       ok  
  9: abis                      ok  
 ## ------------- ##  
 ## Test results. ##  
 ## ------------- ##  
 All 9 tests were successful.  

In case of a failure the following information will be printed and can be inspected to understand why things went wrong.


  ...  
  2: db                       FAILED (testsuite.at:13)  
 ...  
 ## ------------- ##  
 ## Test results. ##  
 ## ------------- ##  
 ERROR: All 9 tests were run,  
 1 failed unexpectedly.  
 ## -------------------------- ##  
 ## testsuite.log was created. ##  
 ## -------------------------- ##  
 Please send `tests/testsuite.log' and all information you think might help:  
   To: 
   Subject: [openbsc 0.13.0.60-1249] testsuite: 2 failed  
 You may investigate any problem if you feel able to do so, in which  
 case the test suite provides a good starting point. Its output may  
 be found below `tests/testsuite.dir'.  

You can go to tests/testsuite.dir and have a look at the failing tests. For each failing test there will be one directory that contains a log file about the run and the output of the application. We are using GNU autotest in libosmocore, libosmo-abis, libosmo-sccp, OpenBSC, osmo-bts and cellmgr_ng.

Friday, May 31, 2013

Don't use Hubstaff, if not for the ethical issues, for the security issues

I was recently pointed to a software called Hubstaff. It is meant for virtual companies that do not trust their employees and want to see what their staff is doing. Two central features are to measure the activity and to regularly send screenshots.

Hubstaff does not respect copyright! Their software is installing various GNU GPL and GNU LGPL licensed libraries without respecting the license of these libraries. The sourcecode was not offered at the same place and I didn't see a written offer.

Hubstaff is using HTTP (not encrypted) for all the traffic! The application is sending the login and password in clear text. If you use Hubstaff at the airport, at a local coffee place, in a shared office, everyone can see your password and take over your account. The activities, notes and screenshots are transferred in clear as well. This means that everybody can look at potential confidential information sent from your employee to the Hubstaff server.

To make it worse, it appears trivial that your employees will send you wrong activity information and screenshots. In general this is a game your employees will win but Hubstaff gives a huge head start to your employees.

In short Hubstaff does not respect copyright law, they don't value your data and they have not the slightest clue about security/privacy. No sane business will trust them.


Update: Hubstaff claims that starting from version 0.8.0 SSL is used for the connection to their server, they also claim that their GPL violations have been addressed. I have not verified those claims.

Monday, May 06, 2013

Spring cleaning of the qt-jenkins.moiji-mobile.com

About a week ago I asked for help/support on the Continuous Integration of Qt for the combination of Linux/MIPS/UCLIBC/DirectFB and due the lack of feedback I have removed the DNS entry and cleaned up my system.

This means that currently there is no (public) build testing for any of DirectFB, MIPS, UCLIBC and the bitrot will increase over time. I have asked on the mailinglist about the removal of the Broadcom 97425 device support as it likely to be unused now.