The idea on how to test it and manage quality. The main strategy is to track difference and notice early when things break, notice them before a user would have a chanche to notice. The most common issues are missing source, failed configure, failed builds or wrongly packaged.
- Test if this application/image in this configuration with that toolchain properly compiles and links. We can track binary size differences, the build times and monitor differences.
- Test the packaging. Test if some files are in a package where they shouldn't be. This applies to .so files or .debug directories. Also test .la and .pc files if they point to the right directories. Check if the architecture of the file is compatible with the TARGET.
- Test -native packages after we have compiled them and before we stage them. Run the tests suits of these applications and refuse to work when they are broken. This is tricky as there are sometimes known issues, e.g. with gcc where a lot of tests will fail but are known to fail. So we can only fail if we really have an unexpected failure.
- The real nice thing to test is if this application actually works. It may not work because this is a known software issue or we have done something wrong or the application makes bad assumptions on the target architecture. But we need test cases for these things. Many of the GNU utilities have something to test them, these testcases must be able to be packaged and execute on a phone using the installed software. This is actually tricky as well. If we manage to get the testcase to execute on the target platform we still need to get the results back. With DogTails
- Track test coverage of the software we build. Always report how many lines are untested.
The architecture to solve some of these issues consists out of patches to our metadata and bbclasses. E.g. the "Cross Compile Badness" check which looks at some known wrong include directories when cross compiling and aborts. This way applications will fail when they have -I/usr/include in their INCLUDEPATH. Packagers will be made aware of this issue when they create the package and add it to OpenEmbedded. Also with the insane BitBake class we will be able to see the following issues: insecure RPATH in the binaries, wrongly packaged files, packages where ARCH and content are different, wrong depends. Together with BitTest we will be able to see undocumented keys, missing sources, wrong RDEPENDS for -native packages. I think we slowly get to the point where we have the tools to secure the quality of our OpenEmbedded product.
On the server side we are currently lagging behind. We have the crappy tinderbox3 from the Mozilla Foundation but I have started writing a new one using python and Django. This setup will help us to relate Bugtracker, IRC, Build- and Testresults. You will be able to do queries like. Show me everything about the package gimp-foo. You will be able to see test results, compilations, you will be able to filter for weeks/month/architecture.
But why is this important? A complete setup will allow us to spot regressions fast, inform the developers immediately, e.g. through RSS feeds or IRC. It will allow to see which software works in which configurations so it will be possible to declare known good configurations. E.g. compile servers could make their images available which get regularily booted and ran on QEMU devices for ARM, MIPS, PowerPC and other real devices which send back reports to the system, e.g. including coverage data and memory footprints. So we will always know that the applications compiled and ran. We only need the discipline to add tests after having found issues and encourage the projects we compile to create test suites themselves.
Oh and finally it is fun to create and setup such a system. The current status is okay as well. We have a Bonsai system which relates to Bugtrackers. Going to here you will be able to do simple queries. If a string like fix #123 is found you will be redirected to the bug tracker. This bonsai will be used to relate build results to changes in the SCM.
Imagine we track all the SCMs we build against. Imagine you build the gimp and the memory footprint during the tests increased by two megabytes and you want to track it down. You will be able to see all the changes made to the dependencies of the gimp between this build and the last known one. You should be able to find the source of this issue quickly.
For the tinderbox itself I have written a first model which I will try to fill with some data to find out how it performs. Actually I'm really excited about how well these things can be integrated.