rsc's Diary: ELC-E 2019 - Day 2
The 2nd day at ELC-E started again with lots of keynotes, so I took the time to meet people at the sponsor showcase area. It's impressive to see that more and more of our industry customers come to these community conferences as well! A better integration of corporate developers with the Embedded Linux community is definitely a good move.
We Need to Talk about systemd
My first talk of the day was Chris Simmonds talking about boot time optimization for systemd. Chris talked about boot time in the past, but that time he focused on the bootloader and the kernel. This time he had a look at systemd ("it is not just an init daemon - it is a way of life") and started explaining how systemd works, what its units, services and targets are and how to interact with them.
Boot time optimization means to optimize the time from power-on to the critical application. You basically do that by making the config of a generic system less generic: leave out tasks that you don't need, change the order of tasks. A useful tool to analyze what's going on is systemd-analyze, which can be used to find out the order of start-up time and print a tree of the time-critical chain from startup to the final target.
The example system he optimized was a PocketBeagle running Debian Stretch: unoptimized, the system took 1m6.597s to boot up. Using "systemd-analyze critical-chain" he found out that one of the biggest delays came from a service waiting for a non-existing serial device, which took 47s. Changing from graphical to multiuser target and removing other services that didn't make sense brought the time down to 29.255s, which was already a reduction by 35s.
Chris outlined that the watchdog is another interesting feature for embedded usecases (a feature my colleague Michael Olbrich implemented some years ago at Pengutronix). Being able to resource limit applications via systemd is also pretty useful for keeping system stability up.
While Chris definitely does an excellent job when it comes to introduction level talks, my impression is that boot time optimization actually starts where his talk ended, so it definitely stayed a little bit behind my expectations.
SocketCAN and J1939
Marc Kleine-Budde and Ojeksij Rempel from Pengutronix Kernel Team recently brought J1939 into the Linux kernel - a protocol based on CAN which is used to control heavy duty vehicles. Marc started with a little overview of the kernel's CAN support, which is part of the network stack. Before SocketCAN was there, people had lots of different CAN interfaces, so several years ago, SocketCAN was started to bring a unified CAN layer into the kernel.
While the same technology is also used on marine and military applications, our focus is mainly on agricultural applications (J1939 and ISOBUS). J1939 offers an addressing scheme via "PGNs" and several other mechanisms useful for transporting engine status information. As CAN has only a message size of 8 bytes, J1939 offers a way of doing higher level things, transporting bigger messages (up to 112 MiB with the Extended Transport Protocol).
The implementation in the kernel was challenging: before we started, the situation was similar to when SocketCAN was started back in 2006: there were several incompatible userspace implementations. The userspace implementations had a lot of overhead, leading to lots of problems especially with low end embedded ARM processors. There were all kind of different approaches, from multi process daemons, libraries up to stacks integrated into the applications with all of the usual maintenance and overhead issues. The most often seen case have been all-in-one applications, which have an insufficient separation of infrastructure and application parts.
After giving an overview, Oleksij gave an insight into how to write your J1939 code in a C userspace application and had a short demo that transferred a 100 kB file over J1939.
The code is already accepted for the 5.4 mainline kernel in the meantime, and the kernel does also contain some documentation. However, some challenges stay, like for instance the export of the address claiming cache to the userspace, a method to handle quirky busses and of course a lot of test automation.
GARDENA IoT Gateway
Andreas Müller and Reto Schneider gave a war story of how they established an open source setup inside a gardening company: when they started, Gardena didn't have any experience with electronics and software at all. Products were developed, and when they were finished, they were finished. In that situation, they started to develop a gateway to network their water distribution devices; their challenges were to do that with a distributed part time team, with a short timeline of one year from project start to sale.
Their gateway is based on a Mediatek MT8688 SoC, with 580 MHz MIPS, integrated WiFi, 128 MB DDR2 RAM and 8 MB NOR and 128 MB SPI NAND Flash, with a sub-GHz transceiver to the devices. To update the system, they have split the available space into two slots plus an overlay.
When they started, it happened that someone wanted to get GPL software for their previous-generation software and failed; then they started talking to their management; when McHardy started to sue people out there, the information helped them a lot to speed up proper compliance processes in the company.
Being open source guys themselves, they started to establish a real open source policy at Gardena, in the sense that the customers really own the devices they buy. Before they came into the company, they had a more closed policy, but it didn't really help, because people just reverse engineered the devices at that time. There was a long journey until they convinced everyone in the company to for example make it possible to give customers open access to the root console on the local serial port.
Regarding mainline status, there was no real support for their device in U-Boot and Linux. They hired experienced upstream developers to write proper U-Boot support and to add the missing drivers to the Linux kernel (such as SPI NAND); then they had to hire the mt76 WiFi maintainer as well (due to bad test results), but the issues were solved more or less over a weekend. Finally at the beginning of the gardening season 2019, they had 4.19 LTS support. Moving to that kernel did also solve some hard NAND/UBI issues...
Now in October 2019, they are more or less upstream in U-Boot and Linux and just have a handful of patches left.
Analyzing the project from today's perspective, they found out that the mainlining cost was less than 10% of the project budget, while greatly reducing the risk. "Being faster" and "lower risk" were also very convincing arguments for keeping the management on track :-)
RAUC - Behind the Scenes of an Update Framework
Enrico Jörns started his RAUC talk about field upgrading with an overview about the common setups: typically, our customers at Pengutronix have a set of embedded devices which either have to be updated over a network or even manually by USB.
One challenge when updating devices is fail safety: embedded devices don't have any kind of operator, so after an update, a system usually is either fully updated or fully bricked, and there is nothing in between. To avoid bricking of the device, RAUC makes use of redundancy.
RAUC is an image based update system. It started back in 2015 when we noticed at Pengutronix that each customer started to reinvent updating over and over again, so we had the impression that that there really was a demand for a framework. Two days before, we have released version 1.2, and the project attracted a community of about 50 contributors so far.
The standard case for RAUC is an A+B setup, but for resource constrained devices there is also A+recovery. RAUC was designed with security in mind, so all update bundles are signed. It consists of a host tool (for creating bundles and signing it) and a device side for actually installing update bundles into the device. There is a D-Bus interface to communicate with RAUC and to integrate it into applications, and also an command line interface.
RAUC has a semantics of "update slots", which is everything you can put an update into. Slots can not only contain root filesystems, but also for example bootloaders. Images specify what kind of slots they are made for. The bundles itself are SquashFS, so they are compressed and you can mount them. You can append information, and we append X.509 (CMS) signatures. For actually writing an update into a slot, RAUC has update handlers to deal with the different types of slots. There needs to be a certain common understanding between RAUC and the bootloader in order to mark a slot good or bad. Interaction with the bootloader currently works with Barebox, U-Boot, Grub, UEFI and even custom bootloaders.
Signing bundles and having them checked on the device is a complex story of its own: RAUC supports even corner cases like resigning of singed bundles (in case keys reach their lifetime), intermediate certificates in bundles, multiple signers and hardware security modules (via PKCS#11).
Finally, RAUC has quite many customisation options to support even exotic special update cases. In order to make sure that only the "right" firmware is updated into the system, it is possible to use RAUC with verified boot, with dm-verity and dm-integrity.
A critical operation is updating the bootloader; with RAUC, we currently support two variants of updating the bootloader in an atomic way, one on eMMCs and one specifically on NAND on i.MX6.
As a server component, hawkbit can be used to provide bundles to an embedded device, which will be shown later today at the technical showcase.
Arnaud Pouliquen then talked about RPMsg to accelerate transition between multi-SoC and multi-processor SoC solutions. Arnaud works on STM32MP1, especially on audio and the coprocessor subsystem. The topic caught my interest, as we are working on MP1 as well recently, and the coprocessor subsystem offers interesting new options to offload time sensitive tasks.
While previous generations of hardware had different CPUs communicate over serial links, I2C or SPI, the MP1 team is using VirtIO, RPMsg and the OpenAMP framework to talk to the in-SoC coprocessor. On the coprocessor side, one can use proprietary code as well as open source operating systems. The team has implemented rpmsg_tty, rpmsg_i2c and rpmsg_spi drivers which tunnel the respective interfaces via rpmsg into the Linux system on the Cortex-A inside MP1. On Linux, the RPMsg serdev drivers are already finished, but I2C/SPI are still missing. The plan is to upstream the Linux RPMsg client drivers and serial drivers next and support the coprocessor side in OpenAMP. In addition, they want to support the virtual serial drivers in Zephyr and MBed.
Customize Real-Time Linux for Rocket Flight Control System
In the last talk of the day, George Kang explained how they are using Linux for flight control for the guidance/navigation/control units in their rockets.
The controller they built to control the rocket is using an AM43xx CPU and PREEMPT_RT, and the actual sensor subsystem is operated by a set of redundant PRUs, each connected to a separate sensor. Data processed by the PRUs is stored in a piece of shared memory mmapped to the userspace.
The computing model does the sensor flaw reduction, transforms coordinates and takes the measurements of different sensors into account in order to increase accuracy. Based on the processed sensor data and guidance data, the rocket status is manipulated by sending control commands.
They use NASA's core flight system (cFS), which is used as a base for their control core; it is released as open source and contains an OS abstraction layer, a core flight executive unit (cFE) and is based on POSIX. cFS takes care of all the memory handling and inter process communication (based on message queues) in a realtime capable way.
Part of their control system is precise time synchronisation, both to trac mission elapsed time (MET) and to correlate to ground epoch (STCF).
Their realtime I/O bus is EtherCAT, with a cycle time of < 100 µs and a jitter of < 1 µs, using IgH's Etherlab master library and SDO/PDO services to communicate process data.
The whole system cannot only operate on a real rocket, but also be simulated, using hardware-in-the-loop, software-in-the-loop and process-in-the-loop techniques.
The actual rocket they are building will be started in 2021, so we are looking forward to how Linux will work in space!
The day ended with the Technical Showcase, and we had a desk were we demonstrated some of our community projects (RAUC & Etnaviv). Like every year, it was a good opportunity to get in touch with the attendees and talk about all kinds of interesting topics!
In ein paar Stunden beginnt die 18. FrOSCon an der Hochschule Bonn-Rhein-Sieg. Pengutronix ist wieder mit einem kleinen Team vor Ort. An einem der Partner-Stände zeigen wir wieder einige unserer Aktivitäten in der Open Source Community. Dafür bringen wir unseren labgrid Demonstrator und die FPGA Demo mit.
Django ist Pengutronix' Framework der Wahl für Software zur Abwicklung unserer Geschäftsprozesse. Diese internen Werkzeuge bieten zudem auch immer die Gelegenheit neuere Entwicklungen im Django-Universum auszuprobieren.
Nach der Corona-Pause finden am 11.+12.03.2023 die Chemnitzer Linux-Tage in diesem Jahr wieder vor Ort statt, und das Pengutronix Team ist mit acht (!) Vorträgen im Programm vertreten.