This is a collection of tips and tricks, useful information, for anyone who wants to pick up kernel and/or bringup work, especially in Chrome OS. This is not intended as a detailed guide for any of the items, but as general hints that these tools are available.
So the first step is to figure out what are the problems:
Look at kernel messages (dmesg
), on boot, or at any time, and find errors or warnings that should not be there. e.g. what does dmesg -w
give.
Look at /var/log/messages
(contains kernel logs from dmesg
, as well as logs from most system services).
Look at top
output, to check if certain processes are hogging CPU or memory.
Build the kernel with USE=kasan
. KASan is a great tool to find memory issues in the kernel (use it with the other tests below).
Other debugging kernel options:
USE=ubsan
USE=lockdep
USE=kmemleak
USE=failslab
. Then configure in /sys/kernel/debug/failslab
(setting probability
to 10
and times
to 1000
is a good start).FAIL_MMC_REQUEST
CONFIG_FAIL_MMC_REQUEST
, CONFIG_FAULT_INJECTION
and CONFIG_FAULT_INJECTION_DEBUG_FS
, and then configure in /sys/kernel/debug/mmc{n}/fail_mmc_request/
.Stress tests:
Single iteration suspend test: powerd_dbus_suspend
Multi-iteration suspend test: suspend_stress_test
Reboot loops, keeping ramoops at each reboot to analyse failures (setup SSH keys first):
#!/bin/bash IP=$1 i=0 while true; do while ! scp root@$IP:/sys/fs/pstore/console-ramoops-0 ramoops-$i; do sleep 1 done ssh root@$IP reboot sleep 20 i=$((i+1)) done
Then run this to extract out the bad ramoops
mkdir bad; ls ramoops-* | xargs -I{} sh -c \ 'tail -n 1 {} | grep -v "reboot: Restarting system" && cp {} bad/{}'
restart ui
in a loop.
Run tests (autotests, CTS, etc.)
Stress test cpufreq
by changing frequency constantly
Balloons (from crbug.com/468342, or src/platform/microbenchmarks/mmm_donut.py
Unbind/rebind drivers (may be nice with kasan
/kmemleak
, too):
cd /usr/local find /sys/bus/\*/drivers/\*/\* -type l -maxdepth 0 | grep -v "module$" > list sync cat list | xargs -I{} sh -c 'echo {}; cd \`dirname {}\`; echo \`basename {}\` > unbind; echo \`basename {}\` > bind; sleep 5' # see what crashes, edit list to remove bad drivers, continue
dev_[info/warn/err]
or pr_[info/warn/err]
), reboot, see what happens.dev_dbg/pr_dbg
in the kernel code can be enabled by setting #define DEBUG
at the top of the source file (before all includes).dump_stack calls
in places may also be very useful.Sometimes adding too many printk changes behaviour (Heisenbug), or makes the system unusable.
ftrace
, see below.ratelimit
to minimize the number of prints. See example CL.BUG/WARN and friends provide nice backtraces. These can be very useful for figuring out what code path is triggering a hard to reproduce issue.
For 4.19 kernel, update as needed:
../third_party/kernel/v4.19/scripts/decode_stacktrace.sh \ /build/kukui/usr/lib/debug/boot/vmlinux \ /mnt/host/source/src/third_party/kernel/v4.19 \ /build/kukui/usr/lib/debug/lib/modules/4.19.*/
Sometimes gdb is more useful (aarch64, update as needed):
aarch64-cros-linux-gnu-gdb /build/kukui/usr/lib/debug/boot/vmlinux disas /m function
ftrace allows you to trace events in the kernel (e.g. function calls, graphs), without introducing too much overhead. This is especially useful to debug timing/performance issues, or for cases when adding printk changes the behaviour.
It is possible to add custom messages by using trace_printk
.
Example, to trace functions starting with rt5667
and mtk_spi
:
cd /sys/kernel/debug/tracing echo "rt5677*" > set_ftrace_filter echo "mtk_spi_*" >> set_ftrace_filter echo function > current_tracer echo 1 > tracing_on # Look at the trace cat trace
trace-cmd
, available on test images, provides a nice frontend to the tracing infrstructure. With trace-cmd
, the above becomes:
# 'record' configures ftrace and writes to trace-cmd.dat (default file). trace-cmd record -p function -l 'rt5677*' -l 'mtk_spi_*' # Hit Ctrl^C to stop recording # 'report' formats trace-cmd.dat, dumps to stdout. trace-cmd report
See the trace-cmd man pages or the LWN trace-cmd HOWTO for more info.
Other tricks:
It is also possible to start tracing on boot by adding kernel parameters (useful to debug early hangs).
It is possible to ask the kernel to dump the ftrace buffer to uart on oops, this is useful to debug hangs/crashes:
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops.
Dumping the whole buffer may take an enormous amount of time at serial rate, but sometimes it's worth it.
(chroot) emerge-kukui chromeos-kernel-4_19 && ./update_kernel.sh --remote=$IP --remote_bootargs
cros_workon_make
is faster than emerge if you just want to do a build test.--install
though if you want to deploy the resulting kernel (and in that case emerge is equally fast).One issue is often to figure out how to recover if you flash a bad kernel. Booting from USB and running chromeos-install
is one solution, but that's slow. There are a couple approaches that can be useful to recover quickly.
Always have a good USB stick connected to the device.
Make sure you use a serial-enabled coreboot firmware.
If the kernel on internal storage does not boot anymore:
Boot from USB (slam Ctrl-U during FW bootup)
Copy kernel and modules back to internal storage (instructions below assume eMMC)
dd if=/dev/sda2 of=/dev/mmcblk0p2 mkdir /tmp/mnt mount /dev/mmcblk0p3 /tmp/mnt rm -rf /tmp/mnt/lib/modules/4.1* cp -a /lib/modules/4.1*/tmp/mnt/lib/modules/ dd if=/dev/sda2 of=/dev/mmcblk0p4 umount /tmp/mnt mount /dev/mmcblk0p5 /tmp/mnt rm -rf /tmp/mnt/lib/modules/4.1* cp -a /lib/modules/4.1*/tmp/mnt/lib/modules/ umount /tmp/mnt # Optional, only if USB stick has rootfs verification on # /usr/share/vboot/bin/make_dev_ssd.sh --remove_rootfs_verification -i /dev/mmcblk0 sync reboot
System should boot from internal storage again
Alternatively flash a known good working image to the device and then use update_kernel.sh to target the other kernel partition (typically KERN-B) instead of the live kernel partition. Boot the device into the A slot kernel (KERN-A) and then run update_kernel.sh like this:
(chroot) ./update_kernel.sh --remote=$IP --rootfs=/dev/sda5 --partition=/dev/sda4 --bootonce
The bootloader will attempt to boot the kernel on the sda4 partition (KERN-B) and kernel modules will be updated to the sda5 rootfs partition (ROOT-B). If the kernel crashes early on then a reboot will fallback to the A slot kernel and rootfs that is known to be good and working.
/boot/
on the device/build/$BOARD/boot/config
in the chrootFor example, to enable the console on a recovery image on USB stick /dev/sdb
:
sudo make_dev_ssd.sh -i /dev/sdb --partitions 2 --save_config ./foo vi ./foo add the updated command line, for example: earlycon=uart,mmio32,0xfedc6000,115200,48000000 save & exit vi sudo make_dev_ssd.sh -i /dev/sdb --partitions 2 --set_config ./foo sudo make_dev_ssd.sh -i /dev/sdb --recovery_key
(chroot) cros_workon-${board} depthcharge vi /src/platform/depthcharge/src/board/${board}/board.c
Call the commandline_append()
function containing your command line addition:
#include "boot/commandline.h" static int board_setup(void) { commandline_append("earlycon=uart,mmio32,0xfedc6000,115200,48000000"); }
Rebuild depthcharge, and build it into the image.
See the kernel_faq guidelines for UPSTREAM, BACKPORT and FROMLIST tags.
Kconfig changes (changes that affect chromeos/config
) should be normalized by running chromeos/scripts/kernelconfig olddefconfig
Make sure that your patch builds fine with allmodconfig:
mkdir -p ../build/x86-64../build/arm64 # Native build (x86-64) make O=../build/x86-64 allmodconfig make O=../build/x86-64all -j50 2>&1|tee ../v3.18-build/x86-64.log # arm64 build CROSS_COMPILE=aarch64-cros-linux-gnu- ARCH=arm64 O=../build/arm64 make allmodconfig CROSS_COMPILE=aarch64-cros-linux-gnu- ARCH=arm64 O=../build/arm64 make -j64 >/dev/null
Test build with Chrome OS config:
cd src/third_party/kernel/v4.19 git checkout linux-next/master # Checkout config options only git checkout m/master -- chromeos # Normal emerge (chroot) emerge-kukui -av chromeos-kernel-4_19
../../../platform/dev/contrib/fromupstream.py -b b:123489157 \ -t "Deploy kukui kernel with USE=kmemleak, no kmemleak warning in __arm_v7s_alloc_table" \ 'git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git#next/032ebd8548c9d05e8d2bdc7a7ec2fe29454b0ad0'
Add project url in ~/.pwclientrc
[options] default=kernel [kernel] url=https://patchwork.kernel.org/ [lore] url=https://lore.kernel.org/patchwork/xmlrpc/
Then run:
../../../platform/dev/contrib/fromupstream.py -b b:132314838 -t "no crash with CONFIG_FAILSLAB" 'pw://10957015/' # or ../../../platform/dev/contrib/fromupstream.py -b b:132314838 -t "no crash with CONFIG_FAILSLAB" 'pw://kernel/10957015/'
In CrOS chroot (gerrit deps
prints dependencies from top to bottom, so its better to use tac
so that the bottom-most CL is set to ready first):
gerrit deps ${CL} --raw | tee deps-${CL} gerrit verify `tac deps-${CL}` 1 gerrit ready `tac deps-${CL}` 2
So you have an email/patch on patchwork, but you didn‘t subscribe to the mailing list, so you can’t reply to/review the change.
To fetch the email into your IMAP/gmail account:
python2.7 ./imap_upload.py patch.mbox --gmail
mbox downloaded from patchwork doesn't include replies to the patch (e.g. reviewer comments). To obtain mbox containing replies, download mbox.gz files from https://lore.kernel.org/lkml/ instead.