Ubuntu on Raspberry Pi: flash-kernel exited with return code 1

I encountered an issue while updating my Ubuntu installation on a Raspberry Pi 4. It wasn't the first time, but until now I couldn't find the cause. The problem occurs while running an update, and it is as follows:

mv: cannot move '/boot/firmware/overlays/rpi-sense-v2.dtbo' to '/boot/firmware/overlays/rpi-sense-v2.dtbo.bak': No such file or directory
run-parts: /etc/initramfs/post-update.d//flash-kernel exited with return code 1
dpkg: error processing package initramfs-tools (--configure):
 installed initramfs-tools package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 initramfs-tools
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)

Sometimes the update failed during the update of the initramfs-tools package. When it failed, it was while moving files for backup purposes, like above. Every time I've checked, the file existed and running touch on the file system worked as expected. In most cases I got it fixed by moving the file manually and creating an empty one. In my case it was nothing I used anyway. Since it is not the first time and every time I'm annoyed, so let's investigate.

user:~$ which flash-kernel
/usr/sbin/flash-kernel
user:~$ file $(which flash-kernel)
/usr/sbin/flash-kernel: POSIX shell script, Unicode text, UTF-8 text executable

Luckily flash-kernel is a shell script, which can be easily examined. It sources another file /usr/share/flash-kernel/functions and the interesting part is:

687backup_and_install() {
688 local source="$1"
689 local dest="$2"
690 local do_dot_bak=$(get_dot_bak_preference)
691 local mtd_backup_dir=$(get_mtd_backup_dir)
692 if [ -e "$dest" ]; then
693 if [ -n "$do_dot_bak" ]; then
694 echo "Taking backup of $(basename "$dest")." >&2
695 mv "$dest" "$dest.bak"
696 else
697 echo "Skipping backup of $(basename "$dest")." >&2
698 fi
699 fi
700 # If we are installing to a filesystem which is not normally mounted
701 # then take a second copy in /var/backups, where they can e.g. be
702 # backed up.
703 if [ -n "$boot_mnt_dir" ] && [ -n "$mtd_backup_dir" ] ; then
704 local bak="$mtd_backup_dir/"$(basename "$dest")
705 #echo "Saving $boot_device:"$(basename "$source")" in $bak"
706 mkdir -p "$mtd_backup_dir"
707 cp "$source" "$bak"
708 fi
709 echo "Installing new $(basename "$dest")." >&2
710 mv "$source" "$dest"
711 maybe_defrag "$dest"
712}

Nothing fancy here. If a file exists make a backup of it with the extension .bak. Let's see what happens while running some commands on the file:

root:~# mv /boot/firmware/overlays/rpi-sense-v2.dtbo /boot/firmware/overlays/rpi-sense-v2.dtbo.bak
mv: cannot move '/boot/firmware/overlays/rpi-sense-v2.dtbo' to '/boot/firmware/overlays/rpi-sense-v2.dtbo.bak': No such file or directory
root:~# rm /boot/firmware/overlays/rpi-sense-v2.dtbo
rm: cannot remove '/boot/firmware/overlays/rpi-sense-v2.dtbo': No such file or directory

Interesting 🤔 At the moment I read the error message of the rm command I've got an idea. Let's check the file system!

root:~# lsblk -e7
NAME                                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                                      8:0    0 931.5G  0 disk
├─sda1                                   8:1    0   256M  0 part  /boot/firmware
...

Unmount the device, run a file system check, mount and fix broken packages:

root:~# umount /boot/firmware
root:~# fsck.vfat -av /dev/sda1
fsck.fat 4.2 (2021-01-31)
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "mkfs.fat"
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
       512 bytes per cluster
        32 reserved sectors
First FAT starts at byte 16384 (sector 32)
         2 FATs, 32 bit entries
   2064896 bytes per FAT (= 4033 sectors)
Root directory start at cluster 2 (arbitrary size)
Data area starts at byte 4146176 (sector 8098)
    516190 data clusters (264289280 bytes)
32 sectors/track, 16 heads
         0 hidden sectors
    524288 sectors total
Long filename fragment "spi3-2cs.dtbo" found outside a LFN sequence.
  (Maybe the start bit is missing on the last fragment)
  Not auto-correcting this.
Orphaned long file name part ".bak"
  Auto-deleting.
Long filename fragment "midi-uart2.dt" found outside a LFN sequence.
  (Maybe the start bit is missing on the last fragment)
  Not auto-correcting this.
Orphaned long file name part ".bak"
  Auto-deleting.
Long filename fragment "spi4-1cs.dtbo" found outside a LFN sequence.
  (Maybe the start bit is missing on the last fragment)
  Not auto-correcting this.
Orphaned long file name part "pi4.dtbo.bak"
  Auto-deleting.
Long filename fragment "-pcm512x-audi" found outside a LFN sequence.
  (Maybe the start bit is missing on the last fragment)
  Not auto-correcting this.
Long filename fragment "allo-boss-dac" found outside a LFN sequence.
  (Maybe the start bit is missing on the last fragment)
  Not auto-correcting this.
A new long file name starts within an old one.
  Not auto-correcting this.
Orphaned long file name part "midi-uart1-pi5.dtbo"
  Auto-deleting.
Reclaiming unconnected clusters.
Reclaimed 34 unused clusters (17408 bytes) in 11 chains.
Checking free cluster summary.
*** Filesystem was changed ***
Writing changes.
/dev/sda1: 1169 files, 340372/516190 clusters
root:~# mount /boot/firmware
root:~# apt --fix-broken install

Finally! It was just a corrupt file system. That could be a sign of a failing disk. Good time to check the S.M.A.R.T. values of the disk using smartmontools. Querying the disk failed. This is a common experience I encounter on disks connected via USB. I no longer had much hope to read out the S.M.A.R.T. values and searched for a solution and found1 it! update-smart-drivedb did the trick!

root:~# apt install smartmontools
root:~# update-smart-drivedb
root:~# smartctl -c /dev/sda
...
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 182) minutes.
...

root:~# smartctl -t short /dev/sda
root:~# sleep 2m && smartctl -l selftest /dev/sda
...

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     31445         -
root:~# smartctl -H /dev/sda
...
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Okay looks good. Hope it keeps running. Maybe I should set up the smartmontools properly, so that I get an email when the S.M.A.R.T. values get bad.