Wednesday, February 15, 2012

libblkid maintainer's brain dump

This article is about the low-level probing libblkid code, and it's really dump, nothing more ;-)

High and Low level

The library contains two APIs.
  • high-level - this is the original library code from e2fsprogs. All results are cached in the file /etc/blkid.tab (or /run/blkid/blkid.tab). The advantage is that information about LABELs and UUIDs are accessible for non-root users and the cache has positive impact on performance.

    This advantage is no more valid on many systems where all necessary information are stored in udev db, and things like LABEL and UUID are accessible by /dev/disk/by-* udev symlinks.

    This is reason why for newly written programs are recommended blkid_evaluate_* functions which are able to use udev symlinks as well as the original libblkid cache. This functionality is also accessible from command line by the blkid -L|-U command.

  • low-level - this part of the API completely bypass the cache and allows to work directly with library probing functions. The rest of this article is about the low-level part of the library.
The library contains three chains of the probing functions:
  1. superblocks
  2. partitions
  3. topology
The superblocks probing is enabled by default. The command "blkid -p -o udev" (or built-in code in udevd) enables partitions probing chain too.

There are two basic probing methods:
  • safeprobe - this is recommended method. This method cares about collisions between filesystems, raids or partition tables.
  • fullprobe - don't check for conflicts, used for example in wipefs(8)
For the superblock is available NAME=value based API only. For topology and partitions is available binary interface too. See the docs link below.

Superblocks
  • three basic "usage" groups: filesystems, raids, crypto and others
  • RAIDs (MD, LVM, ...) are probed before filesystems
  • don't check for filesystems when a RAID signature is detected
  • don't check for RAIDs or others (swap-area) on CD-ROMs
  • don't check for RAIDs on tiny devices (< 1 MiB)
  • don't read whole FAT root directory (to lookup LABEL) on tiny devices (< 1 MiB)
exceptions / extra cases:
  • MD RAID is ignored if detected within a valid partition during whole-disk probing

    [use case: partitioned disk, last partition used as a RAID member and the RAID has metadata at the end of the last partition (so end of the disk)]

  • LVM signature is ignored if another signature is detected within first 8KiB of the device (LVM wipes this area, so there should not be any filesystem superblock)

    [use case: disk with LVM, user stops to use LVM and creates a new partition table by fdisk, result is MBR and obsolete LVM signature on the same device]
Partitions
  • disabled by default, enabled for udev (see ID_PART_ENTRY_* in udev db)
  • parse partition tables (aix, minix, bsd, mbr, gpt, mac, sgi, solaris, sun, ultrix and unixware)
  • detect nested partition tables (e.g. BSD) within partitions
  • if given device is a partition (e.g. sda1) then open whole disk (e.g. sda) to read details about the partition from partition table. This feature has to be enabled by BLKID_PARTS_ENTRY_DETAILS flag.
  • partition table is ignored if a valid RAID superblock is detected at the end of the device

    [use case: partitioned RAID1 (mirror) -- the partition table is visible from underlaying devices]
Topology
  • rarely used
  • designed for mkfs-like or fdisk-like programs to get info about I/O topology
  • for kernel >= 2.6.3x uses ioctl or sysfs
  • as fallback for old kernels uses code originally from xfsprogs

Tips for users

  • please use wipefs(8) before fdisk, mkfs or mkswap. The latest version is able to remove really all possible backup signatures, partition tables and at first glance invisible things. Don't rely on mkfs developers :-)
  • think twice before you start to use some complex setups (for example partitioned RAIDs) to avoid misinterpretation by kernel or system tools.
  • don't forget that blkid without -p might returns cached results
Tips for developers

.... I'll try to keep these notes updated.