Wednesday, April 20, 2011

bind mounts, mtab and read-only

The bind mount feature is supported since Linux 2.4. It's pretty long time, but many users still think that bind mounts are something completely different to the normal mounts.

Example 1:
 # mount /dev/sdb1 /mnt/A
# mount /dev/sdb1 /mnt/B
This is not a bug. It's possible to mount the same filesystem on two places.

Example 2:
 # mount /dev/sdb1 /mnt/A
# mount --bind /mnt/A /mnt/B
The result from both examples is the same, see /proc/self/mountinfo:
 # grep mnt /proc/self/mountinfo
48 20 8:17 / /mnt/A rw,relatime - ext4 /dev/sdb1 rw,barrier=1,stripe=64,data=ordered
49 20 8:17 / /mnt/B rw,relatime - ext4 /dev/sdb1 rw,barrier=1,stripe=64,data=ordered
This is very important, from kernel point of view is it the same thing. The same filesystem is mounted on two places.

The kernel does not maintain anywhere information that /mnt/B was created by bind mount (MS_BIND mount(2) syscall flags). There is not dependence between /mnt/A and /mnt/B (for example you can umount /mnt/A).

Unfortunately, the situation in the /etc/mtab file is completely different:
 # grep mnt /etc/mtab
/dev/sdb1 /mnt/A ext4 rw 0 0
/mnt/A /mnt/B none rw,bind 0 0
This is confusing for many users. Try:
 # umount /mnt/A
# rm -rf /mnt/A

# grep mnt /etc/mtab
/mnt/A /mnt/B none rw,bind 0 0
Does the information in mtab make any sense? I don't think so... Keep this kind of information in userspace is mistake. Yeah, mtab is evil.


Everyone who uses bind mounts on system without mtab (where mtab is symlink to /proc/mounts) has to undestand that "bind" flag is no more stored anywhere. For example you have to explicitly add the flag to the mount options if you want to use read-only bind mount.
 # rm -f /etc/mtab
# ln -s /proc/mounts /etc/mtab
(or install Fedora 15:-)

Let's use findmnt(8) rather than grep /proc/self/mountinfo:
 # findmnt -o TARGET,VFS-OPTIONS,FS-OPTIONS /dev/sda1
TARGET VFS-OPTIONS FS-OPTIONS
/mnt/A rw,relatime rw,errors=continue,user_xattr,acl,barrier=0,data=ordered
/mnt/B rw,relatime rw,errors=continue,user_xattr,acl,barrier=0,data=ordered
What will happen if we try to remount with bind flag? See:
  # mount -o remount,ro,bind /mnt/B

# findmnt -o TARGET,VFS-OPTIONS,FS-OPTIONS /dev/sda1
TARGET VFS-OPTIONS FS-OPTIONS
/mnt/A rw,relatime rw,errors=continue,user_xattr,acl,barrier=0,data=ordered
/mnt/B ro,relatime rw,errors=continue,user_xattr,acl,barrier=0,data=ordered
The filesystem (superblock) is still read-write, but the /mnt/B mountpoint is in VFS marked as read-only.

And now the same thing without the bind flag:
 # mount -o remount,ro /mnt/B

# findmnt -o TARGET,VFS-OPTIONS,FS-OPTIONS /dev/sda1
TARGET VFS-OPTIONS FS-OPTIONS
/mnt/A rw,relatime ro,errors=continue,user_xattr,acl,barrier=0,data=ordered
/mnt/B ro,relatime ro,errors=continue,user_xattr,acl,barrier=0,data=ordered
the superblock has been remounted read-only, so the filesystem is read-only everywhere in the system.

Again, all this is possible independently on the way how /mnt/B has been mounted to the system (examples 1 and 2).

BTW, you can also set the block device as read-only by blockdev --setro. So we have three layers (device -> FS -> VFS) where is possible to set read-only attribute :-)