diff mbox series

[v5,3/4] block: add support for notifications

Message ID ca0022886e8f211a323a716653a1396a3bc91653.1722365899.git.daniel@makrotopia.org (mailing list archive)
State New, archived
Headers show
Series block: preparations for NVMEM provider | expand

Commit Message

Daniel Golle July 30, 2024, 7:26 p.m. UTC
Add notifier block to notify other subsystems about the addition or
removal of block devices.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 block/Kconfig          |  6 +++
 block/Makefile         |  1 +
 block/blk-notify.c     | 87 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h | 11 ++++++
 4 files changed, 105 insertions(+)
 create mode 100644 block/blk-notify.c

Comments

Christoph Hellwig July 30, 2024, 7:36 p.m. UTC | #1
Same NAK as last time.  Random modules should not be able to hook
directly into block device / partition probing.

What you want to do can be done trivially in userspace in initramfs,
please do that as recommended multiple times before.
Daniel Golle July 30, 2024, 9:28 p.m. UTC | #2
On Tue, Jul 30, 2024 at 12:36:59PM -0700, Christoph Hellwig wrote:
> Same NAK as last time.  Random modules should not be able to hook
> directly into block device / partition probing.

Would using delayed_work be indirect enough for your taste?
If so, that would of course be rather easy to implement.

> 
> What you want to do can be done trivially in userspace in initramfs,
> please do that as recommended multiple times before.
> 

While the average desktop or server **general purpose** Linux
distribution uses an initramfs, often generated dynamically on the
target system during installation or kernel updates, this is NOT how
things are working in the embedded Linux world and for OpenWrt
specifically.

For the OpenWrt community, the great thing is that the Linux Kernel, and
even an identical userland can run on embedded devices with as little as
8 megabytes of NOR flash as well as on much more resourceful systems
with large a eMMC or even NVMe disks, but almost always just exactly one
single non-volatile storage device. All of those devices come without
complex boot firmware, so no ACPI, no UEFI, ... just U-Boot and a DT
blob which gets glued to the kernel in one way or another. And it would
of course be nice if they would all wake up with correct MAC addresses
and working WiFi, even if they come with larger (typically
block-oriented) storage. In terms of hardware such boards are often just
two or three IC packages: SoC (sometimes including RAM) and some sort
of non-volatile memory big enough to store a Linux-based firmware,
factory data (MAC addresses, WiFI calibration, serial number) and
user settings.

The same Linux Kernel source tree is also used to build kernels running
on countless large servers (and comparingly small number of desktop
systems) with complex (proprietary) boot firmware and typically a hand
full of flashes and EEPROMs on the motherboard alone. On such systems,
Ethernet NICs are dedicated chips or even PCIe cards with sometimes
even dedicated EEPROMs storing their MAC addresses. Or virtual machines
having the host taking care of all of that.

Coexistance of all those different scales, without forcing the ways of
large systems onto the small ones (and vice versa) has been a huge
strength in my opinion.

When it comes to the small (sub $100, often much less) boards for
plastic-case network appliances such as routers and access points, often
times the exact same board can be bought either with on-board SPI-NAND
(used with UBI) or an eMMC. Of course, the vendors keep things as
similar as possible, so the layout used for the NVMEM bits is often
identical, just that in one case those (typically less than a memory
page full of) bits are stored on an MTD partition or directly inside a
UBI volume, and in the other case they are stored either at a fixed
offset on the mmcblk0boot[01] device or inside a GPT partition. This is
just how reality for this class of devices already looks like today.
In previous iterations of the series I've provided multiple examples of
mainstream device vendors (Adtran, ASUS, GL.iNet, ...) to illustrate
that.

Hence I fail to understand why different rules should apply for block
devices than for EEPROMs, e-fuses, raw or SPI-connected NOR or NAND
flashes, or UBI. Especially as this is about something completely
optional, and disabled by default.

Effectively, if an interface to reference and access block-oriented
storage devices as NVMEM providers in the same way as MTD, UBI, ... is
rejected by the Linux kernel, it just means we will have to carry that
as a downstream patch in OpenWrt in order to support those devices in a
decent way. Generating a device-specific initramfs for each and every
device would not be decent imho. Carrying information about all devices
in the filesystem used on every device is also not decent. Our goal is
exactly to get rid of the board-specific switch-case Shell script
madness in userspace instead of having more of it...

Traversing DT in userspace (via /sys/firmware/) would of course be
possible, but it's often simply too late (ie. after rootfs has been
mounted, and that includes initramfs) for many use-cases (eg. nfsroot),
and it would be a redundant implementation of things which are already
implemented in the kernel. We don't like to repeat ourselves, nor do we
like to deal with board-specific details in userland.

Having a complex do-it-all initramfs like on the common x86-centric
desktop or server distribution is also not an option, it would never fit
into the storage of lower-end devices with only a few megabytes of NOR
flash. You'd need two copies of libc and busybox (one in initramfs and
one in the actual rootfs), and even the extreme case of a single static
ELF binary used as initrd would still occupy hundreds of kilobytes of
storage, and be a hell to maintain. If that sounds like very little to
you, that means you haven't been dealing with that class of devices.


Thank you for your consideration


Daniel
diff mbox series

Patch

diff --git a/block/Kconfig b/block/Kconfig
index 5b623b876d3b4..67cd4f92378af 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -209,6 +209,12 @@  config BLK_INLINE_ENCRYPTION_FALLBACK
 	  by falling back to the kernel crypto API when inline
 	  encryption hardware is not present.
 
+config BLOCK_NOTIFIERS
+	bool "Enable support for notifications in block layer"
+	help
+	  Enable this option to provide notifiers for other subsystems
+	  upon addition or removal of block devices.
+
 source "block/partitions/Kconfig"
 
 config BLK_MQ_PCI
diff --git a/block/Makefile b/block/Makefile
index ddfd21c1a9ffc..a131fa7d6b26e 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -38,3 +38,4 @@  obj-$(CONFIG_BLK_INLINE_ENCRYPTION)	+= blk-crypto.o blk-crypto-profile.o \
 					   blk-crypto-sysfs.o
 obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK)	+= blk-crypto-fallback.o
 obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED)	+= holder.o
+obj-$(CONFIG_BLOCK_NOTIFIERS) += blk-notify.o
diff --git a/block/blk-notify.c b/block/blk-notify.c
new file mode 100644
index 0000000000000..fd727288ea19a
--- /dev/null
+++ b/block/blk-notify.c
@@ -0,0 +1,87 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Notifiers for addition and removal of block devices
+ *
+ * Copyright (c) 2024 Daniel Golle <daniel@makrotopia.org>
+ */
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+
+#include "blk.h"
+
+struct blk_device_list {
+	struct device *dev;
+	struct list_head list;
+};
+
+static RAW_NOTIFIER_HEAD(blk_notifier_list);
+static DEFINE_MUTEX(blk_notifier_lock);
+static LIST_HEAD(blk_devices);
+
+void blk_register_notify(struct notifier_block *nb)
+{
+	struct blk_device_list *existing_blkdev;
+
+	mutex_lock(&blk_notifier_lock);
+	raw_notifier_chain_register(&blk_notifier_list, nb);
+
+	list_for_each_entry(existing_blkdev, &blk_devices, list)
+		nb->notifier_call(nb, BLK_DEVICE_ADD, existing_blkdev->dev);
+
+	mutex_unlock(&blk_notifier_lock);
+}
+EXPORT_SYMBOL_GPL(blk_register_notify);
+
+void blk_unregister_notify(struct notifier_block *nb)
+{
+	mutex_lock(&blk_notifier_lock);
+	raw_notifier_chain_unregister(&blk_notifier_list, nb);
+	mutex_unlock(&blk_notifier_lock);
+}
+EXPORT_SYMBOL_GPL(blk_unregister_notify);
+
+static int blk_call_notifier_add(struct device *dev)
+{
+	struct blk_device_list *new_blkdev;
+
+	new_blkdev = kmalloc(sizeof(*new_blkdev), GFP_KERNEL);
+	if (!new_blkdev)
+		return -ENOMEM;
+
+	new_blkdev->dev = dev;
+	mutex_lock(&blk_notifier_lock);
+	list_add_tail(&new_blkdev->list, &blk_devices);
+	raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_ADD, dev);
+	mutex_unlock(&blk_notifier_lock);
+	return 0;
+}
+
+static void blk_call_notifier_remove(struct device *dev)
+{
+	struct blk_device_list *old_blkdev, *tmp;
+
+	mutex_lock(&blk_notifier_lock);
+	list_for_each_entry_safe(old_blkdev, tmp, &blk_devices, list) {
+		if (old_blkdev->dev != dev)
+			continue;
+
+		list_del(&old_blkdev->list);
+		kfree(old_blkdev);
+	}
+	raw_notifier_call_chain(&blk_notifier_list, BLK_DEVICE_REMOVE, dev);
+	mutex_unlock(&blk_notifier_lock);
+}
+
+static struct class_interface blk_notifications_bus_interface __refdata = {
+	.class = &block_class,
+	.add_dev = &blk_call_notifier_add,
+	.remove_dev = &blk_call_notifier_remove,
+};
+
+static int __init blk_notifications_init(void)
+{
+	return class_interface_register(&blk_notifications_bus_interface);
+}
+device_initcall(blk_notifications_init);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e85ec73a07d57..2f871158d2860 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1682,4 +1682,15 @@  static inline bool bdev_can_atomic_write(struct block_device *bdev)
 
 #define DEFINE_IO_COMP_BATCH(name)	struct io_comp_batch name = { }
 
+
+#define BLK_DEVICE_ADD		1
+#define BLK_DEVICE_REMOVE	2
+#if defined(CONFIG_BLOCK_NOTIFIERS)
+void blk_register_notify(struct notifier_block *nb);
+void blk_unregister_notify(struct notifier_block *nb);
+#else
+static inline void blk_register_notify(struct notifier_block *nb) { };
+static inline void blk_unregister_notify(struct notifier_block *nb) { };
+#endif
+
 #endif /* _LINUX_BLKDEV_H */