diff mbox series

dm: make it possible to open underlying devices in shareable mode

Message ID dcaed20c-2132-d453-9c04-b22417ba4010@redhat.com (mailing list archive)
State New
Headers show
Series dm: make it possible to open underlying devices in shareable mode | expand

Commit Message

Mikulas Patocka Nov. 11, 2024, 4:49 p.m. UTC
Hi

On Fri, 8 Nov 2024, Jooyung Han wrote:

> Hi Mikulas,
>
> Would it be reasonable to allow this? Or, to reduce the risk or impact 
> on other users, how about introducing a flag for dm-linear constructor 
> to open a table device without setting the holder. I don't know how to 
> describe this, not-exclusive? The dm-linear constructor would look like 
> "<dev_path> <offset> [no_exclusive]", when "no_exclusive" is set, it 
> will open a table device without setting a holder.
> 
> I'd like to get your opinion about this idea. Especially if there's any 
> risks about not setting the holder when opening a DM table device. 
> 
> Besides, there's one more question about dm-verity. Even though there's 
> a field named "data_start" (comment saying it's "data offset in 512-byte 
> sectors"): https://elixir.bootlin.com/linux/v6.11.4/source/drivers/md/dm-verity.h#L49, 
> it's not set in its constructor and always zero after initialization. I 
> think it's reasonable for the constructor to set it with an argument 
> passed. What do you think?

Yes, we could set it in the constructor.

> Thanks.

I've created this (totally untested) patch that makes it possible to add a 
flag DM_OPEN_SHARED to the flags in the create ioctl - when the flag is 
used, all devices open by the specified device are open in a shared mode. 
Does the patch work for you?

Mikulas



From: Mikulas Patocka <mpatocka@redhat.com>

This commit adds a new flag DM_OPEN_SHARED. The flag may be used on the
create ioctl. When it is used, the underlying devices that are opened
will be open in a shareable mode, so that the user can mount filesystems
on them or perform other activity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-core.h          |    1 +
 drivers/md/dm-ioctl.c         |    4 ++--
 drivers/md/dm.c               |    9 +++++++--
 include/linux/device-mapper.h |    2 +-
 include/uapi/linux/dm-ioctl.h |    6 ++++++
 5 files changed, 17 insertions(+), 5 deletions(-)

Comments

Jooyung Han Nov. 12, 2024, 6:26 a.m. UTC | #1
Thanks Mikulas for the patch!

This works like a charm. Here's how I tested: On Android, for an APEX
file in /system partition, I create a dm-linear device with "shared"
mode on top of the dm-verity device underlying the system partition.
Previously, this failed with EBUSY because the dm-verity device is
owned by the /system filesystem. But I was able to create it in a
shared mode with your patch applied. Thanks again!

Jooyung

On Tue, Nov 12, 2024 at 1:49 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> On Fri, 8 Nov 2024, Jooyung Han wrote:
>
> > Hi Mikulas,
> >
> > Would it be reasonable to allow this? Or, to reduce the risk or impact
> > on other users, how about introducing a flag for dm-linear constructor
> > to open a table device without setting the holder. I don't know how to
> > describe this, not-exclusive? The dm-linear constructor would look like
> > "<dev_path> <offset> [no_exclusive]", when "no_exclusive" is set, it
> > will open a table device without setting a holder.
> >
> > I'd like to get your opinion about this idea. Especially if there's any
> > risks about not setting the holder when opening a DM table device.
> >
> > Besides, there's one more question about dm-verity. Even though there's
> > a field named "data_start" (comment saying it's "data offset in 512-byte
> > sectors"): https://elixir.bootlin.com/linux/v6.11.4/source/drivers/md/dm-verity.h#L49,
> > it's not set in its constructor and always zero after initialization. I
> > think it's reasonable for the constructor to set it with an argument
> > passed. What do you think?
>
> Yes, we could set it in the constructor.
>
> > Thanks.
>
> I've created this (totally untested) patch that makes it possible to add a
> flag DM_OPEN_SHARED to the flags in the create ioctl - when the flag is
> used, all devices open by the specified device are open in a shared mode.
> Does the patch work for you?
>
> Mikulas
>
>
>
> From: Mikulas Patocka <mpatocka@redhat.com>
>
> This commit adds a new flag DM_OPEN_SHARED. The flag may be used on the
> create ioctl. When it is used, the underlying devices that are opened
> will be open in a shareable mode, so that the user can mount filesystems
> on them or perform other activity.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
>
> ---
>  drivers/md/dm-core.h          |    1 +
>  drivers/md/dm-ioctl.c         |    4 ++--
>  drivers/md/dm.c               |    9 +++++++--
>  include/linux/device-mapper.h |    2 +-
>  include/uapi/linux/dm-ioctl.h |    6 ++++++
>  5 files changed, 17 insertions(+), 5 deletions(-)
>
> Index: linux-2.6/drivers/md/dm-core.h
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-core.h 2024-07-30 14:06:48.000000000 +0200
> +++ linux-2.6/drivers/md/dm-core.h      2024-11-11 17:34:13.000000000 +0100
> @@ -161,6 +161,7 @@ struct mapped_device {
>  #define DMF_SUSPENDED_INTERNALLY 7
>  #define DMF_POST_SUSPENDING 8
>  #define DMF_EMULATE_ZONE_APPEND 9
> +#define DMF_OPEN_SHARED 10
>
>  void disable_discard(struct mapped_device *md);
>  void disable_write_zeroes(struct mapped_device *md);
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c        2024-09-02 17:35:39.000000000 +0200
> +++ linux-2.6/drivers/md/dm-ioctl.c     2024-11-11 17:35:51.000000000 +0100
> @@ -888,7 +888,7 @@ static int dev_create(struct file *filp,
>         if (param->flags & DM_PERSISTENT_DEV_FLAG)
>                 m = MINOR(huge_decode_dev(param->dev));
>
> -       r = dm_create(m, &md);
> +       r = dm_create(m, param->flags, &md);
>         if (r)
>                 return r;
>
> @@ -2283,7 +2283,7 @@ int __init dm_early_create(struct dm_ioc
>                 m = MINOR(huge_decode_dev(dmi->dev));
>
>         /* alloc dm device */
> -       r = dm_create(m, &md);
> +       r = dm_create(m, 0, &md);
>         if (r)
>                 return r;
>
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c      2024-11-07 12:49:13.000000000 +0100
> +++ linux-2.6/drivers/md/dm.c   2024-11-11 17:35:05.000000000 +0100
> @@ -737,7 +737,9 @@ static struct table_device *open_table_d
>                 return ERR_PTR(-ENOMEM);
>         refcount_set(&td->count, 1);
>
> -       bdev_file = bdev_file_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
> +       bdev_file = bdev_file_open_by_dev(dev, mode,
> +               test_bit(DMF_OPEN_SHARED, &md->flags) ? NULL :_dm_claim_ptr,
> +               NULL);
>         if (IS_ERR(bdev_file)) {
>                 r = PTR_ERR(bdev_file);
>                 goto out_free_td;
> @@ -2489,7 +2491,7 @@ static struct dm_table *__unbind(struct
>  /*
>   * Constructor for a new device.
>   */
> -int dm_create(int minor, struct mapped_device **result)
> +int dm_create(int minor, uint32_t flags, struct mapped_device **result)
>  {
>         struct mapped_device *md;
>
> @@ -2497,6 +2499,9 @@ int dm_create(int minor, struct mapped_d
>         if (!md)
>                 return -ENXIO;
>
> +       if (flags & DM_OPEN_SHARED)
> +               set_bit(DMF_OPEN_SHARED, &md->flags);
> +
>         dm_ima_reset_data(md);
>
>         *result = md;
> Index: linux-2.6/include/linux/device-mapper.h
> ===================================================================
> --- linux-2.6.orig/include/linux/device-mapper.h        2024-10-16 16:41:33.000000000 +0200
> +++ linux-2.6/include/linux/device-mapper.h     2024-11-11 17:31:32.000000000 +0100
> @@ -483,7 +483,7 @@ void dm_consume_args(struct dm_arg_set *
>   * DM_ANY_MINOR chooses the next available minor number.
>   */
>  #define DM_ANY_MINOR (-1)
> -int dm_create(int minor, struct mapped_device **md);
> +int dm_create(int minor, uint32_t flags, struct mapped_device **md);
>
>  /*
>   * Reference counting for md.
> Index: linux-2.6/include/uapi/linux/dm-ioctl.h
> ===================================================================
> --- linux-2.6.orig/include/uapi/linux/dm-ioctl.h        2023-09-05 14:46:02.000000000 +0200
> +++ linux-2.6/include/uapi/linux/dm-ioctl.h     2024-11-11 17:33:33.000000000 +0100
> @@ -382,4 +382,10 @@ enum {
>   */
>  #define DM_IMA_MEASUREMENT_FLAG        (1 << 19) /* In */
>
> +/*
> + * Open underlying devices in the shareable mode, so that the user can mount
> + * filesystems on them.
> + */
> +#define DM_OPEN_SHARED         (1 << 20) /* In */
> +
>  #endif                         /* _LINUX_DM_IOCTL_H */
Mikulas Patocka Nov. 13, 2024, 5:25 p.m. UTC | #2
On Tue, 12 Nov 2024, Jooyung Han wrote:

> Thanks Mikulas for the patch!
> 
> This works like a charm. Here's how I tested: On Android, for an APEX
> file in /system partition, I create a dm-linear device with "shared"
> mode on top of the dm-verity device underlying the system partition.
> Previously, this failed with EBUSY because the dm-verity device is
> owned by the /system filesystem. But I was able to create it in a
> shared mode with your patch applied. Thanks again!
> 
> Jooyung

Hi

I discussed this patch with Alasdair Kergon <agk@redhat.com> and he said 
that it may confuse dependencies in udev, so the patch will likely not go 
in.

But there are some alternatives:

1. create a linear device that maps the dm-verity device in 1:1 
relationship. Then, mount the filesystem on this linear device. The 
filesystem will claim only this linear device, so that you can create more 
linear devices on the top of the dm-verity device.

Will there be some performance degradation due to the extra linear device? 
Could you measure it?

2. use the loop device and enable direct I/O mode (use the ioctl 
LOOP_SET_DIRECT_IO) - this should avoid double-caching and it should 
increase the performance of the loop device. Does it improve performance?

3. use two dm-verity instances over one physical device - then, you can 
mount the filesystem on one instance and create linear devices on the 
other.

4. use the dm-loop target - https://www.sourceware.org/lvm2/wiki/DMLoop 
https://patchwork.kernel.org/project/dm-devel/patch/1b84af841912065fc57cfe395d5214f4eee0f0fc.1516124587.git.heinzm@redhat.com/ 
Some times ago we had a dm-loop target that queried the filesystem for a 
layout of a file and then remapped the bios quickly according to this 
layout tree. But now it's unmaintained and not updated to the latest 
kernel. Perhaps it could be used for your purpose. I'm not sure what 
filesystme maintainers will say about it.

Mikulas
Jooyung Han Nov. 14, 2024, 1:32 a.m. UTC | #3
Thank you Mikulas and Alasdair for your thoughts and ideas about alternatives.

I'm aware of those alternatives. Android already uses DIO for loop
devices for read performance for APEX mounts. Having an extra DM
device was in fact our immediate fallback solution when I found this
issue. Then, I thought that, if DM-device owned by another DM-device
can be shared, why not share it with another read-only filesystem.

> confuse dependencies in udev,

What would it be like when a DM device is shared by two other DM
devices like your suggestion #1 or #3? In #1 for example, a dm-verity
device is shared by two dm-linear devices: one for /system filesystem
mount and the other for APEX mount. I hope we can fix the issue around
udev dependencies if there's any.

> Will there be some performance degradation due to the extra linear device?
> Could you measure it?

I'll! There're two aspects: device creation and read performance. Both
will affect syshealth. Fewer devices should definitely be a win.
That's why I brought this to you ;-) If your patch can't make it, #1
is most likely the only option I have so I hope there's not much
degradation.

Thanks again.

On Thu, Nov 14, 2024 at 2:25 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Tue, 12 Nov 2024, Jooyung Han wrote:
>
> > Thanks Mikulas for the patch!
> >
> > This works like a charm. Here's how I tested: On Android, for an APEX
> > file in /system partition, I create a dm-linear device with "shared"
> > mode on top of the dm-verity device underlying the system partition.
> > Previously, this failed with EBUSY because the dm-verity device is
> > owned by the /system filesystem. But I was able to create it in a
> > shared mode with your patch applied. Thanks again!
> >
> > Jooyung
>
> Hi
>
> I discussed this patch with Alasdair Kergon <agk@redhat.com> and he said
> that it may confuse dependencies in udev, so the patch will likely not go
> in.
>
> But there are some alternatives:
>
> 1. create a linear device that maps the dm-verity device in 1:1
> relationship. Then, mount the filesystem on this linear device. The
> filesystem will claim only this linear device, so that you can create more
> linear devices on the top of the dm-verity device.
>
> Will there be some performance degradation due to the extra linear device?
> Could you measure it?
>
> 2. use the loop device and enable direct I/O mode (use the ioctl
> LOOP_SET_DIRECT_IO) - this should avoid double-caching and it should
> increase the performance of the loop device. Does it improve performance?
>
> 3. use two dm-verity instances over one physical device - then, you can
> mount the filesystem on one instance and create linear devices on the
> other.
>
> 4. use the dm-loop target - https://www.sourceware.org/lvm2/wiki/DMLoop
> https://patchwork.kernel.org/project/dm-devel/patch/1b84af841912065fc57cfe395d5214f4eee0f0fc.1516124587.git.heinzm@redhat.com/
> Some times ago we had a dm-loop target that queried the filesystem for a
> layout of a file and then remapped the bios quickly according to this
> layout tree. But now it's unmaintained and not updated to the latest
> kernel. Perhaps it could be used for your purpose. I'm not sure what
> filesystme maintainers will say about it.
>
> Mikulas
>
diff mbox series

Patch

Index: linux-2.6/drivers/md/dm-core.h
===================================================================
--- linux-2.6.orig/drivers/md/dm-core.h	2024-07-30 14:06:48.000000000 +0200
+++ linux-2.6/drivers/md/dm-core.h	2024-11-11 17:34:13.000000000 +0100
@@ -161,6 +161,7 @@  struct mapped_device {
 #define DMF_SUSPENDED_INTERNALLY 7
 #define DMF_POST_SUSPENDING 8
 #define DMF_EMULATE_ZONE_APPEND 9
+#define DMF_OPEN_SHARED 10
 
 void disable_discard(struct mapped_device *md);
 void disable_write_zeroes(struct mapped_device *md);
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c	2024-09-02 17:35:39.000000000 +0200
+++ linux-2.6/drivers/md/dm-ioctl.c	2024-11-11 17:35:51.000000000 +0100
@@ -888,7 +888,7 @@  static int dev_create(struct file *filp,
 	if (param->flags & DM_PERSISTENT_DEV_FLAG)
 		m = MINOR(huge_decode_dev(param->dev));
 
-	r = dm_create(m, &md);
+	r = dm_create(m, param->flags, &md);
 	if (r)
 		return r;
 
@@ -2283,7 +2283,7 @@  int __init dm_early_create(struct dm_ioc
 		m = MINOR(huge_decode_dev(dmi->dev));
 
 	/* alloc dm device */
-	r = dm_create(m, &md);
+	r = dm_create(m, 0, &md);
 	if (r)
 		return r;
 
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c	2024-11-07 12:49:13.000000000 +0100
+++ linux-2.6/drivers/md/dm.c	2024-11-11 17:35:05.000000000 +0100
@@ -737,7 +737,9 @@  static struct table_device *open_table_d
 		return ERR_PTR(-ENOMEM);
 	refcount_set(&td->count, 1);
 
-	bdev_file = bdev_file_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
+	bdev_file = bdev_file_open_by_dev(dev, mode,
+		test_bit(DMF_OPEN_SHARED, &md->flags) ? NULL :_dm_claim_ptr,
+		NULL);
 	if (IS_ERR(bdev_file)) {
 		r = PTR_ERR(bdev_file);
 		goto out_free_td;
@@ -2489,7 +2491,7 @@  static struct dm_table *__unbind(struct
 /*
  * Constructor for a new device.
  */
-int dm_create(int minor, struct mapped_device **result)
+int dm_create(int minor, uint32_t flags, struct mapped_device **result)
 {
 	struct mapped_device *md;
 
@@ -2497,6 +2499,9 @@  int dm_create(int minor, struct mapped_d
 	if (!md)
 		return -ENXIO;
 
+	if (flags & DM_OPEN_SHARED)
+		set_bit(DMF_OPEN_SHARED, &md->flags);
+
 	dm_ima_reset_data(md);
 
 	*result = md;
Index: linux-2.6/include/linux/device-mapper.h
===================================================================
--- linux-2.6.orig/include/linux/device-mapper.h	2024-10-16 16:41:33.000000000 +0200
+++ linux-2.6/include/linux/device-mapper.h	2024-11-11 17:31:32.000000000 +0100
@@ -483,7 +483,7 @@  void dm_consume_args(struct dm_arg_set *
  * DM_ANY_MINOR chooses the next available minor number.
  */
 #define DM_ANY_MINOR (-1)
-int dm_create(int minor, struct mapped_device **md);
+int dm_create(int minor, uint32_t flags, struct mapped_device **md);
 
 /*
  * Reference counting for md.
Index: linux-2.6/include/uapi/linux/dm-ioctl.h
===================================================================
--- linux-2.6.orig/include/uapi/linux/dm-ioctl.h	2023-09-05 14:46:02.000000000 +0200
+++ linux-2.6/include/uapi/linux/dm-ioctl.h	2024-11-11 17:33:33.000000000 +0100
@@ -382,4 +382,10 @@  enum {
  */
 #define DM_IMA_MEASUREMENT_FLAG	(1 << 19) /* In */
 
+/*
+ * Open underlying devices in the shareable mode, so that the user can mount
+ * filesystems on them.
+ */
+#define DM_OPEN_SHARED		(1 << 20) /* In */
+
 #endif				/* _LINUX_DM_IOCTL_H */