mbox series

[RFC,0/5] device mapper atomic write support

Message ID 20250106124119.1318428-1-john.g.garry@oracle.com (mailing list archive)
Headers show
Series device mapper atomic write support | expand

Message

John Garry Jan. 6, 2025, 12:41 p.m. UTC
This series introduces initial device mapper atomic write support.

Since we already support stacking atomic writes limits, it's quite
straightforward to support.

Only dm-linear is supported for now, but other personalities could
be supported.

Patch #1 is a proper fix, but the rest of the series is RFC - this is
because I have not fully tested and we are close to the end of this
development cycle.

Based on v6.13-rc6

John Garry (5):
  block: Ensure start sector is aligned for stacking atomic writes
  block: Change blk_stack_atomic_writes_limits() unit_min check
  dm-table: Atomic writes support
  dm: Ensure cloned bio is same length for atomic write
  dm-linear: Enable atomic writes

 block/blk-settings.c          |  9 ++++++---
 drivers/md/dm-linear.c        |  3 ++-
 drivers/md/dm-table.c         | 12 ++++++++++++
 drivers/md/dm.c               |  3 +++
 include/linux/blkdev.h        | 21 ++++++++++++---------
 include/linux/device-mapper.h |  3 +++
 6 files changed, 38 insertions(+), 13 deletions(-)

Comments

Mike Snitzer Jan. 6, 2025, 5:26 p.m. UTC | #1
On Mon, Jan 06, 2025 at 12:41:14PM +0000, John Garry wrote:
> This series introduces initial device mapper atomic write support.
> 
> Since we already support stacking atomic writes limits, it's quite
> straightforward to support.
> 
> Only dm-linear is supported for now, but other personalities could
> be supported.
> 
> Patch #1 is a proper fix, but the rest of the series is RFC - this is
> because I have not fully tested and we are close to the end of this
> development cycle.

In general, looks reasonable.  But I would prefer to see atomic write
support added to dm-striped as well.  Not that I have some need, but
because it will help verify the correctness of the general stacking
code changes (in both block and DM core).  I wrote and/or fixed a fair
amount of the non-atomic block limits stacking code over the
years.. so this is just me trying to inform this effort based on
limits stacking gotchas we've experienced to this point.

Looks like adding dm-striped support would just need to ensure that
the chunk_size is multiple of atomic write size (so chunk_size >=
atomic write size).

Relative to linear, testing limits stacking in terms of linear should
also verify that concatenated volumes work.

Thanks,
Mike
John Garry Jan. 6, 2025, 6:14 p.m. UTC | #2
On 06/01/2025 17:26, Mike Snitzer wrote:
> On Mon, Jan 06, 2025 at 12:41:14PM +0000, John Garry wrote:
>> This series introduces initial device mapper atomic write support.
>>
>> Since we already support stacking atomic writes limits, it's quite
>> straightforward to support.
>>
>> Only dm-linear is supported for now, but other personalities could
>> be supported.
>>
>> Patch #1 is a proper fix, but the rest of the series is RFC - this is
>> because I have not fully tested and we are close to the end of this
>> development cycle.
> In general, looks reasonable.  But I would prefer to see atomic write
> support added to dm-striped as well.  Not that I have some need, but
> because it will help verify the correctness of the general stacking
> code changes (in both block and DM core). 

That should be fine. We already have md raid0 support working (for 
atomic writes), so I would expect much of the required support is 
already available.

> I wrote and/or fixed a fair
> amount of the non-atomic block limits stacking code over the
> years.. so this is just me trying to inform this effort based on
> limits stacking gotchas we've experienced to this point.

Yeah, understood. And that is why I am on the lookup for points at which 
we try to split atomic writes in the submission patch. The only reason 
that it should happen is due to the limits being incorrectly calculated.

> 
> Looks like adding dm-striped support would just need to ensure that
> the chunk_size is multiple of atomic write size (so chunk_size >=
> atomic write size).

Right, so the block queue limits code already will throttle the atomic 
write max so that chunk_size % atomic write upper limit == 0.

> 
> Relative to linear, testing limits stacking in terms of linear should
> also verify that concatenated volumes work.

ok,

Thanks,
John
Mikulas Patocka Jan. 7, 2025, 5:13 p.m. UTC | #3
On Mon, 6 Jan 2025, John Garry wrote:

> On 06/01/2025 17:26, Mike Snitzer wrote:
> > On Mon, Jan 06, 2025 at 12:41:14PM +0000, John Garry wrote:
> > > This series introduces initial device mapper atomic write support.
> > > 
> > > Since we already support stacking atomic writes limits, it's quite
> > > straightforward to support.
> > > 
> > > Only dm-linear is supported for now, but other personalities could
> > > be supported.
> > > 
> > > Patch #1 is a proper fix, but the rest of the series is RFC - this is
> > > because I have not fully tested and we are close to the end of this
> > > development cycle.
> > In general, looks reasonable.  But I would prefer to see atomic write
> > support added to dm-striped as well.  Not that I have some need, but
> > because it will help verify the correctness of the general stacking
> > code changes (in both block and DM core). 
> 
> That should be fine. We already have md raid0 support working (for atomic
> writes), so I would expect much of the required support is already available.

BTW. could it be possible to add dm-mirror support as well? dm-mirror is 
used when the user moves the logical volume to another physical volume, so 
it would be nice if this worked without resulting in not-supported errors.

dm-mirror uses dm-io to perform the writes on multiple mirror legs (see 
the function do_write() -> dm_io()), I looked at the code and it seems 
that the support for atomic writes in dm-mirror and dm-io would be 
straightforward.

Another possibility would be dm-snapshot support, assuming that the atomic 
i/o size <= snapshot chunk size, the support should be easy - i.e. just 
pass the flag REQ_ATOMIC through. Perhaps it could be supported for 
dm-thin as well.

Mikulas
John Garry Jan. 7, 2025, 5:58 p.m. UTC | #4
On 07/01/2025 17:13, Mikulas Patocka wrote:
> On Mon, 6 Jan 2025, John Garry wrote:
> 
>> On 06/01/2025 17:26, Mike Snitzer wrote:
>>> On Mon, Jan 06, 2025 at 12:41:14PM +0000, John Garry wrote:
>>>> This series introduces initial device mapper atomic write support.
>>>>
>>>> Since we already support stacking atomic writes limits, it's quite
>>>> straightforward to support.
>>>>
>>>> Only dm-linear is supported for now, but other personalities could
>>>> be supported.
>>>>
>>>> Patch #1 is a proper fix, but the rest of the series is RFC - this is
>>>> because I have not fully tested and we are close to the end of this
>>>> development cycle.
>>> In general, looks reasonable.  But I would prefer to see atomic write
>>> support added to dm-striped as well.  Not that I have some need, but
>>> because it will help verify the correctness of the general stacking
>>> code changes (in both block and DM core).
>> That should be fine. We already have md raid0 support working (for atomic
>> writes), so I would expect much of the required support is already available.
> BTW. could it be possible to add dm-mirror support as well? dm-mirror is
> used when the user moves the logical volume to another physical volume, so
> it would be nice if this worked without resulting in not-supported errors.
> 
> dm-mirror uses dm-io to perform the writes on multiple mirror legs (see
> the function do_write() -> dm_io()), I looked at the code and it seems
> that the support for atomic writes in dm-mirror and dm-io would be
> straightforward.

FWIW, we do support atomic writes for md raid1. The key principle is 
that we atomically write to each disk. Obviously we cannot write to 
multiple disks atomically. So the copies in each mirror may be 
out-of-sync after an unexpected power fail, but that is ok as either 
will have all of old or new data, which is what we guarantee.

> 
> Another possibility would be dm-snapshot support, assuming that the atomic
> i/o size <= snapshot chunk size, the support should be easy - i.e. just
> pass the flag REQ_ATOMIC through. Perhaps it could be supported for
> dm-thin as well.

Do you think that there will be users for these?

atomic writes provide guarantees for users, and it would be hard to 
detect when these guarantees become broken through software bugs. I 
would be just concerned that we enable atomic writes for many of these 
more complicated personalities, and they are not actively used and break.

Thanks,
John
Mikulas Patocka Jan. 7, 2025, 6:56 p.m. UTC | #5
On Tue, 7 Jan 2025, John Garry wrote:

> On 07/01/2025 17:13, Mikulas Patocka wrote:
> > On Mon, 6 Jan 2025, John Garry wrote:
> > 
> > BTW. could it be possible to add dm-mirror support as well? dm-mirror is
> > used when the user moves the logical volume to another physical volume, so
> > it would be nice if this worked without resulting in not-supported errors.
> > 
> > dm-mirror uses dm-io to perform the writes on multiple mirror legs (see
> > the function do_write() -> dm_io()), I looked at the code and it seems
> > that the support for atomic writes in dm-mirror and dm-io would be
> > straightforward.
> 
> FWIW, we do support atomic writes for md raid1. The key principle is that we
> atomically write to each disk. Obviously we cannot write to multiple disks
> atomically. So the copies in each mirror may be out-of-sync after an
> unexpected power fail, but that is ok as either will have all of old or new
> data, which is what we guarantee.

Yes - something like that can be implemented for dm-mirror too.

> > Another possibility would be dm-snapshot support, assuming that the atomic
> > i/o size <= snapshot chunk size, the support should be easy - i.e. just
> > pass the flag REQ_ATOMIC through. Perhaps it could be supported for
> > dm-thin as well.
> 
> Do you think that there will be users for these?
> 
> atomic writes provide guarantees for users, and it would be hard to detect
> when these guarantees become broken through software bugs. I would be just
> concerned that we enable atomic writes for many of these more complicated
> personalities, and they are not actively used and break.
> 
> Thanks,
> John

dm-snapshot is not much used, but dm-thin is. I added Joe to the 
recipients list, so that he can decide whether dm-thin should support 
atomic writes or not.

Mikulas