diff mbox

dm thin: superblock may write succeed before other metadata blocks because of wirting metadata in async mode.

Message ID 20180522005336.GA30152@yyp. (mailing list archive)
State Rejected, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

monty_pavel@sina.com May 22, 2018, 12:53 a.m. UTC
If dm_bufio_write_dirty_buffers func is called by __commit_transaction
func and power loss happens during executing it, coincidencely
superblock wrote correctly but some metadata blocks didn't. The reason
is we write all metadata in async mode. We can guarantee that we send
superblock after other blocks but we cannot guarantee that superblock
write completely early than other blocks.
So, We need to commit other metadata blocks before change superblock.

Signed-off-by: Monty Pavel <monty_pavel@sina.com>
---
 drivers/md/dm-thin-metadata.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

Comments

Mike Snitzer June 19, 2018, 1:11 p.m. UTC | #1
On Mon, May 21 2018 at  8:53pm -0400,
Monty Pavel <monty_pavel@sina.com> wrote:

> 
> If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> func and power loss happens during executing it, coincidencely
> superblock wrote correctly but some metadata blocks didn't. The reason
> is we write all metadata in async mode. We can guarantee that we send
> superblock after other blocks but we cannot guarantee that superblock
> write completely early than other blocks.
> So, We need to commit other metadata blocks before change superblock.
> 
> Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> ---
>  drivers/md/dm-thin-metadata.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> index 36ef284..897d7d6 100644
> --- a/drivers/md/dm-thin-metadata.c
> +++ b/drivers/md/dm-thin-metadata.c
> @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
>  	if (r)
>  		return r;
>  
> +	r = dm_tm_commit(pmd->tm, sblock);
> +	if (r)
> +		return r;
> +
> +	r = superblock_lock(pmd, &sblock);
> +	if (r)
> +		return r;
> +
>  	disk_super = dm_block_data(sblock);
>  	disk_super->time = cpu_to_le32(pmd->time);
>  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> -- 
> 1.7.1

Have you actually found this patch to be effective?  It should be
unnecessary.  But I must admit that in looking at the related code I
couldn't convince myself it was.

But then Joe pointed me to this comment block from
dm-transaction-manager.h:

/*
 * We use a 2-phase commit here.
 *
 * i) Make all changes for the transaction *except* for the superblock.
 * Then call dm_tm_pre_commit() to flush them to disk.
 *
 * ii) Lock your superblock.  Update.  Then call dm_tm_commit() which will
 * unlock the superblock and flush it.  No other blocks should be updated
 * during this period.  Care should be taken to never unlock a partially
 * updated superblock; perform any operations that could fail *before* you
 * take the superblock lock.
 */
int dm_tm_pre_commit(struct dm_transaction_manager *tm);
int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *superblock);

So given __commit_transaction() is using dm_tm_pre_commit() prior to the
dm_tm_commit() to flush the superblock -- it would seem that there isn't
any conceptual potential for corruption.

If you've found the dm_tm_pre_commit() to be lacking (whereby not all
metadata getting flushed to disk before the superblock) then please
explain your findings.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Joe Thornber June 19, 2018, 2:43 p.m. UTC | #2
On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> On Mon, May 21 2018 at  8:53pm -0400,
> Monty Pavel <monty_pavel@sina.com> wrote:
> 
> > 
> > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > func and power loss happens during executing it, coincidencely
> > superblock wrote correctly but some metadata blocks didn't. The reason
> > is we write all metadata in async mode. We can guarantee that we send
> > superblock after other blocks but we cannot guarantee that superblock
> > write completely early than other blocks.
> > So, We need to commit other metadata blocks before change superblock.
> > 
> > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > ---
> >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > index 36ef284..897d7d6 100644
> > --- a/drivers/md/dm-thin-metadata.c
> > +++ b/drivers/md/dm-thin-metadata.c
> > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> >  	if (r)
> >  		return r;
> >  
> > +	r = dm_tm_commit(pmd->tm, sblock);
> > +	if (r)
> > +		return r;
> > +
> > +	r = superblock_lock(pmd, &sblock);
> > +	if (r)
> > +		return r;
> > +
> >  	disk_super = dm_block_data(sblock);
> >  	disk_super->time = cpu_to_le32(pmd->time);
> >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);

I don't believe you've tested this; sblock is passed to dm_tm_commit()
uninitialised, and you didn't even bother to remove the later (and correct)
call to dm_tm_commit().  See dm-transaction-manager.h for an explanation of
how the 2-phase commit works.

What is the issue that started you looking in this area?

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 19, 2018, 3 p.m. UTC | #3
On Tue, Jun 19 2018 at 10:43am -0400,
Joe Thornber <thornber@redhat.com> wrote:

> On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > On Mon, May 21 2018 at  8:53pm -0400,
> > Monty Pavel <monty_pavel@sina.com> wrote:
> > 
> > > 
> > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > func and power loss happens during executing it, coincidencely
> > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > is we write all metadata in async mode. We can guarantee that we send
> > > superblock after other blocks but we cannot guarantee that superblock
> > > write completely early than other blocks.
> > > So, We need to commit other metadata blocks before change superblock.
> > > 
> > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > ---
> > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > index 36ef284..897d7d6 100644
> > > --- a/drivers/md/dm-thin-metadata.c
> > > +++ b/drivers/md/dm-thin-metadata.c
> > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > >  	if (r)
> > >  		return r;
> > >  
> > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > +	if (r)
> > > +		return r;
> > > +
> > > +	r = superblock_lock(pmd, &sblock);
> > > +	if (r)
> > > +		return r;
> > > +
> > >  	disk_super = dm_block_data(sblock);
> > >  	disk_super->time = cpu_to_le32(pmd->time);
> > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> 
> I don't believe you've tested this; sblock is passed to dm_tm_commit()
> uninitialised, and you didn't even bother to remove the later (and correct)
> call to dm_tm_commit().

I pointed out to Joe that the patch, in isolation, is decieving.  It
_looks_ like sblock may be uninitialized, etc.  But once the patch is
applied and you look at the entirety of __commit_transaction() it is
clear that you're reusing the existing superblock_lock() to safely
accomplish your additional call to dm_tm_commit().

> What is the issue that started you looking in this area?

Right, as my previous reply asked: please clarify if you _know_ your
patch fixes an actual problem you've experienced.  The more details the
better.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 20, 2018, 2:51 p.m. UTC | #4
On Wed, Jun 20 2018 at  1:03pm -0400,
monty <monty_pavel@sina.com> wrote:

> 
> On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> > 
> > On Tue, Jun 19 2018 at 10:43am -0400,
> > Joe Thornber <thornber@redhat.com> wrote:
> > 
> > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > > On Mon, May 21 2018 at  8:53pm -0400,
> > > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > > 
> > > > > 
> > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > > func and power loss happens during executing it, coincidencely
> > > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > > is we write all metadata in async mode. We can guarantee that we send
> > > > > superblock after other blocks but we cannot guarantee that superblock
> > > > > write completely early than other blocks.
> > > > > So, We need to commit other metadata blocks before change superblock.
> > > > > 
> > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > > ---
> > > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > > index 36ef284..897d7d6 100644
> > > > > --- a/drivers/md/dm-thin-metadata.c
> > > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > > >  	if (r)
> > > > >  		return r;
> > > > >  
> > > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > > +	if (r)
> > > > > +		return r;
> > > > > +
> > > > > +	r = superblock_lock(pmd, &sblock);
> > > > > +	if (r)
> > > > > +		return r;
> > > > > +
> > > > >  	disk_super = dm_block_data(sblock);
> > > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > > 
> > > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > > uninitialised, and you didn't even bother to remove the later (and correct)
> > > call to dm_tm_commit().
> > 
> > I pointed out to Joe that the patch, in isolation, is decieving.  It
> > _looks_ like sblock may be uninitialized, etc.  But once the patch is
> > applied and you look at the entirety of __commit_transaction() it is
> > clear that you're reusing the existing superblock_lock() to safely
> > accomplish your additional call to dm_tm_commit().
> > 
> > > What is the issue that started you looking in this area?
> > 
> > Right, as my previous reply asked: please clarify if you _know_ your
> > patch fixes an actual problem you've experienced.  The more details the
> > better.
> > 
> > Thanks,
> > Mike
> > 
> Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> many times and didn't find any problem of 2-phase commit. I use
> md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> metadata.
> Test case:
> 1. I do copy-diff test on thin device and then reboot my machine.
> 2. Rebuild our device stack and exec "vgchang -ay".
> The thin-pool can not be established(details_root become a bitmap node
> and metadata's bitmap_root become a btree_node).

But are you saying your double commit in __commit_transaction() serves
as a workaround for the corruption you're seeing?

Is it just a case where raid5's writebehind mode is _not_ safe for your
storage config?  By "reboot" do you mean a clean shutdown?  Or a forced
powerfail scenario?

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Joe Thornber June 20, 2018, 3 p.m. UTC | #5
On Wed, Jun 20, 2018 at 01:03:57PM -0400, monty wrote:
> Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> many times and didn't find any problem of 2-phase commit. I use
> md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> metadata.
> Test case:
> 1. I do copy-diff test on thin device and then reboot my machine.
> 2. Rebuild our device stack and exec "vgchang -ay".
> The thin-pool can not be established(details_root become a bitmap node
> and metadata's bitmap_root become a btree_node).

As you simplify your setup does the problem go away?  eg, turn off write-behind, use just the nvme dev etc.

The only effect of your change is to call flush twice rather than once.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
monty_pavel@sina.com June 20, 2018, 5:03 p.m. UTC | #6
On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> 
> On Tue, Jun 19 2018 at 10:43am -0400,
> Joe Thornber <thornber@redhat.com> wrote:
> 
> > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > On Mon, May 21 2018 at  8:53pm -0400,
> > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > 
> > > > 
> > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > func and power loss happens during executing it, coincidencely
> > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > is we write all metadata in async mode. We can guarantee that we send
> > > > superblock after other blocks but we cannot guarantee that superblock
> > > > write completely early than other blocks.
> > > > So, We need to commit other metadata blocks before change superblock.
> > > > 
> > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > ---
> > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > index 36ef284..897d7d6 100644
> > > > --- a/drivers/md/dm-thin-metadata.c
> > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > >  	if (r)
> > > >  		return r;
> > > >  
> > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > +	if (r)
> > > > +		return r;
> > > > +
> > > > +	r = superblock_lock(pmd, &sblock);
> > > > +	if (r)
> > > > +		return r;
> > > > +
> > > >  	disk_super = dm_block_data(sblock);
> > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > 
> > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > uninitialised, and you didn't even bother to remove the later (and correct)
> > call to dm_tm_commit().
> 
> I pointed out to Joe that the patch, in isolation, is decieving.  It
> _looks_ like sblock may be uninitialized, etc.  But once the patch is
> applied and you look at the entirety of __commit_transaction() it is
> clear that you're reusing the existing superblock_lock() to safely
> accomplish your additional call to dm_tm_commit().
> 
> > What is the issue that started you looking in this area?
> 
> Right, as my previous reply asked: please clarify if you _know_ your
> patch fixes an actual problem you've experienced.  The more details the
> better.
> 
> Thanks,
> Mike
> 
Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
many times and didn't find any problem of 2-phase commit. I use
md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
metadata.
Test case:
1. I do copy-diff test on thin device and then reboot my machine.
2. Rebuild our device stack and exec "vgchang -ay".
The thin-pool can not be established(details_root become a bitmap node
and metadata's bitmap_root become a btree_node).

Thanks,
Monty

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
monty_pavel@sina.com June 21, 2018, 4:54 p.m. UTC | #7
On Wed, Jun 20, 2018 at 10:51:17AM -0400, Mike Snitzer wrote:
> 
> On Wed, Jun 20 2018 at  1:03pm -0400,
> monty <monty_pavel@sina.com> wrote:
> 
> > 
> > On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote:
> > > 
> > > On Tue, Jun 19 2018 at 10:43am -0400,
> > > Joe Thornber <thornber@redhat.com> wrote:
> > > 
> > > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote:
> > > > > On Mon, May 21 2018 at  8:53pm -0400,
> > > > > Monty Pavel <monty_pavel@sina.com> wrote:
> > > > > 
> > > > > > 
> > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction
> > > > > > func and power loss happens during executing it, coincidencely
> > > > > > superblock wrote correctly but some metadata blocks didn't. The reason
> > > > > > is we write all metadata in async mode. We can guarantee that we send
> > > > > > superblock after other blocks but we cannot guarantee that superblock
> > > > > > write completely early than other blocks.
> > > > > > So, We need to commit other metadata blocks before change superblock.
> > > > > > 
> > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com>
> > > > > > ---
> > > > > >  drivers/md/dm-thin-metadata.c |    8 ++++++++
> > > > > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> > > > > > index 36ef284..897d7d6 100644
> > > > > > --- a/drivers/md/dm-thin-metadata.c
> > > > > > +++ b/drivers/md/dm-thin-metadata.c
> > > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd)
> > > > > >  	if (r)
> > > > > >  		return r;
> > > > > >  
> > > > > > +	r = dm_tm_commit(pmd->tm, sblock);
> > > > > > +	if (r)
> > > > > > +		return r;
> > > > > > +
> > > > > > +	r = superblock_lock(pmd, &sblock);
> > > > > > +	if (r)
> > > > > > +		return r;
> > > > > > +
> > > > > >  	disk_super = dm_block_data(sblock);
> > > > > >  	disk_super->time = cpu_to_le32(pmd->time);
> > > > > >  	disk_super->data_mapping_root = cpu_to_le64(pmd->root);
> > > > 
> > > > I don't believe you've tested this; sblock is passed to dm_tm_commit()
> > > > uninitialised, and you didn't even bother to remove the later (and correct)
> > > > call to dm_tm_commit().
> > > 
> > > I pointed out to Joe that the patch, in isolation, is decieving.  It
> > > _looks_ like sblock may be uninitialized, etc.  But once the patch is
> > > applied and you look at the entirety of __commit_transaction() it is
> > > clear that you're reusing the existing superblock_lock() to safely
> > > accomplish your additional call to dm_tm_commit().
> > > 
> > > > What is the issue that started you looking in this area?
> > > 
> > > Right, as my previous reply asked: please clarify if you _know_ your
> > > patch fixes an actual problem you've experienced.  The more details the
> > > better.
> > > 
> > > Thanks,
> > > Mike
> > > 
> > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction
> > many times and didn't find any problem of 2-phase commit. I use
> > md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin
> > metadata.
> > Test case:
> > 1. I do copy-diff test on thin device and then reboot my machine.
> > 2. Rebuild our device stack and exec "vgchang -ay".
> > The thin-pool can not be established(details_root become a bitmap node
> > and metadata's bitmap_root become a btree_node).
> 
> But are you saying your double commit in __commit_transaction() serves
> as a workaround for the corruption you're seeing?
> 
> Is it just a case where raid5's writebehind mode is _not_ safe for your
> storage config?  By "reboot" do you mean a clean shutdown?  Or a forced
> powerfail scenario?
> 
> Mike
> 
Reboot my machine means exec reboot command.
My patch seems unnecessary, because 2-phase commit have ensure that
committing superblock after other metadata have written to metadata
device.
This problem is hard to recreate, it didn't happen after applying the
patch above. I will check my device stack if there is any possible that
superblock write complete early than other metadata block.

Monty

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 36ef284..897d7d6 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -813,6 +813,14 @@  static int __commit_transaction(struct dm_pool_metadata *pmd)
 	if (r)
 		return r;
 
+	r = dm_tm_commit(pmd->tm, sblock);
+	if (r)
+		return r;
+
+	r = superblock_lock(pmd, &sblock);
+	if (r)
+		return r;
+
 	disk_super = dm_block_data(sblock);
 	disk_super->time = cpu_to_le32(pmd->time);
 	disk_super->data_mapping_root = cpu_to_le64(pmd->root);