Message ID | 20180522005336.GA30152@yyp. (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Mon, May 21 2018 at 8:53pm -0400, Monty Pavel <monty_pavel@sina.com> wrote: > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > func and power loss happens during executing it, coincidencely > superblock wrote correctly but some metadata blocks didn't. The reason > is we write all metadata in async mode. We can guarantee that we send > superblock after other blocks but we cannot guarantee that superblock > write completely early than other blocks. > So, We need to commit other metadata blocks before change superblock. > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > --- > drivers/md/dm-thin-metadata.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > index 36ef284..897d7d6 100644 > --- a/drivers/md/dm-thin-metadata.c > +++ b/drivers/md/dm-thin-metadata.c > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > if (r) > return r; > > + r = dm_tm_commit(pmd->tm, sblock); > + if (r) > + return r; > + > + r = superblock_lock(pmd, &sblock); > + if (r) > + return r; > + > disk_super = dm_block_data(sblock); > disk_super->time = cpu_to_le32(pmd->time); > disk_super->data_mapping_root = cpu_to_le64(pmd->root); > -- > 1.7.1 Have you actually found this patch to be effective? It should be unnecessary. But I must admit that in looking at the related code I couldn't convince myself it was. But then Joe pointed me to this comment block from dm-transaction-manager.h: /* * We use a 2-phase commit here. * * i) Make all changes for the transaction *except* for the superblock. * Then call dm_tm_pre_commit() to flush them to disk. * * ii) Lock your superblock. Update. Then call dm_tm_commit() which will * unlock the superblock and flush it. No other blocks should be updated * during this period. Care should be taken to never unlock a partially * updated superblock; perform any operations that could fail *before* you * take the superblock lock. */ int dm_tm_pre_commit(struct dm_transaction_manager *tm); int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *superblock); So given __commit_transaction() is using dm_tm_pre_commit() prior to the dm_tm_commit() to flush the superblock -- it would seem that there isn't any conceptual potential for corruption. If you've found the dm_tm_pre_commit() to be lacking (whereby not all metadata getting flushed to disk before the superblock) then please explain your findings. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote: > On Mon, May 21 2018 at 8:53pm -0400, > Monty Pavel <monty_pavel@sina.com> wrote: > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > > func and power loss happens during executing it, coincidencely > > superblock wrote correctly but some metadata blocks didn't. The reason > > is we write all metadata in async mode. We can guarantee that we send > > superblock after other blocks but we cannot guarantee that superblock > > write completely early than other blocks. > > So, We need to commit other metadata blocks before change superblock. > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > > --- > > drivers/md/dm-thin-metadata.c | 8 ++++++++ > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > > index 36ef284..897d7d6 100644 > > --- a/drivers/md/dm-thin-metadata.c > > +++ b/drivers/md/dm-thin-metadata.c > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > > if (r) > > return r; > > > > + r = dm_tm_commit(pmd->tm, sblock); > > + if (r) > > + return r; > > + > > + r = superblock_lock(pmd, &sblock); > > + if (r) > > + return r; > > + > > disk_super = dm_block_data(sblock); > > disk_super->time = cpu_to_le32(pmd->time); > > disk_super->data_mapping_root = cpu_to_le64(pmd->root); I don't believe you've tested this; sblock is passed to dm_tm_commit() uninitialised, and you didn't even bother to remove the later (and correct) call to dm_tm_commit(). See dm-transaction-manager.h for an explanation of how the 2-phase commit works. What is the issue that started you looking in this area? - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jun 19 2018 at 10:43am -0400, Joe Thornber <thornber@redhat.com> wrote: > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote: > > On Mon, May 21 2018 at 8:53pm -0400, > > Monty Pavel <monty_pavel@sina.com> wrote: > > > > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > > > func and power loss happens during executing it, coincidencely > > > superblock wrote correctly but some metadata blocks didn't. The reason > > > is we write all metadata in async mode. We can guarantee that we send > > > superblock after other blocks but we cannot guarantee that superblock > > > write completely early than other blocks. > > > So, We need to commit other metadata blocks before change superblock. > > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > > > --- > > > drivers/md/dm-thin-metadata.c | 8 ++++++++ > > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > > > index 36ef284..897d7d6 100644 > > > --- a/drivers/md/dm-thin-metadata.c > > > +++ b/drivers/md/dm-thin-metadata.c > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > > > if (r) > > > return r; > > > > > > + r = dm_tm_commit(pmd->tm, sblock); > > > + if (r) > > > + return r; > > > + > > > + r = superblock_lock(pmd, &sblock); > > > + if (r) > > > + return r; > > > + > > > disk_super = dm_block_data(sblock); > > > disk_super->time = cpu_to_le32(pmd->time); > > > disk_super->data_mapping_root = cpu_to_le64(pmd->root); > > I don't believe you've tested this; sblock is passed to dm_tm_commit() > uninitialised, and you didn't even bother to remove the later (and correct) > call to dm_tm_commit(). I pointed out to Joe that the patch, in isolation, is decieving. It _looks_ like sblock may be uninitialized, etc. But once the patch is applied and you look at the entirety of __commit_transaction() it is clear that you're reusing the existing superblock_lock() to safely accomplish your additional call to dm_tm_commit(). > What is the issue that started you looking in this area? Right, as my previous reply asked: please clarify if you _know_ your patch fixes an actual problem you've experienced. The more details the better. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 20 2018 at 1:03pm -0400, monty <monty_pavel@sina.com> wrote: > > On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote: > > > > On Tue, Jun 19 2018 at 10:43am -0400, > > Joe Thornber <thornber@redhat.com> wrote: > > > > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote: > > > > On Mon, May 21 2018 at 8:53pm -0400, > > > > Monty Pavel <monty_pavel@sina.com> wrote: > > > > > > > > > > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > > > > > func and power loss happens during executing it, coincidencely > > > > > superblock wrote correctly but some metadata blocks didn't. The reason > > > > > is we write all metadata in async mode. We can guarantee that we send > > > > > superblock after other blocks but we cannot guarantee that superblock > > > > > write completely early than other blocks. > > > > > So, We need to commit other metadata blocks before change superblock. > > > > > > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > > > > > --- > > > > > drivers/md/dm-thin-metadata.c | 8 ++++++++ > > > > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > > > > > index 36ef284..897d7d6 100644 > > > > > --- a/drivers/md/dm-thin-metadata.c > > > > > +++ b/drivers/md/dm-thin-metadata.c > > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > > > > > if (r) > > > > > return r; > > > > > > > > > > + r = dm_tm_commit(pmd->tm, sblock); > > > > > + if (r) > > > > > + return r; > > > > > + > > > > > + r = superblock_lock(pmd, &sblock); > > > > > + if (r) > > > > > + return r; > > > > > + > > > > > disk_super = dm_block_data(sblock); > > > > > disk_super->time = cpu_to_le32(pmd->time); > > > > > disk_super->data_mapping_root = cpu_to_le64(pmd->root); > > > > > > I don't believe you've tested this; sblock is passed to dm_tm_commit() > > > uninitialised, and you didn't even bother to remove the later (and correct) > > > call to dm_tm_commit(). > > > > I pointed out to Joe that the patch, in isolation, is decieving. It > > _looks_ like sblock may be uninitialized, etc. But once the patch is > > applied and you look at the entirety of __commit_transaction() it is > > clear that you're reusing the existing superblock_lock() to safely > > accomplish your additional call to dm_tm_commit(). > > > > > What is the issue that started you looking in this area? > > > > Right, as my previous reply asked: please clarify if you _know_ your > > patch fixes an actual problem you've experienced. The more details the > > better. > > > > Thanks, > > Mike > > > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction > many times and didn't find any problem of 2-phase commit. I use > md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin > metadata. > Test case: > 1. I do copy-diff test on thin device and then reboot my machine. > 2. Rebuild our device stack and exec "vgchang -ay". > The thin-pool can not be established(details_root become a bitmap node > and metadata's bitmap_root become a btree_node). But are you saying your double commit in __commit_transaction() serves as a workaround for the corruption you're seeing? Is it just a case where raid5's writebehind mode is _not_ safe for your storage config? By "reboot" do you mean a clean shutdown? Or a forced powerfail scenario? Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 20, 2018 at 01:03:57PM -0400, monty wrote: > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction > many times and didn't find any problem of 2-phase commit. I use > md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin > metadata. > Test case: > 1. I do copy-diff test on thin device and then reboot my machine. > 2. Rebuild our device stack and exec "vgchang -ay". > The thin-pool can not be established(details_root become a bitmap node > and metadata's bitmap_root become a btree_node). As you simplify your setup does the problem go away? eg, turn off write-behind, use just the nvme dev etc. The only effect of your change is to call flush twice rather than once. - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote: > > On Tue, Jun 19 2018 at 10:43am -0400, > Joe Thornber <thornber@redhat.com> wrote: > > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote: > > > On Mon, May 21 2018 at 8:53pm -0400, > > > Monty Pavel <monty_pavel@sina.com> wrote: > > > > > > > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > > > > func and power loss happens during executing it, coincidencely > > > > superblock wrote correctly but some metadata blocks didn't. The reason > > > > is we write all metadata in async mode. We can guarantee that we send > > > > superblock after other blocks but we cannot guarantee that superblock > > > > write completely early than other blocks. > > > > So, We need to commit other metadata blocks before change superblock. > > > > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > > > > --- > > > > drivers/md/dm-thin-metadata.c | 8 ++++++++ > > > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > > > > index 36ef284..897d7d6 100644 > > > > --- a/drivers/md/dm-thin-metadata.c > > > > +++ b/drivers/md/dm-thin-metadata.c > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > > > > if (r) > > > > return r; > > > > > > > > + r = dm_tm_commit(pmd->tm, sblock); > > > > + if (r) > > > > + return r; > > > > + > > > > + r = superblock_lock(pmd, &sblock); > > > > + if (r) > > > > + return r; > > > > + > > > > disk_super = dm_block_data(sblock); > > > > disk_super->time = cpu_to_le32(pmd->time); > > > > disk_super->data_mapping_root = cpu_to_le64(pmd->root); > > > > I don't believe you've tested this; sblock is passed to dm_tm_commit() > > uninitialised, and you didn't even bother to remove the later (and correct) > > call to dm_tm_commit(). > > I pointed out to Joe that the patch, in isolation, is decieving. It > _looks_ like sblock may be uninitialized, etc. But once the patch is > applied and you look at the entirety of __commit_transaction() it is > clear that you're reusing the existing superblock_lock() to safely > accomplish your additional call to dm_tm_commit(). > > > What is the issue that started you looking in this area? > > Right, as my previous reply asked: please clarify if you _know_ your > patch fixes an actual problem you've experienced. The more details the > better. > > Thanks, > Mike > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction many times and didn't find any problem of 2-phase commit. I use md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin metadata. Test case: 1. I do copy-diff test on thin device and then reboot my machine. 2. Rebuild our device stack and exec "vgchang -ay". The thin-pool can not be established(details_root become a bitmap node and metadata's bitmap_root become a btree_node). Thanks, Monty -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Jun 20, 2018 at 10:51:17AM -0400, Mike Snitzer wrote: > > On Wed, Jun 20 2018 at 1:03pm -0400, > monty <monty_pavel@sina.com> wrote: > > > > > On Tue, Jun 19, 2018 at 11:00:32AM -0400, Mike Snitzer wrote: > > > > > > On Tue, Jun 19 2018 at 10:43am -0400, > > > Joe Thornber <thornber@redhat.com> wrote: > > > > > > > On Tue, Jun 19, 2018 at 09:11:06AM -0400, Mike Snitzer wrote: > > > > > On Mon, May 21 2018 at 8:53pm -0400, > > > > > Monty Pavel <monty_pavel@sina.com> wrote: > > > > > > > > > > > > > > > > > If dm_bufio_write_dirty_buffers func is called by __commit_transaction > > > > > > func and power loss happens during executing it, coincidencely > > > > > > superblock wrote correctly but some metadata blocks didn't. The reason > > > > > > is we write all metadata in async mode. We can guarantee that we send > > > > > > superblock after other blocks but we cannot guarantee that superblock > > > > > > write completely early than other blocks. > > > > > > So, We need to commit other metadata blocks before change superblock. > > > > > > > > > > > > Signed-off-by: Monty Pavel <monty_pavel@sina.com> > > > > > > --- > > > > > > drivers/md/dm-thin-metadata.c | 8 ++++++++ > > > > > > 1 files changed, 8 insertions(+), 0 deletions(-) > > > > > > > > > > > > diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c > > > > > > index 36ef284..897d7d6 100644 > > > > > > --- a/drivers/md/dm-thin-metadata.c > > > > > > +++ b/drivers/md/dm-thin-metadata.c > > > > > > @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) > > > > > > if (r) > > > > > > return r; > > > > > > > > > > > > + r = dm_tm_commit(pmd->tm, sblock); > > > > > > + if (r) > > > > > > + return r; > > > > > > + > > > > > > + r = superblock_lock(pmd, &sblock); > > > > > > + if (r) > > > > > > + return r; > > > > > > + > > > > > > disk_super = dm_block_data(sblock); > > > > > > disk_super->time = cpu_to_le32(pmd->time); > > > > > > disk_super->data_mapping_root = cpu_to_le64(pmd->root); > > > > > > > > I don't believe you've tested this; sblock is passed to dm_tm_commit() > > > > uninitialised, and you didn't even bother to remove the later (and correct) > > > > call to dm_tm_commit(). > > > > > > I pointed out to Joe that the patch, in isolation, is decieving. It > > > _looks_ like sblock may be uninitialized, etc. But once the patch is > > > applied and you look at the entirety of __commit_transaction() it is > > > clear that you're reusing the existing superblock_lock() to safely > > > accomplish your additional call to dm_tm_commit(). > > > > > > > What is the issue that started you looking in this area? > > > > > > Right, as my previous reply asked: please clarify if you _know_ your > > > patch fixes an actual problem you've experienced. The more details the > > > better. > > > > > > Thanks, > > > Mike > > > > > Hi, Mike and Joe. Thanks for your reply. I read __commit_transaction > > many times and didn't find any problem of 2-phase commit. I use > > md-raid1(PCIe nvme and md-raid5) in write-behind mode to store dm-thin > > metadata. > > Test case: > > 1. I do copy-diff test on thin device and then reboot my machine. > > 2. Rebuild our device stack and exec "vgchang -ay". > > The thin-pool can not be established(details_root become a bitmap node > > and metadata's bitmap_root become a btree_node). > > But are you saying your double commit in __commit_transaction() serves > as a workaround for the corruption you're seeing? > > Is it just a case where raid5's writebehind mode is _not_ safe for your > storage config? By "reboot" do you mean a clean shutdown? Or a forced > powerfail scenario? > > Mike > Reboot my machine means exec reboot command. My patch seems unnecessary, because 2-phase commit have ensure that committing superblock after other metadata have written to metadata device. This problem is hard to recreate, it didn't happen after applying the patch above. I will check my device stack if there is any possible that superblock write complete early than other metadata block. Monty -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c index 36ef284..897d7d6 100644 --- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -813,6 +813,14 @@ static int __commit_transaction(struct dm_pool_metadata *pmd) if (r) return r; + r = dm_tm_commit(pmd->tm, sblock); + if (r) + return r; + + r = superblock_lock(pmd, &sblock); + if (r) + return r; + disk_super = dm_block_data(sblock); disk_super->time = cpu_to_le32(pmd->time); disk_super->data_mapping_root = cpu_to_le64(pmd->root);
If dm_bufio_write_dirty_buffers func is called by __commit_transaction func and power loss happens during executing it, coincidencely superblock wrote correctly but some metadata blocks didn't. The reason is we write all metadata in async mode. We can guarantee that we send superblock after other blocks but we cannot guarantee that superblock write completely early than other blocks. So, We need to commit other metadata blocks before change superblock. Signed-off-by: Monty Pavel <monty_pavel@sina.com> --- drivers/md/dm-thin-metadata.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)