diff mbox series

[-next] dm thin: Use last transaction's pmd->root when commit failed

Message ID 20221208142802.1479048-1-chengzhihao1@huawei.com (mailing list archive)
State New, archived
Headers show
Series [-next] dm thin: Use last transaction's pmd->root when commit failed | expand

Commit Message

Zhihao Cheng Dec. 8, 2022, 2:28 p.m. UTC
Recently we found a softlock up problem in dm thin pool looking up btree
caused by corrupted metadata:
 Kernel panic - not syncing: softlockup: hung tasks
 CPU: 7 PID: 2669225 Comm: kworker/u16:3
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 Workqueue: dm-thin do_worker [dm_thin_pool]
 Call Trace:
   <IRQ>
   dump_stack+0x9c/0xd3
   panic+0x35d/0x6b9
   watchdog_timer_fn.cold+0x16/0x25
   __run_hrtimer+0xa2/0x2d0
   </IRQ>
   RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio]
   __bufio_new+0x11f/0x4f0 [dm_bufio]
   new_read+0xa3/0x1e0 [dm_bufio]
   dm_bm_read_lock+0x33/0xd0 [dm_persistent_data]
   ro_step+0x63/0x100 [dm_persistent_data]
   btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data]
   dm_btree_lookup+0x16f/0x210 [dm_persistent_data]
   dm_thin_find_block+0x12c/0x210 [dm_thin_pool]
   __process_bio_read_only+0xc5/0x400 [dm_thin_pool]
   process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool]
   process_one_work+0x3c5/0x730

Following process may generate a broken btree mixed with fresh and stale
nodes, which could let dm thin trapped into an infinite loop while looking
up data block:
 Transaction 1: pmd->root = A, A->B->C   // One path in btree
                pmd->root = X, X->Y->Z   // Copy-up
 Transaction 2: X,Z is updated on disk, Y is written failed.
                // Commit failed, dm thin becomes read-only.
                process_bio_read_only
		 dm_thin_find_block
		  __find_block
		   dm_btree_lookup(pmd->root)
The pmd->root points to a broken btree, Y may contain stale node pointing
to any block, for example X, which lets dm thin trapped into a dead loop
while looking up Z.

Fix it by setting pmd->root in __open_metadata(), so that dm thin will use
last transaction's pmd->root if commit failed.

Fetch a reproducer in [Link].

Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790
Fixes: 991d9fa02da0 ("dm: add thin provisioning target")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/md/dm-thin-metadata.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Joe Thornber Dec. 8, 2022, 4:53 p.m. UTC | #1
Acked-by: Joe Thornber <ejt@redhat.com>

On Thu, Dec 8, 2022 at 2:07 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:

> Recently we found a softlock up problem in dm thin pool looking up btree
> caused by corrupted metadata:
>  Kernel panic - not syncing: softlockup: hung tasks
>  CPU: 7 PID: 2669225 Comm: kworker/u16:3
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>  Workqueue: dm-thin do_worker [dm_thin_pool]
>  Call Trace:
>    <IRQ>
>    dump_stack+0x9c/0xd3
>    panic+0x35d/0x6b9
>    watchdog_timer_fn.cold+0x16/0x25
>    __run_hrtimer+0xa2/0x2d0
>    </IRQ>
>    RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio]
>    __bufio_new+0x11f/0x4f0 [dm_bufio]
>    new_read+0xa3/0x1e0 [dm_bufio]
>    dm_bm_read_lock+0x33/0xd0 [dm_persistent_data]
>    ro_step+0x63/0x100 [dm_persistent_data]
>    btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data]
>    dm_btree_lookup+0x16f/0x210 [dm_persistent_data]
>    dm_thin_find_block+0x12c/0x210 [dm_thin_pool]
>    __process_bio_read_only+0xc5/0x400 [dm_thin_pool]
>    process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool]
>    process_one_work+0x3c5/0x730
>
> Following process may generate a broken btree mixed with fresh and stale
> nodes, which could let dm thin trapped into an infinite loop while looking
> up data block:
>  Transaction 1: pmd->root = A, A->B->C   // One path in btree
>                 pmd->root = X, X->Y->Z   // Copy-up
>  Transaction 2: X,Z is updated on disk, Y is written failed.
>                 // Commit failed, dm thin becomes read-only.
>                 process_bio_read_only
>                  dm_thin_find_block
>                   __find_block
>                    dm_btree_lookup(pmd->root)
> The pmd->root points to a broken btree, Y may contain stale node pointing
> to any block, for example X, which lets dm thin trapped into a dead loop
> while looking up Z.
>
> Fix it by setting pmd->root in __open_metadata(), so that dm thin will use
> last transaction's pmd->root if commit failed.
>
> Fetch a reproducer in [Link].
>
> Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790
> Fixes: 991d9fa02da0 ("dm: add thin provisioning target")
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> ---
>  drivers/md/dm-thin-metadata.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
> index 1a62226ac34e..26c42ee183ed 100644
> --- a/drivers/md/dm-thin-metadata.c
> +++ b/drivers/md/dm-thin-metadata.c
> @@ -724,6 +724,15 @@ static int __open_metadata(struct dm_pool_metadata
> *pmd)
>                 goto bad_cleanup_data_sm;
>         }
>
> +       /*
> +        * For pool metadata opening process, root setting is redundant
> +        * because it will be set again in __begin_transaction(). But dm
> +        * pool aborting process really needs to get last transaction's
> +        * root in case accessing broken btree.
> +        */
> +       pmd->root = le64_to_cpu(disk_super->data_mapping_root);
> +       pmd->details_root = le64_to_cpu(disk_super->device_details_root);
> +
>         __setup_btree_details(pmd);
>         dm_bm_unlock(sblock);
>
> --
> 2.31.1
>
>
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 1a62226ac34e..26c42ee183ed 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -724,6 +724,15 @@  static int __open_metadata(struct dm_pool_metadata *pmd)
 		goto bad_cleanup_data_sm;
 	}
 
+	/*
+	 * For pool metadata opening process, root setting is redundant
+	 * because it will be set again in __begin_transaction(). But dm
+	 * pool aborting process really needs to get last transaction's
+	 * root in case accessing broken btree.
+	 */
+	pmd->root = le64_to_cpu(disk_super->data_mapping_root);
+	pmd->details_root = le64_to_cpu(disk_super->device_details_root);
+
 	__setup_btree_details(pmd);
 	dm_bm_unlock(sblock);