diff mbox series

xen/blkback: Avoid unmapping unmapped grant pages

Message ID 20191126153605.27564-1-sjpark@amazon.com (mailing list archive)
State New, archived
Headers show
Series xen/blkback: Avoid unmapping unmapped grant pages | expand

Commit Message

SeongJae Park Nov. 26, 2019, 3:36 p.m. UTC
From: SeongJae Park <sjpark@amazon.de>

For each I/O request, blkback first maps the foreign pages for the
request to its local pages.  If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.

However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped.  Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure.  On a
system that fails the allocation frequently, this problem leads to
following kernel crash.

  [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
  [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
  [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
  [  372.012562] Oops: 0002 [#1] SMP
  [  372.012566] Modules linked in: act_police sch_ingress cls_u32
  ...
  [  372.012746] Call Trace:
  [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
  [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
  ...
  [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
  ...
  Decompressing Linux... Parsing ELF... done.
  Booting the kernel.
  [    0.000000] Initializing cgroup subsys cpuset

This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.

Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: David Woodhouse <dwmw@amazon.de>
Reviewed-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk>
---
 drivers/block/xen-blkback/blkback.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Roger Pau Monné Nov. 27, 2019, 9:13 a.m. UTC | #1
On Tue, Nov 26, 2019 at 04:36:05PM +0100, SeongJae Park wrote:
> From: SeongJae Park <sjpark@amazon.de>
> 
> For each I/O request, blkback first maps the foreign pages for the
> request to its local pages.  If an allocation of a local page for the
> mapping fails, it should unmap every mapping already made for the
> request.
> 
> However, blkback's handling mechanism for the allocation failure does
> not mark the remaining foreign pages as unmapped.  Therefore, the unmap
> function merely tries to unmap every valid grant page for the request,
> including the pages not mapped due to the allocation failure.  On a
> system that fails the allocation frequently, this problem leads to
> following kernel crash.
> 
>   [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
>   [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
>   [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
>   [  372.012562] Oops: 0002 [#1] SMP
>   [  372.012566] Modules linked in: act_police sch_ingress cls_u32
>   ...
>   [  372.012746] Call Trace:
>   [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
>   [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
>   ...
>   [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
>   ...
>   Decompressing Linux... Parsing ELF... done.
>   Booting the kernel.
>   [    0.000000] Initializing cgroup subsys cpuset
> 
> This commit fixes this problem by marking the grant pages of the given
> request that didn't mapped due to the allocation failure as invalid.
> 
> Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")
> 
> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> Reviewed-by: David Woodhouse <dwmw@amazon.de>
> Reviewed-by: Maximilian Heyne <mheyne@amazon.de>
> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
SeongJae Park Dec. 3, 2019, 2:33 p.m. UTC | #2
On 27.11.19 10:13, Roger Pau Monné wrote:
> On Tue, Nov 26, 2019 at 04:36:05PM +0100, SeongJae Park wrote:
>> From: SeongJae Park <sjpark@amazon.de>
>>
>> For each I/O request, blkback first maps the foreign pages for the
>> request to its local pages.  If an allocation of a local page for the
>> mapping fails, it should unmap every mapping already made for the
>> request.
>>
>> However, blkback's handling mechanism for the allocation failure does
>> not mark the remaining foreign pages as unmapped.  Therefore, the unmap
>> function merely tries to unmap every valid grant page for the request,
>> including the pages not mapped due to the allocation failure.  On a
>> system that fails the allocation frequently, this problem leads to
>> following kernel crash.
>>
>>   [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
>>   [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
>>   [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
>>   [  372.012562] Oops: 0002 [#1] SMP
>>   [  372.012566] Modules linked in: act_police sch_ingress cls_u32
>>   ...
>>   [  372.012746] Call Trace:
>>   [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
>>   [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
>>   ...
>>   [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
>>   ...
>>   Decompressing Linux... Parsing ELF... done.
>>   Booting the kernel.
>>   [    0.000000] Initializing cgroup subsys cpuset
>>
>> This commit fixes this problem by marking the grant pages of the given
>> request that didn't mapped due to the allocation failure as invalid.
>>
>> Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")
>>
>> Signed-off-by: SeongJae Park <sjpark@amazon.de>
>> Reviewed-by: David Woodhouse <dwmw@amazon.de>
>> Reviewed-by: Maximilian Heyne <mheyne@amazon.de>
>> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Thanks, Roger.


May I ask some more comments?



Thanks,

SeongJae Park
Jens Axboe Dec. 3, 2019, 9:03 p.m. UTC | #3
On 11/26/19 8:36 AM, SeongJae Park wrote:
> From: SeongJae Park <sjpark@amazon.de>
> 
> For each I/O request, blkback first maps the foreign pages for the
> request to its local pages.  If an allocation of a local page for the
> mapping fails, it should unmap every mapping already made for the
> request.
> 
> However, blkback's handling mechanism for the allocation failure does
> not mark the remaining foreign pages as unmapped.  Therefore, the unmap
> function merely tries to unmap every valid grant page for the request,
> including the pages not mapped due to the allocation failure.  On a
> system that fails the allocation frequently, this problem leads to
> following kernel crash.
> 
>    [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
>    [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
>    [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
>    [  372.012562] Oops: 0002 [#1] SMP
>    [  372.012566] Modules linked in: act_police sch_ingress cls_u32
>    ...
>    [  372.012746] Call Trace:
>    [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
>    [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
>    ...
>    [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
>    ...
>    Decompressing Linux... Parsing ELF... done.
>    Booting the kernel.
>    [    0.000000] Initializing cgroup subsys cpuset
> 
> This commit fixes this problem by marking the grant pages of the given
> request that didn't mapped due to the allocation failure as invalid.
> 
> Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

Queued up with Roger's reviewed-by.
diff mbox series

Patch

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..3666afa639d1 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -936,6 +936,8 @@  static int xen_blkbk_map(struct xen_blkif_ring *ring,
 out_of_memory:
 	pr_alert("%s: out of memory\n", __func__);
 	put_free_pages(ring, pages_to_gnt, segs_to_map);
+	for (i = last_map; i < num; i++)
+		pages[i]->handle = BLKBACK_INVALID_HANDLE;
 	return -ENOMEM;
 }