diff mbox series

[-next] iommu/amd: fix a warning in increase_address_space

Message ID 1571254542-13998-1-git-send-email-cai@lca.pw (mailing list archive)
State Not Applicable
Headers show
Series [-next] iommu/amd: fix a warning in increase_address_space | expand

Commit Message

Qian Cai Oct. 16, 2019, 7:35 p.m. UTC
After the commit 754265bcab78 ("iommu/amd: Fix race in
increase_address_space()"), it could still possible trigger a race
condition under some heavy memory pressure below. The race to trigger a
warning is,

CPU0:				CPU1:
in alloc_pte():		in increase_address_space():
while (address > PM_LEVEL_SIZE(domain->mode)) [1]

				spin_lock_irqsave(&domain->lock
				domain->mode    += 1;
				spin_unlock_irqrestore(&domain->lock

in increase_address_space():
spin_lock_irqsave(&domain->lock
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))

[1] domain->mode = 5

It is unclear the triggering of the warning is the root cause of the
smartpqi offline yet, but let's fix it first by lifting the locking.

WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474
iommu_map_page+0x718/0x7e0
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec0000 flags=0x0010]
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec1000 flags=0x0010]
CPU: 57 PID: 124314 Comm: oom01 Tainted: G           O
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
RIP: 0010:iommu_map_page+0x718/0x7e0
Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8
08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48
8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48
RSP: 0018:ffff888da4816cb8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8885fe689000 RCX: ffffffff96f4a6c4
RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8885fe689124
RBP: ffff888da4816da8 R08: ffffed10bfcd120e R09: ffffed10bfcd120e
R10: ffffed10bfcd120d R11: ffff8885fe68906b R12: 0000000000000000
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec1a00 flags=0x0010]
R13: ffff8885fe689068 R14: ffff8885fe689124 R15: 0000000000000000
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec1e00 flags=0x0010]
FS:  00007f29722ba700(0000) GS:ffff88902f880000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f27f82d8000 CR3: 000000102ed9c000 CR4: 00000000003406e0
Call Trace:
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec2000 flags=0x0010]
 map_sg+0x1ce/0x2f0
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec2400 flags=0x0010]
 scsi_dma_map+0xd7/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi]
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec2800 flags=0x0010]
 pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec2c00 flags=0x0010]
 scsi_queue_rq+0xd19/0x1360
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec3000 flags=0x0010]
 __blk_mq_try_issue_directly+0x295/0x3f0
smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
address=0xffffffffffec3400 flags=0x0010]
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000
address=0xffffffffffec3800 flags=0x0010]
 blk_mq_request_issue_directly+0xb5/0x100
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000
address=0xffffffffffec3c00 flags=0x0010]
 blk_mq_try_issue_list_directly+0xa9/0x160
 blk_mq_sched_insert_requests+0x228/0x380
 blk_mq_flush_plug_list+0x448/0x7e0
 blk_flush_plug_list+0x1eb/0x230
 blk_finish_plug+0x43/0x5d
 shrink_node_memcg+0x9c5/0x1550
smartpqi 0000:23:00.0: controller is offline: status code 0x14803
smartpqi 0000:23:00.0: controller offline

Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Qian Cai <cai@lca.pw>
---

BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
suspicious. Not sure what the purpose of it.

*updated = increase_address_space(domain, gfp) || *updated;

 drivers/iommu/amd_iommu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jerry Snitselaar Oct. 16, 2019, 10:04 p.m. UTC | #1
On Wed Oct 16 19, Qian Cai wrote:
>After the commit 754265bcab78 ("iommu/amd: Fix race in
>increase_address_space()"), it could still possible trigger a race
>condition under some heavy memory pressure below. The race to trigger a
>warning is,
>
>CPU0:				CPU1:
>in alloc_pte():		in increase_address_space():
>while (address > PM_LEVEL_SIZE(domain->mode)) [1]
>
>				spin_lock_irqsave(&domain->lock
>				domain->mode    += 1;
>				spin_unlock_irqrestore(&domain->lock
>
>in increase_address_space():
>spin_lock_irqsave(&domain->lock
>if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
>
>[1] domain->mode = 5
>
>It is unclear the triggering of the warning is the root cause of the
>smartpqi offline yet, but let's fix it first by lifting the locking.
>
>WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474
>iommu_map_page+0x718/0x7e0
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec0000 flags=0x0010]
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec1000 flags=0x0010]
>CPU: 57 PID: 124314 Comm: oom01 Tainted: G           O
>Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
>07/10/2019
>RIP: 0010:iommu_map_page+0x718/0x7e0
>Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8
>08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48
>8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48
>RSP: 0018:ffff888da4816cb8 EFLAGS: 00010046
>RAX: 0000000000000000 RBX: ffff8885fe689000 RCX: ffffffff96f4a6c4
>RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8885fe689124
>RBP: ffff888da4816da8 R08: ffffed10bfcd120e R09: ffffed10bfcd120e
>R10: ffffed10bfcd120d R11: ffff8885fe68906b R12: 0000000000000000
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec1a00 flags=0x0010]
>R13: ffff8885fe689068 R14: ffff8885fe689124 R15: 0000000000000000
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec1e00 flags=0x0010]
>FS:  00007f29722ba700(0000) GS:ffff88902f880000(0000)
>knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 00007f27f82d8000 CR3: 000000102ed9c000 CR4: 00000000003406e0
>Call Trace:
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec2000 flags=0x0010]
> map_sg+0x1ce/0x2f0
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec2400 flags=0x0010]
> scsi_dma_map+0xd7/0x160
> pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi]
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec2800 flags=0x0010]
> pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec2c00 flags=0x0010]
> scsi_queue_rq+0xd19/0x1360
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec3000 flags=0x0010]
> __blk_mq_try_issue_directly+0x295/0x3f0
>smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000
>address=0xffffffffffec3400 flags=0x0010]
>AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000
>address=0xffffffffffec3800 flags=0x0010]
> blk_mq_request_issue_directly+0xb5/0x100
>AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000
>address=0xffffffffffec3c00 flags=0x0010]
> blk_mq_try_issue_list_directly+0xa9/0x160
> blk_mq_sched_insert_requests+0x228/0x380
> blk_mq_flush_plug_list+0x448/0x7e0
> blk_flush_plug_list+0x1eb/0x230
> blk_finish_plug+0x43/0x5d
> shrink_node_memcg+0x9c5/0x1550
>smartpqi 0000:23:00.0: controller is offline: status code 0x14803
>smartpqi 0000:23:00.0: controller offline
>
>Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
>Signed-off-by: Qian Cai <cai@lca.pw>
>---
>
>BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
>suspicious. Not sure what the purpose of it.
>
>*updated = increase_address_space(domain, gfp) || *updated;
>

Looking at it again I think that isn't an issue really, it would just
not lose updated being set in a previous loop iteration, but now
I'm wondering about the loop itself. In the cases where it would return
false, how does the evaluation of the condition for the while loop
change?

> drivers/iommu/amd_iommu.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>index 2369b8af81f3..a5754068aa29 100644
>--- a/drivers/iommu/amd_iommu.c
>+++ b/drivers/iommu/amd_iommu.c
>@@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain *domain)
> static bool increase_address_space(struct protection_domain *domain,
> 				   gfp_t gfp)
> {
>-	unsigned long flags;
> 	bool ret = false;
> 	u64 *pte;
>
>-	spin_lock_irqsave(&domain->lock, flags);
>-
> 	if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
> 		/* address space already 64 bit large */
> 		goto out;
>@@ -1487,8 +1484,6 @@ static bool increase_address_space(struct protection_domain *domain,
> 	ret = true;
>
> out:
>-	spin_unlock_irqrestore(&domain->lock, flags);
>-
> 	return ret;
> }
>
>@@ -1499,14 +1494,19 @@ static u64 *alloc_pte(struct protection_domain *domain,
> 		      gfp_t gfp,
> 		      bool *updated)
> {
>+	unsigned long flags;
> 	int level, end_lvl;
> 	u64 *pte, *page;
>
> 	BUG_ON(!is_power_of_2(page_size));
>
>+	spin_lock_irqsave(&domain->lock, flags);
>+
> 	while (address > PM_LEVEL_SIZE(domain->mode))
> 		*updated = increase_address_space(domain, gfp) || *updated;
>
>+	spin_unlock_irqrestore(&domain->lock, flags);
>+
> 	level   = domain->mode - 1;
> 	pte     = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
> 	address = PAGE_SIZE_ALIGN(address, page_size);
>-- 
>1.8.3.1
>
>_______________________________________________
>iommu mailing list
>iommu@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/iommu
Jerry Snitselaar Oct. 16, 2019, 10:58 p.m. UTC | #2
On Wed Oct 16 19, Jerry Snitselaar wrote:
>On Wed Oct 16 19, Qian Cai wrote:
>>
>>BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
>>suspicious. Not sure what the purpose of it.
>>
>>*updated = increase_address_space(domain, gfp) || *updated;
>>
>
>Looking at it again I think that isn't an issue really, it would just
>not lose updated being set in a previous loop iteration, but now
>I'm wondering about the loop itself. In the cases where it would return
>false, how does the evaluation of the condition for the while loop
>change?
>

I guess the mode level 6 check is really for other potential callers
increase_address_space, none exist at the moment, and the condition
of the while loop in alloc_pte should fail if the mode level is 6.
Qian Cai Oct. 17, 2019, 11:36 a.m. UTC | #3
> On Oct 16, 2019, at 6:59 PM, Jerry Snitselaar <jsnitsel@redhat.com> wrote:
> 
> I guess the mode level 6 check is really for other potential callers
> increase_address_space, none exist at the moment, and the condition
> of the while loop in alloc_pte should fail if the mode level is 6.

Because there is no locking around iommu_map_page(), if there are several concurrent callers of it for the same domain, could it be that it silently corrupt data due to invalid access?
diff mbox series

Patch

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2369b8af81f3..a5754068aa29 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1465,12 +1465,9 @@  static void free_pagetable(struct protection_domain *domain)
 static bool increase_address_space(struct protection_domain *domain,
 				   gfp_t gfp)
 {
-	unsigned long flags;
 	bool ret = false;
 	u64 *pte;
 
-	spin_lock_irqsave(&domain->lock, flags);
-
 	if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
 		/* address space already 64 bit large */
 		goto out;
@@ -1487,8 +1484,6 @@  static bool increase_address_space(struct protection_domain *domain,
 	ret = true;
 
 out:
-	spin_unlock_irqrestore(&domain->lock, flags);
-
 	return ret;
 }
 
@@ -1499,14 +1494,19 @@  static u64 *alloc_pte(struct protection_domain *domain,
 		      gfp_t gfp,
 		      bool *updated)
 {
+	unsigned long flags;
 	int level, end_lvl;
 	u64 *pte, *page;
 
 	BUG_ON(!is_power_of_2(page_size));
 
+	spin_lock_irqsave(&domain->lock, flags);
+
 	while (address > PM_LEVEL_SIZE(domain->mode))
 		*updated = increase_address_space(domain, gfp) || *updated;
 
+	spin_unlock_irqrestore(&domain->lock, flags);
+
 	level   = domain->mode - 1;
 	pte     = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
 	address = PAGE_SIZE_ALIGN(address, page_size);