diff mbox series

[2/8] scsi: take the DMA max mapping size into account

Message ID 20190617122000.22181-3-hch@lst.de (mailing list archive)
State Changes Requested
Headers show
Series [1/8] scsi: add a host / host template field for the virt boundary | expand

Commit Message

Christoph Hellwig June 17, 2019, 12:19 p.m. UTC
We need to limit the devices max_sectors to what the DMA mapping
implementation can support.  If not we risk running out of swiotlb
buffers easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Bart Van Assche June 17, 2019, 8:56 p.m. UTC | #1
On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> We need to limit the devices max_sectors to what the DMA mapping
> implementation can support.  If not we risk running out of swiotlb
> buffers easily.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/scsi_lib.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index d333bb6b1c59..f233bfd84cd7 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>   		blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
>   	}
>   
> +	shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> +			dma_max_mapping_size(dev) << SECTOR_SHIFT);
>   	blk_queue_max_hw_sectors(q, shost->max_sectors);
>   	if (shost->unchecked_isa_dma)
>   		blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);

Does dma_max_mapping_size() return a value in bytes? Is 
shost->max_sectors a number of sectors? If so, are you sure that "<< 
SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">> 
SECTOR_SHIFT" instead?

Additionally, how about adding a comment above dma_max_mapping_size() 
that documents the unit of the returned number?

Thanks,

Bart.
Ming Lei July 22, 2019, 6 a.m. UTC | #2
On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> > We need to limit the devices max_sectors to what the DMA mapping
> > implementation can support.  If not we risk running out of swiotlb
> > buffers easily.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >   drivers/scsi/scsi_lib.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index d333bb6b1c59..f233bfd84cd7 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> >               blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> >       }
> >
> > +     shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> > +                     dma_max_mapping_size(dev) << SECTOR_SHIFT);
> >       blk_queue_max_hw_sectors(q, shost->max_sectors);
> >       if (shost->unchecked_isa_dma)
> >               blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
>
> Does dma_max_mapping_size() return a value in bytes? Is
> shost->max_sectors a number of sectors? If so, are you sure that "<<
> SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
> SECTOR_SHIFT" instead?

Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.

Also the following kernel oops is triggered on qemu, and looks
device->dma_mask is NULL.

[    5.826483] scsi host0: Virtio SCSI HBA
[    5.829302] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[    5.831042] SCSI Media Changer driver v0.25
[    5.832491] ==================================================================
[    5.833332] BUG: KASAN: null-ptr-deref in
dma_direct_max_mapping_size+0x30/0x94
[    5.833332] Read of size 8 at addr 0000000000000000 by task kworker/u17:0/7
[    5.835506] nvme nvme0: pci function 0000:00:07.0
[    5.833332]
[    5.833332] CPU: 2 PID: 7 Comm: kworker/u17:0 Not tainted 5.3.0-rc1 #1328
[    5.836999] ahci 0000:00:1f.2: version 3.0
[    5.833332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS ?-20180724_192412-buildhw-07.phx4
[    5.833332] Workqueue: events_unbound async_run_entry_fn
[    5.833332] Call Trace:
[    5.833332]  dump_stack+0x6f/0x9d
[    5.833332]  ? dma_direct_max_mapping_size+0x30/0x94
[    5.833332]  __kasan_report+0x161/0x189
[    5.833332]  ? dma_direct_max_mapping_size+0x30/0x94
[    5.833332]  kasan_report+0xe/0x12
[    5.833332]  dma_direct_max_mapping_size+0x30/0x94
[    5.833332]  __scsi_init_queue+0xd8/0x1f3
[    5.833332]  scsi_mq_alloc_queue+0x62/0x89
[    5.833332]  scsi_alloc_sdev+0x38c/0x479
[    5.833332]  scsi_probe_and_add_lun+0x22d/0x1093
[    5.833332]  ? kobject_set_name_vargs+0xa4/0xb2
[    5.833332]  ? mutex_lock+0x88/0xc4
[    5.833332]  ? scsi_free_host_dev+0x4a/0x4a
[    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
[    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
[    5.833332]  ? ata_tdev_match+0x22/0x45
[    5.833332]  ? attribute_container_add_device+0x160/0x17e
[    5.833332]  ? rpm_resume+0x26a/0x7c0
[    5.833332]  ? kobject_get+0x12/0x43
[    5.833332]  ? rpm_put_suppliers+0x7e/0x7e
[    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
[    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
[    5.833332]  ? scsi_target_destroy+0x135/0x135
[    5.833332]  __scsi_scan_target+0x14b/0x6aa
[    5.833332]  ? pvclock_clocksource_read+0xc0/0x14e
[    5.833332]  ? scsi_add_device+0x20/0x20
[    5.833332]  ? rpm_resume+0x1ae/0x7c0
[    5.833332]  ? rpm_put_suppliers+0x7e/0x7e
[    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
[    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
[    5.833332]  ? pick_next_task_fair+0x976/0xa3d
[    5.833332]  ? mutex_lock+0x88/0xc4
[    5.833332]  scsi_scan_channel+0x76/0x9e
[    5.833332]  scsi_scan_host_selected+0x131/0x176
[    5.833332]  ? scsi_scan_host+0x241/0x241
[    5.833332]  do_scan_async+0x27/0x219
[    5.833332]  ? scsi_scan_host+0x241/0x241
[    5.833332]  async_run_entry_fn+0xdc/0x23d
[    5.833332]  process_one_work+0x327/0x539
[    5.833332]  worker_thread+0x330/0x492
[    5.833332]  ? rescuer_thread+0x41f/0x41f
[    5.833332]  kthread+0x1c6/0x1d5
[    5.833332]  ? kthread_park+0xd3/0xd3
[    5.833332]  ret_from_fork+0x1f/0x30
[    5.833332] ==================================================================



Thanks,
Ming Lei
Dexuan-Linux Cui July 22, 2019, 6:18 a.m. UTC | #3
On Sun, Jul 21, 2019 at 11:01 PM Ming Lei <tom.leiming@gmail.com> wrote:
>
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> > > We need to limit the devices max_sectors to what the DMA mapping
> > > implementation can support.  If not we risk running out of swiotlb
> > > buffers easily.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >   drivers/scsi/scsi_lib.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index d333bb6b1c59..f233bfd84cd7 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> > >               blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> > >       }
> > >
> > > +     shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> > > +                     dma_max_mapping_size(dev) << SECTOR_SHIFT);
> > >       blk_queue_max_hw_sectors(q, shost->max_sectors);
> > >       if (shost->unchecked_isa_dma)
> > >               blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
> >
> > Does dma_max_mapping_size() return a value in bytes? Is
> > shost->max_sectors a number of sectors? If so, are you sure that "<<
> > SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
> > SECTOR_SHIFT" instead?
>
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
>
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.
>
> Ming Lei

FYI: we also see the panic with a Linux kernel 5.2.0-next-20190719
running on Hyper-V:

[    7.429053] RIP: 0010:dma_direct_max_mapping_size+0x26/0x80
[    7.429053] Code: 0f b6 c0 c3 0f 1f 44 00 00 55 48 89 e5 41 54 53
48 89 fb e8 4c 14 00 00 84 c0 74 45 48 8b 83 28 02 00 00 4c 8b a3 38
02 00 00 <48> 8b 00 48 85 c0 74 0c 4d 85 e4 74 36 49 39 c4 4c 0f 47 e0
48 89
[    7.429053] RSP: 0018:ffffc1d5005efbc0 EFLAGS: 00010202
[    7.429053] RAX: 0000000000000000 RBX: ffff9cf86d24c428 RCX: 0000000000000000
[    7.429053] RDX: ffff9cf86d12dd00 RSI: 0000000000000200 RDI: ffff9cf86d24c428
[    7.429053] RBP: ffffc1d5005efbd0 R08: ffff9cf86fcaf0e0 R09: ffff9cf86e0072c0
[    7.429053] R10: ffffc1d5005efa70 R11: 00000000000301a0 R12: 0000000000000000
[    7.429053] R13: ffff9cf86d24c428 R14: 0000000000000400 R15: ffff9cf825cff000
[    7.429053] FS:  0000000000000000(0000) GS:ffff9cf86fc80000(0000)
knlGS:0000000000000000
[    7.429053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.429053] CR2: 0000000000000000 CR3: 00000003c700a001 CR4: 00000000003606e0
[    7.456569] NET: Registered protocol family 17
[    7.429053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.469803] Key type dns_resolver registered
[    7.429053] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.429053] Call Trace:
[    7.429053]  dma_max_mapping_size+0x39/0x50
[    7.429053]  __scsi_init_queue+0x7f/0x140
[    7.429053]  scsi_mq_alloc_queue+0x38/0x60
[    7.429053]  scsi_alloc_sdev+0x1da/0x2b0
[    7.429053]  scsi_probe_and_add_lun+0x471/0xe60
[    7.429053]  __scsi_scan_target+0xfc/0x610
[    7.429053]  scsi_scan_channel+0x66/0xa0
[    7.429053]  scsi_scan_host_selected+0xf3/0x160
[    7.429053]  do_scsi_scan_host+0x93/0xa0
[    7.429053]  do_scan_async+0x1c/0x190
[    7.429053]  async_run_entry_fn+0x3c/0x150
[    7.429053]  process_one_work+0x1f7/0x3f0
[    7.429053]  worker_thread+0x34/0x400
[    7.429053]  kthread+0x121/0x140
[    7.429053]  ret_from_fork+0x35/0x40
[    7.429053] Modules linked in:
[    7.429053] CR2: 0000000000000000
[    7.766122] BUG: kernel NULL pointer dereference, address: 0000000000000000

Thanks,
-- Dexuan
Damien Le Moal July 22, 2019, 7:40 a.m. UTC | #4
On 2019/07/22 15:01, Ming Lei wrote:
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> On 6/17/19 5:19 AM, Christoph Hellwig wrote:
>>> We need to limit the devices max_sectors to what the DMA mapping
>>> implementation can support.  If not we risk running out of swiotlb
>>> buffers easily.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>   drivers/scsi/scsi_lib.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index d333bb6b1c59..f233bfd84cd7 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>>>               blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
>>>       }
>>>
>>> +     shost->max_sectors = min_t(unsigned int, shost->max_sectors,
>>> +                     dma_max_mapping_size(dev) << SECTOR_SHIFT);
>>>       blk_queue_max_hw_sectors(q, shost->max_sectors);
>>>       if (shost->unchecked_isa_dma)
>>>               blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
>>
>> Does dma_max_mapping_size() return a value in bytes? Is
>> shost->max_sectors a number of sectors? If so, are you sure that "<<
>> SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
>> SECTOR_SHIFT" instead?
> 
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
> 
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.

Just hit the exact same problem using tcmu-runner (ZBC file handler) on bare
metal (no QEMU). dev->dma_mask is NULL. No problem with real disks though.

> 
> [    5.826483] scsi host0: Virtio SCSI HBA
> [    5.829302] st: Version 20160209, fixed bufsize 32768, s/g segs 256
> [    5.831042] SCSI Media Changer driver v0.25
> [    5.832491] ==================================================================
> [    5.833332] BUG: KASAN: null-ptr-deref in
> dma_direct_max_mapping_size+0x30/0x94
> [    5.833332] Read of size 8 at addr 0000000000000000 by task kworker/u17:0/7
> [    5.835506] nvme nvme0: pci function 0000:00:07.0
> [    5.833332]
> [    5.833332] CPU: 2 PID: 7 Comm: kworker/u17:0 Not tainted 5.3.0-rc1 #1328
> [    5.836999] ahci 0000:00:1f.2: version 3.0
> [    5.833332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS ?-20180724_192412-buildhw-07.phx4
> [    5.833332] Workqueue: events_unbound async_run_entry_fn
> [    5.833332] Call Trace:
> [    5.833332]  dump_stack+0x6f/0x9d
> [    5.833332]  ? dma_direct_max_mapping_size+0x30/0x94
> [    5.833332]  __kasan_report+0x161/0x189
> [    5.833332]  ? dma_direct_max_mapping_size+0x30/0x94
> [    5.833332]  kasan_report+0xe/0x12
> [    5.833332]  dma_direct_max_mapping_size+0x30/0x94
> [    5.833332]  __scsi_init_queue+0xd8/0x1f3
> [    5.833332]  scsi_mq_alloc_queue+0x62/0x89
> [    5.833332]  scsi_alloc_sdev+0x38c/0x479
> [    5.833332]  scsi_probe_and_add_lun+0x22d/0x1093
> [    5.833332]  ? kobject_set_name_vargs+0xa4/0xb2
> [    5.833332]  ? mutex_lock+0x88/0xc4
> [    5.833332]  ? scsi_free_host_dev+0x4a/0x4a
> [    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
> [    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
> [    5.833332]  ? ata_tdev_match+0x22/0x45
> [    5.833332]  ? attribute_container_add_device+0x160/0x17e
> [    5.833332]  ? rpm_resume+0x26a/0x7c0
> [    5.833332]  ? kobject_get+0x12/0x43
> [    5.833332]  ? rpm_put_suppliers+0x7e/0x7e
> [    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
> [    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
> [    5.833332]  ? scsi_target_destroy+0x135/0x135
> [    5.833332]  __scsi_scan_target+0x14b/0x6aa
> [    5.833332]  ? pvclock_clocksource_read+0xc0/0x14e
> [    5.833332]  ? scsi_add_device+0x20/0x20
> [    5.833332]  ? rpm_resume+0x1ae/0x7c0
> [    5.833332]  ? rpm_put_suppliers+0x7e/0x7e
> [    5.833332]  ? _raw_spin_lock_irqsave+0x8c/0xde
> [    5.833332]  ? _raw_write_unlock_irqrestore+0x23/0x23
> [    5.833332]  ? pick_next_task_fair+0x976/0xa3d
> [    5.833332]  ? mutex_lock+0x88/0xc4
> [    5.833332]  scsi_scan_channel+0x76/0x9e
> [    5.833332]  scsi_scan_host_selected+0x131/0x176
> [    5.833332]  ? scsi_scan_host+0x241/0x241
> [    5.833332]  do_scan_async+0x27/0x219
> [    5.833332]  ? scsi_scan_host+0x241/0x241
> [    5.833332]  async_run_entry_fn+0xdc/0x23d
> [    5.833332]  process_one_work+0x327/0x539
> [    5.833332]  worker_thread+0x330/0x492
> [    5.833332]  ? rescuer_thread+0x41f/0x41f
> [    5.833332]  kthread+0x1c6/0x1d5
> [    5.833332]  ? kthread_park+0xd3/0xd3
> [    5.833332]  ret_from_fork+0x1f/0x30
> [    5.833332] ==================================================================
> 
> 
> 
> Thanks,
> Ming Lei
>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d333bb6b1c59..f233bfd84cd7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1768,6 +1768,8 @@  void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 		blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
 	}
 
+	shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+			dma_max_mapping_size(dev) << SECTOR_SHIFT);
 	blk_queue_max_hw_sectors(q, shost->max_sectors);
 	if (shost->unchecked_isa_dma)
 		blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);