diff mbox

[v2,1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference

Message ID 20171206005753.28734-2-bart.vanassche@wdc.com (mailing list archive)
State Accepted
Headers show

Commit Message

Bart Van Assche Dec. 6, 2017, 12:57 a.m. UTC
Avoid that scsi_show_rq() triggers a NULL pointer dereference if
called after sd_uninit_command(). Swap the NULL pointer assignment
and the mempool_free() call in sd_uninit_command() to make it less
likely that scsi_show_rq() triggers a use-after-free. Note: even
with these changes scsi_show_rq() can trigger a use-after-free but
that's a lesser evil than e.g. suppressing debug information for
T10-PI commands completely. This patch fixes the following oops:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: scsi_format_opcode_name+0x1a/0x1c0
CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
Call Trace:
 __scsi_format_command+0x27/0xc0
 scsi_show_rq+0x5c/0xc0
 __blk_mq_debugfs_rq_show+0x116/0x130
 blk_mq_debugfs_rq_show+0xe/0x10
 seq_read+0xfe/0x3b0
 full_proxy_read+0x54/0x90
 __vfs_read+0x37/0x160
 vfs_read+0x96/0x130
 SyS_read+0x55/0xc0
 entry_SYSCALL_64_fastpath+0x1a/0xa5

Fixes: 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
---
 drivers/scsi/scsi_debugfs.c | 6 ++++--
 drivers/scsi/sd.c           | 4 +++-
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Ming Lei Dec. 8, 2017, 1:45 a.m. UTC | #1
On Tue, Dec 05, 2017 at 04:57:51PM -0800, Bart Van Assche wrote:
> Avoid that scsi_show_rq() triggers a NULL pointer dereference if
> called after sd_uninit_command(). Swap the NULL pointer assignment
> and the mempool_free() call in sd_uninit_command() to make it less
> likely that scsi_show_rq() triggers a use-after-free. Note: even
> with these changes scsi_show_rq() can trigger a use-after-free but
> that's a lesser evil than e.g. suppressing debug information for
> T10-PI commands completely. This patch fixes the following oops:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: scsi_format_opcode_name+0x1a/0x1c0
> CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
> Call Trace:
>  __scsi_format_command+0x27/0xc0
>  scsi_show_rq+0x5c/0xc0
>  __blk_mq_debugfs_rq_show+0x116/0x130
>  blk_mq_debugfs_rq_show+0xe/0x10
>  seq_read+0xfe/0x3b0
>  full_proxy_read+0x54/0x90
>  __vfs_read+0x37/0x160
>  vfs_read+0x96/0x130
>  SyS_read+0x55/0xc0
>  entry_SYSCALL_64_fastpath+0x1a/0xa5
> 
> Fixes: 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: stable@vger.kernel.org
> ---
>  drivers/scsi/scsi_debugfs.c | 6 ++++--
>  drivers/scsi/sd.c           | 4 +++-
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 01f08c03f2c1..c3765d29fd3f 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -8,9 +8,11 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
>  {
>  	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
>  	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> -	char buf[80];
> +	const u8 *const cdb = READ_ONCE(cmd->cmnd);
> +	char buf[80] = "(?)";
>  
> -	__scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> +	if (cdb)
> +		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
>  	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
>  		   cmd->retries, msecs / 1000, msecs % 1000);
>  }

As I explained in [1], the use-after-free is inevitable no matter if
clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or not,
so we need to comment the fact that cdb may point to garbage data, and this
function(especially __scsi_format_command() has to survive that, so that
people won't be surprised when kasan complains use-after-free, and guys will
be careful when they try to change the code in future.

Once this comment is added, with or without clearing 'SCpnt->cmnd' before
mempool_free(), I am fine with this patch.

[1] https://marc.info/?l=linux-block&m=151252302112512&w=2

> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d175c5c5ccf8..d841743b2107 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1284,6 +1284,7 @@ static int sd_init_command(struct scsi_cmnd *cmd)
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt)
>  {
>  	struct request *rq = SCpnt->request;
> +	u8 *cmnd;
>  
>  	if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
>  		sd_zbc_write_unlock_zone(SCpnt);
> @@ -1292,9 +1293,10 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
>  		__free_page(rq->special_vec.bv_page);
>  
>  	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
> -		mempool_free(SCpnt->cmnd, sd_cdb_pool);
> +		cmnd = SCpnt->cmnd;
>  		SCpnt->cmnd = NULL;
>  		SCpnt->cmd_len = 0;
> +		mempool_free(cmnd, sd_cdb_pool);
>  	}
>  }
>  
> -- 
> 2.15.0
>
Martin K. Petersen Dec. 8, 2017, 2:46 a.m. UTC | #2
Ming,

> As I explained in [1], the use-after-free is inevitable no matter if
> clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or
> not, so we need to comment the fact that cdb may point to garbage
> data, and this function(especially __scsi_format_command() has to
> survive that, so that people won't be surprised when kasan complains
> use-after-free, and guys will be careful when they try to change the
> code in future.

Longer term we really need to get rid of the separate CDB allocation. It
was a necessary evil when I did it. And not much of a concern since I
did not expect anybody sane to use Type 2 (it's designed for use inside
disk arrays).

However, I keep hearing about people using Type 2 drives. Some vendors
source drives formatted that way and use the same SKU for arrays and
standalone servers.

So we should really look into making it possible for a queue to have a
bigger than 16-byte built-in CDB. For Type 2 devices, 32-byte reads and
writes are a prerequisite. So it would be nice to be able to switch a
queue to a larger allocation post creation (we won't know the type until
after READ CAPACITY(16) has been sent).

Last I looked at this it was not entirely trivial given how we tag
things on to the end. But that really is my preferred fix.
Ming Lei Dec. 8, 2017, 8:44 a.m. UTC | #3
Hi Martin,

On Thu, Dec 07, 2017 at 09:46:21PM -0500, Martin K. Petersen wrote:
> 
> Ming,
> 
> > As I explained in [1], the use-after-free is inevitable no matter if
> > clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or
> > not, so we need to comment the fact that cdb may point to garbage
> > data, and this function(especially __scsi_format_command() has to
> > survive that, so that people won't be surprised when kasan complains
> > use-after-free, and guys will be careful when they try to change the
> > code in future.
> 
> Longer term we really need to get rid of the separate CDB allocation. It
> was a necessary evil when I did it. And not much of a concern since I
> did not expect anybody sane to use Type 2 (it's designed for use inside
> disk arrays).
> 
> However, I keep hearing about people using Type 2 drives. Some vendors
> source drives formatted that way and use the same SKU for arrays and
> standalone servers.
> 
> So we should really look into making it possible for a queue to have a
> bigger than 16-byte built-in CDB. For Type 2 devices, 32-byte reads and
> writes are a prerequisite. So it would be nice to be able to switch a
> queue to a larger allocation post creation (we won't know the type until
> after READ CAPACITY(16) has been sent).

I am wondering why you don't make __cmd[] in scsi_request defined as 32bytes?
Even for some hosts with thousands of tag, the memory waste is still not
too much.

Or if you prefer to do post creation, we have sbitmap_queue now, which can
help to build a pre-allocated memory pool easily, and its allocation/free is
pretty efficient.

Thanks,
Ming
Martin K. Petersen Dec. 12, 2017, 2:57 a.m. UTC | #4
Bart,

> Avoid that scsi_show_rq() triggers a NULL pointer dereference if
> called after sd_uninit_command(). Swap the NULL pointer assignment
> and the mempool_free() call in sd_uninit_command() to make it less
> likely that scsi_show_rq() triggers a use-after-free. Note: even
> with these changes scsi_show_rq() can trigger a use-after-free but
> that's a lesser evil than e.g. suppressing debug information for
> T10-PI commands completely. This patch fixes the following oops:

Applied to 4.15/scsi-fixes. Thanks, Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 01f08c03f2c1..c3765d29fd3f 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -8,9 +8,11 @@  void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
 	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
 	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
-	char buf[80];
+	const u8 *const cdb = READ_ONCE(cmd->cmnd);
+	char buf[80] = "(?)";
 
-	__scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
+	if (cdb)
+		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
 	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
 		   cmd->retries, msecs / 1000, msecs % 1000);
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d175c5c5ccf8..d841743b2107 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1284,6 +1284,7 @@  static int sd_init_command(struct scsi_cmnd *cmd)
 static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
+	u8 *cmnd;
 
 	if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
 		sd_zbc_write_unlock_zone(SCpnt);
@@ -1292,9 +1293,10 @@  static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 		__free_page(rq->special_vec.bv_page);
 
 	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
-		mempool_free(SCpnt->cmnd, sd_cdb_pool);
+		cmnd = SCpnt->cmnd;
 		SCpnt->cmnd = NULL;
 		SCpnt->cmd_len = 0;
+		mempool_free(cmnd, sd_cdb_pool);
 	}
 }