sg, bsg: mitigate read/write abuse, block uaccess in release
diff mbox

Message ID 20180615152335.208202-1-jannh@google.com
State New
Headers show

Commit Message

Jann Horn June 15, 2018, 3:23 p.m. UTC
As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit
to be called under KERNEL_DS"), sg and bsg improperly access userspace
memory outside the provided buffer, permitting kernel memory corruption via
splice().
But they don't just do it on ->write(), also on ->read() and (in the case
of bsg) even on ->release().

As a band-aid, make sure that the ->read() and ->write() handlers can not
be called in weird contexts (kernel context or credentials different from
file opener), like for ib_safe_file_access().
Also, completely prevent user memory accesses from ->release().

If someone needs to use these interfaces from different security contexts,
a new interface should be written that goes through the ->ioctl() handler.

I've mostly copypasted ib_safe_file_access() over as
scsi_safe_file_access() because I couldn't find a good common header -
please tell me if you know a better way.
The duplicate pr_err_once() calls are so that each of them fires once;
otherwise, this would probably have to be a macro.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
---

I'm CC-ing security@ on this patch in case someone cares a lot, but
since you already need to have some pretty high privileges to use these
devices in the first place, I think this can be handled publicly.

In case anyone is interested in how I found these: I was looking at a
reverse callgraph of __might_fault and spotted the ->release handler of
block/bsg.c in there.

 block/bsg-lib.c          |  5 ++++-
 block/bsg.c              | 29 +++++++++++++++++++++--------
 drivers/scsi/sg.c        | 11 ++++++++++-
 include/linux/bsg.h      |  3 ++-
 include/scsi/scsi_cmnd.h | 19 +++++++++++++++++++
 5 files changed, 56 insertions(+), 11 deletions(-)

Comments

Al Viro June 15, 2018, 4:40 p.m. UTC | #1
On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:

> I've mostly copypasted ib_safe_file_access() over as
> scsi_safe_file_access() because I couldn't find a good common header -
> please tell me if you know a better way.
> The duplicate pr_err_once() calls are so that each of them fires once;
> otherwise, this would probably have to be a macro.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---

WTF do you mean, in ->release()?  That makes no sense whatsoever -
what kind of copy_{to,from}_user() would be possible in there?
Jann Horn June 15, 2018, 4:44 p.m. UTC | #2
On Fri, Jun 15, 2018 at 6:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
>
> > I've mostly copypasted ib_safe_file_access() over as
> > scsi_safe_file_access() because I couldn't find a good common header -
> > please tell me if you know a better way.
> > The duplicate pr_err_once() calls are so that each of them fires once;
> > otherwise, this would probably have to be a macro.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
>
> WTF do you mean, in ->release()?  That makes no sense whatsoever -
> what kind of copy_{to,from}_user() would be possible in there?

bsg_release -> bsg_put_device -> bsg_complete_all_commands ->
blk_complete_sgv4_hdr_rq -> bsg_scsi_complete_rq -> copy_to_user.
I don't think that was intentional.

Basically, the sense buffer is copied to a userspace address supplied
in the previous ->write() when you ->read() the reply. But when you
->release() the file without reading the reply, they have to clean it
up, and for that, they reuse the same code they use for ->read() - so
the sense buffer is written to userspace on ->release().
Al Viro June 15, 2018, 4:49 p.m. UTC | #3
On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit
> to be called under KERNEL_DS"), sg and bsg improperly access userspace
> memory outside the provided buffer, permitting kernel memory corruption via
> splice().
> But they don't just do it on ->write(), also on ->read() and (in the case
> of bsg) even on ->release().
> 
> As a band-aid, make sure that the ->read() and ->write() handlers can not
> be called in weird contexts (kernel context or credentials different from
> file opener), like for ib_safe_file_access().
> Also, completely prevent user memory accesses from ->release().

Band-aid it is, and a bloody awful one, at that.  What the hell is going on
in bsg_put_device() and can it _ever_ hit that call chain?  I.e.
	bsg_release()
		bsg_put_device()
			blk_complete_sgv4_hdr_rq()
				->complete_rq()
					copy_to_user()
If it can, the whole thing is FUBAR by design - ->release() may bloody well
be called in a context that has no userspace at all.

This is completely insane; what's going on there?
Al Viro June 15, 2018, 4:53 p.m. UTC | #4
On Fri, Jun 15, 2018 at 06:44:51PM +0200, Jann Horn wrote:
> On Fri, Jun 15, 2018 at 6:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> >
> > > I've mostly copypasted ib_safe_file_access() over as
> > > scsi_safe_file_access() because I couldn't find a good common header -
> > > please tell me if you know a better way.
> > > The duplicate pr_err_once() calls are so that each of them fires once;
> > > otherwise, this would probably have to be a macro.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Jann Horn <jannh@google.com>
> > > ---
> >
> > WTF do you mean, in ->release()?  That makes no sense whatsoever -
> > what kind of copy_{to,from}_user() would be possible in there?
> 
> bsg_release -> bsg_put_device -> bsg_complete_all_commands ->
> blk_complete_sgv4_hdr_rq -> bsg_scsi_complete_rq -> copy_to_user.
> I don't think that was intentional.
> 
> Basically, the sense buffer is copied to a userspace address supplied
> in the previous ->write() when you ->read() the reply. But when you
> ->release() the file without reading the reply, they have to clean it
> up, and for that, they reuse the same code they use for ->read() - so
> the sense buffer is written to userspace on ->release().

Pardon me, that has only one fix - git rm.  This is too broken for words -
if your reading is correct, the interface is unsalvagable.  I hope you
*are* misreading it, but if not... how did that insanity get through
review at merge time?
Jann Horn June 15, 2018, 4:58 p.m. UTC | #5
On Fri, Jun 15, 2018 at 6:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> > As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit
> > to be called under KERNEL_DS"), sg and bsg improperly access userspace
> > memory outside the provided buffer, permitting kernel memory corruption via
> > splice().
> > But they don't just do it on ->write(), also on ->read() and (in the case
> > of bsg) even on ->release().
> >
> > As a band-aid, make sure that the ->read() and ->write() handlers can not
> > be called in weird contexts (kernel context or credentials different from
> > file opener), like for ib_safe_file_access().
> > Also, completely prevent user memory accesses from ->release().
>
> Band-aid it is, and a bloody awful one, at that.  What the hell is going on
> in bsg_put_device() and can it _ever_ hit that call chain?  I.e.
>         bsg_release()
>                 bsg_put_device()
>                         blk_complete_sgv4_hdr_rq()
>                                 ->complete_rq()
>                                         copy_to_user()
> If it can, the whole thing is FUBAR by design - ->release() may bloody well
> be called in a context that has no userspace at all.
>
> This is completely insane; what's going on there?

Perhaps I should have split my patch into two parts; it consists of
two somewhat related changes.

The first change is that ->read() and ->write() violate the normal
contract and, as a band-aid, should not be called in uaccess_kernel()
context or with changed creds.

The second change is an actual fix: AFAICS ->release() accidentally
accessed userspace, which I've fixed using the added "cleaning_up"
parameter.
Jann Horn June 15, 2018, 5:02 p.m. UTC | #6
On Fri, Jun 15, 2018 at 6:58 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Jun 15, 2018 at 6:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> > > As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit
> > > to be called under KERNEL_DS"), sg and bsg improperly access userspace
> > > memory outside the provided buffer, permitting kernel memory corruption via
> > > splice().
> > > But they don't just do it on ->write(), also on ->read() and (in the case
> > > of bsg) even on ->release().
> > >
> > > As a band-aid, make sure that the ->read() and ->write() handlers can not
> > > be called in weird contexts (kernel context or credentials different from
> > > file opener), like for ib_safe_file_access().
> > > Also, completely prevent user memory accesses from ->release().
> >
> > Band-aid it is, and a bloody awful one, at that.  What the hell is going on
> > in bsg_put_device() and can it _ever_ hit that call chain?  I.e.
> >         bsg_release()
> >                 bsg_put_device()
> >                         blk_complete_sgv4_hdr_rq()
> >                                 ->complete_rq()
> >                                         copy_to_user()
> > If it can, the whole thing is FUBAR by design - ->release() may bloody well
> > be called in a context that has no userspace at all.
> >
> > This is completely insane; what's going on there?
>
> Perhaps I should have split my patch into two parts; it consists of
> two somewhat related changes.
>
> The first change is that ->read() and ->write() violate the normal
> contract and, as a band-aid, should not be called in uaccess_kernel()
> context or with changed creds.
>
> The second change is an actual fix: AFAICS ->release() accidentally
> accessed userspace, which I've fixed using the added "cleaning_up"
> parameter.

FWIW, the demo code I'm using to test this in a QEMU VM:

$ cat test.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <linux/bsg.h>
#include <string.h>
#include <err.h>
#include <stdio.h>

int main(void) {
  int fd = open("/dev/bsg/0:0:0:0", O_RDWR);
  if (fd == -1)
    err(1, "foo");
  __u8 buf1[255];
  __u8 request[10] = {
    [0] = 0x5a, // MODE_SENSE_10
    [2] = 0x41,
    [8] = 0x10
  };
  __u8 sense[32];
  memset(sense, 'A', sizeof(sense));
  memset(buf1, 'A', sizeof(buf1));
  struct sg_io_v4 req = {
    .guard = 'Q',
    .protocol = BSG_PROTOCOL_SCSI,
    .subprotocol = BSG_SUB_PROTOCOL_SCSI_CMD,
    .request_len = sizeof(request),
    .request = (__u64)request,
    .max_response_len = sizeof(sense),
    .response = (__u64)sense,
    .din_xfer_len = sizeof(buf1),
    .din_xferp = (__u64)buf1,
    .timeout = 1000
  };
  if (write(fd, &req, sizeof(req)) != sizeof(req))
    err(1, "write");
  printf("sense[0] after write: 0x%02hhx\n", sense[0]);

  /*
  struct sg_io_v4 resp;
  if (splice(fd, NULL, pipe_fds[1], NULL, sizeof(struct sg_io_v4), 0)
!= sizeof(struct sg_io_v4))
    err(1, "splice");
    */

  sleep(1);
  printf("sense[0] after sleep: 0x%02hhx\n", sense[0]);
  close(fd);
  printf("sense[0] after close: 0x%02hhx\n", sense[0]);
}
$ gcc -o test test.c -Wall && sudo ./test
sense[0] after write: 0x41
sense[0] after sleep: 0x41
sense[0] after close: 0xf0
$ uname -a
Linux debian 4.17.0+ #10 SMP Fri Jun 15 14:48:42 CEST 2018 x86_64 GNU/Linux
Al Viro June 15, 2018, 5:10 p.m. UTC | #7
On Fri, Jun 15, 2018 at 05:53:10PM +0100, Al Viro wrote:
> On Fri, Jun 15, 2018 at 06:44:51PM +0200, Jann Horn wrote:
> > On Fri, Jun 15, 2018 at 6:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> > >
> > > > I've mostly copypasted ib_safe_file_access() over as
> > > > scsi_safe_file_access() because I couldn't find a good common header -
> > > > please tell me if you know a better way.
> > > > The duplicate pr_err_once() calls are so that each of them fires once;
> > > > otherwise, this would probably have to be a macro.
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > ---
> > >
> > > WTF do you mean, in ->release()?  That makes no sense whatsoever -
> > > what kind of copy_{to,from}_user() would be possible in there?
> > 
> > bsg_release -> bsg_put_device -> bsg_complete_all_commands ->
> > blk_complete_sgv4_hdr_rq -> bsg_scsi_complete_rq -> copy_to_user.
> > I don't think that was intentional.
> > 
> > Basically, the sense buffer is copied to a userspace address supplied
> > in the previous ->write() when you ->read() the reply. But when you
> > ->release() the file without reading the reply, they have to clean it
> > up, and for that, they reuse the same code they use for ->read() - so
> > the sense buffer is written to userspace on ->release().
> 
> Pardon me, that has only one fix - git rm.  This is too broken for words -
> if your reading is correct, the interface is unsalvagable.  I hope you
> *are* misreading it, but if not... how did that insanity get through
> review at merge time?

AFAICS, it went in as part of commit 3d6392cfbd7d "bsg: support for full
generic block layer SG v3", so your 2.6.12-rc2 is too old...
Jann Horn June 15, 2018, 5:13 p.m. UTC | #8
On Fri, Jun 15, 2018 at 7:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Jun 15, 2018 at 05:53:10PM +0100, Al Viro wrote:
> > On Fri, Jun 15, 2018 at 06:44:51PM +0200, Jann Horn wrote:
> > > On Fri, Jun 15, 2018 at 6:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> > > >
> > > > > I've mostly copypasted ib_safe_file_access() over as
> > > > > scsi_safe_file_access() because I couldn't find a good common header -
> > > > > please tell me if you know a better way.
> > > > > The duplicate pr_err_once() calls are so that each of them fires once;
> > > > > otherwise, this would probably have to be a macro.
> > > > >
> > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > > Cc: <stable@vger.kernel.org>
> > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > ---
> > > >
> > > > WTF do you mean, in ->release()?  That makes no sense whatsoever -
> > > > what kind of copy_{to,from}_user() would be possible in there?
> > >
> > > bsg_release -> bsg_put_device -> bsg_complete_all_commands ->
> > > blk_complete_sgv4_hdr_rq -> bsg_scsi_complete_rq -> copy_to_user.
> > > I don't think that was intentional.
> > >
> > > Basically, the sense buffer is copied to a userspace address supplied
> > > in the previous ->write() when you ->read() the reply. But when you
> > > ->release() the file without reading the reply, they have to clean it
> > > up, and for that, they reuse the same code they use for ->read() - so
> > > the sense buffer is written to userspace on ->release().
> >
> > Pardon me, that has only one fix - git rm.  This is too broken for words -
> > if your reading is correct, the interface is unsalvagable.  I hope you
> > *are* misreading it, but if not... how did that insanity get through
> > review at merge time?
>
> AFAICS, it went in as part of commit 3d6392cfbd7d "bsg: support for full
> generic block layer SG v3", so your 2.6.12-rc2 is too old...

I picked 2.6.12-rc2 for the Fixes tag because the bad copy_to_user()
in sg_new_read() is at least that old.
Do you think I should split this up into two patches or so - one for
the creds/uaccess_kernel checks, one for the ->release() bug?
Douglas Gilbert June 15, 2018, 8:47 p.m. UTC | #9
On 2018-06-15 12:40 PM, Al Viro wrote:
> On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> 
>> I've mostly copypasted ib_safe_file_access() over as
>> scsi_safe_file_access() because I couldn't find a good common header -
>> please tell me if you know a better way.
>> The duplicate pr_err_once() calls are so that each of them fires once;
>> otherwise, this would probably have to be a macro.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Jann Horn <jannh@google.com>
>> ---
> 
> WTF do you mean, in ->release()?  That makes no sense whatsoever -
> what kind of copy_{to,from}_user() would be possible in there?

The folks responsible are no longer active in kernel development ***
but as far as I know the async write(command), read(response) were
added to bsg over 10 years ago as proof-of-concept and never properly
worked in this async mode. The biggest design problem with it that I'm
aware of is that two tasks which issue write(commands) at about the
same time to the same device can receive one another's read(response).
The tracking of which response belongs to which task is in part the
reason why the sg driver's data structures are more complex than the
bsg driver's are. Hence the code work to fix that problem in the bsg
driver is not trivial and probably a reason why no-one has attempted it.

Once real world users (needing an async SCSI (or general storage)
pass-through) find out about that bsg "feature", they don't use it.
They use the sg driver or something else. So the fact that bsg has
other glaring errors in it in async mode is no surprise to me.

When I took over the maintenance of the sg driver in 1998, it only
had an async (i.e. write(command), read(response)) interface.
The SG_IO ioctl was added at the suggestion of Jørg Schilling (of
cdrecord "fame"). The sg driver implementation was essentially to
put a write(command) and read(response) back-to-back. The bsg driver
came along later and started with the synchronous SG_IO ioctl
interface only. The async write(command)/read(response) functionality
was added later to bsg. Perhaps that part of the bsg driver should be
deprecated/withdrawn if a maintainer/rewriter cannot be found.
[BTW the bsg sync SG_IO ioctl implementation can probably get the
wrong response, it's just that the window is a lot narrower.]

That said, the bsg driver has lots of other users. For example it is
the only generic pass-through in Linux for the SAS Management Protocol
(SMP) used to control SAS based storage enclosures. I have a user space
package based on it (in Linux) called smp_utils which works well IMO.
However disk enclosures won't typically have contention between users
trying to control them and I'm not aware of any disk enclosures that
support Persistent Reservations. So the bsg driver's "async" problems
are not a practical issue in this case. Also I believe some high end
storage hardware uses bsg to communicate with their hardware from their
user space tools.


Just some observations from an interested observer ...

Doug Gilbert


*** Well Jens Axbø's Copyright notice is on the bsg driver, together with
     and Peter M. Jones. Since I have been watching the bsg driver I'm
     not aware of any substantial patches or reworks for them. As far as
     I know FUJITA Tomonori did a ground up rewrite of it and he no longer
     works in this area. Makes you wonder what exactly Copyright banners
     mean on some code; 10, 15, 20 years on.
Benjamin Block June 18, 2018, 3:26 p.m. UTC | #10
On Fri, Jun 15, 2018 at 04:47:47PM -0400, Douglas Gilbert wrote:
> On 2018-06-15 12:40 PM, Al Viro wrote:
> > On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
> > 
> >> I've mostly copypasted ib_safe_file_access() over as
> >> scsi_safe_file_access() because I couldn't find a good common header -
> >> please tell me if you know a better way.
> >> The duplicate pr_err_once() calls are so that each of them fires once;
> >> otherwise, this would probably have to be a macro.
> >>
> >> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >> Cc: <stable@vger.kernel.org>
> >> Signed-off-by: Jann Horn <jannh@google.com>
> >> ---
> > 
> > WTF do you mean, in ->release()?  That makes no sense whatsoever -
> > what kind of copy_{to,from}_user() would be possible in there?
> 
> The folks responsible are no longer active in kernel development ***
> but as far as I know the async write(command), read(response) were
> added to bsg over 10 years ago as proof-of-concept and never properly
> worked in this async mode. The biggest design problem with it that I'm
> aware of is that two tasks which issue write(commands) at about the
> same time to the same device can receive one another's read(response).
> The tracking of which response belongs to which task is in part the
> reason why the sg driver's data structures are more complex than the
> bsg driver's are. Hence the code work to fix that problem in the bsg
> driver is not trivial and probably a reason why no-one has attempted it.
> 
> Once real world users (needing an async SCSI (or general storage)
> pass-through) find out about that bsg "feature", they don't use it.
> They use the sg driver or something else. So the fact that bsg has
> other glaring errors in it in async mode is no surprise to me.
> 
> When I took over the maintenance of the sg driver in 1998, it only
> had an async (i.e. write(command), read(response)) interface.
> The SG_IO ioctl was added at the suggestion of Jørg Schilling (of
> cdrecord "fame"). The sg driver implementation was essentially to
> put a write(command) and read(response) back-to-back. The bsg driver
> came along later and started with the synchronous SG_IO ioctl
> interface only. The async write(command)/read(response) functionality
> was added later to bsg. Perhaps that part of the bsg driver should be
> deprecated/withdrawn if a maintainer/rewriter cannot be found.
> [BTW the bsg sync SG_IO ioctl implementation can probably get the
> wrong response, it's just that the window is a lot narrower.]
> 
> That said, the bsg driver has lots of other users. For example it is
> the only generic pass-through in Linux for the SAS Management Protocol
> (SMP) used to control SAS based storage enclosures. I have a user space
> package based on it (in Linux) called smp_utils which works well IMO.
> However disk enclosures won't typically have contention between users
> trying to control them and I'm not aware of any disk enclosures that
> support Persistent Reservations. So the bsg driver's "async" problems
> are not a practical issue in this case. Also I believe some high end
> storage hardware uses bsg to communicate with their hardware from their
> user space tools.
> 

We definitely use the BSG IOCTL part for FC-Command passthrough (and
other SCSI transports need it for theirs too, SAS, iSCSI, ...). And I am
not aware of any other way to do that right now. So this can not go away
without someone giving us a different way to do that.

That said.. the read()/write() interface is just pointless atm, I don't
see any sane way of using that in userspace.

FWIW, I actually thought about rewriting that part so it becomes
somewhat sane (tracking which thread sends what command and so on), but
haven't had time to really doing it. With the whole discussion now, I am
not sure anyone really needs that anyway.


- Benjamin

--
With Best Regards, Benjamin Block      /      Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz       /      Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Jens Axboe June 18, 2018, 3:37 p.m. UTC | #11
On 6/15/18 2:47 PM, Douglas Gilbert wrote:
> On 2018-06-15 12:40 PM, Al Viro wrote:
>> On Fri, Jun 15, 2018 at 05:23:35PM +0200, Jann Horn wrote:
>>
>>> I've mostly copypasted ib_safe_file_access() over as
>>> scsi_safe_file_access() because I couldn't find a good common header -
>>> please tell me if you know a better way.
>>> The duplicate pr_err_once() calls are so that each of them fires once;
>>> otherwise, this would probably have to be a macro.
>>>
>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Jann Horn <jannh@google.com>
>>> ---
>>
>> WTF do you mean, in ->release()?  That makes no sense whatsoever -
>> what kind of copy_{to,from}_user() would be possible in there?
> 
> The folks responsible are no longer active in kernel development ***
> but as far as I know the async write(command), read(response) were
> added to bsg over 10 years ago as proof-of-concept and never properly
> worked in this async mode. The biggest design problem with it that I'm

It was born with that mode, but I don't think anyone ever really used it.
So it might feasible to simply yank it. That said, just doing a prune
mode at ->release() time doesn't seem like such a hard task.

> aware of is that two tasks which issue write(commands) at about the
> same time to the same device can receive one another's read(response).
> The tracking of which response belongs to which task is in part the
> reason why the sg driver's data structures are more complex than the
> bsg driver's are. Hence the code work to fix that problem in the bsg
> driver is not trivial and probably a reason why no-one has attempted it.
> 
> Once real world users (needing an async SCSI (or general storage)
> pass-through) find out about that bsg "feature", they don't use it.
> They use the sg driver or something else. So the fact that bsg has
> other glaring errors in it in async mode is no surprise to me.
> 
> When I took over the maintenance of the sg driver in 1998, it only
> had an async (i.e. write(command), read(response)) interface.
> The SG_IO ioctl was added at the suggestion of Jørg Schilling (of
> cdrecord "fame"). The sg driver implementation was essentially to
> put a write(command) and read(response) back-to-back. The bsg driver
> came along later and started with the synchronous SG_IO ioctl
> interface only. The async write(command)/read(response) functionality
> was added later to bsg. Perhaps that part of the bsg driver should be
> deprecated/withdrawn if a maintainer/rewriter cannot be found.
> [BTW the bsg sync SG_IO ioctl implementation can probably get the
> wrong response, it's just that the window is a lot narrower.]

Feature wise, I don't think it ever changed. The read/write async
mode was included from the get-go.

> 
> That said, the bsg driver has lots of other users. For example it is
> the only generic pass-through in Linux for the SAS Management Protocol
> (SMP) used to control SAS based storage enclosures. I have a user space
> package based on it (in Linux) called smp_utils which works well IMO.
> However disk enclosures won't typically have contention between users
> trying to control them and I'm not aware of any disk enclosures that
> support Persistent Reservations. So the bsg driver's "async" problems
> are not a practical issue in this case. Also I believe some high end
> storage hardware uses bsg to communicate with their hardware from their
> user space tools.
> 
> 
> Just some observations from an interested observer ...
> 
> Doug Gilbert
> 
> 
> *** Well Jens Axbø's Copyright notice is on the bsg driver, together with
>      and Peter M. Jones. Since I have been watching the bsg driver I'm
>      not aware of any substantial patches or reworks for them. As far as
>      I know FUJITA Tomonori did a ground up rewrite of it and he no longer
>      works in this area. Makes you wonder what exactly Copyright banners
>      mean on some code; 10, 15, 20 years on.

It was never re-written. I handed it over to Fujita about 11 years ago,
but there was never any rewrite.

BTW, don't ever write my name like that, the 'oe' is not a spelled out
ascii variant, it's my name. For Jörg, it's o with umlaut, not the
Danish/Norwegian variant (he's German).
Al Viro June 18, 2018, 4:16 p.m. UTC | #12
On Mon, Jun 18, 2018 at 09:37:01AM -0600, Jens Axboe wrote:

> > The folks responsible are no longer active in kernel development ***
> > but as far as I know the async write(command), read(response) were
> > added to bsg over 10 years ago as proof-of-concept and never properly
> > worked in this async mode. The biggest design problem with it that I'm
> 
> It was born with that mode, but I don't think anyone ever really used it.
> So it might feasible to simply yank it. That said, just doing a prune
> mode at ->release() time doesn't seem like such a hard task.

"prune mode" being...?
Jens Axboe June 18, 2018, 4:23 p.m. UTC | #13
On 6/18/18 10:16 AM, Al Viro wrote:
> On Mon, Jun 18, 2018 at 09:37:01AM -0600, Jens Axboe wrote:
> 
>>> The folks responsible are no longer active in kernel development ***
>>> but as far as I know the async write(command), read(response) were
>>> added to bsg over 10 years ago as proof-of-concept and never properly
>>> worked in this async mode. The biggest design problem with it that I'm
>>
>> It was born with that mode, but I don't think anyone ever really used it.
>> So it might feasible to simply yank it. That said, just doing a prune
>> mode at ->release() time doesn't seem like such a hard task.
> 
> "prune mode" being...?

Basically what Jann posted, not doing any copy-back of data. Need to
verify if the bio unmapping is handled correctly, as some of those
will also copy when the end_io handling is invoked.
Christoph Hellwig June 21, 2018, 12:40 p.m. UTC | #14
Can you resend a patch for the sg driver alone?  Also I think
we just want the scsi_safe_file_access code inside sg itself,
it really has nothing to do with the reset of the contents in
scsi_cmnd.h
Jann Horn June 21, 2018, 12:54 p.m. UTC | #15
On Thu, Jun 21, 2018 at 2:40 PM Christoph Hellwig <hch@infradead.org> wrote:
> Can you resend a patch for the sg driver alone?

Okay, will do.

> Also I think
> we just want the scsi_safe_file_access code inside sg itself,
> it really has nothing to do with the reset of the contents in
> scsi_cmnd.h

Okay. (I put it there because I couldn't figure out a better common
header and didn't want to create two new copies of the code.)

Patch
diff mbox

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 9419def8c017..cf5d4fdddbeb 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -53,7 +53,8 @@  static int bsg_transport_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
 	return 0;
 }
 
-static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
+static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr,
+		bool cleaning_up)
 {
 	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
 	int ret = 0;
@@ -79,6 +80,8 @@  static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
 	if (job->reply_len && hdr->response) {
 		int len = min(hdr->max_response_len, job->reply_len);
 
+		if (unlikely(cleaning_up))
+			ret = -EINVAL;
 		if (copy_to_user(uptr64(hdr->response), job->reply, len))
 			ret = -EFAULT;
 		else
diff --git a/block/bsg.c b/block/bsg.c
index 132e657e2d91..e64ef807d2d0 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -159,7 +159,8 @@  static int bsg_scsi_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
 	return 0;
 }
 
-static int bsg_scsi_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
+static int bsg_scsi_complete_rq(struct request *rq, struct sg_io_v4 *hdr,
+		bool cleaning_up)
 {
 	struct scsi_request *sreq = scsi_req(rq);
 	int ret = 0;
@@ -179,7 +180,9 @@  static int bsg_scsi_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
 		int len = min_t(unsigned int, hdr->max_response_len,
 					sreq->sense_len);
 
-		if (copy_to_user(uptr64(hdr->response), sreq->sense, len))
+		if (cleaning_up)
+			ret = -EINVAL;
+		else if (copy_to_user(uptr64(hdr->response), sreq->sense, len))
 			ret = -EFAULT;
 		else
 			hdr->response_len = len;
@@ -383,11 +386,12 @@  static struct bsg_command *bsg_get_done_cmd(struct bsg_device *bd)
 }
 
 static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-				    struct bio *bio, struct bio *bidi_bio)
+				    struct bio *bio, struct bio *bidi_bio,
+				    bool cleaning_up)
 {
 	int ret;
 
-	ret = rq->q->bsg_dev.ops->complete_rq(rq, hdr);
+	ret = rq->q->bsg_dev.ops->complete_rq(rq, hdr, cleaning_up);
 
 	if (rq->next_rq) {
 		blk_rq_unmap_user(bidi_bio);
@@ -453,7 +457,7 @@  static int bsg_complete_all_commands(struct bsg_device *bd)
 			break;
 
 		tret = blk_complete_sgv4_hdr_rq(bc->rq, &bc->hdr, bc->bio,
-						bc->bidi_bio);
+						bc->bidi_bio, true);
 		if (!ret)
 			ret = tret;
 
@@ -488,7 +492,7 @@  __bsg_read(char __user *buf, size_t count, struct bsg_device *bd,
 		 * bsg_complete_work() cannot do that for us
 		 */
 		ret = blk_complete_sgv4_hdr_rq(bc->rq, &bc->hdr, bc->bio,
-					       bc->bidi_bio);
+					       bc->bidi_bio, false);
 
 		if (copy_to_user(buf, &bc->hdr, sizeof(bc->hdr)))
 			ret = -EFAULT;
@@ -532,6 +536,12 @@  bsg_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 	int ret;
 	ssize_t bytes_read;
 
+	if (!scsi_safe_file_access(file)) {
+		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			__func__, task_tgid_vnr(current), current->comm);
+		return -EINVAL;
+	}
+
 	bsg_dbg(bd, "read %zd bytes\n", count);
 
 	bsg_set_block(bd, file);
@@ -608,8 +618,11 @@  bsg_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
 
 	bsg_dbg(bd, "write %zd bytes\n", count);
 
-	if (unlikely(uaccess_kernel()))
+	if (!scsi_safe_file_access(file)) {
+		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			__func__, task_tgid_vnr(current), current->comm);
 		return -EINVAL;
+	}
 
 	bsg_set_block(bd, file);
 
@@ -859,7 +872,7 @@  static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 		at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
 		blk_execute_rq(bd->queue, NULL, rq, at_head);
-		ret = blk_complete_sgv4_hdr_rq(rq, &hdr, bio, bidi_bio);
+		ret = blk_complete_sgv4_hdr_rq(rq, &hdr, bio, bidi_bio, false);
 
 		if (copy_to_user(uarg, &hdr, sizeof(hdr)))
 			return -EFAULT;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 53ae52dbff84..997e06a22527 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -393,6 +393,12 @@  sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
 	struct sg_header *old_hdr = NULL;
 	int retval = 0;
 
+	if (!scsi_safe_file_access(filp)) {
+		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			__func__, task_tgid_vnr(current), current->comm);
+		return -EINVAL;
+	}
+
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
@@ -581,8 +587,11 @@  sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	sg_io_hdr_t *hp;
 	unsigned char cmnd[SG_MAX_CDB_SIZE];
 
-	if (unlikely(uaccess_kernel()))
+	if (!scsi_safe_file_access(filp)) {
+		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
+			__func__, task_tgid_vnr(current), current->comm);
 		return -EINVAL;
+	}
 
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
diff --git a/include/linux/bsg.h b/include/linux/bsg.h
index dac37b6e00ec..c22bc359552a 100644
--- a/include/linux/bsg.h
+++ b/include/linux/bsg.h
@@ -11,7 +11,8 @@  struct bsg_ops {
 	int	(*check_proto)(struct sg_io_v4 *hdr);
 	int	(*fill_hdr)(struct request *rq, struct sg_io_v4 *hdr,
 				fmode_t mode);
-	int	(*complete_rq)(struct request *rq, struct sg_io_v4 *hdr);
+	int	(*complete_rq)(struct request *rq, struct sg_io_v4 *hdr,
+				bool cleaning_up);
 	void	(*free_rq)(struct request *rq);
 };
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index aaf1e971c6a3..d22118a38aa4 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -8,6 +8,8 @@ 
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/scatterlist.h>
+#include <linux/cred.h> /* for scsi_safe_file_access() */
+#include <linux/fs.h> /* for scsi_safe_file_access() */
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_request.h>
 
@@ -363,4 +365,21 @@  static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
 	return xfer_len;
 }
 
+/*
+ * The SCSI interfaces that use read() and write() as an asynchronous variant of
+ * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways
+ * to trigger read() and write() calls from various contexts with elevated
+ * privileges. This can lead to kernel memory corruption (e.g. if these
+ * interfaces are called through splice()) and privilege escalation inside
+ * userspace (e.g. if a process with access to such a device passes a file
+ * descriptor to a SUID binary as stdin/stdout/stderr).
+ *
+ * This function provides protection for the legacy API by restricting the
+ * calling context.
+ */
+static inline bool scsi_safe_file_access(struct file *filp)
+{
+	return filp->f_cred == current_cred() && !uaccess_kernel();
+}
+
 #endif /* _SCSI_SCSI_CMND_H */