diff mbox series

RDMA/uverbs: Consider capability of the process that opens the file

Message ID 20250313050832.113030-1-parav@nvidia.com (mailing list archive)
State New
Headers show
Series RDMA/uverbs: Consider capability of the process that opens the file | expand

Commit Message

Parav Pandit March 13, 2025, 5:08 a.m. UTC
Currently, the capability check is done on the current process which
may have the CAP_NET_RAW capability, but such process may not have
opened the file. A file may could have been opened by a lesser
privilege process that does not possess the CAP_NET_RAW capability.

To avoid such situations, perform the capability checks against
the file's credentials. This approach ensures that the capabilities
of the process that opened the file are enforced.

Fixes: c938a616aadb ("IB/core: Add raw packet QP type")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>

---

Eric,

Shouldn't we check the capabilities of the process that opened the
file and also the current process that is issuing the create_flow()
ioctl? This way, the minimum capabilities of both processes are
considered.

---
 drivers/infiniband/core/uverbs_cmd.c  | 2 +-
 drivers/infiniband/core/uverbs_main.c | 2 +-
 include/rdma/uverbs_types.h           | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe March 17, 2025, 7:31 p.m. UTC | #1
On Thu, Mar 13, 2025 at 07:08:32AM +0200, Parav Pandit wrote:
> Currently, the capability check is done on the current process which
> may have the CAP_NET_RAW capability, but such process may not have
> opened the file. A file may could have been opened by a lesser
> privilege process that does not possess the CAP_NET_RAW capability.

> To avoid such situations, perform the capability checks against
> the file's credentials. This approach ensures that the capabilities
> of the process that opened the file are enforced.
> 
> Fixes: c938a616aadb ("IB/core: Add raw packet QP type")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> 
> ---
> 
> Eric,
> 
> Shouldn't we check the capabilities of the process that opened the
> file and also the current process that is issuing the create_flow()
> ioctl? This way, the minimum capabilities of both processes are
> considered.

I would say no, that is not our model in RDMA. The process that opens
the file is irrelevant. We only check the current system call context
for capability, much like any other systemcall.

Jason
Parav Pandit March 18, 2025, 3:43 a.m. UTC | #2
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 18, 2025 1:02 AM
> 
> On Thu, Mar 13, 2025 at 07:08:32AM +0200, Parav Pandit wrote:
> > Currently, the capability check is done on the current process which
> > may have the CAP_NET_RAW capability, but such process may not have
> > opened the file. A file may could have been opened by a lesser
> > privilege process that does not possess the CAP_NET_RAW capability.
> 
> > To avoid such situations, perform the capability checks against the
> > file's credentials. This approach ensures that the capabilities of the
> > process that opened the file are enforced.
> >
> > Fixes: c938a616aadb ("IB/core: Add raw packet QP type")
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> >
> > ---
> >
> > Eric,
> >
> > Shouldn't we check the capabilities of the process that opened the
> > file and also the current process that is issuing the create_flow()
> > ioctl? This way, the minimum capabilities of both processes are
> > considered.
> 
> I would say no, that is not our model in RDMA. The process that opens the file
> is irrelevant. We only check the current system call context for capability,
> much like any other systemcall.
> 
Eric explained the motivation [1] and [2] for this fix is:
A lesser privilege process A opens the fd (currently caps are not checked), passes the fd to a higher privilege process B.
And somehow let process B pass the needed capabilities check for resource creation, after which process A continue to use the resource without capability.

[1] https://lore.kernel.org/linux-rdma/87ecz4q27k.fsf@email.froward.int.ebiederm.org/
[2] https://lore.kernel.org/linux-rdma/87msdsoism.fsf@email.froward.int.ebiederm.org/


> Jason
Jason Gunthorpe March 18, 2025, 11:20 a.m. UTC | #3
On Tue, Mar 18, 2025 at 03:43:07AM +0000, Parav Pandit wrote:

> > I would say no, that is not our model in RDMA. The process that opens the file
> > is irrelevant. We only check the current system call context for capability,
> > much like any other systemcall.
> >
> Eric explained the motivation [1] and [2] for this fix is:
> A lesser privilege process A opens the fd (currently caps are not
> checked), passes the fd to a higher privilege process B.

> And somehow let process B pass the needed capabilities check for
> resource creation, after which process A continue to use the
> resource without capability.

Yes, I'd say that is fine within our model, and may even be desirable
in some cases.

We don't use a file descriptor linked security model, it is always
secured based on the individual ioctl system call. The file descriptor
is just a way to route the system calls.

The "setuid cat" risk is interesting, but we are supposed to be
preventing that by using ioctl, no 'cat' program is going to randomly
execute ioctls on stdout.

You would not say that if process B creates a CAP_NET_RAW socket FD
and passes it to process A without CAP_NET_RAW then A should not be
able to use the FD.

The same principle holds here too, the object handles scoped inside
the FD should have the same kind of security properties as a normal FD
would.

Jason
Parav Pandit March 18, 2025, 12:30 p.m. UTC | #4
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 18, 2025 4:51 PM
> 
> On Tue, Mar 18, 2025 at 03:43:07AM +0000, Parav Pandit wrote:
> 
> > > I would say no, that is not our model in RDMA. The process that
> > > opens the file is irrelevant. We only check the current system call
> > > context for capability, much like any other systemcall.
> > >
> > Eric explained the motivation [1] and [2] for this fix is:
> > A lesser privilege process A opens the fd (currently caps are not
> > checked), passes the fd to a higher privilege process B.
> 
> > And somehow let process B pass the needed capabilities check for
> > resource creation, after which process A continue to use the resource
> > without capability.
> 
> Yes, I'd say that is fine within our model, and may even be desirable in some
> cases.
>
Is this subsystem specific?
I was thinking it is generic enough to all configurations done through ioctl().
For example, I don't see any difference between [1] and rdma.

[1] https://github.com/torvalds/linux/blob/76b6905c11fd3c6dc4562aefc3e8c4429fefae1e/block/ioctl.c#L441
 
> We don't use a file descriptor linked security model, it is always secured based
> on the individual ioctl system call. The file descriptor is just a way to route the
> system calls.
If I understood right, Eric suggests to improve this model by file level additional checks.

> 
> The "setuid cat" risk is interesting, but we are supposed to be preventing that
> by using ioctl, no 'cat' program is going to randomly execute ioctls on stdout.
> 
> You would not say that if process B creates a CAP_NET_RAW socket FD and
> passes it to process A without CAP_NET_RAW then A should not be able to
> use the FD.
Well, process B is higher privilege which shared the socket FD.

This is what I was asking in this patch to Eric above, should we have the min() check of both the process?

Or may be sharing the fd is somewhat special case, and generically it should be check when sharing the fd itself?

Each has tradeoffs; would like to understand what is the current generic model that we can adopt in rdma subsystem too.

> 
> The same principle holds here too, the object handles scoped inside the FD
> should have the same kind of security properties as a normal FD would.
> 
> Jason
Jason Gunthorpe March 18, 2025, 12:44 p.m. UTC | #5
On Tue, Mar 18, 2025 at 12:30:18PM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, March 18, 2025 4:51 PM
> > 
> > On Tue, Mar 18, 2025 at 03:43:07AM +0000, Parav Pandit wrote:
> > 
> > > > I would say no, that is not our model in RDMA. The process that
> > > > opens the file is irrelevant. We only check the current system call
> > > > context for capability, much like any other systemcall.
> > > >
> > > Eric explained the motivation [1] and [2] for this fix is:
> > > A lesser privilege process A opens the fd (currently caps are not
> > > checked), passes the fd to a higher privilege process B.
> > 
> > > And somehow let process B pass the needed capabilities check for
> > > resource creation, after which process A continue to use the resource
> > > without capability.
> > 
> > Yes, I'd say that is fine within our model, and may even be desirable in some
> > cases.
>
> Is this subsystem specific?

Probably.. How a FD works and it's security model is very specific to
each FD type.

> I was thinking it is generic enough to all configurations done through ioctl().
> For example, I don't see any difference between [1] and rdma.
> 
> [1] https://github.com/torvalds/linux/blob/76b6905c11fd3c6dc4562aefc3e8c4429fefae1e/block/ioctl.c#L441

Isn't that the same thing? roset is an ioctl on a block char dev file
descriptor. It doesn't check the capability of the process that opened
the file.

> > We don't use a file descriptor linked security model, it is always secured based
> > on the individual ioctl system call. The file descriptor is just a way to route the
> > system calls.
> If I understood right, Eric suggests to improve this model by file level additional checks.

Yes, but I'm not sure it is an improvement, or that it won't cause
regressions.

> > You would not say that if process B creates a CAP_NET_RAW socket FD and
> > passes it to process A without CAP_NET_RAW then A should not be able to
> > use the FD.
> Well, process B is higher privilege which shared the socket FD.

Yes, and?
 
> This is what I was asking in this patch to Eric above, should we have the min() check of both the process?

No. We don't do it above for sockets, we shouldn't do it for RDMA
objects either.

Jason
Eric W. Biederman March 18, 2025, 8 p.m. UTC | #6
Jason Gunthorpe <jgg@nvidia.com> writes:

> On Tue, Mar 18, 2025 at 03:43:07AM +0000, Parav Pandit wrote:
>
>> > I would say no, that is not our model in RDMA. The process that opens the file
>> > is irrelevant. We only check the current system call context for capability,
>> > much like any other systemcall.
>> >
>> Eric explained the motivation [1] and [2] for this fix is:
>> A lesser privilege process A opens the fd (currently caps are not
>> checked), passes the fd to a higher privilege process B.
>
>> And somehow let process B pass the needed capabilities check for
>> resource creation, after which process A continue to use the
>> resource without capability.
>
> Yes, I'd say that is fine within our model, and may even be desirable
> in some cases.
>
> We don't use a file descriptor linked security model, it is always
> secured based on the individual ioctl system call. The file descriptor
> is just a way to route the system calls.
>
> The "setuid cat" risk is interesting, but we are supposed to be
> preventing that by using ioctl, no 'cat' program is going to randomly
> execute ioctls on stdout.

I guess I see a few places where inifiniband uses ioctl.

There are also a lot of places where inifinband uses raw read/write on
file descriptors.  I think last time I looked infiniband wasn't even using
ioctl.

Now maybe using an ioctl is the best you can do at this point, because
of some backwards compatibility. 

> You would not say that if process B creates a CAP_NET_RAW socket FD
> and passes it to process A without CAP_NET_RAW then A should not be
> able to use the FD.

But that is exactly what the infiniband security check were are talking
about appears to be doing.  It is using the credentials of process A
and failing after it was passed by process B.

> The same principle holds here too, the object handles scoped inside
> the FD should have the same kind of security properties as a normal FD
> would.

Which is fine as far as I understand it is fine.  The creation check is
what we were talking about.

Taking from your example above.  If process B with CAP_NET_RAW creates a
FD for opening queue pairs and passes it to process A without
CAP_NET_RAW then A is not able to create queue pairs.

That is what the code in
drivers/infiniband/core/ubvers_cmd.c:create_qp() currenty says.

That is what has us confused.  Exactly the kind of thing you said should
not be happening is happening.

Eric
Jason Gunthorpe March 18, 2025, 10:57 p.m. UTC | #7
On Tue, Mar 18, 2025 at 03:00:15PM -0500, Eric W. Biederman wrote:

> There are also a lot of places where inifinband uses raw read/write on
> file descriptors.  I think last time I looked infiniband wasn't even using
> ioctl.

Yeah, that's all deprecated now, and it had some major security issue
with the 'setuid cat' attack. IIRC it was mitigated by disallowing
read/write from a process with different credentials than the process
that opened the FD. This caused regressions which were resolved by
moving to ioctl.

Today you can compile the read/write interface out of the kernel - for
the last uh 6 years or so the userspace has exclusively used ioctl.

> > You would not say that if process B creates a CAP_NET_RAW socket FD
> > and passes it to process A without CAP_NET_RAW then A should not be
> > able to use the FD.
> 
> But that is exactly what the infiniband security check were are talking
> about appears to be doing.  It is using the credentials of process A
> and failing after it was passed by process B.

I'm not sure what you are refering too? The model should be that the
process invoking the system call is the one that provides the
capability set.

It is entirely possible that the code is wrong, but the above was the
intention.

> Taking from your example above.  If process B with CAP_NET_RAW creates a
> FD for opening queue pairs and passes it to process A without
> CAP_NET_RAW then A is not able to create queue pairs.

Yes that's right, because the FD itself has no security properties at
all, it is just a conduit for calling into the kernel.

Process A cannot create raw queue pairs in the same way that Process A
cannot create raw sockets, it doesn't matter what where the FD came
from.

> That is what the code in
> drivers/infiniband/core/ubvers_cmd.c:create_qp() currenty says.

I'm not sure what you are referring to here? That function is called
on the system call path, and at least the intention was that this:

        case IB_QPT_RAW_PACKET:
                if (!capable(CAP_NET_RAW))
                        return -EPERM;
                break;

Would check the current task invoking the system call to see if that
task has the required capability.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 96d639e1ffa0..e028454bcd7e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3217,7 +3217,7 @@  static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 	if (cmd.comp_mask)
 		return -EINVAL;
 
-	if (!capable(CAP_NET_RAW))
+	if (!file_ns_capable(attrs->ufile->filp, &init_user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (cmd.flow_attr.flags >= IB_FLOW_ATTR_FLAGS_RESERVED)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 973fe2c7ef53..8e5ee702e9f8 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -993,7 +993,7 @@  static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
 
 	setup_ufile_idr_uobject(file);
-
+	file->filp = filp;
 	return stream_open(inode, filp);
 
 err_module:
diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h
index 26ba919ac245..06f57d28d349 100644
--- a/include/rdma/uverbs_types.h
+++ b/include/rdma/uverbs_types.h
@@ -181,6 +181,7 @@  struct ib_uverbs_file {
 	struct xarray		idr;
 
 	struct mutex disassociation_lock;
+	struct file *filp;
 };
 
 extern const struct uverbs_obj_type_class uverbs_idr_class;