diff mbox series

[for-next,9/9] RDMA/rxe: Protect pending send packets

Message ID 20230721205021.5394-10-rpearsonhpe@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series RDMA/rxe: Misc fixes and cleanups | expand

Commit Message

Bob Pearson July 21, 2023, 8:50 p.m. UTC
Network interruptions may cause long delays in the processing of
send packets during which time the rxe driver may be unloaded.
This will cause seg faults when the packet is ultimately freed as
it calls the destructor function in the rxe driver. This has been
observed in cable pull fail over fail back testing.

This patch takes a reference on the driver to protect the packet
from this happening.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c     | 26 ++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe.h     |  3 +++
 drivers/infiniband/sw/rxe/rxe_net.c |  7 +++++++
 3 files changed, 36 insertions(+)

Comments

Jason Gunthorpe July 31, 2023, 6:17 p.m. UTC | #1
On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
> Network interruptions may cause long delays in the processing of
> send packets during which time the rxe driver may be unloaded.
> This will cause seg faults when the packet is ultimately freed as
> it calls the destructor function in the rxe driver. This has been
> observed in cable pull fail over fail back testing.

No, module reference counts are only for code that is touching
function pointers.

If your driver is becoming removed and that messes it up then you need
to prevent the driver from unloading by adding something to the remove
function (dellink, I guess in this case)

Jason
Bob Pearson July 31, 2023, 6:26 p.m. UTC | #2
On 7/31/23 13:17, Jason Gunthorpe wrote:
> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
>> Network interruptions may cause long delays in the processing of
>> send packets during which time the rxe driver may be unloaded.
>> This will cause seg faults when the packet is ultimately freed as
>> it calls the destructor function in the rxe driver. This has been
>> observed in cable pull fail over fail back testing.
> 
> No, module reference counts are only for code that is touching
> function pointers.

this is exactly the case here. it is the skb destructor function that
is carried by the skb.

> 
> If your driver is becoming removed and that messes it up then you need
> to prevent the driver from unloading by adding something to the remove
> function (dellink, I guess in this case)
> 
> Jason
Jason Gunthorpe July 31, 2023, 6:32 p.m. UTC | #3
On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote:
> On 7/31/23 13:17, Jason Gunthorpe wrote:
> > On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
> >> Network interruptions may cause long delays in the processing of
> >> send packets during which time the rxe driver may be unloaded.
> >> This will cause seg faults when the packet is ultimately freed as
> >> it calls the destructor function in the rxe driver. This has been
> >> observed in cable pull fail over fail back testing.
> > 
> > No, module reference counts are only for code that is touching
> > function pointers.
> 
> this is exactly the case here. it is the skb destructor function that
> is carried by the skb.

It can't possibly call it correctly without also having the rxe
ib_device reference too though??

Jason
Bob Pearson July 31, 2023, 6:44 p.m. UTC | #4
On 7/31/23 13:32, Jason Gunthorpe wrote:
> On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote:
>> On 7/31/23 13:17, Jason Gunthorpe wrote:
>>> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
>>>> Network interruptions may cause long delays in the processing of
>>>> send packets during which time the rxe driver may be unloaded.
>>>> This will cause seg faults when the packet is ultimately freed as
>>>> it calls the destructor function in the rxe driver. This has been
>>>> observed in cable pull fail over fail back testing.
>>>
>>> No, module reference counts are only for code that is touching
>>> function pointers.
>>
>> this is exactly the case here. it is the skb destructor function that
>> is carried by the skb.
> 
> It can't possibly call it correctly without also having the rxe
> ib_device reference too though??

Nope. This was causing seg faults in testing when there was a long network
hang and the admin tried to reload the rxe driver. The skb code doesn't care
about the ib device at all.

Bob
> 
> Jason
Jason Gunthorpe Aug. 1, 2023, 10:56 p.m. UTC | #5
On Mon, Jul 31, 2023 at 01:44:47PM -0500, Bob Pearson wrote:
> On 7/31/23 13:32, Jason Gunthorpe wrote:
> > On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote:
> >> On 7/31/23 13:17, Jason Gunthorpe wrote:
> >>> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
> >>>> Network interruptions may cause long delays in the processing of
> >>>> send packets during which time the rxe driver may be unloaded.
> >>>> This will cause seg faults when the packet is ultimately freed as
> >>>> it calls the destructor function in the rxe driver. This has been
> >>>> observed in cable pull fail over fail back testing.
> >>>
> >>> No, module reference counts are only for code that is touching
> >>> function pointers.
> >>
> >> this is exactly the case here. it is the skb destructor function that
> >> is carried by the skb.
> > 
> > It can't possibly call it correctly without also having the rxe
> > ib_device reference too though??
> 
> Nope. This was causing seg faults in testing when there was a long network
> hang and the admin tried to reload the rxe driver. The skb code doesn't care
> about the ib device at all.

I don't get it, there aren't globals in rxe, so WTF is it doing if it
isn't somehow tracing back to memory that is under the ib_device
lifetime?

Jason
Bob Pearson Aug. 2, 2023, 2:39 p.m. UTC | #6
On 8/1/23 17:56, Jason Gunthorpe wrote:
> On Mon, Jul 31, 2023 at 01:44:47PM -0500, Bob Pearson wrote:
>> On 7/31/23 13:32, Jason Gunthorpe wrote:
>>> On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote:
>>>> On 7/31/23 13:17, Jason Gunthorpe wrote:
>>>>> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
>>>>>> Network interruptions may cause long delays in the processing of
>>>>>> send packets during which time the rxe driver may be unloaded.
>>>>>> This will cause seg faults when the packet is ultimately freed as
>>>>>> it calls the destructor function in the rxe driver. This has been
>>>>>> observed in cable pull fail over fail back testing.
>>>>>
>>>>> No, module reference counts are only for code that is touching
>>>>> function pointers.
>>>>
>>>> this is exactly the case here. it is the skb destructor function that
>>>> is carried by the skb.
>>>
>>> It can't possibly call it correctly without also having the rxe
>>> ib_device reference too though??
>>
>> Nope. This was causing seg faults in testing when there was a long network
>> hang and the admin tried to reload the rxe driver. The skb code doesn't care
>> about the ib device at all.
> 
> I don't get it, there aren't globals in rxe, so WTF is it doing if it
> isn't somehow tracing back to memory that is under the ib_device
> lifetime?
> 
> Jason

When the rxe driver builds a send packet it puts the address of its destructor
subroutine in the skb before calling ip_local_out and sending it. The address of
driver software is now hanging around. If you don't delay the module exit routine
until all the skb's are freed you can cause seg faults. The only way to cause this to
happen is to call rmmod on the driver too early but people have done this occasionally
and report it as a bug.

Bob
Jason Gunthorpe Aug. 2, 2023, 2:57 p.m. UTC | #7
On Wed, Aug 02, 2023 at 09:39:55AM -0500, Bob Pearson wrote:
> On 8/1/23 17:56, Jason Gunthorpe wrote:
> > On Mon, Jul 31, 2023 at 01:44:47PM -0500, Bob Pearson wrote:
> >> On 7/31/23 13:32, Jason Gunthorpe wrote:
> >>> On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote:
> >>>> On 7/31/23 13:17, Jason Gunthorpe wrote:
> >>>>> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote:
> >>>>>> Network interruptions may cause long delays in the processing of
> >>>>>> send packets during which time the rxe driver may be unloaded.
> >>>>>> This will cause seg faults when the packet is ultimately freed as
> >>>>>> it calls the destructor function in the rxe driver. This has been
> >>>>>> observed in cable pull fail over fail back testing.
> >>>>>
> >>>>> No, module reference counts are only for code that is touching
> >>>>> function pointers.
> >>>>
> >>>> this is exactly the case here. it is the skb destructor function that
> >>>> is carried by the skb.
> >>>
> >>> It can't possibly call it correctly without also having the rxe
> >>> ib_device reference too though??
> >>
> >> Nope. This was causing seg faults in testing when there was a long network
> >> hang and the admin tried to reload the rxe driver. The skb code doesn't care
> >> about the ib device at all.
> > 
> > I don't get it, there aren't globals in rxe, so WTF is it doing if it
> > isn't somehow tracing back to memory that is under the ib_device
> > lifetime?
> > 
> > Jason
> 
> When the rxe driver builds a send packet it puts the address of its destructor
> subroutine in the skb before calling ip_local_out and sending it. The address of
> driver software is now hanging around. If you don't delay the module exit routine
> until all the skb's are freed you can cause seg faults. The only way to cause this to
> happen is to call rmmod on the driver too early but people have done this occasionally
> and report it as a bug.

You are missing the point, the destructor currently does this:

static void rxe_skb_tx_dtor(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;
	struct rxe_qp *qp = sk->sk_user_data;

So you've already UAF'd because rxe_qp is freed memory well before you
get to unloading the module.

This series changed it to do this:
 
 static void rxe_skb_tx_dtor(struct sk_buff *skb)
 {
	struct rxe_dev *rxe;
	unsigned int index;
	struct rxe_qp *qp;
	int skb_out;

	/* takes a ref on ib device if success */
	rxe = get_rxe_from_skb(skb);
	if (!rxe)
		goto out;

 
static struct rxe_dev *get_rxe_from_skb(struct sk_buff *skb)
{
	struct rxe_dev *rxe;
	struct net_device *ndev = skb->dev;

	rxe = rxe_get_dev_from_net(ndev);
	if (!rxe && is_vlan_dev(ndev))
		rxe = rxe_get_dev_from_net(vlan_dev_real_dev(ndev));


Which seems totally nutz, you are now relying on the global hash table
in ib_core to resolve the ib device.

Again, why can't this code do something sane like refcount the qp or
ib_device so the destruction doesn't progress until all the SKBs are
flushed out?

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 54c723a6edda..6b55c595f8f8 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -208,10 +208,33 @@  static struct rdma_link_ops rxe_link_ops = {
 	.newlink = rxe_newlink,
 };
 
+static struct rxe_module {
+	struct kref		ref_cnt;
+	struct completion	complete;
+} rxe_module;
+
+static void rxe_module_release(struct kref *kref)
+{
+	complete(&rxe_module.complete);
+}
+
+int rxe_module_get(void)
+{
+	return kref_get_unless_zero(&rxe_module.ref_cnt);
+}
+
+int rxe_module_put(void)
+{
+	return kref_put(&rxe_module.ref_cnt, rxe_module_release);
+}
+
 static int __init rxe_module_init(void)
 {
 	int err;
 
+	kref_init(&rxe_module.ref_cnt);
+	init_completion(&rxe_module.complete);
+
 	err = rxe_alloc_wq();
 	if (err)
 		return err;
@@ -229,6 +252,9 @@  static int __init rxe_module_init(void)
 
 static void __exit rxe_module_exit(void)
 {
+	rxe_module_put();
+	wait_for_completion(&rxe_module.complete);
+
 	rdma_link_unregister(&rxe_link_ops);
 	ib_unregister_driver(RDMA_DRIVER_RXE);
 	rxe_net_exit();
diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h
index d33dd6cf83d3..077e3ad8f39a 100644
--- a/drivers/infiniband/sw/rxe/rxe.h
+++ b/drivers/infiniband/sw/rxe/rxe.h
@@ -158,4 +158,7 @@  void rxe_port_up(struct rxe_dev *rxe);
 void rxe_port_down(struct rxe_dev *rxe);
 void rxe_set_port_state(struct rxe_dev *rxe);
 
+int rxe_module_get(void);
+int rxe_module_put(void);
+
 #endif /* RXE_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index c1b2eaf82334..0e447420a441 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -384,6 +384,7 @@  static void rxe_skb_tx_dtor(struct sk_buff *skb)
 out_put_ibdev:
 	ib_device_put(&rxe->ib_dev);
 out:
+	rxe_module_put();
 	return;
 }
 
@@ -400,6 +401,12 @@  static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	sock_hold(sk);
 	skb->sk = sk;
 
+	/* the module may be potentially removed while this packet
+	 * is waiting on the tx queue causing seg faults. So need
+	 * to protect the module.
+	 */
+	rxe_module_get();
+
 	atomic_inc(&pkt->qp->skb_out);
 
 	sk->sk_user_data = (void *)(long)qp->elem.index;