Message ID | 20240801074415.1033323-1-huangjunxian6@hisilicon.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [for-rc] RDMA/srpt: Fix UAF when srpt_add_one() failed | expand |
On Thu, Aug 01, 2024 at 03:44:15PM +0800, Junxian Huang wrote: > Currently cancel_work_sync() is not called when srpt_refresh_port() > failed in srpt_add_one(). There is a probability that sdev has been > freed while the previously initiated sport->work is still running, > leading to a UAF as the log below: > > [ T880] ib_srpt MAD registration failed for hns_1-1. > [ T880] ib_srpt srpt_add_one(hns_1) failed. > [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 > ... > [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] > ... > [ T376] Call trace: > [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] > [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] > [ T376] process_one_work+0x1d8/0x4cc > [ T376] worker_thread+0x158/0x410 > [ T376] kthread+0x108/0x13c > [ T376] ret_from_fork+0x10/0x18 > > Add cancel_work_sync() to the exception branch to fix this UAF. > > Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") > Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> > --- > drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c > index 9632afbd727b..244e5c115bf7 100644 > --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) > { > struct srpt_device *sdev; > struct srpt_port *sport; > + u32 i, j; > int ret; > - u32 i; > > pr_debug("device = %p\n", device); > > @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) > if (ret) { > pr_err("MAD registration failed for %s-%d.\n", > dev_name(&sdev->device->dev), i); > - i--; > goto err_port; > } > } > @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) > return 0; > > err_port: > + for (j = i, i--; j > 0; j--)a > + cancel_work_sync(&sdev->port[j - 1].work); There is no need in extra variable, the following code will do the same: while (i--) cancel_work_sync(&sdev->port[i].work); > srpt_unregister_mad_agent(sdev, i); > err_cm: > if (sdev->cm_id) > -- > 2.33.0 >
On 2024/8/1 18:37, Leon Romanovsky wrote: > On Thu, Aug 01, 2024 at 03:44:15PM +0800, Junxian Huang wrote: >> Currently cancel_work_sync() is not called when srpt_refresh_port() >> failed in srpt_add_one(). There is a probability that sdev has been >> freed while the previously initiated sport->work is still running, >> leading to a UAF as the log below: >> >> [ T880] ib_srpt MAD registration failed for hns_1-1. >> [ T880] ib_srpt srpt_add_one(hns_1) failed. >> [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 >> ... >> [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] >> ... >> [ T376] Call trace: >> [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] >> [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] >> [ T376] process_one_work+0x1d8/0x4cc >> [ T376] worker_thread+0x158/0x410 >> [ T376] kthread+0x108/0x13c >> [ T376] ret_from_fork+0x10/0x18 >> >> Add cancel_work_sync() to the exception branch to fix this UAF. >> >> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> >> --- >> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c >> index 9632afbd727b..244e5c115bf7 100644 >> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c >> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c >> @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) >> { >> struct srpt_device *sdev; >> struct srpt_port *sport; >> + u32 i, j; >> int ret; >> - u32 i; >> >> pr_debug("device = %p\n", device); >> >> @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) >> if (ret) { >> pr_err("MAD registration failed for %s-%d.\n", >> dev_name(&sdev->device->dev), i); >> - i--; >> goto err_port; >> } >> } >> @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) >> return 0; >> >> err_port: >> + for (j = i, i--; j > 0; j--)a >> + cancel_work_sync(&sdev->port[j - 1].work); > > There is no need in extra variable, the following code will do the same: > > while (i--) > cancel_work_sync(&sdev->port[i].work); > >> srpt_unregister_mad_agent(sdev, i); i is also used here. Junxian >> err_cm: >> if (sdev->cm_id) >> -- >> 2.33.0 >> >
On Thu, Aug 01, 2024 at 07:02:41PM +0800, Junxian Huang wrote: > > > On 2024/8/1 18:37, Leon Romanovsky wrote: > > On Thu, Aug 01, 2024 at 03:44:15PM +0800, Junxian Huang wrote: > >> Currently cancel_work_sync() is not called when srpt_refresh_port() > >> failed in srpt_add_one(). There is a probability that sdev has been > >> freed while the previously initiated sport->work is still running, > >> leading to a UAF as the log below: > >> > >> [ T880] ib_srpt MAD registration failed for hns_1-1. > >> [ T880] ib_srpt srpt_add_one(hns_1) failed. > >> [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 > >> ... > >> [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] > >> ... > >> [ T376] Call trace: > >> [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] > >> [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] > >> [ T376] process_one_work+0x1d8/0x4cc > >> [ T376] worker_thread+0x158/0x410 > >> [ T376] kthread+0x108/0x13c > >> [ T376] ret_from_fork+0x10/0x18 > >> > >> Add cancel_work_sync() to the exception branch to fix this UAF. > >> > >> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") > >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> > >> --- > >> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c > >> index 9632afbd727b..244e5c115bf7 100644 > >> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > >> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > >> @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) > >> { > >> struct srpt_device *sdev; > >> struct srpt_port *sport; > >> + u32 i, j; > >> int ret; > >> - u32 i; > >> > >> pr_debug("device = %p\n", device); > >> > >> @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) > >> if (ret) { > >> pr_err("MAD registration failed for %s-%d.\n", > >> dev_name(&sdev->device->dev), i); > >> - i--; > >> goto err_port; > >> } > >> } > >> @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) > >> return 0; > >> > >> err_port: > >> + for (j = i, i--; j > 0; j--)a > >> + cancel_work_sync(&sdev->port[j - 1].work); > > > > There is no need in extra variable, the following code will do the same: > > > > while (i--) > > cancel_work_sync(&sdev->port[i].work); > > > >> srpt_unregister_mad_agent(sdev, i); > > i is also used here. So put cancel_work_sync() there. Thanks > > Junxian > > >> err_cm: > >> if (sdev->cm_id) > >> -- > >> 2.33.0 > >> > > >
On 2024/8/1 19:30, Leon Romanovsky wrote: > On Thu, Aug 01, 2024 at 07:02:41PM +0800, Junxian Huang wrote: >> >> >> On 2024/8/1 18:37, Leon Romanovsky wrote: >>> On Thu, Aug 01, 2024 at 03:44:15PM +0800, Junxian Huang wrote: >>>> Currently cancel_work_sync() is not called when srpt_refresh_port() >>>> failed in srpt_add_one(). There is a probability that sdev has been >>>> freed while the previously initiated sport->work is still running, >>>> leading to a UAF as the log below: >>>> >>>> [ T880] ib_srpt MAD registration failed for hns_1-1. >>>> [ T880] ib_srpt srpt_add_one(hns_1) failed. >>>> [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 >>>> ... >>>> [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] >>>> ... >>>> [ T376] Call trace: >>>> [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] >>>> [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] >>>> [ T376] process_one_work+0x1d8/0x4cc >>>> [ T376] worker_thread+0x158/0x410 >>>> [ T376] kthread+0x108/0x13c >>>> [ T376] ret_from_fork+0x10/0x18 >>>> >>>> Add cancel_work_sync() to the exception branch to fix this UAF. >>>> >>>> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") >>>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> >>>> --- >>>> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c >>>> index 9632afbd727b..244e5c115bf7 100644 >>>> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c >>>> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c >>>> @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) >>>> { >>>> struct srpt_device *sdev; >>>> struct srpt_port *sport; >>>> + u32 i, j; >>>> int ret; >>>> - u32 i; >>>> >>>> pr_debug("device = %p\n", device); >>>> >>>> @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) >>>> if (ret) { >>>> pr_err("MAD registration failed for %s-%d.\n", >>>> dev_name(&sdev->device->dev), i); >>>> - i--; >>>> goto err_port; >>>> } >>>> } >>>> @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) >>>> return 0; >>>> >>>> err_port: >>>> + for (j = i, i--; j > 0; j--)a >>>> + cancel_work_sync(&sdev->port[j - 1].work); >>> >>> There is no need in extra variable, the following code will do the same: >>> >>> while (i--) >>> cancel_work_sync(&sdev->port[i].work); >>> >>>> srpt_unregister_mad_agent(sdev, i); >> >> i is also used here. > > So put cancel_work_sync() there. > > Thanks > Sure. Junxian >> >> Junxian >> >>>> err_cm: >>>> if (sdev->cm_id) >>>> -- >>>> 2.33.0 >>>> >>> >>
在 2024/8/1 15:44, Junxian Huang 写道: > Currently cancel_work_sync() is not called when srpt_refresh_port() > failed in srpt_add_one(). There is a probability that sdev has been > freed while the previously initiated sport->work is still running, > leading to a UAF as the log below: > > [ T880] ib_srpt MAD registration failed for hns_1-1. > [ T880] ib_srpt srpt_add_one(hns_1) failed. > [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 > ... > [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] > ... > [ T376] Call trace: > [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] > [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] > [ T376] process_one_work+0x1d8/0x4cc > [ T376] worker_thread+0x158/0x410 > [ T376] kthread+0x108/0x13c > [ T376] ret_from_fork+0x10/0x18 > > Add cancel_work_sync() to the exception branch to fix this UAF. Can you share the method to reproduce this problem? I am interested in this problem. Thanks, Zhu Yanjun > > Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") > Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> > --- > drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c > index 9632afbd727b..244e5c115bf7 100644 > --- a/drivers/infiniband/ulp/srpt/ib_srpt.c > +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c > @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) > { > struct srpt_device *sdev; > struct srpt_port *sport; > + u32 i, j; > int ret; > - u32 i; > > pr_debug("device = %p\n", device); > > @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) > if (ret) { > pr_err("MAD registration failed for %s-%d.\n", > dev_name(&sdev->device->dev), i); > - i--; > goto err_port; > } > } > @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) > return 0; > > err_port: > + for (j = i, i--; j > 0; j--) > + cancel_work_sync(&sdev->port[j - 1].work); > srpt_unregister_mad_agent(sdev, i); > err_cm: > if (sdev->cm_id)
On 2024/8/2 5:55, Zhu Yanjun wrote: > 在 2024/8/1 15:44, Junxian Huang 写道: >> Currently cancel_work_sync() is not called when srpt_refresh_port() >> failed in srpt_add_one(). There is a probability that sdev has been >> freed while the previously initiated sport->work is still running, >> leading to a UAF as the log below: >> >> [ T880] ib_srpt MAD registration failed for hns_1-1. >> [ T880] ib_srpt srpt_add_one(hns_1) failed. >> [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 >> ... >> [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] >> ... >> [ T376] Call trace: >> [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] >> [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] >> [ T376] process_one_work+0x1d8/0x4cc >> [ T376] worker_thread+0x158/0x410 >> [ T376] kthread+0x108/0x13c >> [ T376] ret_from_fork+0x10/0x18 >> >> Add cancel_work_sync() to the exception branch to fix this UAF. > > Can you share the method to reproduce this problem? > I am interested in this problem. > > Thanks, > Zhu Yanjun > I was testing bonding in 5.10 kernel, doing ifenslave bond0 eth0 eth1; ifenslave -d bond0 eth0 eth1 and ethtool --reset eth0 dedicated; ethtool --reset eth1 dedicated concurrently and infinitely. But I think this problem has been fixed in latest mainline probably. Please look into my reply to Bart in v2. Junxian >> >> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> >> --- >> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c >> index 9632afbd727b..244e5c115bf7 100644 >> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c >> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c >> @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) >> { >> struct srpt_device *sdev; >> struct srpt_port *sport; >> + u32 i, j; >> int ret; >> - u32 i; >> pr_debug("device = %p\n", device); >> @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) >> if (ret) { >> pr_err("MAD registration failed for %s-%d.\n", >> dev_name(&sdev->device->dev), i); >> - i--; >> goto err_port; >> } >> } >> @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) >> return 0; >> err_port: >> + for (j = i, i--; j > 0; j--) >> + cancel_work_sync(&sdev->port[j - 1].work); >> srpt_unregister_mad_agent(sdev, i); >> err_cm: >> if (sdev->cm_id) >
在 2024/8/2 10:28, Junxian Huang 写道: > > > On 2024/8/2 5:55, Zhu Yanjun wrote: >> 在 2024/8/1 15:44, Junxian Huang 写道: >>> Currently cancel_work_sync() is not called when srpt_refresh_port() >>> failed in srpt_add_one(). There is a probability that sdev has been >>> freed while the previously initiated sport->work is still running, >>> leading to a UAF as the log below: >>> >>> [ T880] ib_srpt MAD registration failed for hns_1-1. >>> [ T880] ib_srpt srpt_add_one(hns_1) failed. >>> [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 >>> ... >>> [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] >>> ... >>> [ T376] Call trace: >>> [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] >>> [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] >>> [ T376] process_one_work+0x1d8/0x4cc >>> [ T376] worker_thread+0x158/0x410 >>> [ T376] kthread+0x108/0x13c >>> [ T376] ret_from_fork+0x10/0x18 >>> >>> Add cancel_work_sync() to the exception branch to fix this UAF. >> >> Can you share the method to reproduce this problem? >> I am interested in this problem. >> >> Thanks, >> Zhu Yanjun >> > > I was testing bonding in 5.10 kernel, doing > ifenslave bond0 eth0 eth1; ifenslave -d bond0 eth0 eth1 > and > ethtool --reset eth0 dedicated; ethtool --reset eth1 dedicated > concurrently and infinitely. > > But I think this problem has been fixed in latest mainline probably. > Please look into my reply to Bart in v2. Thanks a lot. I am working on srq. Because ib_srpt also supports srq, I just want to confirm if srq of ib_srpt can also work well on RoCEv2 and IB with this commit. Zhu Yanjun > > Junxian > >>> >>> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") >>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> >>> --- >>> drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c >>> index 9632afbd727b..244e5c115bf7 100644 >>> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c >>> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c >>> @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) >>> { >>> struct srpt_device *sdev; >>> struct srpt_port *sport; >>> + u32 i, j; >>> int ret; >>> - u32 i; >>> pr_debug("device = %p\n", device); >>> @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) >>> if (ret) { >>> pr_err("MAD registration failed for %s-%d.\n", >>> dev_name(&sdev->device->dev), i); >>> - i--; >>> goto err_port; >>> } >>> } >>> @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) >>> return 0; >>> err_port: >>> + for (j = i, i--; j > 0; j--) >>> + cancel_work_sync(&sdev->port[j - 1].work); >>> srpt_unregister_mad_agent(sdev, i); >>> err_cm: >>> if (sdev->cm_id) >>
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 9632afbd727b..244e5c115bf7 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -3148,8 +3148,8 @@ static int srpt_add_one(struct ib_device *device) { struct srpt_device *sdev; struct srpt_port *sport; + u32 i, j; int ret; - u32 i; pr_debug("device = %p\n", device); @@ -3226,7 +3226,6 @@ static int srpt_add_one(struct ib_device *device) if (ret) { pr_err("MAD registration failed for %s-%d.\n", dev_name(&sdev->device->dev), i); - i--; goto err_port; } } @@ -3241,6 +3240,8 @@ static int srpt_add_one(struct ib_device *device) return 0; err_port: + for (j = i, i--; j > 0; j--) + cancel_work_sync(&sdev->port[j - 1].work); srpt_unregister_mad_agent(sdev, i); err_cm: if (sdev->cm_id)
Currently cancel_work_sync() is not called when srpt_refresh_port() failed in srpt_add_one(). There is a probability that sdev has been freed while the previously initiated sport->work is still running, leading to a UAF as the log below: [ T880] ib_srpt MAD registration failed for hns_1-1. [ T880] ib_srpt srpt_add_one(hns_1) failed. [ T376] Unable to handle kernel paging request at virtual address 0000000000010008 ... [ T376] Workqueue: events srpt_refresh_port_work [ib_srpt] ... [ T376] Call trace: [ T376] srpt_refresh_port+0x94/0x264 [ib_srpt] [ T376] srpt_refresh_port_work+0x1c/0x2c [ib_srpt] [ T376] process_one_work+0x1d8/0x4cc [ T376] worker_thread+0x158/0x410 [ T376] kthread+0x108/0x13c [ T376] ret_from_fork+0x10/0x18 Add cancel_work_sync() to the exception branch to fix this UAF. Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> --- drivers/infiniband/ulp/srpt/ib_srpt.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)