diff mbox series

[for-next] RDMA/hns: Add support function clear when removing module

Message ID 1555154941-55510-1-git-send-email-oulijun@huawei.com (mailing list archive)
State Superseded
Headers show
Series [for-next] RDMA/hns: Add support function clear when removing module | expand

Commit Message

Lijun Ou April 13, 2019, 11:29 a.m. UTC
To avoid resource unreleased while ULP aborted abnormally,
the hardware adds the capability of restoring the resource
while removing module, this patch enables this capability.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_device.h |   1 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 164 ++++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  27 ++++-
 3 files changed, 191 insertions(+), 1 deletion(-)

Comments

Leon Romanovsky April 16, 2019, 12:16 p.m. UTC | #1
On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
> To avoid resource unreleased while ULP aborted abnormally,
> the hardware adds the capability of restoring the resource
> while removing module, this patch enables this capability.

Can anyone help me to understand what does it mean?
How can ULP "abort" without releasing resources?

Thanks
Lijun Ou April 19, 2019, 7:46 a.m. UTC | #2
在 2019/4/16 20:16, Leon Romanovsky 写道:
> On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
>> To avoid resource unreleased while ULP aborted abnormally,
>> the hardware adds the capability of restoring the resource
>> while removing module, this patch enables this capability.
> Can anyone help me to understand what does it mean?
> How can ULP "abort" without releasing resources?
>
> Thanks

Maybe the commit description is not correct enough.

The entire PATCH is to solve the following scenarios. When a  function is abnormal, the hardware

need to release the relatived hardware reource and the entire release process is the same as the flr process.

It uses the firmware to reslove.  The hw design adds a firmware cmd to clear the hardware state and judge

the resource of hardware have freed.

As a result, the driver need to implement this cmd.

Thanks

Lijun Ou
Leon Romanovsky April 22, 2019, 12:22 p.m. UTC | #3
On Fri, Apr 19, 2019 at 03:46:32PM +0800, oulijun wrote:
> ??? 2019/4/16 20:16, Leon Romanovsky ??????:
> > On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
> >> To avoid resource unreleased while ULP aborted abnormally,
> >> the hardware adds the capability of restoring the resource
> >> while removing module, this patch enables this capability.
> > Can anyone help me to understand what does it mean?
> > How can ULP "abort" without releasing resources?
> >
> > Thanks
>
> Maybe the commit description is not correct enough.
>
> The entire PATCH is to solve the following scenarios. When a  function is abnormal, the hardware
>
> need to release the relatived hardware reource and the entire release process is the same as the flr process.
>
> It uses the firmware to reslove.  The hw design adds a firmware cmd to clear the hardware state and judge
>
> the resource of hardware have freed.
>
> As a result, the driver need to implement this cmd.

You explained what you are doing, but not why are you doing.

Thanks

>
> Thanks
>
> Lijun Ou
>
>
Lijun Ou April 22, 2019, 1:38 p.m. UTC | #4
在 2019/4/22 20:22, Leon Romanovsky 写道:
> On Fri, Apr 19, 2019 at 03:46:32PM +0800, oulijun wrote:
>> ??? 2019/4/16 20:16, Leon Romanovsky ??????:
>>> On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
>>>> To avoid resource unreleased while ULP aborted abnormally,
>>>> the hardware adds the capability of restoring the resource
>>>> while removing module, this patch enables this capability.
>>> Can anyone help me to understand what does it mean?
>>> How can ULP "abort" without releasing resources?
>>>
>>> Thanks
>> Maybe the commit description is not correct enough.
>>
>> The entire PATCH is to solve the following scenarios. When a  function is abnormal, the hardware
>>
>> need to release the relatived hardware reource and the entire release process is the same as the flr process.
>>
>> It uses the firmware to reslove.  The hw design adds a firmware cmd to clear the hardware state and judge
>>
>> the resource of hardware have freed.
>>
>> As a result, the driver need to implement this cmd.
> You explained what you are doing, but not why are you doing.
Hi, Leon
   if carried out unload operation When rdma app running, the hardware is too late to release and remain in hardware.
Under these circumstances, it maybe happen error if loaded hns driver and run app again.  In order to reslove it,
the hardware adds a function clear function to stop this function and clear the residual hardware resources in the function.

Thanks.
Lijun Ou
> Thanks
>
>> Thanks
>>
>> Lijun Ou
>>
>>
> .
>
Leon Romanovsky April 23, 2019, 3:23 p.m. UTC | #5
On Mon, Apr 22, 2019 at 09:38:37PM +0800, oulijun wrote:
> ??? 2019/4/22 20:22, Leon Romanovsky ??????:
> > On Fri, Apr 19, 2019 at 03:46:32PM +0800, oulijun wrote:
> >> ??? 2019/4/16 20:16, Leon Romanovsky ??????:
> >>> On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
> >>>> To avoid resource unreleased while ULP aborted abnormally,
> >>>> the hardware adds the capability of restoring the resource
> >>>> while removing module, this patch enables this capability.
> >>> Can anyone help me to understand what does it mean?
> >>> How can ULP "abort" without releasing resources?
> >>>
> >>> Thanks
> >> Maybe the commit description is not correct enough.
> >>
> >> The entire PATCH is to solve the following scenarios. When a  function is abnormal, the hardware
> >>
> >> need to release the relatived hardware reource and the entire release process is the same as the flr process.
> >>
> >> It uses the firmware to reslove.  The hw design adds a firmware cmd to clear the hardware state and judge
> >>
> >> the resource of hardware have freed.
> >>
> >> As a result, the driver need to implement this cmd.
> > You explained what you are doing, but not why are you doing.
> Hi, Leon
>    if carried out unload operation When rdma app running, the hardware is too late to release and remain in hardware.

It is responsibility of disassociate flow to clean such mess and various
unwind flows.

> Under these circumstances, it maybe happen error if loaded hns driver and run app again.  In order to reslove it,
> the hardware adds a function clear function to stop this function and clear the residual hardware resources in the function.

First, your initialization flow should do it always, second you need to
find the root cause of resource leakage in case of application was aborted.

>
> Thanks.
> Lijun Ou
> > Thanks
> >
> >> Thanks
> >>
> >> Lijun Ou
> >>
> >>
> > .
> >
>
>
Yixian Liu April 26, 2019, 10:12 a.m. UTC | #6
On 2019/4/23 23:23, Leon Romanovsky wrote:
> On Mon, Apr 22, 2019 at 09:38:37PM +0800, oulijun wrote:
>> ??? 2019/4/22 20:22, Leon Romanovsky ??????:
>>> On Fri, Apr 19, 2019 at 03:46:32PM +0800, oulijun wrote:
>>>> ??? 2019/4/16 20:16, Leon Romanovsky ??????:
>>>>> On Sat, Apr 13, 2019 at 07:29:01PM +0800, Lijun Ou wrote:
>>>>>> To avoid resource unreleased while ULP aborted abnormally,
>>>>>> the hardware adds the capability of restoring the resource
>>>>>> while removing module, this patch enables this capability.
>>>>> Can anyone help me to understand what does it mean?
>>>>> How can ULP "abort" without releasing resources?
>>>>>
>>>>> Thanks
>>>> Maybe the commit description is not correct enough.
>>>>
>>>> The entire PATCH is to solve the following scenarios. When a  function is abnormal, the hardware
>>>>
>>>> need to release the relatived hardware reource and the entire release process is the same as the flr process.
>>>>
>>>> It uses the firmware to reslove.  The hw design adds a firmware cmd to clear the hardware state and judge
>>>>
>>>> the resource of hardware have freed.
>>>>
>>>> As a result, the driver need to implement this cmd.
>>> You explained what you are doing, but not why are you doing.
>> Hi, Leon
>>    if carried out unload operation When rdma app running, the hardware is too late to release and remain in hardware.
> 
> It is responsibility of disassociate flow to clean such mess and various
> unwind flows.
> 
>> Under these circumstances, it maybe happen error if loaded hns driver and run app again.  In order to reslove it,
>> the hardware adds a function clear function to stop this function and clear the residual hardware resources in the function.
> 
> First, your initialization flow should do it always, second you need to
> find the root cause of resource leakage in case of application was aborted.
> 

Hi Leon,

    Thanks very much for your valuable doubt that make us to think about this functionality more deeply.

    Sorry that our description and response leads the disccussion to focus on the assumption that
    the application may abort abnormally and cann't notify the hardware to release previously requested
    resource anymore, and thus these resources remain in the hardware. Actually, current OFED driver frame
    has already constructed a mechanism to destroy these objs during rmmod ko for the cases application is
    still running or aborted abnormally. In another word, our assumption is wrong. There is no need to
    warry about the leakage of resource which applied by application.

    However, I have talked with our chip team about function clear functionality. We think it is necessary
    to inform the chip to perform the outstanding task and some cleanup work and restore hardware resources
    in time when rmmod ko. Otherwise, it is dangerous to reuse the hardware as it can not guarantee those work
    can be done well without the notification from our driver.

    Therefore, function clear functionality in this patch can make sure our hardware work properly.

Thanks.
Jason Gunthorpe April 26, 2019, 2:36 p.m. UTC | #7
On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:

>     However, I have talked with our chip team about function clear
>     functionality. We think it is necessary to inform the chip to
>     perform the outstanding task and some cleanup work and restore
>     hardware resources in time when rmmod ko. Otherwise, it is
>     dangerous to reuse the hardware as it can not guarantee those
>     work can be done well without the notification from our driver.

If it is dangerous to reuse the hardware then you have to do this
cleanup on device startup, not on device removal.

Jason
Leon Romanovsky April 26, 2019, 9:05 p.m. UTC | #8
On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
>
> >     However, I have talked with our chip team about function clear
> >     functionality. We think it is necessary to inform the chip to
> >     perform the outstanding task and some cleanup work and restore
> >     hardware resources in time when rmmod ko. Otherwise, it is
> >     dangerous to reuse the hardware as it can not guarantee those
> >     work can be done well without the notification from our driver.
>
> If it is dangerous to reuse the hardware then you have to do this
> cleanup on device startup, not on device removal.

Right, I can think about gazillion ways to brick such HW.
The simplest way will be to call SysRq during RDMA traffic
and no cleanup function will be called in such case.

Thanks

>
> Jason
Yixian Liu April 30, 2019, 8:27 a.m. UTC | #9
On 2019/4/27 5:05, Leon Romanovsky wrote:
> On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
>> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
>>
>>>     However, I have talked with our chip team about function clear
>>>     functionality. We think it is necessary to inform the chip to
>>>     perform the outstanding task and some cleanup work and restore
>>>     hardware resources in time when rmmod ko. Otherwise, it is
>>>     dangerous to reuse the hardware as it can not guarantee those
>>>     work can be done well without the notification from our driver.
>>
>> If it is dangerous to reuse the hardware then you have to do this
>> cleanup on device startup, not on device removal.
> 
> Right, I can think about gazillion ways to brick such HW.
> The simplest way will be to call SysRq during RDMA traffic
> and no cleanup function will be called in such case.
> 
> Thanks

Hi Jason and Leon,

	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
	through the chain break mechanism as a real pcie device. Alternatively, function clear
	is used as a notification to the hardware to stop accessing and ensure to not read or
	write DDR later. That is, the role of function clear to hip08 is similar as the chain
	break to pcie device.

	Without function clear, following problems would be happened:
	1) With current hardware design, the hardware request to the bus may not be able to wait
	   for respone, as the rquest (read or write) may arrive to processor after the hardware
	   has already returned to the destroy verbs from application, in this case, the access
	   error may happen.
	2) The traffic buffer applied from schedule module could not return back, it will affect
	   the traffic of other functions.

	Thus, we think it is more reasonable to do function clear on device removal.

Thanks.
Jason Gunthorpe May 2, 2019, 1:03 p.m. UTC | #10
On Tue, Apr 30, 2019 at 04:27:41PM +0800, Liuyixian (Eason) wrote:
> 
> 
> On 2019/4/27 5:05, Leon Romanovsky wrote:
> > On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
> >> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
> >>
> >>>     However, I have talked with our chip team about function clear
> >>>     functionality. We think it is necessary to inform the chip to
> >>>     perform the outstanding task and some cleanup work and restore
> >>>     hardware resources in time when rmmod ko. Otherwise, it is
> >>>     dangerous to reuse the hardware as it can not guarantee those
> >>>     work can be done well without the notification from our driver.
> >>
> >> If it is dangerous to reuse the hardware then you have to do this
> >> cleanup on device startup, not on device removal.
> > 
> > Right, I can think about gazillion ways to brick such HW.
> > The simplest way will be to call SysRq during RDMA traffic
> > and no cleanup function will be called in such case.
> > 
> > Thanks
> 
> Hi Jason and Leon,
> 
> 	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
> 	through the chain break mechanism as a real pcie device. Alternatively, function clear
> 	is used as a notification to the hardware to stop accessing and ensure to not read or
> 	write DDR later. That is, the role of function clear to hip08 is similar as the chain
> 	break to pcie device.

What? This hardware is broken and doesn't respond to the bus master
enable bit in the PCI config space??

Jason
Yixian Liu May 9, 2019, 10:50 a.m. UTC | #11
On 2019/5/2 21:03, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2019 at 04:27:41PM +0800, Liuyixian (Eason) wrote:
>>
>>
>> On 2019/4/27 5:05, Leon Romanovsky wrote:
>>> On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
>>>> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
>>>>
>>>>>     However, I have talked with our chip team about function clear
>>>>>     functionality. We think it is necessary to inform the chip to
>>>>>     perform the outstanding task and some cleanup work and restore
>>>>>     hardware resources in time when rmmod ko. Otherwise, it is
>>>>>     dangerous to reuse the hardware as it can not guarantee those
>>>>>     work can be done well without the notification from our driver.
>>>>
>>>> If it is dangerous to reuse the hardware then you have to do this
>>>> cleanup on device startup, not on device removal.
>>>
>>> Right, I can think about gazillion ways to brick such HW.
>>> The simplest way will be to call SysRq during RDMA traffic
>>> and no cleanup function will be called in such case.
>>>
>>> Thanks
>>
>> Hi Jason and Leon,
>>
>> 	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
>> 	through the chain break mechanism as a real pcie device. Alternatively, function clear
>> 	is used as a notification to the hardware to stop accessing and ensure to not read or
>> 	write DDR later. That is, the role of function clear to hip08 is similar as the chain
>> 	break to pcie device.
> 
> What? This hardware is broken and doesn't respond to the bus master
> enable bit in the PCI config space??
> 
Hi Jason,

Sorry to reply to you late.

Yes, the bus master enable bit should be set by a pcie device when startup and removal.
The hns (nic) module use it as well. However, we couldn't use/operate this bit in hip08
as it shares the PF(physical function) with nic. Therefore, we need function clear to
notify the hardware to do the cleanup thing and cache write back.

Thanks.
Yixian Liu May 15, 2019, 9:38 a.m. UTC | #12
On 2019/5/9 18:50, Liuyixian (Eason) wrote:
> 
> 
> On 2019/5/2 21:03, Jason Gunthorpe wrote:
>> On Tue, Apr 30, 2019 at 04:27:41PM +0800, Liuyixian (Eason) wrote:
>>>
>>>
>>> On 2019/4/27 5:05, Leon Romanovsky wrote:
>>>> On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
>>>>> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
>>>>>
>>>>>>     However, I have talked with our chip team about function clear
>>>>>>     functionality. We think it is necessary to inform the chip to
>>>>>>     perform the outstanding task and some cleanup work and restore
>>>>>>     hardware resources in time when rmmod ko. Otherwise, it is
>>>>>>     dangerous to reuse the hardware as it can not guarantee those
>>>>>>     work can be done well without the notification from our driver.
>>>>>
>>>>> If it is dangerous to reuse the hardware then you have to do this
>>>>> cleanup on device startup, not on device removal.
>>>>
>>>> Right, I can think about gazillion ways to brick such HW.
>>>> The simplest way will be to call SysRq during RDMA traffic
>>>> and no cleanup function will be called in such case.
>>>>
>>>> Thanks
>>>
>>> Hi Jason and Leon,
>>>
>>> 	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
>>> 	through the chain break mechanism as a real pcie device. Alternatively, function clear
>>> 	is used as a notification to the hardware to stop accessing and ensure to not read or
>>> 	write DDR later. That is, the role of function clear to hip08 is similar as the chain
>>> 	break to pcie device.
>>
>> What? This hardware is broken and doesn't respond to the bus master
>> enable bit in the PCI config space??
>>
> Hi Jason,
> 
> Sorry to reply to you late.
> 
> Yes, the bus master enable bit should be set by a pcie device when startup and removal.
> The hns (nic) module use it as well. However, we couldn't use/operate this bit in hip08
> as it shares the PF(physical function) with nic. Therefore, we need function clear to
> notify the hardware to do the cleanup thing and cache write back.
> 
> Thanks.
> 

Hi Jason and Leon,

Do you have further more suggestions?

Thanks.
Jason Gunthorpe May 15, 2019, 11:49 a.m. UTC | #13
On Wed, May 15, 2019 at 05:38:02PM +0800, Liuyixian (Eason) wrote:
> 
> 
> On 2019/5/9 18:50, Liuyixian (Eason) wrote:
> > 
> > 
> > On 2019/5/2 21:03, Jason Gunthorpe wrote:
> >> On Tue, Apr 30, 2019 at 04:27:41PM +0800, Liuyixian (Eason) wrote:
> >>>
> >>>
> >>> On 2019/4/27 5:05, Leon Romanovsky wrote:
> >>>> On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
> >>>>> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
> >>>>>
> >>>>>>     However, I have talked with our chip team about function clear
> >>>>>>     functionality. We think it is necessary to inform the chip to
> >>>>>>     perform the outstanding task and some cleanup work and restore
> >>>>>>     hardware resources in time when rmmod ko. Otherwise, it is
> >>>>>>     dangerous to reuse the hardware as it can not guarantee those
> >>>>>>     work can be done well without the notification from our driver.
> >>>>>
> >>>>> If it is dangerous to reuse the hardware then you have to do this
> >>>>> cleanup on device startup, not on device removal.
> >>>>
> >>>> Right, I can think about gazillion ways to brick such HW.
> >>>> The simplest way will be to call SysRq during RDMA traffic
> >>>> and no cleanup function will be called in such case.
> >>>>
> >>>> Thanks
> >>>
> >>> Hi Jason and Leon,
> >>>
> >>> 	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
> >>> 	through the chain break mechanism as a real pcie device. Alternatively, function clear
> >>> 	is used as a notification to the hardware to stop accessing and ensure to not read or
> >>> 	write DDR later. That is, the role of function clear to hip08 is similar as the chain
> >>> 	break to pcie device.
> >>
> >> What? This hardware is broken and doesn't respond to the bus master
> >> enable bit in the PCI config space??
> >>
> > Hi Jason,
> > 
> > Sorry to reply to you late.
> > 
> > Yes, the bus master enable bit should be set by a pcie device when startup and removal.
> > The hns (nic) module use it as well. However, we couldn't use/operate this bit in hip08
> > as it shares the PF(physical function) with nic. Therefore, we need function clear to
> > notify the hardware to do the cleanup thing and cache write back.
> > 
> > Thanks.
> > 
> 
> Hi Jason and Leon,
> 
> Do you have further more suggestions?

The approach seems completely wrong to me - no other driver is doing
something so sketchy. 

You need to explain why hns is so special

Jason
Yixian Liu May 16, 2019, 1:44 p.m. UTC | #14
On 2019/5/15 19:49, Jason Gunthorpe wrote:
> On Wed, May 15, 2019 at 05:38:02PM +0800, Liuyixian (Eason) wrote:
>>
>>
>> On 2019/5/9 18:50, Liuyixian (Eason) wrote:
>>>
>>>
>>> On 2019/5/2 21:03, Jason Gunthorpe wrote:
>>>> On Tue, Apr 30, 2019 at 04:27:41PM +0800, Liuyixian (Eason) wrote:
>>>>>
>>>>>
>>>>> On 2019/4/27 5:05, Leon Romanovsky wrote:
>>>>>> On Fri, Apr 26, 2019 at 11:36:56AM -0300, Jason Gunthorpe wrote:
>>>>>>> On Fri, Apr 26, 2019 at 06:12:11PM +0800, Liuyixian (Eason) wrote:
>>>>>>>
>>>>>>>>     However, I have talked with our chip team about function clear
>>>>>>>>     functionality. We think it is necessary to inform the chip to
>>>>>>>>     perform the outstanding task and some cleanup work and restore
>>>>>>>>     hardware resources in time when rmmod ko. Otherwise, it is
>>>>>>>>     dangerous to reuse the hardware as it can not guarantee those
>>>>>>>>     work can be done well without the notification from our driver.
>>>>>>>
>>>>>>> If it is dangerous to reuse the hardware then you have to do this
>>>>>>> cleanup on device startup, not on device removal.
>>>>>>
>>>>>> Right, I can think about gazillion ways to brick such HW.
>>>>>> The simplest way will be to call SysRq during RDMA traffic
>>>>>> and no cleanup function will be called in such case.
>>>>>>
>>>>>> Thanks
>>>>>
>>>>> Hi Jason and Leon,
>>>>>
>>>>> 	As hip08 is a fake pcie device, we could not disassociate and stop the hardware access
>>>>> 	through the chain break mechanism as a real pcie device. Alternatively, function clear
>>>>> 	is used as a notification to the hardware to stop accessing and ensure to not read or
>>>>> 	write DDR later. That is, the role of function clear to hip08 is similar as the chain
>>>>> 	break to pcie device.
>>>>
>>>> What? This hardware is broken and doesn't respond to the bus master
>>>> enable bit in the PCI config space??
>>>>
>>> Hi Jason,
>>>
>>> Sorry to reply to you late.
>>>
>>> Yes, the bus master enable bit should be set by a pcie device when startup and removal.
>>> The hns (nic) module use it as well. However, we couldn't use/operate this bit in hip08
>>> as it shares the PF(physical function) with nic. Therefore, we need function clear to
>>> notify the hardware to do the cleanup thing and cache write back.
>>>
>>> Thanks.
>>>
>>
>> Hi Jason and Leon,
>>
>> Do you have further more suggestions?
> 
> The approach seems completely wrong to me - no other driver is doing
> something so sketchy. 
> 
> You need to explain why hns is so special
> 
> Jason

Thanks, Jason. I will revisit the solution and feedback soon.
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 563cf39..2e35469 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -990,6 +990,7 @@  struct hns_roce_dev {
 	void			*priv;
 	struct workqueue_struct *irq_workq;
 	const struct hns_roce_dfx_hw *dfx;
+	u32			func_num;
 };
 
 static inline struct hns_roce_dev *to_hr_dev(struct ib_device *ib_dev)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index f155d2d..efaf4ee 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -1129,6 +1129,165 @@  static int hns_roce_cmq_query_hw_info(struct hns_roce_dev *hr_dev)
 	return 0;
 }
 
+static bool hns_roce_func_clr_chk_rst(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_v2_priv *priv = (struct hns_roce_v2_priv *)hr_dev->priv;
+	struct hnae3_handle *handle = priv->handle;
+	const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
+	unsigned long reset_cnt;
+	bool sw_resetting;
+	bool hw_resetting;
+
+	reset_cnt = ops->ae_dev_reset_cnt(handle);
+	hw_resetting = ops->get_hw_reset_stat(handle);
+	sw_resetting = ops->ae_dev_resetting(handle);
+
+	if (reset_cnt != hr_dev->reset_cnt || hw_resetting || sw_resetting)
+		return true;
+
+	return false;
+}
+
+static void hns_roce_func_clr_rst_prc(struct hns_roce_dev *hr_dev, int retval,
+				      int flag)
+{
+	struct hns_roce_v2_priv *priv = (struct hns_roce_v2_priv *)hr_dev->priv;
+	struct hnae3_handle *handle = priv->handle;
+	const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
+	unsigned long instance_stage;
+	unsigned long reset_cnt;
+	unsigned long end;
+	bool sw_resetting;
+	bool hw_resetting;
+
+	instance_stage = handle->rinfo.instance_state;
+	reset_cnt = ops->ae_dev_reset_cnt(handle);
+	hw_resetting = ops->get_hw_reset_stat(handle);
+	sw_resetting = ops->ae_dev_resetting(handle);
+
+	if (reset_cnt != hr_dev->reset_cnt) {
+		hr_dev->dis_db = true;
+		hr_dev->is_reset = true;
+		dev_info(hr_dev->dev, "Func clear success after reset.\n");
+	} else if (hw_resetting) {
+		hr_dev->dis_db = true;
+
+		dev_warn(hr_dev->dev,
+			 "Func clear is pending, device in resetting state.\n");
+		end = msecs_to_jiffies(HNS_ROCE_V2_HW_RST_TIMEOUT) + jiffies;
+		while (time_before(jiffies, end)) {
+			if (!ops->get_hw_reset_stat(handle)) {
+				hr_dev->is_reset = true;
+				dev_info(hr_dev->dev,
+					 "Func clear success after reset.\n");
+				return;
+			}
+			msleep(HNS_ROCE_V2_HW_RST_COMPLETION_WAIT);
+		}
+
+		dev_warn(hr_dev->dev, "Func clear failed.\n");
+	} else if (sw_resetting && instance_stage == HNS_ROCE_STATE_INIT) {
+		hr_dev->dis_db = true;
+
+		dev_warn(hr_dev->dev,
+			 "Func clear is pending, device in resetting state.\n");
+		end = msecs_to_jiffies(HNS_ROCE_V2_HW_RST_TIMEOUT) + jiffies;
+		while (time_before(jiffies, end)) {
+			if (ops->ae_dev_reset_cnt(handle) !=
+			    hr_dev->reset_cnt) {
+				hr_dev->is_reset = true;
+				dev_info(hr_dev->dev,
+					 "Func clear success after sw reset\n");
+				return;
+			}
+			msleep(HNS_ROCE_V2_HW_RST_COMPLETION_WAIT);
+		}
+
+		dev_warn(hr_dev->dev, "Func clear failed because of unfinished sw reset\n");
+	} else {
+		if (retval && !flag)
+			dev_warn(hr_dev->dev,
+				 "Func clear read failed, ret = %d.\n", retval);
+
+		dev_warn(hr_dev->dev, "Func clear failed.\n");
+	}
+}
+
+static void hns_roce_query_func_num(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_pf_func_num *resp;
+	struct hns_roce_cmq_desc desc;
+	int ret;
+
+	hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_QUERY_VF_NUM, true);
+	ret = hns_roce_cmq_send(hr_dev, &desc, 1);
+	if (ret) {
+		dev_err(hr_dev->dev, "Query vf count fail, ret = %d.\n",
+			 ret);
+		return;
+	}
+
+	resp = (struct hns_roce_pf_func_num *)desc.data;
+	hr_dev->func_num = resp->pf_own_func_num;
+}
+
+static void hns_roce_clear_func(struct hns_roce_dev *hr_dev, int vf_id)
+{
+	bool fclr_write_fail_flag = false;
+	struct hns_roce_func_clear *resp;
+	struct hns_roce_cmq_desc desc;
+	unsigned long end;
+	int ret = 0;
+
+	if (hns_roce_func_clr_chk_rst(hr_dev))
+		goto out;
+
+	hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_FUNC_CLEAR, false);
+	resp = (struct hns_roce_func_clear *)desc.data;
+	resp->rst_funcid_en = vf_id;
+
+	ret = hns_roce_cmq_send(hr_dev, &desc, 1);
+	if (ret) {
+		fclr_write_fail_flag = true;
+		dev_err(hr_dev->dev, "Func clear write failed, ret = %d.\n",
+			 ret);
+		goto out;
+	}
+
+	end = msecs_to_jiffies(HNS_ROCE_V2_FUNC_CLEAR_TIMEOUT_MSECS) + jiffies;
+
+	msleep(HNS_ROCE_V2_READ_FUNC_CLEAR_FLAG_INTERVAL);
+	while (time_before(jiffies, end)) {
+		if (hns_roce_func_clr_chk_rst(hr_dev))
+			goto out;
+
+		hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_FUNC_CLEAR,
+					      true);
+		resp->rst_funcid_en = vf_id;
+
+		ret = hns_roce_cmq_send(hr_dev, &desc, 1);
+		if (ret) {
+			msleep(HNS_ROCE_V2_READ_FUNC_CLEAR_FLAG_FAIL_WAIT);
+			continue;
+		}
+
+		if (roce_get_bit(resp->func_done, FUNC_CLEAR_RST_FUN_DONE_S)) {
+			if (vf_id == 0)
+				hr_dev->is_reset = true;
+			return;
+		}
+	}
+
+out:
+	dev_err(hr_dev->dev, "Func clear read vf_id %d fail.\n", vf_id);
+	hns_roce_func_clr_rst_prc(hr_dev, ret, fclr_write_fail_flag);
+}
+
+static void hns_roce_function_clear(struct hns_roce_dev *hr_dev)
+{
+	hns_roce_clear_func(hr_dev, 0);
+}
+
 static int hns_roce_query_fw_ver(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_query_fw_info *resp;
@@ -1479,6 +1638,8 @@  static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev)
 		return ret;
 	}
 
+	hns_roce_query_func_num(hr_dev);
+
 	if (hr_dev->pci_dev->revision == 0x21) {
 		ret = hns_roce_query_pf_timer_resource(hr_dev);
 		if (ret) {
@@ -1890,6 +2051,9 @@  static void hns_roce_v2_exit(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_v2_priv *priv = hr_dev->priv;
 
+	if (hr_dev->pci_dev->revision == 0x21)
+		hns_roce_function_clear(hr_dev);
+
 	hns_roce_free_link_table(hr_dev, &priv->tpq);
 	hns_roce_free_link_table(hr_dev, &priv->tsq);
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index edfdbe2..3f2c85f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -96,7 +96,10 @@ 
 #define HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE	2
 #define HNS_ROCE_V2_RSV_QPS			8
 
-#define HNS_ROCE_V2_HW_RST_TIMEOUT             1000
+/* Time out for hardware to complete reset */
+#define HNS_ROCE_V2_HW_RST_TIMEOUT		1000
+
+#define HNS_ROCE_V2_HW_RST_COMPLETION_WAIT	20
 
 #define HNS_ROCE_CONTEXT_HOP_NUM		1
 #define HNS_ROCE_SCCC_HOP_NUM			1
@@ -236,11 +239,13 @@  enum hns_roce_opcode_type {
 	HNS_ROCE_OPC_CFG_EXT_LLM			= 0x8403,
 	HNS_ROCE_OPC_CFG_TMOUT_LLM			= 0x8404,
 	HNS_ROCE_OPC_QUERY_PF_TIMER_RES			= 0x8406,
+	HNS_ROCE_OPC_QUERY_VF_NUM			= 0x8407,
 	HNS_ROCE_OPC_CFG_SGID_TB			= 0x8500,
 	HNS_ROCE_OPC_CFG_SMAC_TB			= 0x8501,
 	HNS_ROCE_OPC_POST_MB				= 0x8504,
 	HNS_ROCE_OPC_QUERY_MB_ST			= 0x8505,
 	HNS_ROCE_OPC_CFG_BT_ATTR			= 0x8506,
+	HNS_ROCE_OPC_FUNC_CLEAR				= 0x8508,
 	HNS_ROCE_OPC_CLR_SCCC				= 0x8509,
 	HNS_ROCE_OPC_QUERY_SCCC				= 0x850a,
 	HNS_ROCE_OPC_RESET_SCCC				= 0x850b,
@@ -1226,6 +1231,26 @@  struct hns_roce_query_fw_info {
 	__le32 rsv[5];
 };
 
+struct hns_roce_func_clear {
+	__le32 rst_funcid_en;
+	__le32 func_done;
+	__le32 rsv[4];
+};
+
+struct hns_roce_pf_func_num {
+	__le32 pf_own_func_num;
+	__le32 func_done;
+	__le32 rsv[4];
+};
+
+#define FUNC_CLEAR_RST_FUN_EN_S 8
+
+#define FUNC_CLEAR_RST_FUN_DONE_S 0
+
+#define HNS_ROCE_V2_FUNC_CLEAR_TIMEOUT_MSECS	(512 * 100)
+#define HNS_ROCE_V2_READ_FUNC_CLEAR_FLAG_INTERVAL	40
+#define HNS_ROCE_V2_READ_FUNC_CLEAR_FLAG_FAIL_WAIT	20
+
 struct hns_roce_cfg_llm_a {
 	__le32 base_addr_l;
 	__le32 base_addr_h;