diff mbox series

net: qrtr: Unprepare MHI channels during remove

Message ID 1605723625-11206-1-git-send-email-bbhatt@codeaurora.org (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series net: qrtr: Unprepare MHI channels during remove | expand

Commit Message

Bhaumik Bhatt Nov. 18, 2020, 6:20 p.m. UTC
Reset MHI device channels when driver remove is called due to
module unload or any crash scenario. This will make sure that
MHI channels no longer remain enabled for transfers since the
MHI stack does not take care of this anymore after the auto-start
channels feature was removed.

Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
---
 net/qrtr/mhi.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jeffrey Hugo Nov. 18, 2020, 6:34 p.m. UTC | #1
On 11/18/2020 11:20 AM, Bhaumik Bhatt wrote:
> Reset MHI device channels when driver remove is called due to
> module unload or any crash scenario. This will make sure that
> MHI channels no longer remain enabled for transfers since the
> MHI stack does not take care of this anymore after the auto-start
> channels feature was removed.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> ---
>   net/qrtr/mhi.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
> index 7100f0b..2bf2b19 100644
> --- a/net/qrtr/mhi.c
> +++ b/net/qrtr/mhi.c
> @@ -104,6 +104,7 @@ static void qcom_mhi_qrtr_remove(struct mhi_device *mhi_dev)
>   	struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev);
>   
>   	qrtr_endpoint_unregister(&qdev->ep);
> +	mhi_unprepare_from_transfer(mhi_dev);
>   	dev_set_drvdata(&mhi_dev->dev, NULL);
>   }
>   
> 

I admit, I didn't pay much attention to the auto-start being removed, 
but this seems odd to me.

As a client, the MHI device is being removed, likely because of some 
factor outside of my control, but I still need to clean it up?  This 
really feels like something MHI should be handling.
Bhaumik Bhatt Nov. 18, 2020, 7:13 p.m. UTC | #2
Hi Jeff,
On 2020-11-18 10:34 AM, Jeffrey Hugo wrote:
> On 11/18/2020 11:20 AM, Bhaumik Bhatt wrote:
>> Reset MHI device channels when driver remove is called due to
>> module unload or any crash scenario. This will make sure that
>> MHI channels no longer remain enabled for transfers since the
>> MHI stack does not take care of this anymore after the auto-start
>> channels feature was removed.
>> 
>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
>> ---
>>   net/qrtr/mhi.c | 1 +
>>   1 file changed, 1 insertion(+)
>> 
>> diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
>> index 7100f0b..2bf2b19 100644
>> --- a/net/qrtr/mhi.c
>> +++ b/net/qrtr/mhi.c
>> @@ -104,6 +104,7 @@ static void qcom_mhi_qrtr_remove(struct mhi_device 
>> *mhi_dev)
>>   	struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev);
>>     	qrtr_endpoint_unregister(&qdev->ep);
>> +	mhi_unprepare_from_transfer(mhi_dev);
>>   	dev_set_drvdata(&mhi_dev->dev, NULL);
>>   }
>> 
> 
> I admit, I didn't pay much attention to the auto-start being removed,
> but this seems odd to me.
It allows fair and common treatment for all client drivers.
> 
> As a client, the MHI device is being removed, likely because of some
> factor outside of my control, but I still need to clean it up?  This
> really feels like something MHI should be handling.
It isn't really outside of a client's control every time. If a client 
driver
module is unloaded for example, it should be in their responsibility to 
clean
up and send commands to close those channels which allows the device to 
clean
up the context.

In the event of a kernel panic or some device crash outside of a 
client's
control, this function will just free some memory and return right away 
as MHI
knows not to pursue it over the link anyway.

Some client drivers depend on USB or other drivers, which allows 
flexibility on
their end as to when to call these MHI prepare/unprepare for transfer 
APIs.

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project
Jeffrey Hugo Nov. 18, 2020, 7:34 p.m. UTC | #3
On 11/18/2020 12:14 PM, Loic Poulain wrote:
> 
> 
> Le mer. 18 nov. 2020 à 19:34, Jeffrey Hugo <jhugo@codeaurora.org 
> <mailto:jhugo@codeaurora.org>> a écrit :
> 
>     On 11/18/2020 11:20 AM, Bhaumik Bhatt wrote:
>      > Reset MHI device channels when driver remove is called due to
>      > module unload or any crash scenario. This will make sure that
>      > MHI channels no longer remain enabled for transfers since the
>      > MHI stack does not take care of this anymore after the auto-start
>      > channels feature was removed.
>      >
>      > Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org
>     <mailto:bbhatt@codeaurora.org>>
>      > ---
>      >   net/qrtr/mhi.c | 1 +
>      >   1 file changed, 1 insertion(+)
>      >
>      > diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
>      > index 7100f0b..2bf2b19 100644
>      > --- a/net/qrtr/mhi.c
>      > +++ b/net/qrtr/mhi.c
>      > @@ -104,6 +104,7 @@ static void qcom_mhi_qrtr_remove(struct
>     mhi_device *mhi_dev)
>      >       struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev);
>      >
>      >       qrtr_endpoint_unregister(&qdev->ep);
>      > +     mhi_unprepare_from_transfer(mhi_dev);
>      >       dev_set_drvdata(&mhi_dev->dev, NULL);
>      >   }
>      >
>      >
> 
>     I admit, I didn't pay much attention to the auto-start being removed,
>     but this seems odd to me.
> 
>     As a client, the MHI device is being removed, likely because of some
>     factor outside of my control, but I still need to clean it up?  This
>     really feels like something MHI should be handling.
> 
> 
> I think this is just about balancing operations, what is done in probe 
> should be undone in remove, so here channels are started in probe and 
> stopped/reset in remove.

I understand that perspective, but that doesn't quite match what is 
going on here.  Regardless of if the channel was started (prepared) in 
probe, it now needs to be stopped in remove.  That not balanced in all cases

Lets assume, in response to probe(), my client driver goes and creates 
some other object, maybe a socket.  In response to that socket being 
opened/activated by the client of my driver, I go and start the mhi 
channel.  Now, normally, when the socket is closed/deactivated, I stop 
the MHI channel.  In this case, stopping the MHI channel in remove() is 
unbalanced with respect to probe(), but is now a requirement.

Now you may argue, I should close the object in response to remove, 
which will then trigger the stop on the channel.  That doesn't apply to 
everything.  For example, you cannot close an open file in the kernel. 
You need to wait for userspace to close it.  By the time that happens, 
the mhi_dev is long gone I expect.

So if, somehow, the client driver is the one causing the remove to 
occur, then yes it should probably be the one doing the stop, but that's 
a narrow set of conditions, and I think having that requirement for all 
scenarios is limiting.
Bhaumik Bhatt Nov. 19, 2020, 7:02 p.m. UTC | #4
On 2020-11-18 11:34 AM, Jeffrey Hugo wrote:
> On 11/18/2020 12:14 PM, Loic Poulain wrote:
>> 
>> 
>> Le mer. 18 nov. 2020 à 19:34, Jeffrey Hugo <jhugo@codeaurora.org 
>> <mailto:jhugo@codeaurora.org>> a écrit :
>> 
>>     On 11/18/2020 11:20 AM, Bhaumik Bhatt wrote:
>>      > Reset MHI device channels when driver remove is called due to
>>      > module unload or any crash scenario. This will make sure that
>>      > MHI channels no longer remain enabled for transfers since the
>>      > MHI stack does not take care of this anymore after the 
>> auto-start
>>      > channels feature was removed.
>>      >
>>      > Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org
>>     <mailto:bbhatt@codeaurora.org>>
>>      > ---
>>      >   net/qrtr/mhi.c | 1 +
>>      >   1 file changed, 1 insertion(+)
>>      >
>>      > diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
>>      > index 7100f0b..2bf2b19 100644
>>      > --- a/net/qrtr/mhi.c
>>      > +++ b/net/qrtr/mhi.c
>>      > @@ -104,6 +104,7 @@ static void qcom_mhi_qrtr_remove(struct
>>     mhi_device *mhi_dev)
>>      >       struct qrtr_mhi_dev *qdev = 
>> dev_get_drvdata(&mhi_dev->dev);
>>      >
>>      >       qrtr_endpoint_unregister(&qdev->ep);
>>      > +     mhi_unprepare_from_transfer(mhi_dev);
>>      >       dev_set_drvdata(&mhi_dev->dev, NULL);
>>      >   }
>>      >
>>      >
>> 
>>     I admit, I didn't pay much attention to the auto-start being 
>> removed,
>>     but this seems odd to me.
>> 
>>     As a client, the MHI device is being removed, likely because of 
>> some
>>     factor outside of my control, but I still need to clean it up?  
>> This
>>     really feels like something MHI should be handling.
>> 
>> 
>> I think this is just about balancing operations, what is done in probe 
>> should be undone in remove, so here channels are started in probe and 
>> stopped/reset in remove.
> 
> I understand that perspective, but that doesn't quite match what is
> going on here.  Regardless of if the channel was started (prepared) in
> probe, it now needs to be stopped in remove.  That not balanced in all
> cases
> 
> Lets assume, in response to probe(), my client driver goes and creates
> some other object, maybe a socket.  In response to that socket being
> opened/activated by the client of my driver, I go and start the mhi
> channel.  Now, normally, when the socket is closed/deactivated, I stop
> the MHI channel.  In this case, stopping the MHI channel in remove()
> is unbalanced with respect to probe(), but is now a requirement.
> 
> Now you may argue, I should close the object in response to remove,
> which will then trigger the stop on the channel.  That doesn't apply
> to everything.  For example, you cannot close an open file in the
> kernel. You need to wait for userspace to close it.  By the time that
> happens, the mhi_dev is long gone I expect.
> 
> So if, somehow, the client driver is the one causing the remove to
> occur, then yes it should probably be the one doing the stop, but
> that's a narrow set of conditions, and I think having that requirement
> for all scenarios is limiting.
It should be the client's responsibility to perform a clean-up though.

We cannot assume that the remove() call was due to factors outside of 
the
client's control at all times. You may not know if the remove() was due 
to
device actually crashing or just an unbind/module unload. So, it would 
be
better if you call it as the device should ideally not be left with a 
stale
channel context.

We had an issue where a client was issuing a driver unbind without 
unpreparing
the MHI channels and without Loic's patch [1], we would not issue a 
channel
RESET to the device resulting in incoming data to the host on those 
channels
after host clean-up and an unmapped memory access and kernel panic.

If MHI dev will be gone that NULL/status check must be present in 
something that
userspace could potentially use.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/bus/mhi?h=next-20201119&id=a7f422f2f89e7d48aa66e6488444a4c7f01269d5

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project
Jakub Kicinski Nov. 20, 2020, 5:10 a.m. UTC | #5
On Wed, 18 Nov 2020 10:20:25 -0800 Bhaumik Bhatt wrote:
> Reset MHI device channels when driver remove is called due to
> module unload or any crash scenario. This will make sure that
> MHI channels no longer remain enabled for transfers since the
> MHI stack does not take care of this anymore after the auto-start
> channels feature was removed.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

Patch seems reasonable, Mani are you taking it or should I?

Bhaumik would you mind adding a Fixes tag to be clear where 
the issue was introduced?
Manivannan Sadhasivam Nov. 20, 2020, 6:15 a.m. UTC | #6
On Thu, Nov 19, 2020 at 09:10:46PM -0800, Jakub Kicinski wrote:
> On Wed, 18 Nov 2020 10:20:25 -0800 Bhaumik Bhatt wrote:
> > Reset MHI device channels when driver remove is called due to
> > module unload or any crash scenario. This will make sure that
> > MHI channels no longer remain enabled for transfers since the
> > MHI stack does not take care of this anymore after the auto-start
> > channels feature was removed.
> > 
> > Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> 
> Patch seems reasonable, Mani are you taking it or should I?
> 

Since I already picked up one qrtr patch, makese sense to pick this
also.

> Bhaumik would you mind adding a Fixes tag to be clear where 
> the issue was introduced?

This is due to the MHI auto-start change which just got queued into
mhi-next. I don't think we need a Fixes tag.

Jakub, can you please provide your ack so that I can take it?

Thanks,
Mani
Jakub Kicinski Nov. 20, 2020, 6:18 a.m. UTC | #7
On Fri, 20 Nov 2020 11:45:12 +0530 Manivannan Sadhasivam wrote:
> Jakub, can you please provide your ack so that I can take it?

Sure:

Acked-by: Jakub Kicinski <kuba@kernel.org>
Manivannan Sadhasivam Nov. 20, 2020, 6:23 a.m. UTC | #8
On Thu, Nov 19, 2020 at 10:18:28PM -0800, Jakub Kicinski wrote:
> On Fri, 20 Nov 2020 11:45:12 +0530 Manivannan Sadhasivam wrote:
> > Jakub, can you please provide your ack so that I can take it?
> 
> Sure:
> 
> Acked-by: Jakub Kicinski <kuba@kernel.org>

Patch applied to mhi-ath11k-immutable.

Thanks,
Mani
Jeffrey Hugo Nov. 25, 2020, 6:01 p.m. UTC | #9
On 11/19/2020 12:02 PM, Bhaumik Bhatt wrote:
> On 2020-11-18 11:34 AM, Jeffrey Hugo wrote:
>> On 11/18/2020 12:14 PM, Loic Poulain wrote:
>>>
>>>
>>> Le mer. 18 nov. 2020 à 19:34, Jeffrey Hugo <jhugo@codeaurora.org 
>>> <mailto:jhugo@codeaurora.org>> a écrit :
>>>
>>>     On 11/18/2020 11:20 AM, Bhaumik Bhatt wrote:
>>>      > Reset MHI device channels when driver remove is called due to
>>>      > module unload or any crash scenario. This will make sure that
>>>      > MHI channels no longer remain enabled for transfers since the
>>>      > MHI stack does not take care of this anymore after the auto-start
>>>      > channels feature was removed.
>>>      >
>>>      > Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org
>>>     <mailto:bbhatt@codeaurora.org>>
>>>      > ---
>>>      >   net/qrtr/mhi.c | 1 +
>>>      >   1 file changed, 1 insertion(+)
>>>      >
>>>      > diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
>>>      > index 7100f0b..2bf2b19 100644
>>>      > --- a/net/qrtr/mhi.c
>>>      > +++ b/net/qrtr/mhi.c
>>>      > @@ -104,6 +104,7 @@ static void qcom_mhi_qrtr_remove(struct
>>>     mhi_device *mhi_dev)
>>>      >       struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev);
>>>      >
>>>      >       qrtr_endpoint_unregister(&qdev->ep);
>>>      > +     mhi_unprepare_from_transfer(mhi_dev);
>>>      >       dev_set_drvdata(&mhi_dev->dev, NULL);
>>>      >   }
>>>      >
>>>      >
>>>
>>>     I admit, I didn't pay much attention to the auto-start being 
>>> removed,
>>>     but this seems odd to me.
>>>
>>>     As a client, the MHI device is being removed, likely because of some
>>>     factor outside of my control, but I still need to clean it up? This
>>>     really feels like something MHI should be handling.
>>>
>>>
>>> I think this is just about balancing operations, what is done in 
>>> probe should be undone in remove, so here channels are started in 
>>> probe and stopped/reset in remove.
>>
>> I understand that perspective, but that doesn't quite match what is
>> going on here.  Regardless of if the channel was started (prepared) in
>> probe, it now needs to be stopped in remove.  That not balanced in all
>> cases
>>
>> Lets assume, in response to probe(), my client driver goes and creates
>> some other object, maybe a socket.  In response to that socket being
>> opened/activated by the client of my driver, I go and start the mhi
>> channel.  Now, normally, when the socket is closed/deactivated, I stop
>> the MHI channel.  In this case, stopping the MHI channel in remove()
>> is unbalanced with respect to probe(), but is now a requirement.
>>
>> Now you may argue, I should close the object in response to remove,
>> which will then trigger the stop on the channel.  That doesn't apply
>> to everything.  For example, you cannot close an open file in the
>> kernel. You need to wait for userspace to close it.  By the time that
>> happens, the mhi_dev is long gone I expect.
>>
>> So if, somehow, the client driver is the one causing the remove to
>> occur, then yes it should probably be the one doing the stop, but
>> that's a narrow set of conditions, and I think having that requirement
>> for all scenarios is limiting.
> It should be the client's responsibility to perform a clean-up though.
> 
> We cannot assume that the remove() call was due to factors outside of the
> client's control at all times. You may not know if the remove() was due to
> device actually crashing or just an unbind/module unload. So, it would be
> better if you call it as the device should ideally not be left with a stale
> channel context. >
> We had an issue where a client was issuing a driver unbind without 
> unpreparing
> the MHI channels and without Loic's patch [1], we would not issue a channel
> RESET to the device resulting in incoming data to the host on those 
> channels
> after host clean-up and an unmapped memory access and kernel panic.

So the client drivers have to do the right thing, otherwise the kernel 
could crash?  Sounds like you are choosing to not do defensive coding in 
MHI and making your problems the client's problems.

Before releasing the resources, why haven't you issued a MHI_RESET of 
the state machine, and ensured the device has ack'd the reset?

> If MHI dev will be gone that NULL/status check must be present in 
> something that
> userspace could potentially use.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/bus/mhi?h=next-20201119&id=a7f422f2f89e7d48aa66e6488444a4c7f01269d5 
> 
> 
> Thanks,
> Bhaumik
> ---
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
diff mbox series

Patch

diff --git a/net/qrtr/mhi.c b/net/qrtr/mhi.c
index 7100f0b..2bf2b19 100644
--- a/net/qrtr/mhi.c
+++ b/net/qrtr/mhi.c
@@ -104,6 +104,7 @@  static void qcom_mhi_qrtr_remove(struct mhi_device *mhi_dev)
 	struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev);
 
 	qrtr_endpoint_unregister(&qdev->ep);
+	mhi_unprepare_from_transfer(mhi_dev);
 	dev_set_drvdata(&mhi_dev->dev, NULL);
 }