diff mbox

ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal.

Message ID 20170316143039.375-1-james.morse@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

James Morse March 16, 2017, 2:30 p.m. UTC
When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.

Signed-off-by: James Morse <james.morse@arm.com>
Cc: Huang Ying <ying.huang@intel.com>

---
It looks like 81e88fdc432a lifted this into ACPI_HEST_NOTIFY_NMI, missing
that ACPI_HEST_NOTIFY_SCI needed it too.

If there is only ever one SCI GHES entry this is safe today as
unregister_acpi_hed_notifier() takes a write lock on its semaphore, meaning
any RCU readers will have finished.
If there can be more than one SCI GHES entry...

Fixes: 81e88fdc432a ("ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support")

 drivers/acpi/apei/ghes.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Huang Ying March 17, 2017, 1:23 a.m. UTC | #1
Hi, James,

James Morse <james.morse@arm.com> writes:

> When removing a GHES device notified by SCI, list_del_rcu() is used,
> ghes_remove() should call synchronize_rcu() before it goes on to call
> kfree(ghes), otherwise concurrent RCU readers may still hold this list
> entry after it has been freed.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Cc: Huang Ying <ying.huang@intel.com>
>
> ---
> It looks like 81e88fdc432a lifted this into ACPI_HEST_NOTIFY_NMI, missing
> that ACPI_HEST_NOTIFY_SCI needed it too.
>
> If there is only ever one SCI GHES entry this is safe today as
> unregister_acpi_hed_notifier() takes a write lock on its semaphore, meaning
> any RCU readers will have finished.
> If there can be more than one SCI GHES entry...
>
> Fixes: 81e88fdc432a ("ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support")
>
>  drivers/acpi/apei/ghes.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b192b42a8351..79b3c9c5a3bc 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1073,6 +1073,7 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  		if (list_empty(&ghes_sci))
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);

In remove path

unregister_acpi_hed_notifier()
  blocking_notifier_chain_unregister()
    down_write(&nh->rwsem)

While in notifier call path

acpi_hed_notify()
  blocking_notifier_call_chain()
    __blocking_notifier_call_chain()
      down_read(&nh->rwsem)

So when unregister succeeds, the notifier call should have
finished.

Best Regards,
Huang, Ying

>  		mutex_unlock(&ghes_list_mutex);
> +		synchronize_rcu();
>  		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Morse March 17, 2017, 10:54 a.m. UTC | #2
Hi Huang, Ying

On 17/03/17 01:23, Huang, Ying wrote:
> James Morse <james.morse@arm.com> writes:
> 
>> When removing a GHES device notified by SCI, list_del_rcu() is used,
>> ghes_remove() should call synchronize_rcu() before it goes on to call
>> kfree(ghes), otherwise concurrent RCU readers may still hold this list
>> entry after it has been freed.

>> ---
>> It looks like 81e88fdc432a lifted this into ACPI_HEST_NOTIFY_NMI, missing
>> that ACPI_HEST_NOTIFY_SCI needed it too.
>>
>> If there is only ever one SCI GHES entry this is safe today as
>> unregister_acpi_hed_notifier() takes a write lock on its semaphore, meaning
>> any RCU readers will have finished.


> In remove path
> 
> unregister_acpi_hed_notifier()
>   blocking_notifier_chain_unregister()
>     down_write(&nh->rwsem)
> 
> While in notifier call path
> 
> acpi_hed_notify()
>   blocking_notifier_call_chain()
>     __blocking_notifier_call_chain()
>       down_read(&nh->rwsem)
> 
> So when unregister succeeds, the notifier call should have
> finished.

You are only protected like this if the unregister call is made for every
list_del_rcu(), which would only be the case if there is only ever one SCI GHES
entry.

Is this how NOTIFY_SCI is expected to be used?

If so I agree its safe, (but confusing!) today, and we need to take account of
this behaviour in Shiju Jose's patch.


If there can be multiple SCI entries, you only get this unregister protection
for the last one to be freed, as the list_empty() check skips the unregister for
all but the last entry.

>	case ACPI_HEST_NOTIFY_SCI:
>		mutex_lock(&ghes_list_mutex);
>		list_del_rcu(&ghes->list);
>		if (list_empty(&ghes_sci))
>			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>		mutex_unlock(&ghes_list_mutex);
>		break;



Thanks,

James





--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang Ying March 20, 2017, 6:10 a.m. UTC | #3
James Morse <james.morse@arm.com> writes:

> Hi Huang, Ying
>
> On 17/03/17 01:23, Huang, Ying wrote:
>> James Morse <james.morse@arm.com> writes:
>> 
>>> When removing a GHES device notified by SCI, list_del_rcu() is used,
>>> ghes_remove() should call synchronize_rcu() before it goes on to call
>>> kfree(ghes), otherwise concurrent RCU readers may still hold this list
>>> entry after it has been freed.
>
>>> ---
>>> It looks like 81e88fdc432a lifted this into ACPI_HEST_NOTIFY_NMI, missing
>>> that ACPI_HEST_NOTIFY_SCI needed it too.
>>>
>>> If there is only ever one SCI GHES entry this is safe today as
>>> unregister_acpi_hed_notifier() takes a write lock on its semaphore, meaning
>>> any RCU readers will have finished.
>
>
>> In remove path
>> 
>> unregister_acpi_hed_notifier()
>>   blocking_notifier_chain_unregister()
>>     down_write(&nh->rwsem)
>> 
>> While in notifier call path
>> 
>> acpi_hed_notify()
>>   blocking_notifier_call_chain()
>>     __blocking_notifier_call_chain()
>>       down_read(&nh->rwsem)
>> 
>> So when unregister succeeds, the notifier call should have
>> finished.
>
> You are only protected like this if the unregister call is made for every
> list_del_rcu(), which would only be the case if there is only ever one SCI GHES
> entry.
>
> Is this how NOTIFY_SCI is expected to be used?
>
> If so I agree its safe, (but confusing!) today, and we need to take account of
> this behaviour in Shiju Jose's patch.
>
>
> If there can be multiple SCI entries, you only get this unregister protection
> for the last one to be freed, as the list_empty() check skips the unregister for
> all but the last entry.

Yes.  You are right.  Feel free to add

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

In your patch.

Best Regards,
Huang, Ying

>>	case ACPI_HEST_NOTIFY_SCI:
>>		mutex_lock(&ghes_list_mutex);
>>		list_del_rcu(&ghes->list);
>>		if (list_empty(&ghes_sci))
>>			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>		mutex_unlock(&ghes_list_mutex);
>>		break;
>
>
>
> Thanks,
>
> James
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b192b42a8351..79b3c9c5a3bc 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1073,6 +1073,7 @@  static int ghes_remove(struct platform_device *ghes_dev)
 		if (list_empty(&ghes_sci))
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
+		synchronize_rcu();
 		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);