diff mbox series

[v6,2/3] Documentation: add a isolation strategy sysfs node for uacce

Message ID 20220730083246.55646-3-yekai13@huawei.com (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series crypto: hisilicon - supports device isolation feature | expand

Commit Message

yekai (A) July 30, 2022, 8:32 a.m. UTC
Update documentation describing sysfs node that could help to
configure isolation strategy for users in the user space. And
describing sysfs node that could read the device isolated state.

Signed-off-by: Kai Ye <yekai13@huawei.com>
---
 Documentation/ABI/testing/sysfs-driver-uacce | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Greg KH July 30, 2022, 11:06 a.m. UTC | #1
On Sat, Jul 30, 2022 at 04:32:45PM +0800, Kai Ye wrote:
> Update documentation describing sysfs node that could help to
> configure isolation strategy for users in the user space. And
> describing sysfs node that could read the device isolated state.
> 
> Signed-off-by: Kai Ye <yekai13@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-driver-uacce | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
> index 08f2591138af..1601f9dac29c 100644
> --- a/Documentation/ABI/testing/sysfs-driver-uacce
> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
> @@ -19,6 +19,23 @@ Contact:        linux-accelerators@lists.ozlabs.org
>  Description:    Available instances left of the device
>                  Return -ENODEV if uacce_ops get_available_instances is not provided
>  
> +What:           /sys/class/uacce/<dev_name>/isolate_strategy
> +Date:           Jul 2022
> +KernelVersion:  5.20
> +Contact:        linux-accelerators@lists.ozlabs.org
> +Description:    (RW) Configure the frequency size for the hardware error
> +                isolation strategy. This size is a configured integer value.
> +                The default is 0. The maximum value is 65535. This value is a
> +                threshold based on your driver handling strategy.

what is a "driver handling strategy"?  What exactly is this units in?
Any documentation for how to use this?

thanks,

greg k-h
yekai (A) Aug. 1, 2022, 2:20 a.m. UTC | #2
On 2022/7/30 19:06, Greg KH wrote:
> On Sat, Jul 30, 2022 at 04:32:45PM +0800, Kai Ye wrote:
>> Update documentation describing sysfs node that could help to
>> configure isolation strategy for users in the user space. And
>> describing sysfs node that could read the device isolated state.
>>
>> Signed-off-by: Kai Ye <yekai13@huawei.com>
>> ---
>>  Documentation/ABI/testing/sysfs-driver-uacce | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
>> index 08f2591138af..1601f9dac29c 100644
>> --- a/Documentation/ABI/testing/sysfs-driver-uacce
>> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
>> @@ -19,6 +19,23 @@ Contact:        linux-accelerators@lists.ozlabs.org
>>  Description:    Available instances left of the device
>>                  Return -ENODEV if uacce_ops get_available_instances is not provided
>>  
>> +What:           /sys/class/uacce/<dev_name>/isolate_strategy
>> +Date:           Jul 2022
>> +KernelVersion:  5.20
>> +Contact:        linux-accelerators@lists.ozlabs.org
>> +Description:    (RW) Configure the frequency size for the hardware error
>> +                isolation strategy. This size is a configured integer value.
>> +                The default is 0. The maximum value is 65535. This value is a
>> +                threshold based on your driver handling strategy.
> what is a "driver handling strategy"?  What exactly is this units in?
> Any documentation for how to use this?
>
> thanks,
>
> greg k-h
> .
The unit is the number of times, also means frequency size.
e.g.
In the  hisilicon acc engine, First we will time-stamp every slot AER error. Then check the AER error log when the device
AER error occurred. if the device slot AER error count  exceeds the preset the number of times in one hour, the isolated state
will be set to true. So the device will be isolated.  And the AER error log that exceed one hour  will be cleared.  Of course,
different strategy can be defined in different drivers.

thanks
Kai
Greg KH Aug. 1, 2022, 6:13 a.m. UTC | #3
On Mon, Aug 01, 2022 at 10:20:27AM +0800, yekai (A) wrote:
> 
> 
> On 2022/7/30 19:06, Greg KH wrote:
> > On Sat, Jul 30, 2022 at 04:32:45PM +0800, Kai Ye wrote:
> >> Update documentation describing sysfs node that could help to
> >> configure isolation strategy for users in the user space. And
> >> describing sysfs node that could read the device isolated state.
> >>
> >> Signed-off-by: Kai Ye <yekai13@huawei.com>
> >> ---
> >>  Documentation/ABI/testing/sysfs-driver-uacce | 17 +++++++++++++++++
> >>  1 file changed, 17 insertions(+)
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
> >> index 08f2591138af..1601f9dac29c 100644
> >> --- a/Documentation/ABI/testing/sysfs-driver-uacce
> >> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
> >> @@ -19,6 +19,23 @@ Contact:        linux-accelerators@lists.ozlabs.org
> >>  Description:    Available instances left of the device
> >>                  Return -ENODEV if uacce_ops get_available_instances is not provided
> >>  
> >> +What:           /sys/class/uacce/<dev_name>/isolate_strategy
> >> +Date:           Jul 2022
> >> +KernelVersion:  5.20
> >> +Contact:        linux-accelerators@lists.ozlabs.org
> >> +Description:    (RW) Configure the frequency size for the hardware error
> >> +                isolation strategy. This size is a configured integer value.
> >> +                The default is 0. The maximum value is 65535. This value is a
> >> +                threshold based on your driver handling strategy.
> > what is a "driver handling strategy"?  What exactly is this units in?
> > Any documentation for how to use this?
> >
> > thanks,
> >
> > greg k-h
> > .
> The unit is the number of times, also means frequency size.
> e.g.
> In the  hisilicon acc engine, First we will time-stamp every slot AER error. Then check the AER error log when the device
> AER error occurred. if the device slot AER error count  exceeds the preset the number of times in one hour, the isolated state
> will be set to true. So the device will be isolated.  And the AER error log that exceed one hour  will be cleared.  Of course,
> different strategy can be defined in different drivers.

Ok, can you please explain this better here when you redo the patch
series?

thanks,

greg k-h
yekai (A) Aug. 1, 2022, 9:25 a.m. UTC | #4
On 2022/8/1 14:13, Greg KH wrote:
> On Mon, Aug 01, 2022 at 10:20:27AM +0800, yekai (A) wrote:
>>
>> On 2022/7/30 19:06, Greg KH wrote:
>>> On Sat, Jul 30, 2022 at 04:32:45PM +0800, Kai Ye wrote:
>>>> Update documentation describing sysfs node that could help to
>>>> configure isolation strategy for users in the user space. And
>>>> describing sysfs node that could read the device isolated state.
>>>>
>>>> Signed-off-by: Kai Ye <yekai13@huawei.com>
>>>> ---
>>>>  Documentation/ABI/testing/sysfs-driver-uacce | 17 +++++++++++++++++
>>>>  1 file changed, 17 insertions(+)
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
>>>> index 08f2591138af..1601f9dac29c 100644
>>>> --- a/Documentation/ABI/testing/sysfs-driver-uacce
>>>> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
>>>> @@ -19,6 +19,23 @@ Contact:        linux-accelerators@lists.ozlabs.org
>>>>  Description:    Available instances left of the device
>>>>                  Return -ENODEV if uacce_ops get_available_instances is not provided
>>>>  
>>>> +What:           /sys/class/uacce/<dev_name>/isolate_strategy
>>>> +Date:           Jul 2022
>>>> +KernelVersion:  5.20
>>>> +Contact:        linux-accelerators@lists.ozlabs.org
>>>> +Description:    (RW) Configure the frequency size for the hardware error
>>>> +                isolation strategy. This size is a configured integer value.
>>>> +                The default is 0. The maximum value is 65535. This value is a
>>>> +                threshold based on your driver handling strategy.
>>> what is a "driver handling strategy"?  What exactly is this units in?
>>> Any documentation for how to use this?
>>>
>>> thanks,
>>>
>>> greg k-h
>>> .
>> The unit is the number of times, also means frequency size.
>> e.g.
>> In the  hisilicon acc engine, First we will time-stamp every slot AER error. Then check the AER error log when the device
>> AER error occurred. if the device slot AER error count  exceeds the preset the number of times in one hour, the isolated state
>> will be set to true. So the device will be isolated.  And the AER error log that exceed one hour  will be cleared.  Of course,
>> different strategy can be defined in different drivers.
> Ok, can you please explain this better here when you redo the patch
> series?
>
> thanks,
>
> greg k-h
> .
>
OK, I will do this in the next v7 patch series.

thanks

Kai
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
index 08f2591138af..1601f9dac29c 100644
--- a/Documentation/ABI/testing/sysfs-driver-uacce
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -19,6 +19,23 @@  Contact:        linux-accelerators@lists.ozlabs.org
 Description:    Available instances left of the device
                 Return -ENODEV if uacce_ops get_available_instances is not provided
 
+What:           /sys/class/uacce/<dev_name>/isolate_strategy
+Date:           Jul 2022
+KernelVersion:  5.20
+Contact:        linux-accelerators@lists.ozlabs.org
+Description:    (RW) Configure the frequency size for the hardware error
+                isolation strategy. This size is a configured integer value.
+                The default is 0. The maximum value is 65535. This value is a
+                threshold based on your driver handling strategy.
+
+What:           /sys/class/uacce/<dev_name>/isolate
+Date:           Jul 2022
+KernelVersion:  5.20
+Contact:        linux-accelerators@lists.ozlabs.org
+Description:    (R) A sysfs node that read the device isolated state. The value 1
+                means the device is unavailable. The 0 means the device is
+                available.
+
 What:           /sys/class/uacce/<dev_name>/algorithms
 Date:           Feb 2020
 KernelVersion:  5.7