diff mbox series

drm/buddy: Fix alloc_range() error handling code

Message ID 20240207174456.341121-1-Arunpravin.PaneerSelvam@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/buddy: Fix alloc_range() error handling code | expand

Commit Message

Paneer Selvam, Arunpravin Feb. 7, 2024, 5:44 p.m. UTC
Few users have observed display corruption when they boot
the machine to KDE Plasma or playing games. We have root
caused the problem that whenever alloc_range() couldn't
find the required memory blocks the function was returning
SUCCESS in some of the corner cases.

The right approach would be if the total allocated size
is less than the required size, the function should
return -ENOSPC.

Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/gpu/drm/drm_buddy.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Christian König Feb. 8, 2024, 6:57 a.m. UTC | #1
Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam:
> Few users have observed display corruption when they boot
> the machine to KDE Plasma or playing games. We have root
> caused the problem that whenever alloc_range() couldn't
> find the required memory blocks the function was returning
> SUCCESS in some of the corner cases.
>
> The right approach would be if the total allocated size
> is less than the required size, the function should
> return -ENOSPC.
>
> Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097
> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

CC: stable.. ?

> ---
>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index f57e6d74fb0e..c1a99bf4dffd 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>   	} while (1);
>   
>   	list_splice_tail(&allocated, blocks);
> +
> +	if (total_allocated < size) {
> +		err = -ENOSPC;
> +		goto err_free;
> +	}
> +
>   	return 0;
>   
>   err_undo:
Paneer Selvam, Arunpravin Feb. 8, 2024, 12:06 p.m. UTC | #2
Hi Christian,

On 2/8/2024 12:27 PM, Christian König wrote:
> Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam:
>> Few users have observed display corruption when they boot
>> the machine to KDE Plasma or playing games. We have root
>> caused the problem that whenever alloc_range() couldn't
>> find the required memory blocks the function was returning
>> SUCCESS in some of the corner cases.
>>
>> The right approach would be if the total allocated size
>> is less than the required size, the function should
>> return -ENOSPC.
>>
>> Gitlab ticket link - 
>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097
>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
>> Signed-off-by: Arunpravin Paneer Selvam 
>> <Arunpravin.PaneerSelvam@amd.com>
>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>
> Acked-by: Christian König <christian.koenig@amd.com>
>
> CC: stable.. ?
I will check and add the stable kernel version.

Thanks,
Arun.
>
>> ---
>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index f57e6d74fb0e..c1a99bf4dffd 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>       } while (1);
>>         list_splice_tail(&allocated, blocks);
>> +
>> +    if (total_allocated < size) {
>> +        err = -ENOSPC;
>> +        goto err_free;
>> +    }
>> +
>>       return 0;
>>     err_undo:
>
Matthew Auld Feb. 8, 2024, 1:30 p.m. UTC | #3
On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote:
> Few users have observed display corruption when they boot
> the machine to KDE Plasma or playing games. We have root
> caused the problem that whenever alloc_range() couldn't
> find the required memory blocks the function was returning
> SUCCESS in some of the corner cases.

Can you please give an example here?

> 
> The right approach would be if the total allocated size
> is less than the required size, the function should
> return -ENOSPC.
> 
> Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097
> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index f57e6d74fb0e..c1a99bf4dffd 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>   	} while (1);
>   
>   	list_splice_tail(&allocated, blocks);
> +
> +	if (total_allocated < size) {
> +		err = -ENOSPC;
> +		goto err_free;
> +	}
> +
>   	return 0;
>   
>   err_undo:
Paneer Selvam, Arunpravin Feb. 8, 2024, 1:47 p.m. UTC | #4
Hi Matthew,

On 2/8/2024 7:00 PM, Matthew Auld wrote:
> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote:
>> Few users have observed display corruption when they boot
>> the machine to KDE Plasma or playing games. We have root
>> caused the problem that whenever alloc_range() couldn't
>> find the required memory blocks the function was returning
>> SUCCESS in some of the corner cases.
>
> Can you please give an example here?
>
In the try hard contiguous allocation, for example the requested memory 
is 1024 pages,
it might go and pick the highest and last block (of size 512 pages) in 
the freelist where
there are no more space exist in the total address range. In this kind 
of corner case,
alloc_range was returning success though the allocated size is less than 
the requested size.
Hence in try_hard_contiguous_allocation, we will not proceed to the LHS 
allocation and
we return only with the RHS allocation having only the 512 pages of 
allocation. This
leads to display corruption in many use cases (I think mainly when 
requested for contiguous huge buffer)
mainly on APU platforms.

Thanks,
Arun.
>>
>> The right approach would be if the total allocated size
>> is less than the required size, the function should
>> return -ENOSPC.
>>
>> Gitlab ticket link - 
>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097
>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
>> Signed-off-by: Arunpravin Paneer Selvam 
>> <Arunpravin.PaneerSelvam@amd.com>
>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index f57e6d74fb0e..c1a99bf4dffd 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>       } while (1);
>>         list_splice_tail(&allocated, blocks);
>> +
>> +    if (total_allocated < size) {
>> +        err = -ENOSPC;
>> +        goto err_free;
>> +    }
>> +
>>       return 0;
>>     err_undo:
Mario Limonciello Feb. 8, 2024, 1:59 p.m. UTC | #5
On 2/8/2024 06:06, Arunpravin Paneer Selvam wrote:
> Hi Christian,
> 
> On 2/8/2024 12:27 PM, Christian König wrote:
>> Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam:
>>> Few users have observed display corruption when they boot
>>> the machine to KDE Plasma or playing games. We have root
>>> caused the problem that whenever alloc_range() couldn't
>>> find the required memory blocks the function was returning
>>> SUCCESS in some of the corner cases.
>>>
>>> The right approach would be if the total allocated size
>>> is less than the required size, the function should
>>> return -ENOSPC.
>>>
>>> Gitlab ticket link - 
>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097

Syntax should be "Closes: $URL"

>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
>>> Signed-off-by: Arunpravin Paneer Selvam 
>>> <Arunpravin.PaneerSelvam@amd.com>
>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>>
>> Acked-by: Christian König <christian.koenig@amd.com>
>>
>> CC: stable.. ?
> I will check and add the stable kernel version.

Should be 6.7.

> 
> Thanks,
> Arun.
>>
>>> ---
>>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>>> index f57e6d74fb0e..c1a99bf4dffd 100644
>>> --- a/drivers/gpu/drm/drm_buddy.c
>>> +++ b/drivers/gpu/drm/drm_buddy.c
>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>>       } while (1);
>>>         list_splice_tail(&allocated, blocks);
>>> +
>>> +    if (total_allocated < size) {
>>> +        err = -ENOSPC;
>>> +        goto err_free;
>>> +    }
>>> +
>>>       return 0;
>>>     err_undo:
>>
>
Matthew Auld Feb. 8, 2024, 2:17 p.m. UTC | #6
On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote:
> Hi Matthew,
> 
> On 2/8/2024 7:00 PM, Matthew Auld wrote:
>> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote:
>>> Few users have observed display corruption when they boot
>>> the machine to KDE Plasma or playing games. We have root
>>> caused the problem that whenever alloc_range() couldn't
>>> find the required memory blocks the function was returning
>>> SUCCESS in some of the corner cases.
>>
>> Can you please give an example here?
>>
> In the try hard contiguous allocation, for example the requested memory 
> is 1024 pages,
> it might go and pick the highest and last block (of size 512 pages) in 
> the freelist where
> there are no more space exist in the total address range. In this kind 
> of corner case,
> alloc_range was returning success though the allocated size is less than 
> the requested size.
> Hence in try_hard_contiguous_allocation, we will not proceed to the LHS 
> allocation and
> we return only with the RHS allocation having only the 512 pages of 
> allocation. This
> leads to display corruption in many use cases (I think mainly when 
> requested for contiguous huge buffer)
> mainly on APU platforms.

Ok, I guess other thing is doing:

lhs_offset = drm_buddy_block_offset(block) - lhs_size;

I presume it's possible for block_offset < lhs_size here, which might be 
funny?

> 
> Thanks,
> Arun.
>>>
>>> The right approach would be if the total allocated size
>>> is less than the required size, the function should
>>> return -ENOSPC.
>>>
>>> Gitlab ticket link - 
>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097
>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
>>> Signed-off-by: Arunpravin Paneer Selvam 
>>> <Arunpravin.PaneerSelvam@amd.com>
>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>>> index f57e6d74fb0e..c1a99bf4dffd 100644
>>> --- a/drivers/gpu/drm/drm_buddy.c
>>> +++ b/drivers/gpu/drm/drm_buddy.c
>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>>       } while (1);
>>>         list_splice_tail(&allocated, blocks);
>>> +
>>> +    if (total_allocated < size) {
>>> +        err = -ENOSPC;
>>> +        goto err_free;
>>> +    }
>>> +
>>>       return 0;
>>>     err_undo:
>
Matthew Auld Feb. 8, 2024, 2:44 p.m. UTC | #7
On 08/02/2024 14:17, Matthew Auld wrote:
> On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote:
>> Hi Matthew,
>>
>> On 2/8/2024 7:00 PM, Matthew Auld wrote:
>>> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote:
>>>> Few users have observed display corruption when they boot
>>>> the machine to KDE Plasma or playing games. We have root
>>>> caused the problem that whenever alloc_range() couldn't
>>>> find the required memory blocks the function was returning
>>>> SUCCESS in some of the corner cases.
>>>
>>> Can you please give an example here?
>>>
>> In the try hard contiguous allocation, for example the requested 
>> memory is 1024 pages,
>> it might go and pick the highest and last block (of size 512 pages) in 
>> the freelist where
>> there are no more space exist in the total address range. In this kind 
>> of corner case,
>> alloc_range was returning success though the allocated size is less 
>> than the requested size.
>> Hence in try_hard_contiguous_allocation, we will not proceed to the 
>> LHS allocation and
>> we return only with the RHS allocation having only the 512 pages of 
>> allocation. This
>> leads to display corruption in many use cases (I think mainly when 
>> requested for contiguous huge buffer)
>> mainly on APU platforms.
> 
> Ok, I guess other thing is doing:
> 
> lhs_offset = drm_buddy_block_offset(block) - lhs_size;
> 
> I presume it's possible for block_offset < lhs_size here, which might be 
> funny?

I think would also be good to add some basic unit test here:
https://patchwork.freedesktop.org/patch/577497/?series=129671&rev=1

Test passes with your patch, and ofc fails without it.

Just the question of the lhs_offset above,
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> 
>>
>> Thanks,
>> Arun.
>>>>
>>>> The right approach would be if the total allocated size
>>>> is less than the required size, the function should
>>>> return -ENOSPC.
>>>>
>>>> Gitlab ticket link - 
>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097
>>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
>>>> Signed-off-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>>>> index f57e6d74fb0e..c1a99bf4dffd 100644
>>>> --- a/drivers/gpu/drm/drm_buddy.c
>>>> +++ b/drivers/gpu/drm/drm_buddy.c
>>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>>>       } while (1);
>>>>         list_splice_tail(&allocated, blocks);
>>>> +
>>>> +    if (total_allocated < size) {
>>>> +        err = -ENOSPC;
>>>> +        goto err_free;
>>>> +    }
>>>> +
>>>>       return 0;
>>>>     err_undo:
>>
Paneer Selvam, Arunpravin Feb. 9, 2024, 12:58 p.m. UTC | #8
On 2/8/2024 7:47 PM, Matthew Auld wrote:
> On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote:
>> Hi Matthew,
>>
>> On 2/8/2024 7:00 PM, Matthew Auld wrote:
>>> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote:
>>>> Few users have observed display corruption when they boot
>>>> the machine to KDE Plasma or playing games. We have root
>>>> caused the problem that whenever alloc_range() couldn't
>>>> find the required memory blocks the function was returning
>>>> SUCCESS in some of the corner cases.
>>>
>>> Can you please give an example here?
>>>
>> In the try hard contiguous allocation, for example the requested 
>> memory is 1024 pages,
>> it might go and pick the highest and last block (of size 512 pages) 
>> in the freelist where
>> there are no more space exist in the total address range. In this 
>> kind of corner case,
>> alloc_range was returning success though the allocated size is less 
>> than the requested size.
>> Hence in try_hard_contiguous_allocation, we will not proceed to the 
>> LHS allocation and
>> we return only with the RHS allocation having only the 512 pages of 
>> allocation. This
>> leads to display corruption in many use cases (I think mainly when 
>> requested for contiguous huge buffer)
>> mainly on APU platforms.
>
> Ok, I guess other thing is doing:
>
> lhs_offset = drm_buddy_block_offset(block) - lhs_size;
>
> I presume it's possible for block_offset < lhs_size here, which might 
> be funny?
yes, seems it is possible, I will modify the lhs_offset calculation and 
send the patch for review.

Thanks,
Arun.
>
>>
>> Thanks,
>> Arun.
>>>>
>>>> The right approach would be if the total allocated size
>>>> is less than the required size, the function should
>>>> return -ENOSPC.
>>>>
>>>> Gitlab ticket link - 
>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097
>>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory 
>>>> allocation")
>>>> Signed-off-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_buddy.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>>>> index f57e6d74fb0e..c1a99bf4dffd 100644
>>>> --- a/drivers/gpu/drm/drm_buddy.c
>>>> +++ b/drivers/gpu/drm/drm_buddy.c
>>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm,
>>>>       } while (1);
>>>>         list_splice_tail(&allocated, blocks);
>>>> +
>>>> +    if (total_allocated < size) {
>>>> +        err = -ENOSPC;
>>>> +        goto err_free;
>>>> +    }
>>>> +
>>>>       return 0;
>>>>     err_undo:
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..c1a99bf4dffd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -539,6 +539,12 @@  static int __alloc_range(struct drm_buddy *mm,
 	} while (1);
 
 	list_splice_tail(&allocated, blocks);
+
+	if (total_allocated < size) {
+		err = -ENOSPC;
+		goto err_free;
+	}
+
 	return 0;
 
 err_undo: