Message ID | 20240207174456.341121-1-Arunpravin.PaneerSelvam@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/buddy: Fix alloc_range() error handling code | expand |
Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam: > Few users have observed display corruption when they boot > the machine to KDE Plasma or playing games. We have root > caused the problem that whenever alloc_range() couldn't > find the required memory blocks the function was returning > SUCCESS in some of the corner cases. > > The right approach would be if the total allocated size > is less than the required size, the function should > return -ENOSPC. > > Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 > Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") > Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> > Tested-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Christian König <christian.koenig@amd.com> CC: stable.. ? > --- > drivers/gpu/drm/drm_buddy.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c > index f57e6d74fb0e..c1a99bf4dffd 100644 > --- a/drivers/gpu/drm/drm_buddy.c > +++ b/drivers/gpu/drm/drm_buddy.c > @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, > } while (1); > > list_splice_tail(&allocated, blocks); > + > + if (total_allocated < size) { > + err = -ENOSPC; > + goto err_free; > + } > + > return 0; > > err_undo:
Hi Christian, On 2/8/2024 12:27 PM, Christian König wrote: > Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam: >> Few users have observed display corruption when they boot >> the machine to KDE Plasma or playing games. We have root >> caused the problem that whenever alloc_range() couldn't >> find the required memory blocks the function was returning >> SUCCESS in some of the corner cases. >> >> The right approach would be if the total allocated size >> is less than the required size, the function should >> return -ENOSPC. >> >> Gitlab ticket link - >> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 >> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") >> Signed-off-by: Arunpravin Paneer Selvam >> <Arunpravin.PaneerSelvam@amd.com> >> Tested-by: Mario Limonciello <mario.limonciello@amd.com> > > Acked-by: Christian König <christian.koenig@amd.com> > > CC: stable.. ? I will check and add the stable kernel version. Thanks, Arun. > >> --- >> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >> index f57e6d74fb0e..c1a99bf4dffd 100644 >> --- a/drivers/gpu/drm/drm_buddy.c >> +++ b/drivers/gpu/drm/drm_buddy.c >> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >> } while (1); >> list_splice_tail(&allocated, blocks); >> + >> + if (total_allocated < size) { >> + err = -ENOSPC; >> + goto err_free; >> + } >> + >> return 0; >> err_undo: >
On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: > Few users have observed display corruption when they boot > the machine to KDE Plasma or playing games. We have root > caused the problem that whenever alloc_range() couldn't > find the required memory blocks the function was returning > SUCCESS in some of the corner cases. Can you please give an example here? > > The right approach would be if the total allocated size > is less than the required size, the function should > return -ENOSPC. > > Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 > Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") > Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> > Tested-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/gpu/drm/drm_buddy.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c > index f57e6d74fb0e..c1a99bf4dffd 100644 > --- a/drivers/gpu/drm/drm_buddy.c > +++ b/drivers/gpu/drm/drm_buddy.c > @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, > } while (1); > > list_splice_tail(&allocated, blocks); > + > + if (total_allocated < size) { > + err = -ENOSPC; > + goto err_free; > + } > + > return 0; > > err_undo:
Hi Matthew, On 2/8/2024 7:00 PM, Matthew Auld wrote: > On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: >> Few users have observed display corruption when they boot >> the machine to KDE Plasma or playing games. We have root >> caused the problem that whenever alloc_range() couldn't >> find the required memory blocks the function was returning >> SUCCESS in some of the corner cases. > > Can you please give an example here? > In the try hard contiguous allocation, for example the requested memory is 1024 pages, it might go and pick the highest and last block (of size 512 pages) in the freelist where there are no more space exist in the total address range. In this kind of corner case, alloc_range was returning success though the allocated size is less than the requested size. Hence in try_hard_contiguous_allocation, we will not proceed to the LHS allocation and we return only with the RHS allocation having only the 512 pages of allocation. This leads to display corruption in many use cases (I think mainly when requested for contiguous huge buffer) mainly on APU platforms. Thanks, Arun. >> >> The right approach would be if the total allocated size >> is less than the required size, the function should >> return -ENOSPC. >> >> Gitlab ticket link - >> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 >> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") >> Signed-off-by: Arunpravin Paneer Selvam >> <Arunpravin.PaneerSelvam@amd.com> >> Tested-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >> index f57e6d74fb0e..c1a99bf4dffd 100644 >> --- a/drivers/gpu/drm/drm_buddy.c >> +++ b/drivers/gpu/drm/drm_buddy.c >> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >> } while (1); >> list_splice_tail(&allocated, blocks); >> + >> + if (total_allocated < size) { >> + err = -ENOSPC; >> + goto err_free; >> + } >> + >> return 0; >> err_undo:
On 2/8/2024 06:06, Arunpravin Paneer Selvam wrote: > Hi Christian, > > On 2/8/2024 12:27 PM, Christian König wrote: >> Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam: >>> Few users have observed display corruption when they boot >>> the machine to KDE Plasma or playing games. We have root >>> caused the problem that whenever alloc_range() couldn't >>> find the required memory blocks the function was returning >>> SUCCESS in some of the corner cases. >>> >>> The right approach would be if the total allocated size >>> is less than the required size, the function should >>> return -ENOSPC. >>> >>> Gitlab ticket link - >>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Syntax should be "Closes: $URL" >>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") >>> Signed-off-by: Arunpravin Paneer Selvam >>> <Arunpravin.PaneerSelvam@amd.com> >>> Tested-by: Mario Limonciello <mario.limonciello@amd.com> >> >> Acked-by: Christian König <christian.koenig@amd.com> >> >> CC: stable.. ? > I will check and add the stable kernel version. Should be 6.7. > > Thanks, > Arun. >> >>> --- >>> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >>> index f57e6d74fb0e..c1a99bf4dffd 100644 >>> --- a/drivers/gpu/drm/drm_buddy.c >>> +++ b/drivers/gpu/drm/drm_buddy.c >>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >>> } while (1); >>> list_splice_tail(&allocated, blocks); >>> + >>> + if (total_allocated < size) { >>> + err = -ENOSPC; >>> + goto err_free; >>> + } >>> + >>> return 0; >>> err_undo: >> >
On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote: > Hi Matthew, > > On 2/8/2024 7:00 PM, Matthew Auld wrote: >> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: >>> Few users have observed display corruption when they boot >>> the machine to KDE Plasma or playing games. We have root >>> caused the problem that whenever alloc_range() couldn't >>> find the required memory blocks the function was returning >>> SUCCESS in some of the corner cases. >> >> Can you please give an example here? >> > In the try hard contiguous allocation, for example the requested memory > is 1024 pages, > it might go and pick the highest and last block (of size 512 pages) in > the freelist where > there are no more space exist in the total address range. In this kind > of corner case, > alloc_range was returning success though the allocated size is less than > the requested size. > Hence in try_hard_contiguous_allocation, we will not proceed to the LHS > allocation and > we return only with the RHS allocation having only the 512 pages of > allocation. This > leads to display corruption in many use cases (I think mainly when > requested for contiguous huge buffer) > mainly on APU platforms. Ok, I guess other thing is doing: lhs_offset = drm_buddy_block_offset(block) - lhs_size; I presume it's possible for block_offset < lhs_size here, which might be funny? > > Thanks, > Arun. >>> >>> The right approach would be if the total allocated size >>> is less than the required size, the function should >>> return -ENOSPC. >>> >>> Gitlab ticket link - >>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 >>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") >>> Signed-off-by: Arunpravin Paneer Selvam >>> <Arunpravin.PaneerSelvam@amd.com> >>> Tested-by: Mario Limonciello <mario.limonciello@amd.com> >>> --- >>> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >>> index f57e6d74fb0e..c1a99bf4dffd 100644 >>> --- a/drivers/gpu/drm/drm_buddy.c >>> +++ b/drivers/gpu/drm/drm_buddy.c >>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >>> } while (1); >>> list_splice_tail(&allocated, blocks); >>> + >>> + if (total_allocated < size) { >>> + err = -ENOSPC; >>> + goto err_free; >>> + } >>> + >>> return 0; >>> err_undo: >
On 08/02/2024 14:17, Matthew Auld wrote: > On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote: >> Hi Matthew, >> >> On 2/8/2024 7:00 PM, Matthew Auld wrote: >>> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: >>>> Few users have observed display corruption when they boot >>>> the machine to KDE Plasma or playing games. We have root >>>> caused the problem that whenever alloc_range() couldn't >>>> find the required memory blocks the function was returning >>>> SUCCESS in some of the corner cases. >>> >>> Can you please give an example here? >>> >> In the try hard contiguous allocation, for example the requested >> memory is 1024 pages, >> it might go and pick the highest and last block (of size 512 pages) in >> the freelist where >> there are no more space exist in the total address range. In this kind >> of corner case, >> alloc_range was returning success though the allocated size is less >> than the requested size. >> Hence in try_hard_contiguous_allocation, we will not proceed to the >> LHS allocation and >> we return only with the RHS allocation having only the 512 pages of >> allocation. This >> leads to display corruption in many use cases (I think mainly when >> requested for contiguous huge buffer) >> mainly on APU platforms. > > Ok, I guess other thing is doing: > > lhs_offset = drm_buddy_block_offset(block) - lhs_size; > > I presume it's possible for block_offset < lhs_size here, which might be > funny? I think would also be good to add some basic unit test here: https://patchwork.freedesktop.org/patch/577497/?series=129671&rev=1 Test passes with your patch, and ofc fails without it. Just the question of the lhs_offset above, Reviewed-by: Matthew Auld <matthew.auld@intel.com> > >> >> Thanks, >> Arun. >>>> >>>> The right approach would be if the total allocated size >>>> is less than the required size, the function should >>>> return -ENOSPC. >>>> >>>> Gitlab ticket link - >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 >>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") >>>> Signed-off-by: Arunpravin Paneer Selvam >>>> <Arunpravin.PaneerSelvam@amd.com> >>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com> >>>> --- >>>> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >>>> index f57e6d74fb0e..c1a99bf4dffd 100644 >>>> --- a/drivers/gpu/drm/drm_buddy.c >>>> +++ b/drivers/gpu/drm/drm_buddy.c >>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >>>> } while (1); >>>> list_splice_tail(&allocated, blocks); >>>> + >>>> + if (total_allocated < size) { >>>> + err = -ENOSPC; >>>> + goto err_free; >>>> + } >>>> + >>>> return 0; >>>> err_undo: >>
On 2/8/2024 7:47 PM, Matthew Auld wrote: > On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote: >> Hi Matthew, >> >> On 2/8/2024 7:00 PM, Matthew Auld wrote: >>> On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: >>>> Few users have observed display corruption when they boot >>>> the machine to KDE Plasma or playing games. We have root >>>> caused the problem that whenever alloc_range() couldn't >>>> find the required memory blocks the function was returning >>>> SUCCESS in some of the corner cases. >>> >>> Can you please give an example here? >>> >> In the try hard contiguous allocation, for example the requested >> memory is 1024 pages, >> it might go and pick the highest and last block (of size 512 pages) >> in the freelist where >> there are no more space exist in the total address range. In this >> kind of corner case, >> alloc_range was returning success though the allocated size is less >> than the requested size. >> Hence in try_hard_contiguous_allocation, we will not proceed to the >> LHS allocation and >> we return only with the RHS allocation having only the 512 pages of >> allocation. This >> leads to display corruption in many use cases (I think mainly when >> requested for contiguous huge buffer) >> mainly on APU platforms. > > Ok, I guess other thing is doing: > > lhs_offset = drm_buddy_block_offset(block) - lhs_size; > > I presume it's possible for block_offset < lhs_size here, which might > be funny? yes, seems it is possible, I will modify the lhs_offset calculation and send the patch for review. Thanks, Arun. > >> >> Thanks, >> Arun. >>>> >>>> The right approach would be if the total allocated size >>>> is less than the required size, the function should >>>> return -ENOSPC. >>>> >>>> Gitlab ticket link - >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3097 >>>> Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory >>>> allocation") >>>> Signed-off-by: Arunpravin Paneer Selvam >>>> <Arunpravin.PaneerSelvam@amd.com> >>>> Tested-by: Mario Limonciello <mario.limonciello@amd.com> >>>> --- >>>> drivers/gpu/drm/drm_buddy.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >>>> index f57e6d74fb0e..c1a99bf4dffd 100644 >>>> --- a/drivers/gpu/drm/drm_buddy.c >>>> +++ b/drivers/gpu/drm/drm_buddy.c >>>> @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, >>>> } while (1); >>>> list_splice_tail(&allocated, blocks); >>>> + >>>> + if (total_allocated < size) { >>>> + err = -ENOSPC; >>>> + goto err_free; >>>> + } >>>> + >>>> return 0; >>>> err_undo: >>
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1); list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0; err_undo: