diff mbox

[libdrm,0/4] Dynamicly disable suites and tets.

Message ID faf30a58-a8cf-05b2-6028-065438b38f07@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrey Grodzovsky Nov. 13, 2017, 2:57 p.m. UTC
On 11/13/2017 07:39 AM, Christian König wrote:

> Am 13.11.2017 um 12:32 schrieb Michel Dänzer:
>> On 12/11/17 10:35 AM, Christian König wrote:
>>> A few comments on the code:
>>>
>>>> +/* Validate bo size is bit bigger then the request domain */
>>>> +static inline bool amdgpu_bo_validate_bo_size(struct amdgpu_device
>>>> *adev,
>>>> +                      unsigned long size, u32 domain)
>>> Drop the inline keyword and the second _bo_ in the name here.
>>>
>>>> +{
>>>> +    struct ttm_mem_type_manager *man = NULL;
>>>> +
>>>> +    if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
>>>> +        man = &adev->mman.bdev.man[TTM_PL_VRAM];
>>>> +
>>>> +        if (man && size < (man->size << PAGE_SHIFT))
>>> Drop the extra check that man is not NULL. We get the pointer to an
>>> array element, that can't be NULL.
>>>
>>>> +            return true;
>>> Mhm, domain is a bitmask of allowed domains.
>>>
>>> So we should check all valid domains if the size fit, not just the 
>>> first
>>> one.
>> Assuming VRAM <-> system migration of BOs larger than the GTT domain
>> works, I'd say we should only require that the BO can fit in any of the
>> allowed domains. Otherwise it must also always fit in GTT.
> Good point, and yes VRAM <-> system migration of BOs larger than the 
> GTT domain works now.
>
> I can agree on that VRAM should probably be optional, otherwise we 
> can't allocate anything large when the driver uses only very low 
> amounts of stolen VRAM on APUs.
>
> But I think when userspace requests VRAM and GTT at the same time we 
> still should be able to fall back to GTT.

Attached V2 patch, I still don't understand why I experience the SIGSEV 
in the tester when the check fails and the IOCTLs will return ENOMEM

I will update the libdrm test to correctly handle mem failure, it 
segfaults at the moment.


Thanks,
Andey
>
> Regards,
> Christian.

Comments

Christian König Nov. 13, 2017, 3:27 p.m. UTC | #1
Am 13.11.2017 um 15:57 schrieb Andrey Grodzovsky:
> On 11/13/2017 07:39 AM, Christian König wrote:
>
>> Am 13.11.2017 um 12:32 schrieb Michel Dänzer:
>>> On 12/11/17 10:35 AM, Christian König wrote:
>>>> A few comments on the code:
>>>>
>>>>> +/* Validate bo size is bit bigger then the request domain */
>>>>> +static inline bool amdgpu_bo_validate_bo_size(struct amdgpu_device
>>>>> *adev,
>>>>> +                      unsigned long size, u32 domain)
>>>> Drop the inline keyword and the second _bo_ in the name here.
>>>>
>>>>> +{
>>>>> +    struct ttm_mem_type_manager *man = NULL;
>>>>> +
>>>>> +    if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
>>>>> +        man = &adev->mman.bdev.man[TTM_PL_VRAM];
>>>>> +
>>>>> +        if (man && size < (man->size << PAGE_SHIFT))
>>>> Drop the extra check that man is not NULL. We get the pointer to an
>>>> array element, that can't be NULL.
>>>>
>>>>> +            return true;
>>>> Mhm, domain is a bitmask of allowed domains.
>>>>
>>>> So we should check all valid domains if the size fit, not just the 
>>>> first
>>>> one.
>>> Assuming VRAM <-> system migration of BOs larger than the GTT domain
>>> works, I'd say we should only require that the BO can fit in any of the
>>> allowed domains. Otherwise it must also always fit in GTT.
>> Good point, and yes VRAM <-> system migration of BOs larger than the 
>> GTT domain works now.
>>
>> I can agree on that VRAM should probably be optional, otherwise we 
>> can't allocate anything large when the driver uses only very low 
>> amounts of stolen VRAM on APUs.
>>
>> But I think when userspace requests VRAM and GTT at the same time we 
>> still should be able to fall back to GTT.
>
> Attached V2 patch, I still don't understand why I experience the 
> SIGSEV in the tester when the check fails and the IOCTLs will return 
> ENOMEM
>

Reviewed-by: Christian König <christian.koenig@amd.com> for this one, 
but please use git send-email to send out patches.

> I will update the libdrm test to correctly handle mem failure, it 
> segfaults at the moment.

Sounds like it just tries to use the BO for VM or CPU mapping while the 
underlying function has failed (or we have another bug somewhere).

Please commit the kernel patch and leave me a note so that I can push 
the libdrm patches. BTW: Do you have the link where you request an 
account at hand? I want to ping the admins once more.

Regards,
Christian.

>
>
> Thanks,
> Andey
>>
>> Regards,
>> Christian.
>
Andrey Grodzovsky Nov. 13, 2017, 3:36 p.m. UTC | #2
On 11/13/2017 10:27 AM, Christian König wrote:
> Am 13.11.2017 um 15:57 schrieb Andrey Grodzovsky:
>> On 11/13/2017 07:39 AM, Christian König wrote:
>>
>>> Am 13.11.2017 um 12:32 schrieb Michel Dänzer:
>>>> On 12/11/17 10:35 AM, Christian König wrote:
>>>>> A few comments on the code:
>>>>>
>>>>>> +/* Validate bo size is bit bigger then the request domain */
>>>>>> +static inline bool amdgpu_bo_validate_bo_size(struct amdgpu_device
>>>>>> *adev,
>>>>>> +                      unsigned long size, u32 domain)
>>>>> Drop the inline keyword and the second _bo_ in the name here.
>>>>>
>>>>>> +{
>>>>>> +    struct ttm_mem_type_manager *man = NULL;
>>>>>> +
>>>>>> +    if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
>>>>>> +        man = &adev->mman.bdev.man[TTM_PL_VRAM];
>>>>>> +
>>>>>> +        if (man && size < (man->size << PAGE_SHIFT))
>>>>> Drop the extra check that man is not NULL. We get the pointer to an
>>>>> array element, that can't be NULL.
>>>>>
>>>>>> +            return true;
>>>>> Mhm, domain is a bitmask of allowed domains.
>>>>>
>>>>> So we should check all valid domains if the size fit, not just the 
>>>>> first
>>>>> one.
>>>> Assuming VRAM <-> system migration of BOs larger than the GTT domain
>>>> works, I'd say we should only require that the BO can fit in any of 
>>>> the
>>>> allowed domains. Otherwise it must also always fit in GTT.
>>> Good point, and yes VRAM <-> system migration of BOs larger than the 
>>> GTT domain works now.
>>>
>>> I can agree on that VRAM should probably be optional, otherwise we 
>>> can't allocate anything large when the driver uses only very low 
>>> amounts of stolen VRAM on APUs.
>>>
>>> But I think when userspace requests VRAM and GTT at the same time we 
>>> still should be able to fall back to GTT.
>>
>> Attached V2 patch, I still don't understand why I experience the 
>> SIGSEV in the tester when the check fails and the IOCTLs will return 
>> ENOMEM
>>
>
> Reviewed-by: Christian König <christian.koenig@amd.com> for this one, 
> but please use git send-email to send out patches.
>
>> I will update the libdrm test to correctly handle mem failure, it 
>> segfaults at the moment.
>
> Sounds like it just tries to use the BO for VM or CPU mapping while 
> the underlying function has failed (or we have another bug somewhere).

Yes, the segfault is because I am using gpu_mem_alloc which continues 
executing after amdgpu_bo_alloc failed, the segfault is in amdgpu_bo_va_op.

>
> Please commit the kernel patch and leave me a note so that I can push 
> the libdrm patches. 

Areyou gonna push patches 1-3 from the original series and then I need 
to resend patch 4 to fix the segfault ?


> BTW: Do you have the link where you request an account at hand? I want 
> to ping the admins once more.

https://bugs.freedesktop.org/show_bug.cgi?id=103566

Thanks,
Andrey

>
> Regards,
> Christian.
>
>>
>>
>> Thanks,
>> Andey
>>>
>>> Regards,
>>> Christian.
>>
>
diff mbox

Patch

From 37369b3a58027dedf7e9dc65fd9bc0f9e86d0d80 Mon Sep 17 00:00:00 2001
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Date: Fri, 10 Nov 2017 18:35:56 -0500
Subject: drm/amdgpu: Implement BO size validation V2

Validates BO size against each requested domain's total memory.

v2:
Make GTT size check a MUST to allow fall back to GTT.
Rmove redundant NULL check.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 41 ++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index a937c49..5acf20c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -281,6 +281,44 @@  void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
 		*cpu_addr = NULL;
 }
 
+/* Validate bo size is bit bigger then the request domain */
+static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
+					  unsigned long size, u32 domain)
+{
+	struct ttm_mem_type_manager *man = NULL;
+
+	/*
+	 * If GTT is part of requested domains the check must succeed to
+	 * allow fall back to GTT
+	 */
+	if (domain & AMDGPU_GEM_DOMAIN_GTT) {
+		man = &adev->mman.bdev.man[TTM_PL_TT];
+
+		if (size < (man->size << PAGE_SHIFT))
+			return true;
+		else
+			goto fail;
+	}
+
+	if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
+		man = &adev->mman.bdev.man[TTM_PL_VRAM];
+
+		if (size < (man->size << PAGE_SHIFT))
+			return true;
+		else
+			goto fail;
+	}
+
+
+	/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
+	return true;
+
+fail:
+	DRM_ERROR("BO size %lu > total memory in domain: %llu\n", size,
+					      man->size << PAGE_SHIFT);
+	return false;
+}
+
 static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 			       unsigned long size, int byte_align,
 			       bool kernel, u32 domain, u64 flags,
@@ -299,6 +337,9 @@  static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	page_align = roundup(byte_align, PAGE_SIZE) >> PAGE_SHIFT;
 	size = ALIGN(size, PAGE_SIZE);
 
+	if (!amdgpu_bo_validate_size(adev, size, domain))
+		return -ENOMEM;
+
 	if (kernel) {
 		type = ttm_bo_type_kernel;
 	} else if (sg) {
-- 
2.7.4