drm/prime: Support page array >= 4GB

Message ID	20230821200201.24685-1-Philip.Yang@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Philip Yang <Philip.Yang@amd.com> To: <dri-devel@lists.freedesktop.org> Subject: [PATCH] drm/prime: Support page array >= 4GB Date: Mon, 21 Aug 2023 16:02:01 -0400 Message-ID: <20230821200201.24685-1-Philip.Yang@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: list Cc: Philip Yang <Philip.Yang@amd.com>, Felix.Kuehling@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/prime: Support page array >= 4GB \| expand drm/prime: Support page array >= 4GB

Message ID

20230821200201.24685-1-Philip.Yang@amd.com (mailing list archive)

State

New, archived

Headers

Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Philip Yang <Philip.Yang@amd.com>
To: <dri-devel@lists.freedesktop.org>
Subject: [PATCH] drm/prime: Support page array >= 4GB
Date: Mon, 21 Aug 2023 16:02:01 -0400
Message-ID: <20230821200201.24685-1-Philip.Yang@amd.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Aug 2023 20:02:24.3109 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 d7198be6-bf4c-4149-adba-08dba2818981
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17];
 Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 CO1PEPF000042AE.namprd03.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB9061
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: Philip Yang <Philip.Yang@amd.com>, Felix.Kuehling@amd.com,
 christian.koenig@amd.com
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/prime: Support page array >= 4GB | expand

Commit Message

Philip Yang Aug. 21, 2023, 8:02 p.m. UTC

Without unsigned long typecast, the size is passed in as zero if page
array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
have the first and the last chunk lost.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/drm_prime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian König Aug. 22, 2023, 9:43 a.m. UTC | #1

Am 21.08.23 um 22:02 schrieb Philip Yang:
> Without unsigned long typecast, the size is passed in as zero if page
> array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
> have the first and the last chunk lost.

Good catch, but I'm not sure if this is enough to make it work.

Additional to that I don't think we have an use case for BOs > 4GiB.

Christian.

>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/drm_prime.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index f924b8b4ab6b..2630ad2e504d 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
>   	if (max_segment == 0)
>   		max_segment = UINT_MAX;
>   	err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
> -						nr_pages << PAGE_SHIFT,
> +						(unsigned long)nr_pages << PAGE_SHIFT,
>   						max_segment, GFP_KERNEL);
>   	if (err) {
>   		kfree(sg);

Christian König Aug. 23, 2023, 5:49 a.m. UTC | #2

Am 22.08.23 um 20:27 schrieb Philip Yang:
>
> On 2023-08-22 05:43, Christian König wrote:
>
>>
>> Am 21.08.23 um 22:02 schrieb Philip Yang:
>>> Without unsigned long typecast, the size is passed in as zero if page
>>> array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
>>> have the first and the last chunk lost.
>>
>> Good catch, but I'm not sure if this is enough to make it work.
>>
>> Additional to that I don't think we have an use case for BOs > 4GiB.
>
> >4GB buffer is normal for compute applications, the issue is reported 
> by "Maelstrom generated exerciser detects micompares when GPU accesses 
> larger remote GPU memory." on GFX 9.4.3 APU, which uses GTT domain to 
> allocate VRAM, and trigger the bug in this drm prime helper. With this 
> fix, the test passed.
>

Why is the application allocating all the data as a single BO?

Usually you have a single texture, image, array etc... in a single BO 
but this here looks a bit like the application tries to allocate all 
their memory in a single BO (could of course be that this isn't the case 
and that's really just one giant data structure).

Swapping such large BOs out at once is quite impractical, so should we 
ever have an use case like suspend/resume or checkpoint/restore with 
this it will most likely fail.

Christian.

> Regards,
>
> Philip
>
>>
>> Christian.
>>
>>>
>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_prime.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>> index f924b8b4ab6b..2630ad2e504d 100644
>>> --- a/drivers/gpu/drm/drm_prime.c
>>> +++ b/drivers/gpu/drm/drm_prime.c
>>> @@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct 
>>> drm_device *dev,
>>>       if (max_segment == 0)
>>>           max_segment = UINT_MAX;
>>>       err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
>>> -                        nr_pages << PAGE_SHIFT,
>>> +                        (unsigned long)nr_pages << PAGE_SHIFT,
>>>                           max_segment, GFP_KERNEL);
>>>       if (err) {
>>>           kfree(sg);
>>

Felix Kuehling Aug. 23, 2023, 3:38 p.m. UTC | #3

On 2023-08-23 01:49, Christian König wrote:
> Am 22.08.23 um 20:27 schrieb Philip Yang:
>>
>> On 2023-08-22 05:43, Christian König wrote:
>>
>>>
>>> Am 21.08.23 um 22:02 schrieb Philip Yang:
>>>> Without unsigned long typecast, the size is passed in as zero if page
>>>> array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
>>>> have the first and the last chunk lost.
>>>
>>> Good catch, but I'm not sure if this is enough to make it work.
>>>
>>> Additional to that I don't think we have an use case for BOs > 4GiB.
>>
>> >4GB buffer is normal for compute applications, the issue is reported 
>> by "Maelstrom generated exerciser detects micompares when GPU 
>> accesses larger remote GPU memory." on GFX 9.4.3 APU, which uses GTT 
>> domain to allocate VRAM, and trigger the bug in this drm prime 
>> helper. With this fix, the test passed.
>>
>
> Why is the application allocating all the data as a single BO?
>
> Usually you have a single texture, image, array etc... in a single BO 
> but this here looks a bit like the application tries to allocate all 
> their memory in a single BO (could of course be that this isn't the 
> case and that's really just one giant data structure).

Compute applications work with pretty big data structures. For example 
huge multi-dimensional matrices are not uncommon in large 
machine-learning models.


>
>
> Swapping such large BOs out at once is quite impractical, so should we 
> ever have an use case like suspend/resume or checkpoint/restore with 
> this it will most likely fail.
Checkpointing and restoring multiple GB at a time should not be a 
problem. I'm pretty sure we have tested that. On systems with 100s of 
GBs of memory, HBM memory bandwidth approaching TB/s and PCIe/CXL bus 
bandwidths going into 10s of GB/s, dealing with multi-GB BOs should not 
be a fundamental problem.

That said, if you wanted to impose limits on the size of single 
allocations, then I would expect some policy somewhere that prohibits 
large allocations. On the contrary, I see long or 64-bit data types all 
over the VRAM manager and TTM code, which tells me that >4GB allocations 
must be part of the plan.

This patch is clearly addressing a bug in the code that results in data 
corruption when mapping large BOs on multiple GPUs. You could address 
this with an allocation policy change, if you want, and leave the bug in 
place. Then we have to update ROCm user mode to break large allocations 
into multiple BOs. It would break applications that try to share such 
large allocations via DMABufs (e.g. with an RDMA NIC), because it would 
become impossible to share large allocations with a single DMABuf handle.

Regards,
   Felix


>
> Christian.
>
>> Regards,
>>
>> Philip
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_prime.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>> index f924b8b4ab6b..2630ad2e504d 100644
>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>> @@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct 
>>>> drm_device *dev,
>>>>       if (max_segment == 0)
>>>>           max_segment = UINT_MAX;
>>>>       err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
>>>> -                        nr_pages << PAGE_SHIFT,
>>>> +                        (unsigned long)nr_pages << PAGE_SHIFT,
>>>>                           max_segment, GFP_KERNEL);
>>>>       if (err) {
>>>>           kfree(sg);
>>>
>

Felix Kuehling Aug. 28, 2023, 3:41 p.m. UTC | #4

On 2023-08-21 16:02, Philip Yang wrote:
> Without unsigned long typecast, the size is passed in as zero if page
> array size >= 4GB, nr_pages >= 0x100000, then sg list converted will
> have the first and the last chunk lost.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

The patch looks reasonable to me. I don't have authority to approve it. 
But FWIW,

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>

Can anyone give a Reviewed-by?

Thanks,
   Felix


> ---
>   drivers/gpu/drm/drm_prime.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index f924b8b4ab6b..2630ad2e504d 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
>   	if (max_segment == 0)
>   		max_segment = UINT_MAX;
>   	err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
> -						nr_pages << PAGE_SHIFT,
> +						(unsigned long)nr_pages << PAGE_SHIFT,
>   						max_segment, GFP_KERNEL);
>   	if (err) {
>   		kfree(sg);

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index f924b8b4ab6b..2630ad2e504d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -830,7 +830,7 @@  struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
 	if (max_segment == 0)
 		max_segment = UINT_MAX;
 	err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
-						nr_pages << PAGE_SHIFT,
+						(unsigned long)nr_pages << PAGE_SHIFT,
 						max_segment, GFP_KERNEL);
 	if (err) {
 		kfree(sg);

drm/prime: Support page array >= 4GB

Commit Message

Comments

Patch