[v2] drm/ttm: Allow direct reclaim to allocate local memory

Message ID	20240708160636.1147308-1-rajneesh.bhardwaj@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> To: <amd-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org> CC: <felix.kuehling@amd.com>, <alexander.deucher@amd.com>, <christian.koenig@amd.com>, Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>, Dave Airlie <airlied@redhat.com>, Vlastimil Babka <vbabka@suse.cz>, "Daniel Vetter" <daniel.vetter@ffwll.ch> Subject: [Patch v2] drm/ttm: Allow direct reclaim to allocate local memory Date: Mon, 8 Jul 2024 12:06:36 -0400 Message-ID: <20240708160636.1147308-1-rajneesh.bhardwaj@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	[v2] drm/ttm: Allow direct reclaim to allocate local memory \| expand [v2] drm/ttm: Allow direct reclaim to allocate local memory

Message ID

20240708160636.1147308-1-rajneesh.bhardwaj@amd.com (mailing list archive)

State

New, archived

Headers

Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
To: <amd-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org>
CC: <felix.kuehling@amd.com>, <alexander.deucher@amd.com>,
 <christian.koenig@amd.com>, Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>,
 Dave Airlie <airlied@redhat.com>, Vlastimil Babka <vbabka@suse.cz>, "Daniel
 Vetter" <daniel.vetter@ffwll.ch>
Subject: [Patch v2] drm/ttm: Allow direct reclaim to allocate local memory
Date: Mon, 8 Jul 2024 12:06:36 -0400
Message-ID: <20240708160636.1147308-1-rajneesh.bhardwaj@amd.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jul 2024 16:06:49.4093 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 67a05b13-8152-4c0b-6f7c-08dc9f67f989
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17];
 Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 SJ5PEPF000001D4.namprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB7112
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

[v2] drm/ttm: Allow direct reclaim to allocate local memory | expand

Commit Message

Rajneesh Bhardwaj July 8, 2024, 4:06 p.m. UTC

Limiting the allocation of higher order pages to the closest NUMA node
and enabling direct memory reclaim provides not only failsafe against
situations when memory becomes too much fragmented and the allocator is
not able to satisfy the request from the local node but falls back to
remote pages (HUGEPAGE) but also offers performance improvement.
Accessing remote pages suffers due to bandwidth limitations and could be
avoided if memory becomes defragmented and in most cases without using
manual compaction. (/proc/sys/vm/compact_memory)

Note: On certain distros such as RHEL, the proactive compaction is
disabled. (https://tinyurl.com/4f32f7rs)

Cc: Dave Airlie <airlied@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Vlastimil Babka July 12, 2024, 1:40 p.m. UTC | #1

On 7/8/24 6:06 PM, Rajneesh Bhardwaj wrote:
> Limiting the allocation of higher order pages to the closest NUMA node
> and enabling direct memory reclaim provides not only failsafe against
> situations when memory becomes too much fragmented and the allocator is
> not able to satisfy the request from the local node but falls back to
> remote pages (HUGEPAGE) but also offers performance improvement.
> Accessing remote pages suffers due to bandwidth limitations and could be
> avoided if memory becomes defragmented and in most cases without using
> manual compaction. (/proc/sys/vm/compact_memory)
> 
> Note: On certain distros such as RHEL, the proactive compaction is
> disabled. (https://tinyurl.com/4f32f7rs)
> 
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 6e1fd6985ffc..cc27d5c7afe8 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -91,7 +91,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>  	 */
>  	if (order)
>  		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
> -			__GFP_KSWAPD_RECLAIM;
> +			__GFP_RECLAIM | __GFP_THISNODE;

__GFP_RECLAIM includes ___GFP_DIRECT_RECLAIM, what if the caller is a
context that can't sleep? Then it would be a bugy to add that.
OTOH the only caller I see is ttm_pool_alloc() which seems to start with
GFP_USER and that already includes __GFP_RECLAIM, so either way I see no
reason to be adding it here, other than it might be a potential
bug in case other callers are added later and have more restricted context.

As for __GFP_THISNODE addition that should be fine as this seems to be an
opportunistic allocation with a fallback that's decreasing the attempted order.

Also I note that the caller might be adding __GFP_RETRY_MAYFAIL

        if (ctx->gfp_retry_mayfail)
                gfp_flags |= __GFP_RETRY_MAYFAIL;

But here adding __GFP_NORETRY makes __GFP_RETRY_MAYFAIL non-effective as
__alloc_pages_slowpath() evaluates __GFP_NORETRY earlier to decide to give
up, than evaluating __GFP_RETRY_MAYFAIL to decide to try a bit harder.

That's not introduced by this patch but maybe something to look into, if
__GFP_RETRY_MAYFAIL is expected to be useful here for trying harder. If it's
instead intended only for the final fallback order-0 case where the
gfp_flags adjustment above doesn't apply, then __GFP_RETRY_MAYFAIL will
cause the allocation to fail instead of applying the infamous implicit "too
small to fail" for order-0 allocation and invoking OOM. If that's the reason
for it to be used here, all is fine and great.

Vlastimil

>  
>  	if (!pool->use_dma_alloc) {
>  		p = alloc_pages_node(pool->nid, gfp_flags, order);

Christian König July 12, 2024, 1:49 p.m. UTC | #2

Am 12.07.24 um 15:40 schrieb Vlastimil Babka:
> On 7/8/24 6:06 PM, Rajneesh Bhardwaj wrote:
>> Limiting the allocation of higher order pages to the closest NUMA node
>> and enabling direct memory reclaim provides not only failsafe against
>> situations when memory becomes too much fragmented and the allocator is
>> not able to satisfy the request from the local node but falls back to
>> remote pages (HUGEPAGE) but also offers performance improvement.
>> Accessing remote pages suffers due to bandwidth limitations and could be
>> avoided if memory becomes defragmented and in most cases without using
>> manual compaction. (/proc/sys/vm/compact_memory)
>>
>> Note: On certain distros such as RHEL, the proactive compaction is
>> disabled. (https://tinyurl.com/4f32f7rs)
>>
>> Cc: Dave Airlie <airlied@redhat.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>> index 6e1fd6985ffc..cc27d5c7afe8 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -91,7 +91,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>   	 */
>>   	if (order)
>>   		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
>> -			__GFP_KSWAPD_RECLAIM;
>> +			__GFP_RECLAIM | __GFP_THISNODE;
> __GFP_RECLAIM includes ___GFP_DIRECT_RECLAIM, what if the caller is a
> context that can't sleep?

That's unproblematic, all callers should be able to sleep.

>   Then it would be a bugy to add that.
> OTOH the only caller I see is ttm_pool_alloc() which seems to start with
> GFP_USER and that already includes __GFP_RECLAIM, so either way I see no
> reason to be adding it here, other than it might be a potential
> bug in case other callers are added later and have more restricted context.

Ah! Good point, I was already wondering why changing that didn't had any 
effect.

In this case we should probably just drop __GFP_KSWAPD_RECLAIM as well 
or stop using GFP_USER as base.

> As for __GFP_THISNODE addition that should be fine as this seems to be an
> opportunistic allocation with a fallback that's decreasing the attempted order.

That's what we wanted to hear, thanks!

>
> Also I note that the caller might be adding __GFP_RETRY_MAYFAIL
>
>          if (ctx->gfp_retry_mayfail)
>                  gfp_flags |= __GFP_RETRY_MAYFAIL;
>
> But here adding __GFP_NORETRY makes __GFP_RETRY_MAYFAIL non-effective as
> __alloc_pages_slowpath() evaluates __GFP_NORETRY earlier to decide to give
> up, than evaluating __GFP_RETRY_MAYFAIL to decide to try a bit harder.

That's expected, __GFP_RETRY_MAYFAIL is just relevant for the order==0 
case when we don't add any additional flags.

We could drop that as well since I don't think anybody uses that any more.

Thanks,
Christian.

>
> That's not introduced by this patch but maybe something to look into, if
> __GFP_RETRY_MAYFAIL is expected to be useful here for trying harder. If it's
> instead intended only for the final fallback order-0 case where the
> gfp_flags adjustment above doesn't apply, then __GFP_RETRY_MAYFAIL will
> cause the allocation to fail instead of applying the infamous implicit "too
> small to fail" for order-0 allocation and invoking OOM. If that's the reason
> for it to be used here, all is fine and great.
>
> Vlastimil
>
>>   
>>   	if (!pool->use_dma_alloc) {
>>   		p = alloc_pages_node(pool->nid, gfp_flags, order);

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 6e1fd6985ffc..cc27d5c7afe8 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -91,7 +91,7 @@  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 	 */
 	if (order)
 		gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
-			__GFP_KSWAPD_RECLAIM;
+			__GFP_RECLAIM | __GFP_THISNODE;
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages_node(pool->nid, gfp_flags, order);