mbox series

[v2,0/2] Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy

Message ID cover.1709909210.git.donettom@linux.ibm.com (mailing list archive)
Headers show
Series Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy | expand

Message

Donet Tom March 8, 2024, 3:15 p.m. UTC
This patchset is to optimize the cross-socket memory access with
MPOL_PREFERRED_MANY policy.

To test this patch we ran the following test on a 3 node system.
 Node 0 - 2GB   - Tier 1
 Node 1 - 11GB  - Tier 1
 Node 6 - 10GB  - Tier 2

Below changes are made to memcached to set the memory policy,
It select Node0 and Node1 as preferred nodes.

   #include <numaif.h>
   #include <numa.h>

    unsigned long nodemask;
    int ret;

    nodemask = 0x03;
    ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
                                               &nodemask, 10);
    /* If MPOL_F_NUMA_BALANCING isn't supported,
     * fall back to MPOL_PREFERRED_MANY */
    if (ret < 0 && errno == EINVAL){
       printf("set mem policy normal\n");
        ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
    }
    if (ret < 0) {
       perror("Failed to call set_mempolicy");
       exit(-1);
    }

Test Procedure:
===============
1. Make sure memory tiring and demotion are enabled.
2. Start memcached.

   # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
       -d -s "/tmp/memcached.sock"

3. Run memtier_benchmark to store 3200000 keys.

  #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
    --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
    --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024

4. Start a memory eater on node 0 and 1. This will demote all memcached
   pages to node 6.
5. Make sure all the memcached pages got demoted to lower tier by reading
   /proc/<memcaced PID>/numa_maps.

    # cat /proc/2771/numa_maps
     ---
    default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
    default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
     ---

6. Kill memory eater.
7. Read the pgpromote_success counter.
8. Start reading the keys by running memtier_benchmark.

  #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
   --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
   --key-minimum=1 --key-maximum=3200000 -n allkeys
   --threads=64 -c 1 -R -x 6

9. Read the pgpromote_success counter.

Test Results:
=============
Without Patch
------------------
1. pgpromote_success  before test
Node 0:  pgpromote_success 11
Node 1:  pgpromote_success 140974

pgpromote_success  after test
Node 0:  pgpromote_success 11
Node 1:  pgpromote_success 140974

2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
==================================================================
Type    Ops/sec   Hits/sec   Misses/sec  Avg. Latency  p50 Latency
------------------------------------------------------------------
Sets     0.00       ---         ---        ---          ---
Gets    305792.03  305791.93   0.10       0.18949       0.16700
Waits    0.00       ---         ---        ---          ---
Totals  305792.03  305791.93   0.10       0.18949       0.16700

======================================
p99 Latency  p99.9 Latency  KB/sec
-------------------------------------
---          ---            0.00
0.44700     1.71100        11542.69
---           ---            ---
0.44700     1.71100        11542.69

With Patch
---------------
1. pgpromote_success  before test
Node 0:  pgpromote_success 5
Node 1:  pgpromote_success 89386

pgpromote_success  after test
Node 0:  pgpromote_success 57895
Node 1:  pgpromote_success 141463

2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
====================================================================
Type    Ops/sec    Hits/sec  Misses/sec  Avg. Latency  p50 Latency
--------------------------------------------------------------------
Sets     0.00        ---       ---        ---           ---
Gets    521942.24  521942.07  0.17       0.11459        0.10300
Waits    0.00        ---       ---         ---          ---
Totals  521942.24  521942.07  0.17       0.11459        0.10300

=======================================
p99 Latency  p99.9 Latency  KB/sec
---------------------------------------
 ---          ---            0.00
0.23100      0.31900        19701.68
---          ---             ---
0.23100      0.31900        19701.68


Test Result Analysis:
=====================
1. With patch we could observe pages are getting promoted.
2. Memtier-benchmark results shows that, with the patch,
   performance has increased more than 50%.

 Ops/sec without fix -  305792.03
 Ops/sec with fix    -  521942.24

Changes:
v2:
- Rebased on latest upstream (v6.8-rc7)
- Used 'numa_node_id()' to get the current execution node ID, Added
  'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is
  called with ptl held.
- The migration condition has been updated; now, migration will only
  occur if the execution node is present in the policy nodemask.

-v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t


Donet Tom (2):
  mm/mempolicy: Use numa_node_id() instead of cpu_to_node()
  mm/numa_balancing:Allow migrate on protnone reference with
    MPOL_PREFERRED_MANY policy

 include/linux/mempolicy.h |  5 +++--
 mm/huge_memory.c          |  2 +-
 mm/internal.h             |  2 +-
 mm/memory.c               |  8 +++++---
 mm/mempolicy.c            | 34 ++++++++++++++++++++++++++--------
 5 files changed, 36 insertions(+), 15 deletions(-)

Comments

Huang, Ying March 11, 2024, 1:45 a.m. UTC | #1
Donet Tom <donettom@linux.ibm.com> writes:

> This patchset is to optimize the cross-socket memory access with
> MPOL_PREFERRED_MANY policy.
>
> To test this patch we ran the following test on a 3 node system.
>  Node 0 - 2GB   - Tier 1
>  Node 1 - 11GB  - Tier 1
>  Node 6 - 10GB  - Tier 2
>
> Below changes are made to memcached to set the memory policy,
> It select Node0 and Node1 as preferred nodes.
>
>    #include <numaif.h>
>    #include <numa.h>
>
>     unsigned long nodemask;
>     int ret;
>
>     nodemask = 0x03;
>     ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
>                                                &nodemask, 10);
>     /* If MPOL_F_NUMA_BALANCING isn't supported,
>      * fall back to MPOL_PREFERRED_MANY */
>     if (ret < 0 && errno == EINVAL){
>        printf("set mem policy normal\n");
>         ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
>     }
>     if (ret < 0) {
>        perror("Failed to call set_mempolicy");
>        exit(-1);
>     }
>
> Test Procedure:
> ===============
> 1. Make sure memory tiring and demotion are enabled.

Nit picking.

s/tiring/tiering/

--
Best Regards,
Huang, Ying

> 2. Start memcached.
>
>    # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
>        -d -s "/tmp/memcached.sock"
>
> 3. Run memtier_benchmark to store 3200000 keys.
>
>   #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
>     --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
>     --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024
>
> 4. Start a memory eater on node 0 and 1. This will demote all memcached
>    pages to node 6.
> 5. Make sure all the memcached pages got demoted to lower tier by reading
>    /proc/<memcaced PID>/numa_maps.
>
>     # cat /proc/2771/numa_maps
>      ---
>     default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
>     default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
>      ---
>
> 6. Kill memory eater.
> 7. Read the pgpromote_success counter.
> 8. Start reading the keys by running memtier_benchmark.
>
>   #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
>    --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
>    --key-minimum=1 --key-maximum=3200000 -n allkeys
>    --threads=64 -c 1 -R -x 6
>
> 9. Read the pgpromote_success counter.
>
> Test Results:
> =============
> Without Patch
> ------------------
> 1. pgpromote_success  before test
> Node 0:  pgpromote_success 11
> Node 1:  pgpromote_success 140974
>
> pgpromote_success  after test
> Node 0:  pgpromote_success 11
> Node 1:  pgpromote_success 140974
>
> 2. Memtier-benchmark result.
> AGGREGATED AVERAGE RESULTS (6 runs)
> ==================================================================
> Type    Ops/sec   Hits/sec   Misses/sec  Avg. Latency  p50 Latency
> ------------------------------------------------------------------
> Sets     0.00       ---         ---        ---          ---
> Gets    305792.03  305791.93   0.10       0.18949       0.16700
> Waits    0.00       ---         ---        ---          ---
> Totals  305792.03  305791.93   0.10       0.18949       0.16700
>
> ======================================
> p99 Latency  p99.9 Latency  KB/sec
> -------------------------------------
> ---          ---            0.00
> 0.44700     1.71100        11542.69
> ---           ---            ---
> 0.44700     1.71100        11542.69
>
> With Patch
> ---------------
> 1. pgpromote_success  before test
> Node 0:  pgpromote_success 5
> Node 1:  pgpromote_success 89386
>
> pgpromote_success  after test
> Node 0:  pgpromote_success 57895
> Node 1:  pgpromote_success 141463
>
> 2. Memtier-benchmark result.
> AGGREGATED AVERAGE RESULTS (6 runs)
> ====================================================================
> Type    Ops/sec    Hits/sec  Misses/sec  Avg. Latency  p50 Latency
> --------------------------------------------------------------------
> Sets     0.00        ---       ---        ---           ---
> Gets    521942.24  521942.07  0.17       0.11459        0.10300
> Waits    0.00        ---       ---         ---          ---
> Totals  521942.24  521942.07  0.17       0.11459        0.10300
>
> =======================================
> p99 Latency  p99.9 Latency  KB/sec
> ---------------------------------------
>  ---          ---            0.00
> 0.23100      0.31900        19701.68
> ---          ---             ---
> 0.23100      0.31900        19701.68
>
>
> Test Result Analysis:
> =====================
> 1. With patch we could observe pages are getting promoted.
> 2. Memtier-benchmark results shows that, with the patch,
>    performance has increased more than 50%.
>
>  Ops/sec without fix -  305792.03
>  Ops/sec with fix    -  521942.24
>
> Changes:
> v2:
> - Rebased on latest upstream (v6.8-rc7)
> - Used 'numa_node_id()' to get the current execution node ID, Added
>   'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is
>   called with ptl held.
> - The migration condition has been updated; now, migration will only
>   occur if the execution node is present in the policy nodemask.
>
> -v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t
>
>
> Donet Tom (2):
>   mm/mempolicy: Use numa_node_id() instead of cpu_to_node()
>   mm/numa_balancing:Allow migrate on protnone reference with
>     MPOL_PREFERRED_MANY policy
>
>  include/linux/mempolicy.h |  5 +++--
>  mm/huge_memory.c          |  2 +-
>  mm/internal.h             |  2 +-
>  mm/memory.c               |  8 +++++---
>  mm/mempolicy.c            | 34 ++++++++++++++++++++++++++--------
>  5 files changed, 36 insertions(+), 15 deletions(-)