Message ID | 20220827095815.698-1-jszhang@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] riscv: enable THP_SWAP for RV64 | expand |
Hey Jisheng, On 27/08/2022 10:58, Jisheng Zhang wrote: > I have a Sipeed Lichee RV dock board which only has 512MB DDR, so > memory optimizations such as swap on zram are helpful. As is seen > in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and > commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after > swapped out"), THP_SWAP can improve the swap throughput significantly. > > Enable THP_SWAP for RV64, testing the micro-benchmark which is > introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") > shows below numbers on the Lichee RV dock board: > > thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) > thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) I know the original commit message contains this, but it's a little odd. If the patch /enables/ THP then how would there be THP swap prior to the patch? > > Improved by 382%! I could not replicate the after numbers on my nezha, so I suspect I am missing something in my config/setup. zswap is enabled and is working, TRANSPARENT_HUGEPAGE is enabled etc. Not that it matters for acceptance of the patch though. I gave it a try and nothing went up in flames while using zswap so: Reviewed-by: Conor Dooley <conor.dooley@microchip.com> > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > --- > Since v1: > - collect reviewed-by tag > - make ARCH_WANTS_THP_SWAP rely on HAVE_ARCH_TRANSPARENT_HUGEPAGE > instead > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index ed66c31e4655..79e52441e18b 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -45,6 +45,7 @@ config RISCV > select ARCH_WANT_FRAME_POINTERS > select ARCH_WANT_GENERAL_HUGETLB > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT > + select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE > select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU > select BUILDTIME_TABLE_SORT if MMU > select CLONE_BACKWARDS
On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: > Hey Jisheng, Hi Conor, > On 27/08/2022 10:58, Jisheng Zhang wrote: > > I have a Sipeed Lichee RV dock board which only has 512MB DDR, so > > memory optimizations such as swap on zram are helpful. As is seen > > in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and > > commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after > > swapped out"), THP_SWAP can improve the swap throughput significantly. > > > > Enable THP_SWAP for RV64, testing the micro-benchmark which is > > introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") > > shows below numbers on the Lichee RV dock board: > > > > thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) > > thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) > > I know the original commit message contains this, but it's a little > odd. If the patch /enables/ THP then how would there be THP swap > prior to the patch? hmm, it's swap I'll send a v3 to correct the description. > > > > > Improved by 382%! > > I could not replicate the after numbers on my nezha, so I suspect > I am missing something in my config/setup. zswap is enabled and is swap on zram rather than zswap ;) > working, TRANSPARENT_HUGEPAGE is enabled etc. Not that it matters > for acceptance of the patch though. > > I gave it a try and nothing went up in flames while using zswap so: > Reviewed-by: Conor Dooley <conor.dooley@microchip.com> > > > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > > Reviewed-by: Andrew Jones <ajones@ventanamicro.com> > > --- > > Since v1: > > - collect reviewed-by tag > > - make ARCH_WANTS_THP_SWAP rely on HAVE_ARCH_TRANSPARENT_HUGEPAGE > > instead > > > > arch/riscv/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index ed66c31e4655..79e52441e18b 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -45,6 +45,7 @@ config RISCV > > select ARCH_WANT_FRAME_POINTERS > > select ARCH_WANT_GENERAL_HUGETLB > > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT > > + select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE > > select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU > > select BUILDTIME_TABLE_SORT if MMU > > select CLONE_BACKWARDS
On 29/08/2022 15:10, Jisheng Zhang wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: >> Hey Jisheng, > > Hi Conor, > >> On 27/08/2022 10:58, Jisheng Zhang wrote: >>> I have a Sipeed Lichee RV dock board which only has 512MB DDR, so >>> memory optimizations such as swap on zram are helpful. As is seen >>> in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and >>> commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after >>> swapped out"), THP_SWAP can improve the swap throughput significantly. >>> >>> Enable THP_SWAP for RV64, testing the micro-benchmark which is >>> introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") >>> shows below numbers on the Lichee RV dock board: >>> >>> thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) >>> thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) >> >> I know the original commit message contains this, but it's a little >> odd. If the patch /enables/ THP then how would there be THP swap >> prior to the patch? > > hmm, it's swap I'll send a v3 to correct the description. > >> >>> >>> Improved by 382%! >> >> I could not replicate the after numbers on my nezha, so I suspect >> I am missing something in my config/setup. zswap is enabled and is > > swap on zram rather than zswap ;) I think I tried about 30 different config variations, initially not using zswap and later using it. My zramctl looks like so (although I did try zstd too) after running the demo application from that commit: NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lzo-rle 241M 22M 8.4M 9.1M 1 [SWAP] I am using the default riscv defconfig + the following: CONFIG_ZRAM=y CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_LZO=y CONFIG_CRYPTO_ZSTD=y CONFIG_ZRAM_MEMORY_TRACKING=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y CONFIG_THP_SWAP=y Am I just missing something obvious here? Sorry, Conor.
On Mon, Aug 29, 2022 at 05:27:48PM +0000, Conor.Dooley@microchip.com wrote: > On 29/08/2022 15:10, Jisheng Zhang wrote: > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: > >> Hey Jisheng, > > > > Hi Conor, > > > >> On 27/08/2022 10:58, Jisheng Zhang wrote: > >>> I have a Sipeed Lichee RV dock board which only has 512MB DDR, so > >>> memory optimizations such as swap on zram are helpful. As is seen > >>> in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and > >>> commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after > >>> swapped out"), THP_SWAP can improve the swap throughput significantly. > >>> > >>> Enable THP_SWAP for RV64, testing the micro-benchmark which is > >>> introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") > >>> shows below numbers on the Lichee RV dock board: > >>> > >>> thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) > >>> thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) > >> > >> I know the original commit message contains this, but it's a little > >> odd. If the patch /enables/ THP then how would there be THP swap > >> prior to the patch? > > > > hmm, it's swap I'll send a v3 to correct the description. > > > >> > >>> > >>> Improved by 382%! > >> > >> I could not replicate the after numbers on my nezha, so I suspect > >> I am missing something in my config/setup. zswap is enabled and is > > > > swap on zram rather than zswap ;) > > I think I tried about 30 different config variations, initially not > using zswap and later using it. > My zramctl looks like so (although I did try zstd too) after running > the demo application from that commit: > > NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > /dev/zram0 lzo-rle 241M 22M 8.4M 9.1M 1 [SWAP] > > I am using the default riscv defconfig + the following: > CONFIG_ZRAM=y > CONFIG_CRYPTO_DEFLATE=y > CONFIG_CRYPTO_LZO=y > CONFIG_CRYPTO_ZSTD=y > CONFIG_ZRAM_MEMORY_TRACKING=y > CONFIG_TRANSPARENT_HUGEPAGE=y > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y > CONFIG_THP_SWAP=y > > Am I just missing something obvious here? similar config options here. what's your rootfs? Is your board busy with something? I used a minimal rootfs built from buildroot. can you plz show your numbers w/ and w/o the patch? I also tried the simple benchmark on qemu(just for reference, since I have no other riscv boards except the lichee RV dock board): swp out w/o patch: 30066 bytes/ms (mean of 10 tests) swp out w/ patch: 130055 bytes/ms (mean of 10 tests) so improved by 332.7%
On 30/08/2022 14:59, Jisheng Zhang wrote: > On Mon, Aug 29, 2022 at 05:27:48PM +0000, Conor.Dooley@microchip.com wrote: >> On 29/08/2022 15:10, Jisheng Zhang wrote: >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>> >>> On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: >>>> Hey Jisheng, >>> >>> Hi Conor, >>> >>>> On 27/08/2022 10:58, Jisheng Zhang wrote: >>>>> I have a Sipeed Lichee RV dock board which only has 512MB DDR, so >>>>> memory optimizations such as swap on zram are helpful. As is seen >>>>> in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and >>>>> commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after >>>>> swapped out"), THP_SWAP can improve the swap throughput significantly. >>>>> >>>>> Enable THP_SWAP for RV64, testing the micro-benchmark which is >>>>> introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") >>>>> shows below numbers on the Lichee RV dock board: >>>>> >>>>> thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) >>>>> thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) >>>> >>>> I know the original commit message contains this, but it's a little >>>> odd. If the patch /enables/ THP then how would there be THP swap >>>> prior to the patch? >>> >>> hmm, it's swap I'll send a v3 to correct the description. >>> >>>> >>>>> >>>>> Improved by 382%! >>>> >>>> I could not replicate the after numbers on my nezha, so I suspect >>>> I am missing something in my config/setup. zswap is enabled and is >>> >>> swap on zram rather than zswap ;) >> >> I think I tried about 30 different config variations, initially not >> using zswap and later using it. >> My zramctl looks like so (although I did try zstd too) after running >> the demo application from that commit: >> >> NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT >> /dev/zram0 lzo-rle 241M 22M 8.4M 9.1M 1 [SWAP] >> >> I am using the default riscv defconfig + the following: >> CONFIG_ZRAM=y >> CONFIG_CRYPTO_DEFLATE=y >> CONFIG_CRYPTO_LZO=y >> CONFIG_CRYPTO_ZSTD=y >> CONFIG_ZRAM_MEMORY_TRACKING=y >> CONFIG_TRANSPARENT_HUGEPAGE=y >> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y >> CONFIG_THP_SWAP=y >> >> Am I just missing something obvious here? > > similar config options here. what's your rootfs? Is your board busy > with something? I used a minimal rootfs built from buildroot. > can you plz show your numbers w/ and w/o the patch? I was using fedora for the testing, downloaded directly from koji. My before/after numbers varied, but were around 80,000 bytes/ms most of the time. If I increased the size to 500 * 1024 * 1024 I got around 130k. Before/after the patch, the numbers did not really change, but things did fluctuate quite wildly - from about 50k to 90k using the 400 size. > > I also tried the simple benchmark on qemu(just for reference, since > I have no other riscv boards except the lichee RV dock board): > swp out w/o patch: 30066 bytes/ms (mean of 10 tests) > swp out w/ patch: 130055 bytes/ms (mean of 10 tests) > so improved by 332.7% I'll give QEMU a go so :)
On 30/08/2022 15:15, Conor Dooley - M52691 wrote: > On 30/08/2022 14:59, Jisheng Zhang wrote: >> On Mon, Aug 29, 2022 at 05:27:48PM +0000, Conor.Dooley@microchip.com wrote: >>> On 29/08/2022 15:10, Jisheng Zhang wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>> >>>> On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: >>>>> Hey Jisheng, >>>> >>>> Hi Conor, >>>> >>>>> On 27/08/2022 10:58, Jisheng Zhang wrote: >>>>>> I have a Sipeed Lichee RV dock board which only has 512MB DDR, so >>>>>> memory optimizations such as swap on zram are helpful. As is seen >>>>>> in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and >>>>>> commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after >>>>>> swapped out"), THP_SWAP can improve the swap throughput significantly. >>>>>> >>>>>> Enable THP_SWAP for RV64, testing the micro-benchmark which is >>>>>> introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") >>>>>> shows below numbers on the Lichee RV dock board: >>>>>> >>>>>> thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) >>>>>> thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) >>>>> >>>>> I know the original commit message contains this, but it's a little >>>>> odd. If the patch /enables/ THP then how would there be THP swap >>>>> prior to the patch? >>>> >>>> hmm, it's swap I'll send a v3 to correct the description. >>>> >>>>> >>>>>> >>>>>> Improved by 382%! >>>>> >>>>> I could not replicate the after numbers on my nezha, so I suspect >>>>> I am missing something in my config/setup. zswap is enabled and is >>>> >>>> swap on zram rather than zswap ;) >>> >>> I think I tried about 30 different config variations, initially not >>> using zswap and later using it. >>> My zramctl looks like so (although I did try zstd too) after running >>> the demo application from that commit: >>> >>> NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT >>> /dev/zram0 lzo-rle 241M 22M 8.4M 9.1M 1 [SWAP] >>> >>> I am using the default riscv defconfig + the following: >>> CONFIG_ZRAM=y >>> CONFIG_CRYPTO_DEFLATE=y >>> CONFIG_CRYPTO_LZO=y >>> CONFIG_CRYPTO_ZSTD=y >>> CONFIG_ZRAM_MEMORY_TRACKING=y >>> CONFIG_TRANSPARENT_HUGEPAGE=y >>> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y >>> CONFIG_THP_SWAP=y >>> >>> Am I just missing something obvious here? >> >> similar config options here. what's your rootfs? Is your board busy >> with something? I used a minimal rootfs built from buildroot. >> can you plz show your numbers w/ and w/o the patch? > > I was using fedora for the testing, downloaded directly from > koji. My before/after numbers varied, but were around 80,000 > bytes/ms most of the time. > > If I increased the size to 500 * 1024 * 1024 I got around 130k. 130k before & after.** > > Before/after the patch, the numbers did not really change, but > things did fluctuate quite wildly - from about 50k to 90k using > the 400 size. What I mean is: before/after the patch had visible performance difference because it was always flucuating in the same range. > >> >> I also tried the simple benchmark on qemu(just for reference, since >> I have no other riscv boards except the lichee RV dock board): >> swp out w/o patch: 30066 bytes/ms (mean of 10 tests) >> swp out w/ patch: 130055 bytes/ms (mean of 10 tests) >> so improved by 332.7% > > I'll give QEMU a go so :)
On Tue, Aug 30, 2022 at 02:26:38PM +0000, Conor.Dooley@microchip.com wrote: > On 30/08/2022 15:15, Conor Dooley - M52691 wrote: > > On 30/08/2022 14:59, Jisheng Zhang wrote: > >> On Mon, Aug 29, 2022 at 05:27:48PM +0000, Conor.Dooley@microchip.com wrote: > >>> On 29/08/2022 15:10, Jisheng Zhang wrote: > >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > >>>> > >>>> On Sat, Aug 27, 2022 at 09:13:03PM +0000, Conor.Dooley@microchip.com wrote: > >>>>> Hey Jisheng, > >>>> > >>>> Hi Conor, > >>>> > >>>>> On 27/08/2022 10:58, Jisheng Zhang wrote: > >>>>>> I have a Sipeed Lichee RV dock board which only has 512MB DDR, so > >>>>>> memory optimizations such as swap on zram are helpful. As is seen > >>>>>> in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and > >>>>>> commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after > >>>>>> swapped out"), THP_SWAP can improve the swap throughput significantly. > >>>>>> > >>>>>> Enable THP_SWAP for RV64, testing the micro-benchmark which is > >>>>>> introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") > >>>>>> shows below numbers on the Lichee RV dock board: > >>>>>> > >>>>>> thp swp throughput w/o patch: 66908 bytes/ms (mean of 10 tests) > >>>>>> thp swp throughput w/ patch: 322638 bytes/ms (mean of 10 tests) > >>>>> > >>>>> I know the original commit message contains this, but it's a little > >>>>> odd. If the patch /enables/ THP then how would there be THP swap > >>>>> prior to the patch? > >>>> > >>>> hmm, it's swap I'll send a v3 to correct the description. > >>>> > >>>>> > >>>>>> > >>>>>> Improved by 382%! > >>>>> > >>>>> I could not replicate the after numbers on my nezha, so I suspect > >>>>> I am missing something in my config/setup. zswap is enabled and is > >>>> > >>>> swap on zram rather than zswap ;) > >>> > >>> I think I tried about 30 different config variations, initially not > >>> using zswap and later using it. > >>> My zramctl looks like so (although I did try zstd too) after running > >>> the demo application from that commit: > >>> > >>> NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > >>> /dev/zram0 lzo-rle 241M 22M 8.4M 9.1M 1 [SWAP] > >>> > >>> I am using the default riscv defconfig + the following: > >>> CONFIG_ZRAM=y > >>> CONFIG_CRYPTO_DEFLATE=y > >>> CONFIG_CRYPTO_LZO=y > >>> CONFIG_CRYPTO_ZSTD=y > >>> CONFIG_ZRAM_MEMORY_TRACKING=y > >>> CONFIG_TRANSPARENT_HUGEPAGE=y > >>> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y > >>> CONFIG_THP_SWAP=y > >>> > >>> Am I just missing something obvious here? > >> > >> similar config options here. what's your rootfs? Is your board busy > >> with something? I used a minimal rootfs built from buildroot. > >> can you plz show your numbers w/ and w/o the patch? > > > > I was using fedora for the testing, downloaded directly from > > koji. My before/after numbers varied, but were around 80,000 > > bytes/ms most of the time. > > > > If I increased the size to 500 * 1024 * 1024 I got around 130k. > > 130k before & after.** > > > > > Before/after the patch, the numbers did not really change, but > > things did fluctuate quite wildly - from about 50k to 90k using > > the 400 size. > > What I mean is: before/after the patch had visible performance > difference because it was always flucuating in the same range. I see the difference -- w/ minial buildroot rootfs, the numbers isn't kept the same but the difference is trivial, I even got two or three the same numbers during 10 times of testing. But your numbers were always flucuating, so I guess your system maybe busy with with something in a shore period, I.E the os env is full of noise. I guess you may get similar improvement percentage when trying buildroot > > > > >> > >> I also tried the simple benchmark on qemu(just for reference, since > >> I have no other riscv boards except the lichee RV dock board): > >> swp out w/o patch: 30066 bytes/ms (mean of 10 tests) > >> swp out w/ patch: 130055 bytes/ms (mean of 10 tests) > >> so improved by 332.7% > > > > I'll give QEMU a go so :) >
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index ed66c31e4655..79e52441e18b 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -45,6 +45,7 @@ config RISCV select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE if 64BIT + select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU select BUILDTIME_TABLE_SORT if MMU select CLONE_BACKWARDS