Message ID | 20240206231908.1792529-1-hao.xiang@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce multifd zero page checking. | expand |
On Tue, Feb 06, 2024 at 11:19:02PM +0000, Hao Xiang wrote: > This patchset is based on Juan Quintela's old series here > https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/ > > In the multifd live migration model, there is a single migration main > thread scanning the page map, queuing the pages to multiple multifd > sender threads. The migration main thread runs zero page checking on > every page before queuing the page to the sender threads. Zero page > checking is a CPU intensive task and hence having a single thread doing > all that doesn't scale well. This change introduces a new function > to run the zero page checking on the multifd sender threads. This > patchset also lays the ground work for future changes to offload zero > page checking task to accelerator hardwares. > > Use two Intel 4th generation Xeon servers for testing. > > Architecture: x86_64 > CPU(s): 192 > Thread(s) per core: 2 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 143 > Model name: Intel(R) Xeon(R) Platinum 8457C > Stepping: 8 > CPU MHz: 2538.624 > CPU max MHz: 3800.0000 > CPU min MHz: 800.0000 > > Perform multifd live migration with below setup: > 1. VM has 100GB memory. All pages in the VM are zero pages. > 2. Use tcp socket for live migratio. > 3. Use 4 multifd channels and zero page checking on migration main thread. > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender > threads. > 5. Record migration total time from sender QEMU console's "info migrate" > command. > 6. Calculate throughput with "100GB / total time". > > +------------------------------------------------------+ > |zero-page-checking | total-time(ms) | throughput(GB/s)| > +------------------------------------------------------+ > |main-thread | 9629 | 10.38GB/s | > +------------------------------------------------------+ > |multifd-1-threads | 6182 | 16.17GB/s | > +------------------------------------------------------+ > |multifd-2-threads | 4643 | 21.53GB/s | > +------------------------------------------------------+ > |multifd-4-threads | 4143 | 24.13GB/s | > +------------------------------------------------------+ This "throughput" is slightly confusing; I was initially surprised to see a large throughput for idle guests. IMHO the "total-time" would explain. Feel free to drop that column if there's a repost. Did you check why 4 channels mostly already reached the top line? Is it because main thread is already spinning 100%? Thanks,
On Tue, Feb 6, 2024 at 7:39 PM Peter Xu <peterx@redhat.com> wrote: > > On Tue, Feb 06, 2024 at 11:19:02PM +0000, Hao Xiang wrote: > > This patchset is based on Juan Quintela's old series here > > https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/ > > > > In the multifd live migration model, there is a single migration main > > thread scanning the page map, queuing the pages to multiple multifd > > sender threads. The migration main thread runs zero page checking on > > every page before queuing the page to the sender threads. Zero page > > checking is a CPU intensive task and hence having a single thread doing > > all that doesn't scale well. This change introduces a new function > > to run the zero page checking on the multifd sender threads. This > > patchset also lays the ground work for future changes to offload zero > > page checking task to accelerator hardwares. > > > > Use two Intel 4th generation Xeon servers for testing. > > > > Architecture: x86_64 > > CPU(s): 192 > > Thread(s) per core: 2 > > Core(s) per socket: 48 > > Socket(s): 2 > > NUMA node(s): 2 > > Vendor ID: GenuineIntel > > CPU family: 6 > > Model: 143 > > Model name: Intel(R) Xeon(R) Platinum 8457C > > Stepping: 8 > > CPU MHz: 2538.624 > > CPU max MHz: 3800.0000 > > CPU min MHz: 800.0000 > > > > Perform multifd live migration with below setup: > > 1. VM has 100GB memory. All pages in the VM are zero pages. > > 2. Use tcp socket for live migratio. > > 3. Use 4 multifd channels and zero page checking on migration main thread. > > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender > > threads. > > 5. Record migration total time from sender QEMU console's "info migrate" > > command. > > 6. Calculate throughput with "100GB / total time". > > > > +------------------------------------------------------+ > > |zero-page-checking | total-time(ms) | throughput(GB/s)| > > +------------------------------------------------------+ > > |main-thread | 9629 | 10.38GB/s | > > +------------------------------------------------------+ > > |multifd-1-threads | 6182 | 16.17GB/s | > > +------------------------------------------------------+ > > |multifd-2-threads | 4643 | 21.53GB/s | > > +------------------------------------------------------+ > > |multifd-4-threads | 4143 | 24.13GB/s | > > +------------------------------------------------------+ > > This "throughput" is slightly confusing; I was initially surprised to see a > large throughput for idle guests. IMHO the "total-time" would explain. > Feel free to drop that column if there's a repost. > > Did you check why 4 channels mostly already reached the top line? Is it > because main thread is already spinning 100%? > > Thanks, > > -- > Peter Xu Sure I will drop "throughput" to avoid confusion. In my testing, 1 multifd channel already makes the main thread spin at 100%. So the total-time is the same across 1/2/4 multifd channels as long as zero page is run on the main migration thread. Of course, this is based on the fact that the network is not the bottleneck. One interesting finding is that multifd 1 channel with multifd zero page has better performance than multifd 1 channel with main migration thread. >
On Wed, Feb 07, 2024 at 04:47:27PM -0800, Hao Xiang wrote: > Sure I will drop "throughput" to avoid confusion. In my testing, 1 > multifd channel already makes the main thread spin at 100%. So the > total-time is the same across 1/2/4 multifd channels as long as zero > page is run on the main migration thread. Of course, this is based on > the fact that the network is not the bottleneck. One interesting > finding is that multifd 1 channel with multifd zero page has better > performance than multifd 1 channel with main migration thread. It's probably because the main thread has even more works to do than "detecting zero page" alone. When zero detection is done in main thread and when the guest is fully idle, it'll consume a major portion of main thread cpu resource scanning those pages already. Consider all pages zero, multifd threads should be fully idle, so n_channels may not matter here. When 1 multifd thread created with zero-page offloading, zero page is fully offloaded from main -> multifd thread even if only one. It's kind of a similar effect of forking the main thread into two threads, so the main thread can be more efficient on other tasks (fetching/scanning dirty bits, etc.). Thanks,