Message ID | 20190116181859.D1504459@viggo.jf.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Allow persistent memory to be used like normal RAM | expand |
Dave Hansen <dave.hansen@linux.intel.com> writes: > Persistent memory is cool. But, currently, you have to rewrite > your applications to use it. Wouldn't it be cool if you could > just have it show up in your system like normal RAM and get to > it like a slow blob of memory? Well... have I got the patch > series for you! So, isn't that what memory mode is for? https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ Why do we need this code in the kernel? -Jeff
On Thu, Jan 17, 2019 at 11:29:10AM -0500, Jeff Moyer wrote: > Dave Hansen <dave.hansen@linux.intel.com> writes: > > Persistent memory is cool. But, currently, you have to rewrite > > your applications to use it. Wouldn't it be cool if you could > > just have it show up in your system like normal RAM and get to > > it like a slow blob of memory? Well... have I got the patch > > series for you! > > So, isn't that what memory mode is for? > https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ > > Why do we need this code in the kernel? I don't think those are the same thing. The "memory mode" in the link refers to platforms that sequester DRAM to side cache memory access, where this series doesn't have that platform dependency nor hides faster DRAM.
On Thu, Jan 17, 2019 at 8:29 AM Jeff Moyer <jmoyer@redhat.com> wrote: > > Dave Hansen <dave.hansen@linux.intel.com> writes: > > > Persistent memory is cool. But, currently, you have to rewrite > > your applications to use it. Wouldn't it be cool if you could > > just have it show up in your system like normal RAM and get to > > it like a slow blob of memory? Well... have I got the patch > > series for you! > > So, isn't that what memory mode is for? > https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ That's a hardware cache that privately manages DRAM in front of PMEM. It benefits from some help from software [1]. > Why do we need this code in the kernel? This goes further and enables software managed allocation decisions with the full DRAM + PMEM address space. [1]: https://lore.kernel.org/lkml/154767945660.1983228.12167020940431682725.stgit@dwillia2-desk3.amr.corp.intel.com/
Keith Busch <keith.busch@intel.com> writes: > On Thu, Jan 17, 2019 at 11:29:10AM -0500, Jeff Moyer wrote: >> Dave Hansen <dave.hansen@linux.intel.com> writes: >> > Persistent memory is cool. But, currently, you have to rewrite >> > your applications to use it. Wouldn't it be cool if you could >> > just have it show up in your system like normal RAM and get to >> > it like a slow blob of memory? Well... have I got the patch >> > series for you! >> >> So, isn't that what memory mode is for? >> https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ >> >> Why do we need this code in the kernel? > > I don't think those are the same thing. The "memory mode" in the link > refers to platforms that sequester DRAM to side cache memory access, where > this series doesn't have that platform dependency nor hides faster DRAM. OK, so you are making two arguments, here. 1) platforms may not support memory mode, and 2) this series allows for performance differentiated memory (even though applications may not modified to make use of that...). With this patch set, an unmodified application would either use: 1) whatever memory it happened to get 2) only the faster dram (via numactl --membind=) 3) only the slower pmem (again, via numactl --membind1) 4) preferentially one or the other (numactl --preferred=) The other options are: - as mentioned above, memory mode, which uses DRAM as a cache for the slower persistent memory. Note that it isn't all or nothing--you can configure your system with both memory mode and appdirect. The limitation, of course, is that your platform has to support this. This seems like the obvious solution if you want to make use of the larger pmem capacity as regular volatile memory (and your platform supports it). But maybe there is some other limitation that motivated this work? - libmemkind or pmdk. These options typically* require application modifications, but allow those applications to actively decide which data lives in fast versus slow media. This seems like the obvious answer for applications that care about access latency. * you could override the system malloc, but some libraries/application stacks already do that, so it isn't a universal solution. Listing something like this in the headers of these patch series would considerably reduce the head-scratching for reviewers. Keith, you seem to be implying that there are platforms that won't support memory mode. Do you also have some insight into how customers want to use this, beyond my speculation? It's really frustrating to see patch sets like this go by without any real use cases provided. Cheers, Jeff
On Thu, Jan 17, 2019 at 12:20:06PM -0500, Jeff Moyer wrote: > Keith Busch <keith.busch@intel.com> writes: > > On Thu, Jan 17, 2019 at 11:29:10AM -0500, Jeff Moyer wrote: > >> Dave Hansen <dave.hansen@linux.intel.com> writes: > >> > Persistent memory is cool. But, currently, you have to rewrite > >> > your applications to use it. Wouldn't it be cool if you could > >> > just have it show up in your system like normal RAM and get to > >> > it like a slow blob of memory? Well... have I got the patch > >> > series for you! > >> > >> So, isn't that what memory mode is for? > >> https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ > >> > >> Why do we need this code in the kernel? > > > > I don't think those are the same thing. The "memory mode" in the link > > refers to platforms that sequester DRAM to side cache memory access, where > > this series doesn't have that platform dependency nor hides faster DRAM. > > OK, so you are making two arguments, here. 1) platforms may not support > memory mode, and 2) this series allows for performance differentiated > memory (even though applications may not modified to make use of > that...). > > With this patch set, an unmodified application would either use: > > 1) whatever memory it happened to get > 2) only the faster dram (via numactl --membind=) > 3) only the slower pmem (again, via numactl --membind1) > 4) preferentially one or the other (numactl --preferred=) Yes, numactl and mbind are good ways for unmodified applications to use these different memory types when they're available. Tangentially related, I have another series[1] that provides supplementary information that can be used to help make these decisions for platforms that provide HMAT (heterogeneous memory attribute tables). > The other options are: > - as mentioned above, memory mode, which uses DRAM as a cache for the > slower persistent memory. Note that it isn't all or nothing--you can > configure your system with both memory mode and appdirect. The > limitation, of course, is that your platform has to support this. > > This seems like the obvious solution if you want to make use of the > larger pmem capacity as regular volatile memory (and your platform > supports it). But maybe there is some other limitation that motivated > this work? The hardware supported implementation is one way it may be used, and it's up side is that accessing the cached memory is transparent to the OS and applications. They can use memory unaware that this is happening, so it has a low barrier for applications to make use of the large available address space. There are some minimal things software may do that improve this mode, as Dan mentioned in his reply [2], but it is still usable even without such optimizations. On the downside, a reboot would be required if you want to change the memory configuration at a later time, like you decide more or less DRAM as cache is needed. This series has runtime hot pluggable capabilities. It's also possible the customer may know better which applications require more hot vs cold data, but the memory mode caching doesn't give them as much control since the faster memory is hidden. > - libmemkind or pmdk. These options typically* require application > modifications, but allow those applications to actively decide which > data lives in fast versus slow media. > > This seems like the obvious answer for applications that care about > access latency. > > * you could override the system malloc, but some libraries/application > stacks already do that, so it isn't a universal solution. > > Listing something like this in the headers of these patch series would > considerably reduce the head-scratching for reviewers. > > Keith, you seem to be implying that there are platforms that won't > support memory mode. Do you also have some insight into how customers > want to use this, beyond my speculation? It's really frustrating to see > patch sets like this go by without any real use cases provided. Right, most NFIT reporting platforms today don't have memory mode, and the kernel currently only supports the persistent DAX mode with these. This series adds another option for those platforms. I think numactl as you mentioned is the first consideration for how customers may make use. Dave or Dan might have other use cases in mind. Just thinking out loud, if we wanted an in-kernel use case, it may be interesting to make slower memory a swap tier so the host can manage the cache rather than the hardware. [1] https://lore.kernel.org/patchwork/cover/1032688/ [2] https://lore.kernel.org/lkml/154767945660.1983228.12167020940431682725.stgit@dwillia2-desk3.amr.corp.intel.com/
Keith Busch <keith.busch@intel.com> writes: >> Keith, you seem to be implying that there are platforms that won't >> support memory mode. Do you also have some insight into how customers >> want to use this, beyond my speculation? It's really frustrating to see >> patch sets like this go by without any real use cases provided. > > Right, most NFIT reporting platforms today don't have memory mode, and > the kernel currently only supports the persistent DAX mode with these. > This series adds another option for those platforms. All NFIT reporting platforms today are shipping NVDIMM-Ns, where it makes absolutely no sense to use them as regular DRAM. I don't think that's a good argument to make. > I think numactl as you mentioned is the first consideration for how > customers may make use. Dave or Dan might have other use cases in mind. Well, it sure looks like this took a lot of work, so I thought there were known use cases or users asking for this functionality. Cheers, Jeff
On 1/17/19 8:29 AM, Jeff Moyer wrote: >> Persistent memory is cool. But, currently, you have to rewrite >> your applications to use it. Wouldn't it be cool if you could >> just have it show up in your system like normal RAM and get to >> it like a slow blob of memory? Well... have I got the patch >> series for you! > So, isn't that what memory mode is for? > https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ > > Why do we need this code in the kernel? So, my bad for not mentioning memory mode. This patch set existed before we could talk about it publicly, so it simply ignores its existence. It's a pretty glaring omissions at this point, sorry. I'll add this to the patches, but here are a few reasons you might want this instead of memory mode: 1. Memory mode is all-or-nothing. Either 100% of your persistent memory is used for memory mode, or nothing is. With this set, you can (theoretically) have very granular (128MB) assignment of PMEM to either volatile or persistent uses. We have a few practical matters to fix to get us down to that 128MB value, but we can get there. 2. The capacity of memory mode is the size of your persistent memory. DRAM capacity is "lost" because it is used for cache. With this, you get PMEM+DRAM capacity for memory. 3. DRAM acts as a cache with memory mode, and caches can lead to unpredictable latencies. Since memory mode is all-or-nothing, your entire memory space is exposed to these unpredictable latencies. This solution lets you guarantee DRAM latencies if you need them. 4. The new "tier" of memory is exposed to software. That means that you can build tiered applications or infrastructure. A cloud provider could sell cheaper VMs that use more PMEM and more expensive ones that use DRAM. That's impossible with memory mode. Don't take this as criticism of memory mode. Memory mode is awesome, and doesn't strictly require *any* software changes (we have software changes proposed for optimizing it though). It has tons of other advantages over *this* approach. Basically, they are complementary enough that we think both can live side-by-side.
>With this patch set, an unmodified application would either use: > >1) whatever memory it happened to get >2) only the faster dram (via numactl --membind=) >3) only the slower pmem (again, via numactl --membind1) >4) preferentially one or the other (numactl --preferred=) Yet another option: MemoryOptimizer -- hot page accounting and migration daemon https://github.com/intel/memory-optimizer Once PMEM NUMA nodes are available, we may run a user space daemon to walk page tables of virtual machines (EPT) or processes, collect the "accessed" bits to find out hot pages, and finally migrate hot pages to DRAM and cold pages to PMEM. In that scenario, only kernel and the migrate daemon need to be aware of the PMEM nodes. Unmodified virtual machines and processes can enjoy the added memory space w/o knowing whether it's using DRAM or PMEM. Thanks, Fengguang