another pmem variant V2
diff mbox

Message ID 551BE96A.9060603@plexistor.com
State New, archived
Headers show

Commit Message

Boaz Harrosh April 1, 2015, 12:49 p.m. UTC
On 03/31/2015 07:16 PM, Christoph Hellwig wrote:
> On Tue, Mar 31, 2015 at 06:14:15PM +0300, Boaz Harrosh wrote:
>> We can not accept it as is right now.
> 
> Who is we?
> 
>> We have conducted farther tests. And it messes up NUMA.
> 
> Only you if you use the memmap option in weird ways.
> 

No weird ways just the single range. I would be very happy if
you can teach me the proper way, believe me I'm trying for 2 days.

> Sounds like I should simply remove the memmap= option so people don't
> abuse it.  The main point is to parse the e820 tables, which works fine.
> 

What abuse? the single range and the problem shows up.

So you are just pasting over the problem sir. The patch that you
are submitting has grave problems and covering it up with not
allowing memmap=! will not fix it.

The Kernel as is after your patch does not like this half baked
beast as we defined it, the defined pmem-memory-range messes
things up.

> And those people having fake pmem, or pcie cards that they are too lazy
> to submit proper drivers for can stick to their out of tree hacks?
> 

I have now conducted more tests on real type-12 DDR3 system and
the exact same problem as I reported exists with real type-12
chips!
And who are we kidding? the "memmap=!" yes or no, makes no
difference at all. All it does is edit the table as if it was
the table the BIOS gave us. There is no extra processing done
on memmap=.

Your e820 patch trashes NUMA.

But I fix it for good this time here is the fix below.
After I apply below patch every thing boots and work just as
expected. All the problems I reported disappear. Any configuration,
any number of ranges, cross NUMA or not, just works, exactly as
before with my patches.

The fix is over your V2, I will post one later that fixes
your V3 and adds back the memmap=!

---
If you inspect my fix below You will see that what happened is
that the original patch was too aggressive in making pmem
look like ram. In fact it started the ARCH side memory
initialization and was only skipping the generic initialization
of memory. This messed up internal real-memory structures.

With this fix below block/drivers/pmem.ko loads just fine
finds its resources and maps them, and everything just
works. including any "abuse" to memmap=! and any NUMA
configurations. Both real HW, type-12 HW and NUMA Vms.
(Preliminary testing we are conducting the full test
 rig as we speak)

(I'm still going over it, I might send some more cleaning)

Cheers

---
git diff --stat -p -M HEAD
 arch/x86/kernel/e820.c  |  7 ++++---
 arch/x86/kernel/pmem.c  | 17 -----------------
 arch/x86/kernel/setup.c |  2 --
 3 files changed, 4 insertions(+), 22 deletions(-)

Patch
diff mbox

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 07246e5..dce0e84 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -963,9 +963,10 @@  void __init e820_reserve_resources(void)
 		 * pci device BAR resource and insert them later in
 		 * pcibios_resource_survey()
 		 */
-		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
-			if (e820.map[i].type != E820_PRAM)
-				res->flags |= IORESOURCE_BUSY;
+		if (((e820.map[i].type != E820_RESERVED) &&
+		     (e820.map[i].type != E820_PRAM)) ||
+		     res->start < (1ULL<<20)) {
+			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
 		res++;
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
index f970048..fcdbc20 100644
--- a/arch/x86/kernel/pmem.c
+++ b/arch/x86/kernel/pmem.c
@@ -9,23 +9,6 @@ 
 #include <asm/page_types.h>
 #include <asm/setup.h>
 
-void __init reserve_pmem(void)
-{
-	int i;
-
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i];
-
-		if (ei->type != E820_PRAM)
-			continue;
-
-		memblock_reserve(ei->addr, ei->addr + ei->size);
-		max_pfn_mapped = init_memory_mapping(
-				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
-				ei->addr + ei->size);
-	}
-}
-
 static __init void register_pmem_device(struct resource *res)
 {
 	struct platform_device *pdev;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f2bed2b..0a2421c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1158,8 +1158,6 @@  void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
-	reserve_pmem();
-
 	initmem_init();
 	dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);