From patchwork Wed Apr 1 12:49:46 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boaz Harrosh X-Patchwork-Id: 6140081 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D5292BF4A6 for ; Wed, 1 Apr 2015 12:49:54 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D60D8202A1 for ; Wed, 1 Apr 2015 12:49:53 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BF3FD202B8 for ; Wed, 1 Apr 2015 12:49:52 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 95FEF80FC2; Wed, 1 Apr 2015 05:49:52 -0700 (PDT) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1FA9180FBE for ; Wed, 1 Apr 2015 05:49:51 -0700 (PDT) Received: by wixm2 with SMTP id m2so34909864wix.0 for ; Wed, 01 Apr 2015 05:49:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=HujYVXbXxdmDVc+fWnPxtJePHnXZ8IDse5T9UoYNTus=; b=RpXQqpfkaDAIl+hlBrYtQ5DTsQEBxxq1sI4XWWXnkVae3zr8IeYU7S2tGI68XZkJaJ 9TyKLfAYjPcTV8UskjHo90kLQP6y9y4mf5/Nd8ezFGnELv2K0LnbU9JRe45iNlwxNB54 WSivn70qIzwSg8faryGeBvAPlCTAbjXVDAT8kRdKppcUrtx3vag1LpJ8bXNINKC7k6Vf NttX8ZrUTwDE55AG70GUTrcH+XlFfLuaZEqFr2nFaqOmXb/z3kn+i0TubPoyT+lS7ckC cKT9PV8OTRUhjNyEf7zTc4HWR1qzxwhW5Q03blN2L87z1TCZrT1SMwWvgowpMNEZ+rfr n8qw== X-Gm-Message-State: ALoCoQmeqV6G88dW+4inzBvLMLyEup1fTrEIf2PirjtKE2jvpkfkrcMqtgmE2grOEb80Nu5X8sXK X-Received: by 10.194.239.65 with SMTP id vq1mr81097913wjc.98.1427892588977; Wed, 01 Apr 2015 05:49:48 -0700 (PDT) Received: from [10.0.0.5] ([207.232.55.62]) by mx.google.com with ESMTPSA id bd1sm25345396wib.13.2015.04.01.05.49.47 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Apr 2015 05:49:48 -0700 (PDT) Message-ID: <551BE96A.9060603@plexistor.com> Date: Wed, 01 Apr 2015 15:49:46 +0300 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Christoph Hellwig References: <1427358764-6126-1-git-send-email-hch@lst.de> <55143A8B.2060304@plexistor.com> <20150331092526.GA25958@lst.de> <551AB9C7.6020505@plexistor.com> <20150331161648.GA1318@lst.de> In-Reply-To: <20150331161648.GA1318@lst.de> Cc: axboe@kernel.dk, linux-nvdimm@ml01.01.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [Linux-nvdimm] another pmem variant V2 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 03/31/2015 07:16 PM, Christoph Hellwig wrote: > On Tue, Mar 31, 2015 at 06:14:15PM +0300, Boaz Harrosh wrote: >> We can not accept it as is right now. > > Who is we? > >> We have conducted farther tests. And it messes up NUMA. > > Only you if you use the memmap option in weird ways. > No weird ways just the single range. I would be very happy if you can teach me the proper way, believe me I'm trying for 2 days. > Sounds like I should simply remove the memmap= option so people don't > abuse it. The main point is to parse the e820 tables, which works fine. > What abuse? the single range and the problem shows up. So you are just pasting over the problem sir. The patch that you are submitting has grave problems and covering it up with not allowing memmap=! will not fix it. The Kernel as is after your patch does not like this half baked beast as we defined it, the defined pmem-memory-range messes things up. > And those people having fake pmem, or pcie cards that they are too lazy > to submit proper drivers for can stick to their out of tree hacks? > I have now conducted more tests on real type-12 DDR3 system and the exact same problem as I reported exists with real type-12 chips! And who are we kidding? the "memmap=!" yes or no, makes no difference at all. All it does is edit the table as if it was the table the BIOS gave us. There is no extra processing done on memmap=. Your e820 patch trashes NUMA. But I fix it for good this time here is the fix below. After I apply below patch every thing boots and work just as expected. All the problems I reported disappear. Any configuration, any number of ranges, cross NUMA or not, just works, exactly as before with my patches. The fix is over your V2, I will post one later that fixes your V3 and adds back the memmap=! --- If you inspect my fix below You will see that what happened is that the original patch was too aggressive in making pmem look like ram. In fact it started the ARCH side memory initialization and was only skipping the generic initialization of memory. This messed up internal real-memory structures. With this fix below block/drivers/pmem.ko loads just fine finds its resources and maps them, and everything just works. including any "abuse" to memmap=! and any NUMA configurations. Both real HW, type-12 HW and NUMA Vms. (Preliminary testing we are conducting the full test rig as we speak) (I'm still going over it, I might send some more cleaning) Cheers --- git diff --stat -p -M HEAD arch/x86/kernel/e820.c | 7 ++++--- arch/x86/kernel/pmem.c | 17 ----------------- arch/x86/kernel/setup.c | 2 -- 3 files changed, 4 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 07246e5..dce0e84 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -963,9 +963,10 @@ void __init e820_reserve_resources(void) * pci device BAR resource and insert them later in * pcibios_resource_survey() */ - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { - if (e820.map[i].type != E820_PRAM) - res->flags |= IORESOURCE_BUSY; + if (((e820.map[i].type != E820_RESERVED) && + (e820.map[i].type != E820_PRAM)) || + res->start < (1ULL<<20)) { + res->flags |= IORESOURCE_BUSY; insert_resource(&iomem_resource, res); } res++; diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c index f970048..fcdbc20 100644 --- a/arch/x86/kernel/pmem.c +++ b/arch/x86/kernel/pmem.c @@ -9,23 +9,6 @@ #include #include -void __init reserve_pmem(void) -{ - int i; - - for (i = 0; i < e820.nr_map; i++) { - struct e820entry *ei = &e820.map[i]; - - if (ei->type != E820_PRAM) - continue; - - memblock_reserve(ei->addr, ei->addr + ei->size); - max_pfn_mapped = init_memory_mapping( - ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr, - ei->addr + ei->size); - } -} - static __init void register_pmem_device(struct resource *res) { struct platform_device *pdev; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f2bed2b..0a2421c 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1158,8 +1158,6 @@ void __init setup_arch(char **cmdline_p) early_acpi_boot_init(); - reserve_pmem(); - initmem_init(); dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);