Message ID | HK2PR03MB168447008C658172FFDA402992840@HK2PR03MB1684.apcprd03.prod.outlook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, May 06, 2018 at 04:17:06PM +0000, Huaisheng HS1 Ye wrote: > Upload my current patch and testing platform info for reference. This patch has been tested > on a two sockets platform. Thank you! > It works, but some drivers or subsystem shall be modified to fit > these new type __GFP flags. > They use these flags directly to realize bit manipulations like this > below. > > eg. > swiotlb-xen.c (drivers\xen): flags &= ~(__GFP_DMA | __GFP_HIGHMEM); > extent_io.c (fs\btrfs): mask &= ~(__GFP_DMA32|__GFP_HIGHMEM); > > Because of these flags have been encoded within this patch, the > above operations can cause problem. I don't think this actually causes problems. At least, no additional problems. These users will successfully clear __GFP_DMA and __GFP_HIGHMEM no matter what values GFP_DMA and GFP_HIGHMEM have; the only problem will be if someone calls them with a zone type they're not expecting (eg DMA32 for the first one or DMA for the second; or MOVABLE for either of them). The thing is, they're already buggy in those circumstances. > */ > -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) > -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) > -#define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32) > +#define __GFP_DMA ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL) > +#define __GFP_HIGHMEM ((__force gfp_t)ZONE_MOVABLE ^ ZONE_NORMAL) > +#define __GFP_DMA32 ((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL) > #define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ [...] > static inline enum zone_type gfp_zone(gfp_t flags) > { > enum zone_type z; > - int bit = (__force int) (flags & GFP_ZONEMASK); > + z = ((__force unsigned int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > - z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) & > - ((1 << GFP_ZONES_SHIFT) - 1); > - VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1); > + if (z > OPT_ZONE_HIGHMEM) { > + z = OPT_ZONE_HIGHMEM + > + !!((__force unsigned int)flags & ___GFP_MOVABLE); > + } > return z; > } How about: +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ ZONE_NORMAL) -#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ +#define __GFP_MOVABLE ((__force gfp_t)ZONE_MOVABLE ^ ZONE_NORMAL | \ + ___GFP_MOVABLE) Then I think you can just make it: static inline enum zone_type gfp_zone(gfp_t flags) { return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; } > @@ -370,42 +368,15 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) > #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer > #endif You should be able to delete GFP_ZONES_SHIFT too.
Dear Matthew, I will try to explain them in depth. Correct me if anything wrong. > > On Sun, May 06, 2018 at 04:17:06PM +0000, Huaisheng HS1 Ye wrote: > > Upload my current patch and testing platform info for reference. This patch > has been tested > > on a two sockets platform. > > Thank you! My pleasure. > > It works, but some drivers or subsystem shall be modified to fit > > these new type __GFP flags. > > They use these flags directly to realize bit manipulations like this > > below. > > > > eg. > > swiotlb-xen.c (drivers\xen): flags &= ~(__GFP_DMA | __GFP_HIGHMEM); > > extent_io.c (fs\btrfs): mask &= ~(__GFP_DMA32|__GFP_HIGHMEM); > > > > Because of these flags have been encoded within this patch, the > > above operations can cause problem. > > I don't think this actually causes problems. At least, no additional > problems. These users will successfully clear __GFP_DMA and > __GFP_HIGHMEM > no matter what values GFP_DMA and GFP_HIGHMEM have; the only problem > will be if someone calls them with a zone type they're not expecting (eg DMA32 > for the first one or DMA for the second; or MOVABLE for either of them). > The thing is, they're already buggy in those circumstances. I hope it couldn't cause problem, but based on my analyzation it has the potential to go wrong if users still use the flags as usual, which are __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM. Let me take an example with my testing platform, these logics are much abstract, an example will be helpful. There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB memories. Its zone types shall be like this below, ZONE_DMA 0 0b0000 ZONE_DMA32 1 0b0001 ZONE_NORMAL 2 0b0010 (OPT_ZONE_HIGHMEM) 2 0b0010 ZONE_MOVABLE 3 0b0011 ZONE_DEVICE 4 0b0100 (virtual zone) __MAX_NR_ZONES 5 __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | ___GFP_MOVABLE = 0b1001 Eg. If a driver uses flags like this below, Step 1: gfp_mask | __GFP_DMA32; (0b 0000 | 0b 0011 = 0b 0011) gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE Step 2: gfp_mask & ~__GFP_DMA; (0b 0011 & ~0b0010 = 0b0001) gfp_mask's low four bits shall equal to 0001 now, then when it enter gfp_zone(), return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; (0b0001 ^ 0b0010 = 0b0011) You know 0011 means that ZONE_MOVABLE will be returned. In this case, error can be found, because gfp_mask needs to get ZONE_DMA32 originally. But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are way of 0x1, 0x2, 0x4, 0x8 I just want to show a case of failure, please don't blame me that use case was invented. Again, your idea is great in my eyes, which has much advantages than ZONE_TABLE/BAD. But if we use this idea, that means other subsystem or driver shall not use the flags as existing way. Of course, this limitation only exists in low 3 bits of gfp_t. The remaining high bits can be used as usual. This is my opinion, maybe it is not accurate, but I really worry about it. > > */ > > -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) > > -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) > > -#define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32) > > +#define __GFP_DMA ((__force gfp_t)OPT_ZONE_DMA ^ > ZONE_NORMAL) > > +#define __GFP_HIGHMEM ((__force gfp_t)ZONE_MOVABLE ^ > ZONE_NORMAL) > > +#define __GFP_DMA32 ((__force gfp_t)OPT_ZONE_DMA32 ^ > ZONE_NORMAL) > > #define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* > ZONE_MOVABLE allowed */ > [...] > > static inline enum zone_type gfp_zone(gfp_t flags) > > { > > enum zone_type z; > > - int bit = (__force int) (flags & GFP_ZONEMASK); > > + z = ((__force unsigned int)flags & ___GFP_ZONE_MASK) ^ > ZONE_NORMAL; > > > > - z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) & > > - ((1 << GFP_ZONES_SHIFT) - 1); > > - VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1); > > + if (z > OPT_ZONE_HIGHMEM) { > > + z = OPT_ZONE_HIGHMEM + > > + !!((__force unsigned int)flags & ___GFP_MOVABLE); > > + } > > return z; > > } > > How about: > > +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ > ZONE_NORMAL) > -#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* > ZONE_MOVABLE allowed */ > +#define __GFP_MOVABLE ((__force gfp_t)ZONE_MOVABLE ^ > ZONE_NORMAL | \ > + ___GFP_MOVABLE) > > Then I think you can just make it: > > static inline enum zone_type gfp_zone(gfp_t flags) > { > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > } Sorry, I think it has risk in this way, let me introduce a failure case for example. Now suppose that, there is a flag should represent DMA flag with movable. It should be like this below, __GFP_DMA | __GFP_MOVABLE (0b 0010 | 0b 1001 = 0b 1011) Normally, gfp_zone shall return ZONE_DMA but with MOVABLE policy, right? But with your code, gfp_zone will return ZONE_DMA32 with MOVABLE policy. (0b 1011 ^ 0b 0010 = 1001) You can find that something wrong happens, so that is why I make gfp_zone more complicated than yours. > > @@ -370,42 +368,15 @@ static inline bool gfpflags_allow_blocking(const > gfp_t gfp_flags) > > #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer > > #endif > > You should be able to delete GFP_ZONES_SHIFT too. Yes, you are right. Sincerely, Huaisheng Ye | εΆζθ Linux kernel | Lenovo
On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote: > I hope it couldn't cause problem, but based on my analyzation it has the potential to go wrong if users still use the flags as usual, which are __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM. > Let me take an example with my testing platform, these logics are much abstract, an example will be helpful. > > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB memories. > Its zone types shall be like this below, > > ZONE_DMA 0 0b0000 > ZONE_DMA32 1 0b0001 > ZONE_NORMAL 2 0b0010 > (OPT_ZONE_HIGHMEM) 2 0b0010 > ZONE_MOVABLE 3 0b0011 > ZONE_DEVICE 4 0b0100 (virtual zone) > __MAX_NR_ZONES 5 > > __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 > __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 > __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | ___GFP_MOVABLE = 0b1001 > > Eg. > If a driver uses flags like this below, > Step 1: > gfp_mask | __GFP_DMA32; > (0b 0000 | 0b 0011 = 0b 0011) > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE > > Step 2: > gfp_mask & ~__GFP_DMA; > (0b 0011 & ~0b0010 = 0b0001) > gfp_mask's low four bits shall equal to 0001 now, then when it enter gfp_zone(), > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > (0b0001 ^ 0b0010 = 0b0011) > You know 0011 means that ZONE_MOVABLE will be returned. > In this case, error can be found, because gfp_mask needs to get ZONE_DMA32 originally. > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are way of 0x1, 0x2, 0x4, 0x8 Yes, I understand your point here. My point was that this was already a bug; the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL. And for that, they should be using ~GFP_ZONEMASK Unless they already know, of course. For example, this one in arch/x86/mm/pgtable.c is fine: if (strcmp(arg, "nohigh") == 0) __userpte_alloc_gfp &= ~__GFP_HIGHMEM; because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set. But something like btrfs should almost certainly be using ~GFP_ZONEMASK. > > +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ > > ZONE_NORMAL) > > -#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* > > ZONE_MOVABLE allowed */ > > +#define __GFP_MOVABLE ((__force gfp_t)ZONE_MOVABLE ^ > > ZONE_NORMAL | \ > > + ___GFP_MOVABLE) > > > > Then I think you can just make it: > > > > static inline enum zone_type gfp_zone(gfp_t flags) > > { > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > } > Sorry, I think it has risk in this way, let me introduce a failure case for example. > > Now suppose that, there is a flag should represent DMA flag with movable. > It should be like this below, > __GFP_DMA | __GFP_MOVABLE > (0b 0010 | 0b 1001 = 0b 1011) > Normally, gfp_zone shall return ZONE_DMA but with MOVABLE policy, right? No, if you somehow end up with __GFP_MOVABLE | __GFP_DMA, it should give you ZONE_DMA. > But with your code, gfp_zone will return ZONE_DMA32 with MOVABLE policy. > (0b 1011 ^ 0b 0010 = 1001) ___GFP_ZONE_MASK is 0x7, so it excludes __GFP_MOVABLE.
On Mon, May 07, 2018 at 11:44:10AM -0700, Matthew Wilcox wrote: > On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote: > > I hope it couldn't cause problem, but based on my analyzation it has the potential to go wrong if users still use the flags as usual, which are __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM. > > Let me take an example with my testing platform, these logics are much abstract, an example will be helpful. > > > > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB memories. > > Its zone types shall be like this below, > > > > ZONE_DMA 0 0b0000 > > ZONE_DMA32 1 0b0001 > > ZONE_NORMAL 2 0b0010 > > (OPT_ZONE_HIGHMEM) 2 0b0010 > > ZONE_MOVABLE 3 0b0011 > > ZONE_DEVICE 4 0b0100 (virtual zone) > > __MAX_NR_ZONES 5 > > > > __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 > > __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 > > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 > > __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | ___GFP_MOVABLE = 0b1001 > > > > Eg. > > If a driver uses flags like this below, > > Step 1: > > gfp_mask | __GFP_DMA32; > > (0b 0000 | 0b 0011 = 0b 0011) > > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE > > > > Step 2: > > gfp_mask & ~__GFP_DMA; > > (0b 0011 & ~0b0010 = 0b0001) > > gfp_mask's low four bits shall equal to 0001 now, then when it enter gfp_zone(), > > > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > (0b0001 ^ 0b0010 = 0b0011) > > You know 0011 means that ZONE_MOVABLE will be returned. > > In this case, error can be found, because gfp_mask needs to get ZONE_DMA32 originally. > > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are way of 0x1, 0x2, 0x4, 0x8 > > Yes, I understand your point here. My point was that this was already a bug; > the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear > all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL. And for > that, they should be using ~GFP_ZONEMASK > > Unless they already know, of course. For example, this one in > arch/x86/mm/pgtable.c is fine: > > if (strcmp(arg, "nohigh") == 0) > __userpte_alloc_gfp &= ~__GFP_HIGHMEM; > > because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set. > > But something like btrfs should almost certainly be using ~GFP_ZONEMASK. Agreed, the direct use of __GFP_DMA32 was added in 3ba7ab220e8918176c6f to substitute GFP_NOFS, so the allocation flags are less restrictive but still acceptable for allocation from slab. The requirement from btrfs is to avoid highmem, the 'must be acceptable for slab' requirement is more MM internal and should have been hidden under some opaque flag mask. There was no strong need for that at the time.
> On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote: > > I hope it couldn't cause problem, but based on my analyzation it has the > potential to go wrong if users still use the flags as usual, which are __GFP_DMA, > __GFP_DMA32 and __GFP_HIGHMEM. > > Let me take an example with my testing platform, these logics are much > abstract, an example will be helpful. > > > > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB > memories. > > Its zone types shall be like this below, > > > > ZONE_DMA 0 0b0000 > > ZONE_DMA32 1 0b0001 > > ZONE_NORMAL 2 0b0010 > > (OPT_ZONE_HIGHMEM) 2 0b0010 > > ZONE_MOVABLE 3 0b0011 > > ZONE_DEVICE 4 0b0100 (virtual zone) > > __MAX_NR_ZONES 5 > > > > __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 > > __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 > > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 > > __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | > ___GFP_MOVABLE = 0b1001 > > > > Eg. > > If a driver uses flags like this below, > > Step 1: > > gfp_mask | __GFP_DMA32; > > (0b 0000 | 0b 0011 = 0b 0011) > > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE > > > > Step 2: > > gfp_mask & ~__GFP_DMA; > > (0b 0011 & ~0b0010 = 0b0001) > > gfp_mask's low four bits shall equal to 0001 now, then when it enter > gfp_zone(), > > > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > (0b0001 ^ 0b0010 = 0b0011) > > You know 0011 means that ZONE_MOVABLE will be returned. > > In this case, error can be found, because gfp_mask needs to get > ZONE_DMA32 originally. > > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are > way of 0x1, 0x2, 0x4, 0x8 > > Yes, I understand your point here. My point was that this was already a bug; > the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear > all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL. And for > that, they should be using ~GFP_ZONEMASK That is great, if they can follow this principle, I don't worry it. Maybe I am too cautious. > > Unless they already know, of course. For example, this one in > arch/x86/mm/pgtable.c is fine: > > if (strcmp(arg, "nohigh") == 0) > __userpte_alloc_gfp &= ~__GFP_HIGHMEM; > > because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set. > > But something like btrfs should almost certainly be using ~GFP_ZONEMASK. > > > +#define __GFP_HIGHMEM ((__force gfp_t)OPT_ZONE_HIGHMEM ^ > > > ZONE_NORMAL) > > > -#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* > > > ZONE_MOVABLE allowed */ > > > +#define __GFP_MOVABLE ((__force gfp_t)ZONE_MOVABLE ^ > > > ZONE_NORMAL | \ > > > + ___GFP_MOVABLE) > > > > > > Then I think you can just make it: > > > > > > static inline enum zone_type gfp_zone(gfp_t flags) > > > { > > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > > } > > Sorry, I think it has risk in this way, let me introduce a failure case for > example. > > > > Now suppose that, there is a flag should represent DMA flag with movable. > > It should be like this below, > > __GFP_DMA | __GFP_MOVABLE > > (0b 0010 | 0b 1001 = 0b 1011) > > Normally, gfp_zone shall return ZONE_DMA but with MOVABLE policy, right? > > No, if you somehow end up with __GFP_MOVABLE | __GFP_DMA, it should give > you ZONE_DMA. Exactly, it should return ZONE_DMA, that's what I thought. > > > But with your code, gfp_zone will return ZONE_DMA32 with MOVABLE > >policy. > > (0b 1011 ^ 0b 0010 = 1001) > > ___GFP_ZONE_MASK is 0x7, so it excludes __GFP_MOVABLE. Sorry, I made a mistake here. I rewrite it as below. ((__GFP_DMA | __GFP_MOVABLE) & ___GFP_ZONE_MASK) ((0b 0010 | 0b 1001 = 0b 1011) & 0b 0111) = 0b 0011 0b 0011 ^ 0b 0010 = 0b 0001 So ZONE_DMA32 will be returned, but what user needs is ZONE_DMA. Thanks, Huaisheng
> On Mon, May 07, 2018 at 11:44:10AM -0700, Matthew Wilcox wrote: > > On Mon, May 07, 2018 at 05:16:50PM +0000, Huaisheng HS1 Ye wrote: > > > I hope it couldn't cause problem, but based on my analyzation it has the potential > to go wrong if users still use the flags as usual, which are __GFP_DMA, __GFP_DMA32 > and __GFP_HIGHMEM. > > > Let me take an example with my testing platform, these logics are much abstract, > an example will be helpful. > > > > > > There is a two sockets X86_64 server, No HIGHMEM and it has 16 + 16GB memories. > > > Its zone types shall be like this below, > > > > > > ZONE_DMA 0 0b0000 > > > ZONE_DMA32 1 0b0001 > > > ZONE_NORMAL 2 0b0010 > > > (OPT_ZONE_HIGHMEM) 2 0b0010 > > > ZONE_MOVABLE 3 0b0011 > > > ZONE_DEVICE 4 0b0100 (virtual zone) > > > __MAX_NR_ZONES 5 > > > > > > __GFP_DMA = ZONE_DMA ^ ZONE_NORMAL= 0b0010 > > > __GFP_DMA32 = ZONE_DMA32 ^ ZONE_NORMAL= 0b0011 > > > __GFP_HIGHMEM = OPT_ZONE_HIGHMEM ^ ZONE_NORMAL = 0b0000 > > > __GFP_MOVABLE = ZONE_MOVABLE ^ ZONE_NORMAL | ___GFP_MOVABLE = 0b1001 > > > > > > Eg. > > > If a driver uses flags like this below, > > > Step 1: > > > gfp_mask | __GFP_DMA32; > > > (0b 0000 | 0b 0011 = 0b 0011) > > > gfp_mask's low four bits shall equal to 0011, assuming no __GFP_MOVABLE > > > > > > Step 2: > > > gfp_mask & ~__GFP_DMA; > > > (0b 0011 & ~0b0010 = 0b0001) > > > gfp_mask's low four bits shall equal to 0001 now, then when it enter gfp_zone(), > > > > > > return ((__force int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; > > > (0b0001 ^ 0b0010 = 0b0011) > > > You know 0011 means that ZONE_MOVABLE will be returned. > > > In this case, error can be found, because gfp_mask needs to get ZONE_DMA32 originally. > > > But with existing GFP_ZONE_TABLE/BAD, it is correct. Because the bits are way of > 0x1, 0x2, 0x4, 0x8 > > > > Yes, I understand your point here. My point was that this was already a bug; > > the caller shouldn't simply be clearing __GFP_DMA; they really mean to clear > > all of the GFP_ZONE bits so that they allocate from ZONE_NORMAL. And for > > that, they should be using ~GFP_ZONEMASK > > > > Unless they already know, of course. For example, this one in > > arch/x86/mm/pgtable.c is fine: > > > > if (strcmp(arg, "nohigh") == 0) > > __userpte_alloc_gfp &= ~__GFP_HIGHMEM; > > > > because it knows that __userpte_alloc_gfp can only have __GFP_HIGHMEM set. > > > > But something like btrfs should almost certainly be using ~GFP_ZONEMASK. > > Agreed, the direct use of __GFP_DMA32 was added in 3ba7ab220e8918176c6f > to substitute GFP_NOFS, so the allocation flags are less restrictive but > still acceptable for allocation from slab. > > The requirement from btrfs is to avoid highmem, the 'must be acceptable > for slab' requirement is more MM internal and should have been hidden > under some opaque flag mask. There was no strong need for that at the > time. Hi Matthew, Should we add an error detection in gfp_zone? How about this? @@ -377,6 +377,8 @@ static inline enum zone_type gfp_zone(gfp_t flags) z = OPT_ZONE_HIGHMEM + !!((__force unsigned int)flags & ___GFP_MOVABLE); } + + VM_BUG_ON(z > ZONE_MOVABLE); return z; } Sincerely, Huaisheng Ye
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 1a4582b..1647385 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -16,9 +16,7 @@ */ /* Plain integer GFP bitmasks. Do not use this directly. */ -#define ___GFP_DMA 0x01u -#define ___GFP_HIGHMEM 0x02u -#define ___GFP_DMA32 0x04u +#define ___GFP_ZONE_MASK 0x07u #define ___GFP_MOVABLE 0x08u #define ___GFP_RECLAIMABLE 0x10u #define ___GFP_HIGH 0x20u @@ -53,11 +51,11 @@ * without the underscores and use them consistently. The definitions here may * be used in bit comparisons. */ -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) -#define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32) +#define __GFP_DMA ((__force gfp_t)OPT_ZONE_DMA ^ ZONE_NORMAL) +#define __GFP_HIGHMEM ((__force gfp_t)ZONE_MOVABLE ^ ZONE_NORMAL) +#define __GFP_DMA32 ((__force gfp_t)OPT_ZONE_DMA32 ^ ZONE_NORMAL) #define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) +#define GFP_ZONEMASK ((__force gfp_t)___GFP_ZONE_MASK | ___GFP_MOVABLE) /* * Page mobility and placement hints @@ -370,42 +368,15 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) #error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer #endif -#define GFP_ZONE_TABLE ( \ - (ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \ - | (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \ - | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \ - | (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \ - | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \ - | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\ - | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\ -) - -/* - * GFP_ZONE_BAD is a bitmap for all combinations of __GFP_DMA, __GFP_DMA32 - * __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per - * entry starting with bit 0. Bit is set if the combination is not - * allowed. - */ -#define GFP_ZONE_BAD ( \ - 1 << (___GFP_DMA | ___GFP_HIGHMEM) \ - | 1 << (___GFP_DMA | ___GFP_DMA32) \ - | 1 << (___GFP_DMA32 | ___GFP_HIGHMEM) \ - | 1 << (___GFP_DMA | ___GFP_DMA32 | ___GFP_HIGHMEM) \ - | 1 << (___GFP_MOVABLE | ___GFP_HIGHMEM | ___GFP_DMA) \ - | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA) \ - | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_HIGHMEM) \ - | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA | ___GFP_HIGHMEM) \ -) - static inline enum zone_type gfp_zone(gfp_t flags) { enum zone_type z; - int bit = (__force int) (flags & GFP_ZONEMASK); + z = ((__force unsigned int)flags & ___GFP_ZONE_MASK) ^ ZONE_NORMAL; - z = (GFP_ZONE_TABLE >> (bit * GFP_ZONES_SHIFT)) & - ((1 << GFP_ZONES_SHIFT) - 1); - VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1); + if (z > OPT_ZONE_HIGHMEM) { + z = OPT_ZONE_HIGHMEM + + !!((__force unsigned int)flags & ___GFP_MOVABLE); + } return z; }