diff mbox

md/dm-ioctl.c: optimize memory allocation in copy_params

Message ID 53B68068.2060102@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

xinhui.pan July 4, 2014, 10:22 a.m. UTC
KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel->data_size
is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE << 2).
There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and
many worse issues. We don't benefit much when using kmalloc in such scenario.

Signed-off-by: xinhui.pan <xinhuiX.pan@intel.com>
Signed-off-by: yanmin.zhang <yanmin_zhang@linux.intel.com>
---
 drivers/md/dm-ioctl.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mikulas Patocka July 8, 2014, 10:39 p.m. UTC | #1
Hi

I don't really know what is the purpose of this patch. In existing device 
mapper code, if kmalloc fails, the allocation is retried with __vmalloc. 
So there is no need to avoid kmalloc aritifically.

kmalloc doesn't cause memory fragmentation. If the memory is too 
fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause 
memory being fragmented.

If you have some other driver that fails because of large kmalloc failure, 
you should fix it to use scatter-gather DMA or (if the hardware can't do 
it) preallocate the memory when the driver is loaded.

Mikulas



On Fri, 4 Jul 2014, xinhui.pan wrote:

> KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel->data_size
> is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE << 2).
> There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and
> many worse issues. We don't benefit much when using kmalloc in such scenario.
> 
> Signed-off-by: xinhui.pan <xinhuiX.pan@intel.com>
> Signed-off-by: yanmin.zhang <yanmin_zhang@linux.intel.com>
> ---
>  drivers/md/dm-ioctl.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index 5152142..31c3af9 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
>          * Use kmalloc() rather than vmalloc() when we can.
>          */
>         dmi = NULL;
> -       if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
> +       if (param_kernel->data_size <= (PAGE_SIZE << 2)) {
>                 dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
>                 if (dmi)
>                         *param_flags |= DM_PARAMS_KMALLOC;
> -- 
> 1.7.9.5
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Yanmin Zhang July 9, 2014, 2:01 a.m. UTC | #2
On 2014/7/9 6:39, Mikulas Patocka wrote:

> Hi

Mikulas,

Thanks for your kind comments.

> I don't really know what is the purpose of this patch. In existing device
> mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
> So there is no need to avoid kmalloc aritifically.
>
> kmalloc doesn't cause memory fragmentation. If the memory is too
> fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
> memory being fragmented.

I agree with you. The patch's original description is not appropriate.
Basically, memory fragmentation is not caused by this kmalloc.

The right description is: When memory is fragmented and most memory is used up,
kmalloc a big memory might cause lots of OutOFMemory and system might kill
lots of processes. Then, system might hang.

We are enabling Android on mobiles and tablets. We hit OOM often at
Monkey and other stress testing. Sometimes, kernel prints big memory allocation
warning.

Theoretically, it's a good idea to call kmalloc firstly. If it fails, use vmalloc.
However, kernel assumes drivers do the right thing. When drivers call kmalloc to
allocate a big memory, memory management would do the best to fulfill it. When memory
is fragmented and most memory is used up, such allocation would cause big memory
recollection. Some processes might be killed. The worse scenario is after a process
is killed, it is restarted and allocates memory again. That causes system hang,
or mobile users feel a big stall. We did fix a similar issue in one of our drivers.

Usually, kernel and drivers can allocate big continuous memory at booting or
initialization stage. After that, they need allocate small order memory. The best
is to just allocate order-0 memory.


>
> If you have some other driver that fails because of large kmalloc failure,
> you should fix it to use scatter-gather DMA or (if the hardware can't do
> it) preallocate the memory when the driver is loaded.

I agree with you. Here the patch fixes the issue, where dm might allocate
big continuous memory after booting. Monkey testing triggers it by operating
menu Setting=>Security=>InstallfromSDcard.

We are catching all places in kernel where big memory allocation happens.
This patch is just to fix one case.

Thanks,
Yanmin

>
> Mikulas
>
>
>
> On Fri, 4 Jul 2014, xinhui.pan wrote:
>
>> KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel->data_size
>> is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE << 2).
>> There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and
>> many worse issues. We don't benefit much when using kmalloc in such scenario.
>>
>> Signed-off-by: xinhui.pan <xinhuiX.pan@intel.com>
>> Signed-off-by: yanmin.zhang <yanmin_zhang@linux.intel.com>
>> ---
>>   drivers/md/dm-ioctl.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
>> index 5152142..31c3af9 100644
>> --- a/drivers/md/dm-ioctl.c
>> +++ b/drivers/md/dm-ioctl.c
>> @@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
>>           * Use kmalloc() rather than vmalloc() when we can.
>>           */
>>          dmi = NULL;
>> -       if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
>> +       if (param_kernel->data_size <= (PAGE_SIZE << 2)) {
>>                  dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>                  if (dmi)
>>                          *param_flags |= DM_PARAMS_KMALLOC;
>> -- 
>> 1.7.9.5
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
xinhui.pan July 9, 2014, 3:37 a.m. UTC | #3
? 2014?07?09? 10:01, Zhang, Yanmin ??:
> On 2014/7/9 6:39, Mikulas Patocka wrote:
> 
>> Hi
> 
> Mikulas,
> 
> Thanks for your kind comments.
> 
>> I don't really know what is the purpose of this patch. In existing device
>> mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
>> So there is no need to avoid kmalloc aritifically.
>>
>> kmalloc doesn't cause memory fragmentation. If the memory is too
>> fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
>> memory being fragmented.
> 
> I agree with you. The patch's original description is not appropriate.
> Basically, memory fragmentation is not caused by this kmalloc.
> 
> The right description is: When memory is fragmented and most memory is used up,
> kmalloc a big memory might cause lots of OutOFMemory and system might kill
> lots of processes. Then, system might hang.
> 
> We are enabling Android on mobiles and tablets. We hit OOM often at
> Monkey and other stress testing. Sometimes, kernel prints big memory allocation
> warning.
> 
> Theoretically, it's a good idea to call kmalloc firstly. If it fails, use vmalloc.
> However, kernel assumes drivers do the right thing. When drivers call kmalloc to
> allocate a big memory, memory management would do the best to fulfill it. When memory
> is fragmented and most memory is used up, such allocation would cause big memory
> recollection. Some processes might be killed. The worse scenario is after a process
> is killed, it is restarted and allocates memory again. That causes system hang,
> or mobile users feel a big stall. We did fix a similar issue in one of our drivers.
> 
> Usually, kernel and drivers can allocate big continuous memory at booting or
> initialization stage. After that, they need allocate small order memory. The best
> is to just allocate order-0 memory.
> 
> 
>>
>> If you have some other driver that fails because of large kmalloc failure,
>> you should fix it to use scatter-gather DMA or (if the hardware can't do
>> it) preallocate the memory when the driver is loaded.
> 
> I agree with you. Here the patch fixes the issue, where dm might allocate
> big continuous memory after booting. Monkey testing triggers it by operating
> menu Setting=>Security=>InstallfromSDcard.
> 

hi, Yanmin && Mikulas
	Thanks for your nice comments. And sorry for confusing you as my comment is not clear enough.
In android, there is a command "dumpstate", it run many other commands to collect information. In our scenario,
it run command "vdc dump", and vdc uses socket to pass some parameters to "vold", then vold generates ioctl. 

thanks.
> We are catching all places in kernel where big memory allocation happens.
> This patch is just to fix one case.
> 
> Thanks,
> Yanmin
> 
>>
>> Mikulas
>>
>>
>>
>> On Fri, 4 Jul 2014, xinhui.pan wrote:
>>
>>> KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel->data_size
>>> is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE << 2).
>>> There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and
>>> many worse issues. We don't benefit much when using kmalloc in such scenario.
>>>
>>> Signed-off-by: xinhui.pan <xinhuiX.pan@intel.com>
>>> Signed-off-by: yanmin.zhang <yanmin_zhang@linux.intel.com>
>>> ---
>>>   drivers/md/dm-ioctl.c |    2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
>>> index 5152142..31c3af9 100644
>>> --- a/drivers/md/dm-ioctl.c
>>> +++ b/drivers/md/dm-ioctl.c
>>> @@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
>>>           * Use kmalloc() rather than vmalloc() when we can.
>>>           */
>>>          dmi = NULL;
>>> -       if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
>>> +       if (param_kernel->data_size <= (PAGE_SIZE << 2)) {
>>>                  dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>>                  if (dmi)
>>>                          *param_flags |= DM_PARAMS_KMALLOC;
>>> -- 
>>> 1.7.9.5
>>>
>>> -- 
>>> dm-devel mailing list
>>> dm-devel@redhat.com
>>> https://www.redhat.com/mailman/listinfo/dm-devel
>>>
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 9, 2014, 2:53 p.m. UTC | #4
On Wed, 9 Jul 2014, Zhang, Yanmin wrote:

> On 2014/7/9 6:39, Mikulas Patocka wrote:
> 
> > Hi
> 
> Mikulas,
> 
> Thanks for your kind comments.
> 
> > I don't really know what is the purpose of this patch. In existing device
> > mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
> > So there is no need to avoid kmalloc aritifically.
> > 
> > kmalloc doesn't cause memory fragmentation. If the memory is too
> > fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
> > memory being fragmented.
> 
> I agree with you. The patch's original description is not appropriate.
> Basically, memory fragmentation is not caused by this kmalloc.
> 
> The right description is: When memory is fragmented and most memory is used
> up,
> kmalloc a big memory might cause lots of OutOFMemory and system might kill
> lots of processes. Then, system might hang.

The question is - does this particular kmalloc in device mapper cause out 
of memory or killing of other tasks? It has flags __GFP_NORETRY, 
__GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It 
should just fail silently if memory is fragmented.

Do you have some stacktrace that identifies this kmalloc as a problem?

Do this test - prepare two kernels that are identical, except that one 
kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do 
exactly the same operation after boot. Does the kernel with the patch 
always behave correctly and does the kernel without the patch always fail? 
Report the result - how many failures did you get with or without that 
one-line patch. Without such a test - I just don't believe that your patch 
makes any difference.

Another question - your patch only makes change if some device mapper 
ioctl has more than 16kB arugments. Which ioctl with more than 16kB 
arguments do you use? Do you load such a big table to device mapper? How 
often do you call that ioctl with such big arguments?

Mikulas


> We are enabling Android on mobiles and tablets. We hit OOM often at 
> Monkey and other stress testing. Sometimes, kernel prints big memory 
> allocation warning.
> 
> Theoretically, it's a good idea to call kmalloc firstly. If it fails, 
> use vmalloc. However, kernel assumes drivers do the right thing. When 
> drivers call kmalloc to allocate a big memory, memory management would 
> do the best to fulfill it. When memory is fragmented and most memory is 
> used up, such allocation would cause big memory recollection. Some 
> processes might be killed. The worse scenario is after a process is 
> killed, it is restarted and allocates memory again. That causes system 
> hang, or mobile users feel a big stall. We did fix a similar issue in 
> one of our drivers.
> 
> Usually, kernel and drivers can allocate big continuous memory at 
> booting or initialization stage. After that, they need allocate small 
> order memory. The best is to just allocate order-0 memory.
> 
> 
> > 
> > If you have some other driver that fails because of large kmalloc failure,
> > you should fix it to use scatter-gather DMA or (if the hardware can't do
> > it) preallocate the memory when the driver is loaded.
> 
> I agree with you. Here the patch fixes the issue, where dm might allocate
> big continuous memory after booting. Monkey testing triggers it by operating
> menu Setting=>Security=>InstallfromSDcard.
> 
> We are catching all places in kernel where big memory allocation happens.
> This patch is just to fix one case.
> 
> Thanks,
> Yanmin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Yanmin Zhang July 22, 2014, 1:02 a.m. UTC | #5
On 2014/7/9 22:53, Mikulas Patocka wrote:
>
> On Wed, 9 Jul 2014, Zhang, Yanmin wrote:
>
>> On 2014/7/9 6:39, Mikulas Patocka wrote:
>>
>>> Hi
>> Mikulas,
>>
>> Thanks for your kind comments.
>>
>>> I don't really know what is the purpose of this patch. In existing device
>>> mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
>>> So there is no need to avoid kmalloc aritifically.
>>>
>>> kmalloc doesn't cause memory fragmentation. If the memory is too
>>> fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
>>> memory being fragmented.
>> I agree with you. The patch's original description is not appropriate.
>> Basically, memory fragmentation is not caused by this kmalloc.
>>
>> The right description is: When memory is fragmented and most memory is used
>> up,
>> kmalloc a big memory might cause lots of OutOFMemory and system might kill
>> lots of processes. Then, system might hang.

Sorry for replying you too late. I am very busy in some other critical issues.

> The question is - does this particular kmalloc in device mapper cause out
> of memory or killing of other tasks? It has flags __GFP_NORETRY,

When memory is fragmented, drivers need allocate small pages instead of big
memory. Even with __GFP_NORETRY, driver might get a big memory by luck. That means
other drivers would get fewer chances to fulfill their memory requests, such like
allocating 2 pages for task_struct. Later on, OOM might happen.

> __GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It
> should just fail silently if memory is fragmented.

It's hard to say this call causes out of memory. There are many such places
in kernel to allocate big continuous memory. One is in seq_read, where we
created a patch to use vmalloc instead of kmalloc to fix it, but got far
worse comments as it's very old code. Another is in our own gfx driver.
We want to fix all. We can't blame the OOM to just one place.

Monkey testing is popular for Android development. We run the testing frequently.
It might start lots of applications. Eventually, it is a comprehensive testing.

>
> Do you have some stacktrace that identifies this kmalloc as a problem?

Sometimes, when OOM happens, kernel log shows some backtrace of big continuous
memory allocation failure. Sometimes, when board can't respond and watchdog might
reset the board after saving thread callchain into disk.

>
> Do this test - prepare two kernels that are identical, except that one
> kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do
> exactly the same operation after boot. Does the kernel with the patch
> always behave correctly and does the kernel without the patch always fail?

No. Instead of just one, many places have impact on the OOM issue.

> Report the result - how many failures did you get with or without that
> one-line patch. Without such a test - I just don't believe that your patch
> makes any difference.
>
> Another question - your patch only makes change if some device mapper
> ioctl has more than 16kB arugments. Which ioctl with more than 16kB
> arguments do you use? Do you load such a big table to device mapper? How
> often do you call that ioctl with such big arguments?

Xinhui's email mentions the ioctl details. In android, there is a command
"dumpstate", it run many other commands to collect information. In our
scenario, it run command "vdc dump", and vdc uses socket to pass some
parameters to "vold", then vold generates ioctl.

Thanks for your patience.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon July 22, 2014, 1:23 a.m. UTC | #6
>>> On 2014/7/9 6:39, Mikulas Patocka wrote:
>> Which ioctl with more than 16kB
>> arguments do you use? 

Unanswered.  Let's ask the same question in a different way:

Please supply the output of these three commands on the real-world system on
which you believe that that particular allocation is causing you a problem:
  dmsetup info -c
  dmsetup table
  dmsetup status

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon July 22, 2014, 2:04 a.m. UTC | #7
On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote:
> Unanswered.  Let's ask the same question in a different way:
 
A quick search for 'vold' returns:
  https://android.googlesource.com/platform/system/vold/
and the code there requests a fixed 64K allocation to hold the names of
active volumes.

So unlike libdevmapper-based applications where a smaller allocation is
used at first and only extended if needed, Android just assumes that
64KB is enough for everyone.

So is your claim that:

1. This 64KB allocation for the brief duration of the ioctl to store the
names of active device-mapper volumes leads to memory problems?
[Mustn't the system *already* be in a bad state if this pushes it over
the limit?]

and

2. The systems on which this memory shortage occurs have so many volumes
(with long names?) that a smaller allocation would not suffice?

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Yanmin Zhang July 23, 2014, 3:06 a.m. UTC | #8
On 2014/7/22 9:23, Alasdair G Kergon wrote:
>>>> On 2014/7/9 6:39, Mikulas Patocka wrote:
>>> Which ioctl with more than 16kB
>>> arguments do you use?
> Unanswered.  Let's ask the same question in a different way:
>
> Please supply the output of these three commands on the real-world system on
> which you believe that that particular allocation is causing you a problem:
>    dmsetup info -c
>    dmsetup table
>    dmsetup status

Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy
dmsetup to the board. But command reports No devices.

# /data/bin/dmsetup info
No devices found

# cat /proc/devices
Character devices:
   1 mem
   4 /dev/vc/0
   4 tty
   4 ttyS
   4 ttyMFD
   5 /dev/tty
   5 /dev/console
   5 /dev/ptmx
   7 vcs
  10 misc
  13 input
  21 sg
  29 fb
  81 video4linux
  89 i2c
108 ppp
116 alsa
128 ptm
136 pts
153 spi
166 ttyACM
180 usb
188 ttyUSB
189 usb_device
202 cpu/msr
203 cpu/cpuid
226 drm
241 mdm_ctrl
242 sep54
243 roccat
244 hidraw
245 ttyGS
246 usbmon
247 ttyPTI
248 gsmtty
249 bsg
250 iio
251 ptp
252 pps
253 media
254 rtc

Block devices:
   1 ramdisk
259 blkext
   7 loop
   8 sd
   9 md
  11 sr
  65 sd
  66 sd
  67 sd
  68 sd
  69 sd
  70 sd
  71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc
253 device-mapper
254 mdp

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Yanmin Zhang July 23, 2014, 3:15 a.m. UTC | #9
On 2014/7/22 10:04, Alasdair G Kergon wrote:
> On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote:
>> Unanswered.  Let's ask the same question in a different way:
>   
> A quick search for 'vold' returns:
>    https://android.googlesource.com/platform/system/vold/
> and the code there requests a fixed 64K allocation to hold the names of
> active volumes.
>
> So unlike libdevmapper-based applications where a smaller allocation is
> used at first and only extended if needed, Android just assumes that
> 64KB is enough for everyone.
>
> So is your claim that:
>
> 1. This 64KB allocation for the brief duration of the ioctl to store the
> names of active device-mapper volumes leads to memory problems?
> [Mustn't the system *already* be in a bad state if this pushes it over
> the limit?]

It's a good question.
1) Usually, Android mobile runs for a long time. It's very command that
users don't turn off the phones for months. Users might start lots of
applications. memory is used up in the end. Kernel might recollect memory
over and over again.
2) We never blames this Out of memory issue fully to DM.
3) We want to improve the OOM issue.

>
> and
>
> 2. The systems on which this memory shortage occurs have so many volumes
> (with long names?) that a smaller allocation would not suffice?

64K is small, comparing with 2GB, even 4GB total memory. However, this 64K
by kmalloc has a strict criteria. It might be continuous physical memory
and align with 64K. If memory is used up, freed memory is very fragmented.
Such 64K criteria is a hard request.
Usually, driver can allocate such memory at booting initialization. After
that, it should allocate many sparse pages instead of continuous memory.
Here 64K allocation is after booting.

Thanks for the kind comments.

Yanmin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 23, 2014, 12:16 p.m. UTC | #10
On Wed, 23 Jul 2014, Zhang, Yanmin wrote:

> On 2014/7/22 9:23, Alasdair G Kergon wrote:
> > > > > On 2014/7/9 6:39, Mikulas Patocka wrote:
> > > > Which ioctl with more than 16kB
> > > > arguments do you use?
> > Unanswered.  Let's ask the same question in a different way:
> > 
> > Please supply the output of these three commands on the real-world system on
> > which you believe that that particular allocation is causing you a problem:
> >    dmsetup info -c
> >    dmsetup table
> >    dmsetup status
> 
> Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy
> dmsetup to the board. But command reports No devices.
> 
> # /data/bin/dmsetup info
> No devices found

So, it means that you do not use device mapper at all. So, why are you 
trying to change memory allocation in device mapper?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 23, 2014, 12:39 p.m. UTC | #11
On Tue, 22 Jul 2014, Zhang, Yanmin wrote:

> Sorry for replying you too late. I am very busy in some other critical issues.
> 
> > The question is - does this particular kmalloc in device mapper cause out
> > of memory or killing of other tasks? It has flags __GFP_NORETRY,
> 
> When memory is fragmented, drivers need allocate small pages instead of 
> big memory. Even with __GFP_NORETRY, driver might get a big memory by 
> luck. That means other drivers would get fewer chances to fulfill their 
> memory requests, such like allocating 2 pages for task_struct. Later on, 
> OOM might happen.

You claim that that big kmalloc causes memory fragmentation. But memory 
can get fragmented even if no big kmalloc is used. For example, this 
program will cause memory fragmentation despite the fact that it never 
does any multi-page allocation:
int main(void)
{
        int i;
        char *array[65536];
        for (i = 0; i < 65536; i++) {
                array[i] = mmap(NULL, 4096, PROT_READ | PROT_WRITE, 
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
                if (array[i] == MAP_FAILED) perror("mmap"), exit(1);
                array[i][0] = 3;
        }
        for (i = 0; i < 65536; i += 2) {
                if (munmap(array[i], 4096)) perror("munmap"), exit(1);
        }
        pause();
        return 0;
}

If you have problems with memory fragmentation - find the piece of code 
that is failing because of fragmented memory and fix it.

To fix it:
* if it is DMA memory for device
  * use continuous memory allocator, or
  * preallocate the chunk of memory when the driver is loaded and never 
    free it
* if it is general memory allocation with kmalloc
  * change the algorithm, so that it doesn't require big allocation
  * use the same trick as device mapper - try kmalloc and if it fails, try 
    vmalloc.

> > __GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It
> > should just fail silently if memory is fragmented.
> 
> It's hard to say this call causes out of memory. There are many such places
> in kernel to allocate big continuous memory. One is in seq_read, where we
> created a patch to use vmalloc instead of kmalloc to fix it, but got far
> worse comments as it's very old code. Another is in our own gfx driver.
> We want to fix all. We can't blame the OOM to just one place.

vmalloc is much slower than kmalloc, so you can't just go over the Linux 
source and change every large kmalloc to vmalloc.

You can change seq_read to use the same trick as device mapper (use 
kmalloc and if it fails, fall back to vmalloc).

> Monkey testing is popular for Android development. We run the testing 
> frequently. It might start lots of applications. Eventually, it is a 
> comprehensive testing.

I ask again - do you have some statistically significant test results that 
show that your patch makes any difference to stability? I suppose, no...

> > Do you have some stacktrace that identifies this kmalloc as a problem?
> 
> Sometimes, when OOM happens, kernel log shows some backtrace of big 
> continuous memory allocation failure. Sometimes, when board can't 
> respond and watchdog might reset the board after saving thread callchain 
> into disk.

Find places, where the OOM happens (those, that you see on your 
stacktrace) and fix them.

> > Do this test - prepare two kernels that are identical, except that one
> > kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do
> > exactly the same operation after boot. Does the kernel with the patch
> > always behave correctly and does the kernel without the patch always fail?
> 
> No. Instead of just one, many places have impact on the OOM issue.

I repeat again - find the piece of code that is failing because of 
fragmented memory and fix it.

Device mapper isn't failing (because it falls back to vmalloc), so leave 
it alone.

> > Report the result - how many failures did you get with or without that
> > one-line patch. Without such a test - I just don't believe that your patch
> > makes any difference.
> > 
> > Another question - your patch only makes change if some device mapper
> > ioctl has more than 16kB arugments. Which ioctl with more than 16kB
> > arguments do you use? Do you load such a big table to device mapper? How
> > often do you call that ioctl with such big arguments?
> 
> Xinhui's email mentions the ioctl details. In android, there is a command
> "dumpstate", it run many other commands to collect information. In our
> scenario, it run command "vdc dump", and vdc uses socket to pass some
> parameters to "vold", then vold generates ioctl.
> 
> Thanks for your patience.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Alasdair G Kergon July 23, 2014, 12:54 p.m. UTC | #12
On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:
> So, it means that you do not use device mapper at all. So, why are you 
> trying to change memory allocation in device mapper?
 
So the *test* they run is asking device-mapper to briefly reserve a 64KB
buffer when there is no data to report:  The answer is not to run that
pointless test:)

And if a single 64KB allocation really is a big deal, then patch 'vold'
in userspace so it doesn't ask for 64KB when it clearly doesn't need it!

+ int Devmapper::dumpState(SocketClient *c) {
+    char *buffer = (char *) malloc(1024 * 64);

The code has just
#define DEVMAPPER_BUFFER_SIZE 4096
for all the other dm ioctls it issues.

Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.

Alasdair

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka July 23, 2014, 5:14 p.m. UTC | #13
On Wed, 23 Jul 2014, Alasdair G Kergon wrote:

> On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:
> > So, it means that you do not use device mapper at all. So, why are you 
> > trying to change memory allocation in device mapper?
>  
> So the *test* they run is asking device-mapper to briefly reserve a 64KB
> buffer when there is no data to report:  The answer is not to run that
> pointless test:)
> 
> And if a single 64KB allocation really is a big deal, then patch 'vold'
> in userspace so it doesn't ask for 64KB when it clearly doesn't need it!
> 
> + int Devmapper::dumpState(SocketClient *c) {
> +    char *buffer = (char *) malloc(1024 * 64);
> 
> The code has just
> #define DEVMAPPER_BUFFER_SIZE 4096
> for all the other dm ioctls it issues.
> 
> Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.
> 
> Alasdair

Device mapper shouldn't depend on allocation on any contiguous memory - it 
will fall back to vmalloc. I still can't believe that their suggested 
patch makes any difference.

This pattern is being repeated over and over in the kernel - for example:

        if (PIDLIST_TOO_LARGE(count))
                return vmalloc(count * sizeof(pid_t));
        else
                return kmalloc(count * sizeof(pid_t), GFP_KERNEL);


        if (is_vmalloc_addr(p))
                vfree(p);
        else
                kfree(p);

- I think we should make two functions that do this operation (for example 
kvalloc and kvfree) and convert device mapper and other users to these 
functions. Then, other kernel subsystems can start to use them to fix 
memory fragmentation issues.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Yanmin Zhang July 24, 2014, 5:53 a.m. UTC | #14
On 2014/7/24 1:14, Mikulas Patocka wrote:
>
> On Wed, 23 Jul 2014, Alasdair G Kergon wrote:
>
>> On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:
>>> So, it means that you do not use device mapper at all. So, why are you
>>> trying to change memory allocation in device mapper?
>>   
>> So the *test* they run is asking device-mapper to briefly reserve a 64KB
>> buffer when there is no data to report:  The answer is not to run that
>> pointless test:)
>>
>> And if a single 64KB allocation really is a big deal, then patch 'vold'
>> in userspace so it doesn't ask for 64KB when it clearly doesn't need it!
>>
>> + int Devmapper::dumpState(SocketClient *c) {
>> +    char *buffer = (char *) malloc(1024 * 64);
>>
>> The code has just
>> #define DEVMAPPER_BUFFER_SIZE 4096
>> for all the other dm ioctls it issues.
>>
>> Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.
>>
>> Alasdair
> Device mapper shouldn't depend on allocation on any contiguous memory - it
> will fall back to vmalloc. I still can't believe that their suggested
> patch makes any difference.
>
> This pattern is being repeated over and over in the kernel - for example:
>
>          if (PIDLIST_TOO_LARGE(count))
>                  return vmalloc(count * sizeof(pid_t));
>          else
>                  return kmalloc(count * sizeof(pid_t), GFP_KERNEL);
>
>
>          if (is_vmalloc_addr(p))
>                  vfree(p);
>          else
>                  kfree(p);
>
> - I think we should make two functions that do this operation (for example
> kvalloc and kvfree) and convert device mapper and other users to these
> functions. Then, other kernel subsystems can start to use them to fix
> memory fragmentation issues.

Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :)
It's hard to balance between performance and stability.

Anyway, we would try to change seq_read.

Yanmin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5152142..31c3af9 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1709,7 +1709,7 @@  static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
         * Use kmalloc() rather than vmalloc() when we can.
         */
        dmi = NULL;
-       if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
+       if (param_kernel->data_size <= (PAGE_SIZE << 2)) {
                dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
                if (dmi)
                        *param_flags |= DM_PARAMS_KMALLOC;