Message ID | 20170426132337.GA6482@amd (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On 26.04.2017 16:23, Pavel Machek wrote: > Hi! > >>>> I don't see why it would be hard to open files or have threads inside >>>> a library. There are several libraries that do that already, specially >>>> the ones designed to be used on multimidia apps. >>> >>> Well, This is what the libv4l2 says: >>> >>> This file implements libv4l2, which offers v4l2_ prefixed versions >>> of >>> open/close/etc. The API is 100% the same as directly opening >>> /dev/videoX >>> using regular open/close/etc, the big difference is that format >>> conversion >>> >>> but if I open additional files in v4l2_open(), API is no longer the >>> same, as unix open() is defined to open just one file descriptor. >>> >>> Now. There is autogain support in libv4lconvert, but it expects to use >>> same fd for camera and for the gain... which does not work with >>> subdevs. >>> >>> Of course, opening subdevs by name like this is not really >>> acceptable. But can you suggest a method that is? >> >> There are two separate things here: >> >> 1) Autofoucs for a device that doesn't use subdev API >> 2) libv4l2 support for devices that require MC and subdev API > > Actually there are three: 0) autogain. Unfortunately, I need autogain > first before autofocus has a chance... > > And that means... bayer10 support for autogain. > > Plus, I changed avg_lum to long long. Quick calculation tells me int > could overflow with few megapixel sensor. > > Oh, btw http://ytse.tricolour.net/docs/LowLightOptimization.html no > longer works. > > Regards, > Pavel > > diff --git a/lib/libv4lconvert/processing/autogain.c b/lib/libv4lconvert/processing/autogain.c > index c6866d6..0b52d0f 100644 > --- a/lib/libv4lconvert/processing/autogain.c > +++ b/lib/libv4lconvert/processing/autogain.c > @@ -68,6 +71,41 @@ static void autogain_adjust(struct v4l2_queryctrl *ctrl, int *value, > } > } > > +static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) > +{ > + long long avg_lum = 0; > + int x, y; > + > + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > + fmt->fmt.pix.width / 4; > + > + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > + for (x = 0; x < fmt->fmt.pix.width / 2; x++) That would take some time :). AIUI, we have NEON support in ARM kernels (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the above loop to NEON-optimized when it comes to it? Are there any drawbacks in using NEON code in kernel? > + avg_lum += *buf++; > + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > + } > + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + avg_lum /= 4; > + return avg_lum; > +} > + > +static int get_luminosity_bayer8(unsigned char *buf, const struct v4l2_format *fmt) > +{ > + long long avg_lum = 0; > + int x, y; > + > + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > + fmt->fmt.pix.width / 4; > + > + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > + for (x = 0; x < fmt->fmt.pix.width / 2; x++) ditto. > + avg_lum += *buf++; > + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > + } > + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + return avg_lum; > +} > + > /* auto gain and exposure algorithm based on the knee algorithm described here: > http://ytse.tricolour.net/docs/LowLightOptimization.html */ > static int autogain_calculate_lookup_tables( > @@ -100,17 +142,16 @@ static int autogain_calculate_lookup_tables( > switch (fmt->fmt.pix.pixelformat) { > + case V4L2_PIX_FMT_SGBRG10: > + case V4L2_PIX_FMT_SGRBG10: > + case V4L2_PIX_FMT_SBGGR10: > + case V4L2_PIX_FMT_SRGGB10: > + avg_lum = get_luminosity_bayer10((void *) buf, fmt); > + break; > + > case V4L2_PIX_FMT_SGBRG8: > case V4L2_PIX_FMT_SGRBG8: > case V4L2_PIX_FMT_SBGGR8: > case V4L2_PIX_FMT_SRGGB8: > - buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > - fmt->fmt.pix.width / 4; > - > - for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > - for (x = 0; x < fmt->fmt.pix.width / 2; x++) > - avg_lum += *buf++; > - buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > - } > - avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + avg_lum = get_luminosity_bayer8(buf, fmt); > break; > > case V4L2_PIX_FMT_RGB24: > diff --git a/lib/libv4lconvert/processing/libv4lprocessing.c b/lib/libv4lconvert/processing/libv4lprocessing.c > index b061f50..b98d024 100644 > --- a/lib/libv4lconvert/processing/libv4lprocessing.c > +++ b/lib/libv4lconvert/processing/libv4lprocessing.c > @@ -164,6 +165,10 @@ void v4lprocessing_processing(struct v4lprocessing_data *data, > case V4L2_PIX_FMT_SGRBG8: > case V4L2_PIX_FMT_SBGGR8: > case V4L2_PIX_FMT_SRGGB8: > + case V4L2_PIX_FMT_SGBRG10: > + case V4L2_PIX_FMT_SGRBG10: > + case V4L2_PIX_FMT_SBGGR10: > + case V4L2_PIX_FMT_SRGGB10: > case V4L2_PIX_FMT_RGB24: > case V4L2_PIX_FMT_BGR24: > break; > > > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi! > >>There are two separate things here: > >> > >>1) Autofoucs for a device that doesn't use subdev API > >>2) libv4l2 support for devices that require MC and subdev API > > > >Actually there are three: 0) autogain. Unfortunately, I need autogain > >first before autofocus has a chance... > > > >And that means... bayer10 support for autogain. > > > >Plus, I changed avg_lum to long long. Quick calculation tells me int > >could overflow with few megapixel sensor. > > > >Oh, btw http://ytse.tricolour.net/docs/LowLightOptimization.html no > >longer works. > > > >Regards, > > Pavel > > > >diff --git a/lib/libv4lconvert/processing/autogain.c b/lib/libv4lconvert/processing/autogain.c > >index c6866d6..0b52d0f 100644 > >--- a/lib/libv4lconvert/processing/autogain.c > >+++ b/lib/libv4lconvert/processing/autogain.c > >@@ -68,6 +71,41 @@ static void autogain_adjust(struct v4l2_queryctrl *ctrl, int *value, > > } > > } > > > >+static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) > >+{ > >+ long long avg_lum = 0; > >+ int x, y; > >+ > >+ buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > >+ fmt->fmt.pix.width / 4; > >+ > >+ for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > >+ for (x = 0; x < fmt->fmt.pix.width / 2; x++) > > That would take some time :). AIUI, we have NEON support in ARM kernels > (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the > above loop to NEON-optimized when it comes to it? Are there any drawbacks in > using NEON code in kernel? Well, thanks for offer. This is actualy libv4l2. But I'd say NEON conversion is not neccessary anytime soon. First, this is just trying to get average luminosity. We can easily skip quite a lot of pixels, and still get reasonable answer. Second, omap3isp actually has a hardware block computing statistics for us. We just don't use it for simplicity. (But if you want to play with camera, I'll get you patches; there's ton of work to be done, both kernel and userspace :-). Regards, Pavel
On 27.04.2017 01:51, Pavel Machek wrote: > Hi! > >>>> There are two separate things here: >>>> >>>> 1) Autofoucs for a device that doesn't use subdev API >>>> 2) libv4l2 support for devices that require MC and subdev API >>> >>> Actually there are three: 0) autogain. Unfortunately, I need autogain >>> first before autofocus has a chance... >>> >>> And that means... bayer10 support for autogain. >>> >>> Plus, I changed avg_lum to long long. Quick calculation tells me int >>> could overflow with few megapixel sensor. >>> >>> Oh, btw http://ytse.tricolour.net/docs/LowLightOptimization.html no >>> longer works. >>> >>> Regards, >>> Pavel >>> >>> diff --git a/lib/libv4lconvert/processing/autogain.c b/lib/libv4lconvert/processing/autogain.c >>> index c6866d6..0b52d0f 100644 >>> --- a/lib/libv4lconvert/processing/autogain.c >>> +++ b/lib/libv4lconvert/processing/autogain.c >>> @@ -68,6 +71,41 @@ static void autogain_adjust(struct v4l2_queryctrl *ctrl, int *value, >>> } >>> } >>> >>> +static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) >>> +{ >>> + long long avg_lum = 0; >>> + int x, y; >>> + >>> + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + >>> + fmt->fmt.pix.width / 4; >>> + >>> + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { >>> + for (x = 0; x < fmt->fmt.pix.width / 2; x++) >> >> That would take some time :). AIUI, we have NEON support in ARM kernels >> (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the >> above loop to NEON-optimized when it comes to it? Are there any drawbacks in >> using NEON code in kernel? > > Well, thanks for offer. This is actualy libv4l2. > Oh, somehow I got confused that this is kernel code :) > But I'd say NEON conversion is not neccessary anytime soon. First, > this is just trying to get average luminosity. We can easily skip > quite a lot of pixels, and still get reasonable answer. > > Second, omap3isp actually has a hardware block computing statistics > for us. We just don't use it for simplicity. > Right, I forgot about that. > (But if you want to play with camera, I'll get you patches; there's > ton of work to be done, both kernel and userspace :-). Well, I saw a low hanging fruit I thought I can convert to NEON in a day or two, while having some rest from the huge "project" I am devoting all my spare time recently (rebasing hildon/maemo 5 on top of devuan Jessie). Still, if there is something relatively small to be done, just email me and I'll have a look. Regards, Ivo -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 2017-04-26 15:23:37, Pavel Machek wrote: > Hi! > > > > > I don't see why it would be hard to open files or have threads inside > > > > a library. There are several libraries that do that already, specially > > > > the ones designed to be used on multimidia apps. > > > > > > Well, This is what the libv4l2 says: > > > > > > This file implements libv4l2, which offers v4l2_ prefixed versions > > > of > > > open/close/etc. The API is 100% the same as directly opening > > > /dev/videoX > > > using regular open/close/etc, the big difference is that format > > > conversion > > > > > > but if I open additional files in v4l2_open(), API is no longer the > > > same, as unix open() is defined to open just one file descriptor. > > > > > > Now. There is autogain support in libv4lconvert, but it expects to use > > > same fd for camera and for the gain... which does not work with > > > subdevs. > > > > > > Of course, opening subdevs by name like this is not really > > > acceptable. But can you suggest a method that is? > > > > There are two separate things here: > > > > 1) Autofoucs for a device that doesn't use subdev API > > 2) libv4l2 support for devices that require MC and subdev API > > Actually there are three: 0) autogain. Unfortunately, I need autogain > first before autofocus has a chance... > > And that means... bayer10 support for autogain. > > Plus, I changed avg_lum to long long. Quick calculation tells me int > could overflow with few megapixel sensor. > > Oh, btw http://ytse.tricolour.net/docs/LowLightOptimization.html no > longer works. Can I get some comments here? Patch will need fixup (constants need adjusting), but is style/design acceptable? Thanks, Pavel > diff --git a/lib/libv4lconvert/processing/autogain.c b/lib/libv4lconvert/processing/autogain.c > index c6866d6..0b52d0f 100644 > --- a/lib/libv4lconvert/processing/autogain.c > +++ b/lib/libv4lconvert/processing/autogain.c > @@ -68,6 +71,41 @@ static void autogain_adjust(struct v4l2_queryctrl *ctrl, int *value, > } > } > > +static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) > +{ > + long long avg_lum = 0; > + int x, y; > + > + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > + fmt->fmt.pix.width / 4; > + > + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > + for (x = 0; x < fmt->fmt.pix.width / 2; x++) > + avg_lum += *buf++; > + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > + } > + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + avg_lum /= 4; > + return avg_lum; > +} > + > +static int get_luminosity_bayer8(unsigned char *buf, const struct v4l2_format *fmt) > +{ > + long long avg_lum = 0; > + int x, y; > + > + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > + fmt->fmt.pix.width / 4; > + > + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > + for (x = 0; x < fmt->fmt.pix.width / 2; x++) > + avg_lum += *buf++; > + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > + } > + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + return avg_lum; > +} > + > /* auto gain and exposure algorithm based on the knee algorithm described here: > http://ytse.tricolour.net/docs/LowLightOptimization.html */ > static int autogain_calculate_lookup_tables( > @@ -100,17 +142,16 @@ static int autogain_calculate_lookup_tables( > switch (fmt->fmt.pix.pixelformat) { > + case V4L2_PIX_FMT_SGBRG10: > + case V4L2_PIX_FMT_SGRBG10: > + case V4L2_PIX_FMT_SBGGR10: > + case V4L2_PIX_FMT_SRGGB10: > + avg_lum = get_luminosity_bayer10((void *) buf, fmt); > + break; > + > case V4L2_PIX_FMT_SGBRG8: > case V4L2_PIX_FMT_SGRBG8: > case V4L2_PIX_FMT_SBGGR8: > case V4L2_PIX_FMT_SRGGB8: > - buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > - fmt->fmt.pix.width / 4; > - > - for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > - for (x = 0; x < fmt->fmt.pix.width / 2; x++) > - avg_lum += *buf++; > - buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; > - } > - avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; > + avg_lum = get_luminosity_bayer8(buf, fmt); > break; > > case V4L2_PIX_FMT_RGB24: > diff --git a/lib/libv4lconvert/processing/libv4lprocessing.c b/lib/libv4lconvert/processing/libv4lprocessing.c > index b061f50..b98d024 100644 > --- a/lib/libv4lconvert/processing/libv4lprocessing.c > +++ b/lib/libv4lconvert/processing/libv4lprocessing.c > @@ -164,6 +165,10 @@ void v4lprocessing_processing(struct v4lprocessing_data *data, > case V4L2_PIX_FMT_SGRBG8: > case V4L2_PIX_FMT_SBGGR8: > case V4L2_PIX_FMT_SRGGB8: > + case V4L2_PIX_FMT_SGBRG10: > + case V4L2_PIX_FMT_SGRBG10: > + case V4L2_PIX_FMT_SBGGR10: > + case V4L2_PIX_FMT_SRGGB10: > case V4L2_PIX_FMT_RGB24: > case V4L2_PIX_FMT_BGR24: > break; > > >
On Wed, Apr 26, 2017 at 06:43:54PM +0300, Ivaylo Dimitrov wrote: > >+static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) > >+{ > >+ long long avg_lum = 0; > >+ int x, y; > >+ > >+ buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > >+ fmt->fmt.pix.width / 4; > >+ > >+ for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > >+ for (x = 0; x < fmt->fmt.pix.width / 2; x++) > > That would take some time :). AIUI, we have NEON support in ARM kernels > (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the > above loop to NEON-optimized when it comes to it? Are there any drawbacks in > using NEON code in kernel? Using neon without the VFP state saved and restored corrupts userspace's FP state. So, you have to save the entire VFP state to use neon in kernel mode. There are helper functions for this: kernel_neon_begin() and kernel_neon_end(). You can't build C code with the compiler believing that neon is available as the compiler could emit neon instructions in unprotected kernel code. Note that kernel_neon_begin() is only allowed to be called outside interrupt context and with preemption disabled. Given that, do we really want to be walking over multi-megabytes of image data in the kernel with preemption disabled - it sounds like a recipe for a very sluggish system. I think this should (and can only sensibly be done) in userspace.
On Wed 2017-05-03 20:05:56, Russell King - ARM Linux wrote: > On Wed, Apr 26, 2017 at 06:43:54PM +0300, Ivaylo Dimitrov wrote: > > >+static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) > > >+{ > > >+ long long avg_lum = 0; > > >+ int x, y; > > >+ > > >+ buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + > > >+ fmt->fmt.pix.width / 4; > > >+ > > >+ for (y = 0; y < fmt->fmt.pix.height / 2; y++) { > > >+ for (x = 0; x < fmt->fmt.pix.width / 2; x++) > > > > That would take some time :). AIUI, we have NEON support in ARM kernels > > (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the > > above loop to NEON-optimized when it comes to it? Are there any drawbacks in > > using NEON code in kernel? > > Using neon without the VFP state saved and restored corrupts userspace's > FP state. So, you have to save the entire VFP state to use neon in kernel > mode. There are helper functions for this: kernel_neon_begin() and > kernel_neon_end(). ... > Given that, do we really want to be walking over multi-megabytes of image > data in the kernel with preemption disabled - it sounds like a recipe for > a very sluggish system. I think this should (and can only sensibly be > done) in userspace. The patch was for libv4l2. (And I explained why we don't need to overoptimize this.) Pavel
Hi! > Oh, somehow I got confused that this is kernel code :) > > >But I'd say NEON conversion is not neccessary anytime soon. First, > >this is just trying to get average luminosity. We can easily skip > >quite a lot of pixels, and still get reasonable answer. > > > >Second, omap3isp actually has a hardware block computing statistics > >for us. We just don't use it for simplicity. > > > > Right, I forgot about that. > > >(But if you want to play with camera, I'll get you patches; there's > >ton of work to be done, both kernel and userspace :-). > > Well, I saw a low hanging fruit I thought I can convert to NEON in a day or > two, while having some rest from the huge "project" I am devoting all my > spare time recently (rebasing hildon/maemo 5 on top of devuan Jessie). > Still, if there is something relatively small to be done, just email me and > I'll have a look. Well, there's a ton of work on camera, and some work on libcmtspeechdata. The later is rather self-contained. Pavel
diff --git a/lib/libv4lconvert/processing/autogain.c b/lib/libv4lconvert/processing/autogain.c index c6866d6..0b52d0f 100644 --- a/lib/libv4lconvert/processing/autogain.c +++ b/lib/libv4lconvert/processing/autogain.c @@ -68,6 +71,41 @@ static void autogain_adjust(struct v4l2_queryctrl *ctrl, int *value, } } +static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt) +{ + long long avg_lum = 0; + int x, y; + + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + + fmt->fmt.pix.width / 4; + + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { + for (x = 0; x < fmt->fmt.pix.width / 2; x++) + avg_lum += *buf++; + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; + } + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; + avg_lum /= 4; + return avg_lum; +} + +static int get_luminosity_bayer8(unsigned char *buf, const struct v4l2_format *fmt) +{ + long long avg_lum = 0; + int x, y; + + buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + + fmt->fmt.pix.width / 4; + + for (y = 0; y < fmt->fmt.pix.height / 2; y++) { + for (x = 0; x < fmt->fmt.pix.width / 2; x++) + avg_lum += *buf++; + buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; + } + avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; + return avg_lum; +} + /* auto gain and exposure algorithm based on the knee algorithm described here: http://ytse.tricolour.net/docs/LowLightOptimization.html */ static int autogain_calculate_lookup_tables( @@ -100,17 +142,16 @@ static int autogain_calculate_lookup_tables( switch (fmt->fmt.pix.pixelformat) { + case V4L2_PIX_FMT_SGBRG10: + case V4L2_PIX_FMT_SGRBG10: + case V4L2_PIX_FMT_SBGGR10: + case V4L2_PIX_FMT_SRGGB10: + avg_lum = get_luminosity_bayer10((void *) buf, fmt); + break; + case V4L2_PIX_FMT_SGBRG8: case V4L2_PIX_FMT_SGRBG8: case V4L2_PIX_FMT_SBGGR8: case V4L2_PIX_FMT_SRGGB8: - buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 + - fmt->fmt.pix.width / 4; - - for (y = 0; y < fmt->fmt.pix.height / 2; y++) { - for (x = 0; x < fmt->fmt.pix.width / 2; x++) - avg_lum += *buf++; - buf += fmt->fmt.pix.bytesperline - fmt->fmt.pix.width / 2; - } - avg_lum /= fmt->fmt.pix.height * fmt->fmt.pix.width / 4; + avg_lum = get_luminosity_bayer8(buf, fmt); break; case V4L2_PIX_FMT_RGB24: diff --git a/lib/libv4lconvert/processing/libv4lprocessing.c b/lib/libv4lconvert/processing/libv4lprocessing.c index b061f50..b98d024 100644 --- a/lib/libv4lconvert/processing/libv4lprocessing.c +++ b/lib/libv4lconvert/processing/libv4lprocessing.c @@ -164,6 +165,10 @@ void v4lprocessing_processing(struct v4lprocessing_data *data, case V4L2_PIX_FMT_SGRBG8: case V4L2_PIX_FMT_SBGGR8: case V4L2_PIX_FMT_SRGGB8: + case V4L2_PIX_FMT_SGBRG10: + case V4L2_PIX_FMT_SGRBG10: + case V4L2_PIX_FMT_SBGGR10: + case V4L2_PIX_FMT_SRGGB10: case V4L2_PIX_FMT_RGB24: case V4L2_PIX_FMT_BGR24: break;