Commit Graph

234 Commits

Author SHA1 Message Date
Yaowu Xu
d338d14c6b optimize fast_quantizer c version
As the zbin and rounding constants are normalized, rounding effectively
does the zbinning, therefore the zbin operation can be removed. In
addition, the memset on the two arrays are no longer necessary.

Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
2010-10-06 13:28:36 -07:00
Paul Wilkins
2931b05ac5 Merge "Tune effect of motion on KF/GF boost in two pass;" 2010-10-05 06:58:24 -07:00
Jan Kratochvil
1fc294116a nasm: movhps compatibility QWORD->MMWORD
Filed for nasm as:
https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208

nasm just does not accept any size parameter for movhps:
1.asm:2: error: mismatch in operand sizes

Some parts of libvpx already use MMWORD for movhps and MMWORD is
defined-out so it is compatible both with yasm and nasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.

Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
2010-10-04 20:47:19 -04:00
Jan Kratochvil
5cdc3a4c29 nasm: address labels 'rel label' vice 'wrt rip'
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04 19:47:54 -04:00
Jan Kratochvil
e114f699f6 nasm: match instruction length (movd/movq) to parameters
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-10-04 23:36:29 +02:00
Yaowu Xu
2d4ef37507 Merge "enable trellis quantization for 2nd order blocks" 2010-10-04 10:41:20 -07:00
Paul Wilkins
788c0eb54e Tune effect of motion on KF/GF boost in two pass;
This code adjust the impact of the amount and speed of motion
on GF and KF boost.

Sections with lots of slow motion will tend to have a
somewhat bigger boost and sections with fast motion may
have less.

There is a knock on effect to the selection of the active
quantizer range.

This will likely require further tuning but helps with a couple
of particularly bad edge cases.

Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
2010-10-02 17:31:46 +01:00
Yaowu Xu
dcd29e369f enable trellis quantization for 2nd order blocks
Experimented with different value for Y2_RD_MULT ranging f[1, 32],
without adapting the value to MB coding mode/frame type/Q value,
4 works out best among all values, providing overall 0.1% coding
gain on the test set.

Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
2010-10-02 06:20:33 -07:00
Johann
f143a81191 Merge "Fix valgrind errors in the NEON loop filters." 2010-10-01 06:18:53 -07:00
Adrian Grange
999bc00301 Made temporal filter default to use centered mode
If temporal filtering is enabled but a filter type is not specified
centered filter mode is used by default.

Change-Id: I87306f267c1390074c806c506a69b4ba914d92a2
2010-10-01 10:14:01 +01:00
Timothy B. Terriberry
a465076e02 Fix valgrind errors in the NEON loop filters.
Like the ARMv6 code, these functions were accessing values below
 the stack pointer, which can be corrupted by signal delivery at
 any time.
2010-09-30 20:40:45 -07:00
John Koleszar
0faa8a0861 Merge "Rename mode_ref_lf_test_function" 2010-09-30 10:26:31 -07:00
John Koleszar
a047fee606 Merge "Fix loopfilter delta zero transitions" 2010-09-30 10:26:10 -07:00
Adrian Grange
8ee7284d60 Changed defaults & range checking for AltRef params
Modified the range checking of parameters used in the
AltRef temporal filter (arnr-max-frames, arnr-strength,
arnr-type) and default values for each of them.

Change-Id: Ib261028d501b9523f6e44cb4790cc52167b6e92b
2010-09-30 10:06:09 +01:00
John Koleszar
7e5e31516c Rename mode_ref_lf_test_function
This function graduated from being a test func to something that's on
by default. Rename it and remove some spurious comments that confuse
its status.

Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
2010-09-29 13:53:14 -04:00
Fritz Koenig
439b2ecd74 Merge "Optimizations on the loopfilters." 2010-09-29 10:47:01 -07:00
John Koleszar
b9be7a464f Fix loopfilter delta zero transitions
Loopfilter deltas are initialized to zero on keyframes in the decoder.
The values then persist from the previous frame unless an update bit
is set in the bitstream. This data is not included in the entropy
data saved by the 'refresh entropy' bit in the bitstream, so it is
effectively an additional contextual element beyond the 3 ref-frames
and the entropy data.

The encoder was treating this delta update bit as update-if-nonzero,
meaning that the value would be refreshed even if it hadn't changed,
and more significantly, if the correct value for the delta changed
to zero, the update wouldn't be sent, and the decoder would preserve
the last (presumably non-zero) value.

This patch updates the encoder to send an update only if the value
has changed from the previously transmitted value. It also forces the
value to be transmitted in error resilient mode, to account for lost
context in the event of lost frames.

Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
2010-09-29 13:04:04 -04:00
Paul Wilkins
7288cdf79d Change to coefficient optimization rules.
Allow coefficient optimization for good quality speed 0.

Change-Id: Id0cb363df6823c6798671584fbba097916a7df2c
2010-09-29 13:22:05 +01:00
Adrian Grange
4f92b96bdb Merge "Moved row-specific computation of MV bounds out of col loop" 2010-09-29 05:13:41 -07:00
Adrian Grange
0e7c45b391 Moved row-specific computation of MV bounds out of col loop
Moved the bounds computation on vertical MV component out
of the loop that processes MBs within a MB row.
2010-09-29 13:03:07 +01:00
Paul Wilkins
ff3068d6da Control of active min quantizer for two pass.
Create  look up tables for controlling the active quantizer range.
Some initial tuning to improve quality circa 0.5% on test set.
Clean up of some stats output code

Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
2010-09-29 12:03:19 +01:00
Fritz Koenig
0964ef0e71 Optimizations on the loopfilters.
- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
  shifting and orring

Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
2010-09-28 12:01:34 -07:00
Adrian Grange
47fc8f2683 Enabled AltRef motion map creation
Enabled the first-pass encode to output the
map of macroblock coding modes required by
the AltRef filter.
2010-09-28 16:52:19 +01:00
Adrian Grange
0090328164 Merge "Made AltRef filter adaptive & added motion compensation" 2010-09-28 08:34:44 -07:00
Adrian Grange
1b2f8308e4 Made AltRef filter adaptive & added motion compensation
Modified AltRef temporal filter to adapt filter length based
on macroblock coding modes selected during first-pass
encode.

Also added sub-pixel motion compensation to the AltRef
filter.
2010-09-28 15:23:41 +01:00
Timothy B. Terriberry
18dc92fd66 Add 4-tap version of 2nd-pass ARMv6 MC filter.
The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
 extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
 of loads per iteration from 4 to 1, and the number of multiplies
 from 7 to 4.
The gain in overall decoding performance, however, is small (less
 than 1%).

This change also means we now valgrind clean on ARMv6, which is
 its real purpose.
The errors reported here were valgrind's fault (it does not detect
 that 0 times an uninitialized value is initialized), but Julian
 Seward says it would slow down valgrind considerably to make such
 checks.
Speeding up libvpx rather, even by a small amount, seems a much
 better idea if only to enable proper valgrind checking of the
 rest of the codec.

Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
2010-09-27 18:25:45 -07:00
Paul Wilkins
305be4e417 Badly placed initialization of rolling rate monitors.
This affects control of the active quantizer range.

Change-Id: I30511fc81ac9f75ff20d9f1372382423d56739da
2010-09-27 12:50:55 -04:00
John Koleszar
2b521ab551 move reconintra_mt to decoder (fixup)
Missed the .h file in the move.

Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
2010-09-27 12:48:31 -04:00
John Koleszar
9fdcdc511d Merge "disable compilation of debugging code" 2010-09-27 07:00:03 -07:00
Johann
063be9b82a Merge "combine max values and compare once" 2010-09-27 06:39:20 -07:00
Timothy B. Terriberry
e2795e9978 Fix valgrind errors in vp8_sixtap_predict8x4_armv6().
This function was accessing values below the stack pointer, which
 can be corrupted by signal delivery at any time.

Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
2010-09-24 14:34:18 -07:00
Johann
f30e8dd7bd combine max values and compare once
previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.

this does the accumulation first, requiring only one compare

Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
2010-09-24 15:42:50 -04:00
John Koleszar
dbd57c2663 Merge "move reconintra_mt to decoder (for now)" 2010-09-24 08:46:35 -07:00
John Koleszar
8ca779aba8 disable compilation of debugging code
This patch avoids compiling some debugging code in onyx_if.c. The most
significant fix is to avoid generating code for vp8_write_yuv_frame,
which is never called. Some other code was removed by the dead code
elimination performed by the compiler, and this patch does it with the
preprocessor instead. There are advantages both ways.

Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
2010-09-24 11:42:22 -04:00
Yunqing Wang
aab0f5b121 Merge "Adjust multi-thread sync ranges according to image sizes" 2010-09-24 08:34:07 -07:00
John Koleszar
48e76ff4fd move reconintra_mt to decoder (for now)
reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.

This patch is needed to build with --disable-vp8-decoder.

Change-Id: I568c52221a2b309234d269675cba97131ce35c86
2010-09-24 11:23:06 -04:00
John Koleszar
329aaaf453 Merge "Add getter functions for the interface data symbols" 2010-09-24 05:39:48 -07:00
John Koleszar
fa7a55bb04 Add getter functions for the interface data symbols
Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.

Fixes issue #169

Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
2010-09-23 14:58:43 -04:00
Yunqing Wang
8db5da2906 Adjust multi-thread sync ranges according to image sizes
In multi-threaded decoder, set different sync ranges for
different video resolutions.

Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
2010-09-23 13:53:09 -04:00
Johann
7fed3832e7 Remove dead code
The new loopfilter was originally introduced as an experimental change.
It's permanent now.

Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
2010-09-22 11:07:34 -04:00
John Koleszar
cdd2066687 unset execute bit on c source
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
2010-09-21 19:48:06 -04:00
John Koleszar
6f4c0435d1 Merge "Don't reset mb clamping state during splitmv decoding" 2010-09-21 09:06:59 -07:00
John Koleszar
4d391e8ed2 Don't reset mb clamping state during splitmv decoding
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.

Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
2010-09-21 11:58:48 -04:00
John Koleszar
015cfcafbd Merge "Add high limit check for unsigned parameters" 2010-09-21 05:36:46 -07:00
Yunqing Wang
a23ccf8f8c Merge "Restructure multi-threaded decoder" 2010-09-21 05:00:30 -07:00
Fritz Koenig
b7dc9398f2 Use movq instead of movdqu.
Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
2010-09-20 11:34:26 -07:00
Fritz Koenig
1c906448cc Merge "Better choice of instruction filter mask comparision." 2010-09-20 11:01:51 -07:00
Johann
6cf2b4aa0e Merge "reorder data to use wider instructions" 2010-09-20 10:47:33 -07:00
Johann
9c9afbab85 Merge "Update NEON wide idcts" 2010-09-20 10:47:22 -07:00
Fritz Koenig
8eae7fe7e8 Better choice of instruction filter mask comparision.
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
2010-09-20 10:20:38 -07:00