Commit Graph

651 Commits

Author SHA1 Message Date
Tero Rintaluoma
6fdc9aa79f ARMv6 optimized subtract functions
Adds following ARMv6 optimized functions to encoder:
  - vp8_subtract_b_armv6
  - vp8_subtract_mby_armv6
  - vp8_subtract_mbuv_armv6

Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.

Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
2011-03-29 16:52:00 +03:00
Tero Rintaluoma
f5e433464b Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.

Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
2011-03-28 09:51:51 +03:00
John Koleszar
b8a78cfa49 Merge remote branch 'origin/master' into experimental
Change-Id: Ibffdedc3bd2e1ec349e79ba038b065c98db77d06
2011-03-25 00:05:04 -04:00
John Koleszar
cdada23377 Merge remote branch 'internal/upstream' into HEAD 2011-03-25 00:05:04 -04:00
Johann
beaafefcf1 Merge "use asm_offsets with vp8_regular_quantize_b_sse2" 2011-03-24 11:06:36 -07:00
Johann
8edaf6e2f2 use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems

when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9

Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
2011-03-24 13:34:48 -04:00
John Koleszar
3f4291e6e0 Merge remote branch 'origin/master' into experimental
Change-Id: I2e36f806ae5551c5015243de697aac3e9e29334d
2011-03-24 00:05:06 -04:00
John Koleszar
1f1526f8b8 Merge remote branch 'internal/upstream' into HEAD 2011-03-24 00:05:05 -04:00
Johann
4cde2ab765 Merge "ARMv6 optimized fdct4x4" 2011-03-23 07:52:51 -07:00
John Koleszar
51bcf621c1 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/decodemv.c
	vp8/decoder/onyxd_if.c
	vp8/encoder/ratectrl.c
	vp8/encoder/rdopt.c

Change-Id: Ia1c1c5e589f4200822d12378c7749ba62bd17ae2
2011-03-23 00:27:52 -04:00
John Koleszar
5f6db3591c Merge remote branch 'origin/master' into experimental
Conflicts:
	vp8/encoder/ratectrl.c
	vp8/encoder/rdopt.c

Change-Id: I4cc58acb432662d2c47aceda1680e52982adbc06
2011-03-23 00:24:25 -04:00
Yunqing Wang
73065b67e4 Merge "Fix multithreaded encoding for 1 MB wide frame" 2011-03-21 07:41:31 -07:00
John Koleszar
2cbd962088 Remove unused vp8_get4x4sse_cs_mmx declaration
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.

Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
2011-03-21 07:53:53 -04:00
John Koleszar
769c74c0ac Merge "Increase static linkage, remove unused functions" 2011-03-21 04:51:51 -07:00
Tero Rintaluoma
a61785b6a1 ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
  - No interlocks in Cortex-A8 pipeline
  - One interlock cycle in ARM11 pipeline
  - About 2.16 times faster than current C-code compiled with -O3

Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
2011-03-21 13:33:45 +02:00
Attila Nagy
bfe803bda3 Fix multithreaded encoding for 1 MB wide frame
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.

http://code.google.com/p/webm/issues/detail?id=302

Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
2011-03-18 12:35:30 +02:00
John Koleszar
4a1c3cf7d8 Merge remote branch 'origin/master' into experimental
Change-Id: If77de7e96a971edd8666ea0b1bd5eac6b09c6912
2011-03-18 00:05:07 -04:00
John Koleszar
cba980e3eb Merge remote branch 'internal/upstream' into HEAD 2011-03-18 00:05:06 -04:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
John Koleszar
8431e768c9 Merge "Fix "used uninitialized" warning in vp8_pack_bitstream()" 2011-03-17 14:25:04 -07:00
John Koleszar
386ceca8d2 Merge remote branch 'origin/master' into experimental
Change-Id: If09b27454f82265fd5e3b25c85c1eea70c6c637f
2011-03-16 00:05:07 -04:00
John Koleszar
dc3451b086 Merge remote branch 'internal/upstream' into HEAD 2011-03-16 00:05:06 -04:00
Attila Nagy
71bcd9f1af Add vp8_variance8x8_armv6 and vp8_sub_pixel_variance8x8_armv6 functions
Change-Id: I08edaffc62514907fa5e90e1689269e467c857f5
2011-03-15 15:50:44 +02:00
John Koleszar
54c59a03f3 Merge remote branch 'origin/master' into experimental
Change-Id: Ice13978071e98a88cf8ae5c069c6423d74425dea
2011-03-15 00:05:07 -04:00
John Koleszar
b210797a6a Merge remote branch 'internal/upstream' into HEAD 2011-03-15 00:05:07 -04:00
Johann
d0ec28b3d3 Merge "Add vp8_mse16x16_armv6 function" 2011-03-14 12:47:42 -07:00
John Koleszar
ba83622a00 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/encoder/onyx_if.c

Change-Id: Ieef9a58a2effdc68cf52bc5f14d90c31a1dbc13a
2011-03-14 08:53:02 -04:00
John Koleszar
eeb8c8004e Merge remote branch 'origin/master' into experimental
Conflicts:
	vp8/encoder/onyx_if.c

Change-Id: I230b63cef209cd1ac98357729a91ec07597756bd
2011-03-14 08:48:44 -04:00
Attila Nagy
e54dcfe88d Add vp8_mse16x16_armv6 function
Change-Id: I77e9f2f521a71089228f96e2db72524189364ffb
2011-03-14 14:38:31 +02:00
Johann
3788b3564c Merge "Move build_intra_predictors_mby to RTCD framework" 2011-03-11 10:23:48 -08:00
John Koleszar
27972d2c1d Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.

Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
2011-03-11 13:04:50 -05:00
Johann
5c60a646f3 Merge "ARMv6 optimized quantization" 2011-03-11 08:29:00 -08:00
Paul Wilkins
6e73748492 Clean up of vp8_init_config()
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.

Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
2011-03-11 11:06:51 -05:00
John Koleszar
170b87390e Merge "1 Pass CQ and VBR bug fixes" 2011-03-11 08:06:09 -08:00
Paul Wilkins
2ae91fbef0 1 Pass CQ and VBR bug fixes
Issue 291 highlighted  the fact that CQ mode was not working
as expected in 1 pass mode,

This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.

For some clips (particularly short clips), the resulting
improvement is dramatic.

Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
2011-03-11 10:59:34 -05:00
John Koleszar
e34e417d94 Merge "Fix incorrect macroblock counts in twopass rate control" 2011-03-11 06:06:04 -08:00
Yunqing Wang
3c9dd6c3ef Merge "Align SAD output array to be 16-byte aligned" 2011-03-11 05:56:02 -08:00
John Koleszar
c5c5dcd0be Merge "vp8cx - psnr converted to call assemblerized sse" 2011-03-11 05:54:00 -08:00
John Koleszar
29c46b64a2 Merge "vp8cx- alternate ssim function with optimizations" 2011-03-11 05:53:41 -08:00
Jim Bankoski
3dc382294b vp8cx - psnr converted to call assemblerized sse
Change-Id: Ie388d4618c44b131f96b9fe526618b457f020dfa
2011-03-11 08:51:22 -05:00
Jim Bankoski
3f6f7289aa vp8cx- alternate ssim function with optimizations
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11 08:51:21 -05:00
Yunqing Wang
b2aa401776 Align SAD output array to be 16-byte aligned
Use aligned store.

Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-11 08:24:23 -05:00
Yunqing Wang
76ec21928c Merge "Encoder loopfilter running in its own thread" 2011-03-11 04:55:05 -08:00
Attila Nagy
9c836daf65 Fix "used uninitialized" warning in vp8_pack_bitstream()
Change-Id: Iadcbdba717439f47a2c24e65fd69a3a1464174b5
2011-03-11 12:36:28 +02:00
Attila Nagy
3ae2465788 Encoder loopfilter running in its own thread
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.

Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application  and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.

Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
2011-03-11 10:52:51 +02:00
Tero Rintaluoma
7ab08e1fee ARMv6 optimized quantization
Adds new ARMv6 optimized function vp8_fast_quantize_b_armv6
to the encoder.

Change-Id: I40277ec8f82e8a6cbc453cf295a0cc9b2504b21e
2011-03-11 10:48:42 +02:00
John Koleszar
314631ca61 Merge remote branch 'origin/master' into experimental
Change-Id: Ibc4a75dbbc8b35ce298477e055e5a88df080d4b3
2011-03-11 00:05:09 -05:00
John Koleszar
31ce8f419c Merge remote branch 'internal/upstream' into HEAD 2011-03-11 00:05:07 -05:00
Adrian Grange
6daacdb785 Added missing format specifier in print statement
Printout of firstpass stats for frame had one fewer
format specifiers than arguments.

Change-Id: I5a42c85aa79c471e1a70afd75e24a91546b7a1cd
2011-03-10 12:43:49 -08:00
Adrian Grange
ed40ff9e2d Removed firstpass motion map
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.

For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.

This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.

Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.

Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
2011-03-10 11:32:48 -08:00
James Berry
f3e9e2a0f8 Fix incorrect macroblock counts in twopass rate control
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
 estimate_kf_group_q, and estimate_cq

Change-Id: I92ff98587864c801b1ee5485cfead964673a9973
2011-03-10 13:33:06 -05:00
John Koleszar
dc29ed27bd Merge remote branch 'origin/master' into experimental
Change-Id: Icb795cef47a205f33f180f3852d88c36113b673e
2011-03-10 00:05:06 -05:00
John Koleszar
820b2b927f Merge remote branch 'internal/upstream' into HEAD 2011-03-10 00:05:04 -05:00
Yunqing Wang
7b8e7f0f3a Add vp8_sub_pixel_variance16x8_ssse3 function
Added SSSE3 function

Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09 12:33:21 -05:00
Yunqing Wang
4561109a69 Remove unused functions
Removed some unused functions

Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-09 10:45:03 -05:00
Yunqing Wang
7966dd5287 Merge "Improve SSE2 half-pixel filter funtions" 2011-03-09 07:23:06 -08:00
John Koleszar
fa836faede Merge "Configuration updates:Making a clear distinction between Init and Change" 2011-03-09 05:07:11 -08:00
John Koleszar
016fb2b554 Merge remote branch 'origin/master' into experimental
Change-Id: Ie52ff118b00ce462bb110ae349108e55d3d8ff3b
2011-03-09 00:05:07 -05:00
John Koleszar
96208f2e45 Merge remote branch 'internal/upstream' into HEAD 2011-03-09 00:05:06 -05:00
Yunqing Wang
419f638910 Improve SSE2 half-pixel filter funtions
Rewrote these functions to process 16 pixels once instead of 8.

Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08 16:25:06 -05:00
Yunqing Wang
859abd6b5d Merge "Add zero offset checking in SSE2 sub-pixel filter function" 2011-03-08 12:26:58 -08:00
Yunqing Wang
8432a1729f Add zero offset checking in SSE2 sub-pixel filter function
Skip filter at zero offset.

Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08 15:22:07 -05:00
Yunqing Wang
e8f7b0f7f5 Merge "Write SSSE3 sub-pixel filter function" 2011-03-08 10:58:30 -08:00
Yunqing Wang
244e2e1451 Write SSSE3 sub-pixel filter function
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
   during motion search.
This change gave encoder 1%~3% performance gain.

Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-03-08 13:29:01 -05:00
Ralph Giles
e6948bf0f9 Fix a multi-line format-string warning.
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.

Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.

Change-Id: I27c9f92d46be82d408105a3a4091f145f677e00e
2011-03-08 07:14:12 -08:00
Paul Wilkins
de87c420ef Corrected minor typos.
Change-Id: Icc9f12bd1e1bdaf51256dc8a90d08aa9be89ef34
2011-03-08 14:46:22 +00:00
Paul Wilkins
0eccee4378 Merge changes I00c3e823,If8bca004
* changes:
  Improved key frame detection.
  Improved KF insertion after fades to still.
2011-03-08 06:40:11 -08:00
John Koleszar
5d1d9911cb correct zbin boost for splitmv mode
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.

Change-Id: I0408cc7595eff68f00efef6d008e79f5b60d14bf
2011-03-07 20:58:37 -05:00
Paul Wilkins
bc9c30a003 Improved key frame detection.
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.

The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.

Change-Id: I00c3e8230772b8213cdc08020e1990cf83b780d8
2011-03-07 15:58:07 +00:00
Paul Wilkins
9fc8cb39aa Improved KF insertion after fades to still.
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion)  followed by a still section, will
be beneficial and will reduce the number of forced key frames.

Change-Id: If8bca00457f0d5f83dc3318a587f61c17d90f135
2011-03-07 15:11:09 +00:00
John Koleszar
01eb7c2874 Merge remote branch 'origin/master' into experimental
Change-Id: I70ac5a4f8388a7bfa058178c0ae53f6bdb0bb6e5
2011-03-05 00:05:07 -05:00
John Koleszar
89d66cbb20 Merge remote branch 'internal/upstream' into HEAD 2011-03-05 00:05:05 -05:00
John Koleszar
0bc31f1887 Merge "Fixing divide by zero" 2011-03-04 05:40:33 -08:00
Mikhal Shemer
84f7f20985 Configuration updates:Making a clear distinction between Init and Change
Change-Id: I7b2fb326e1aabc08b032177a7b914a5b8bb7376f
2011-03-03 10:35:09 -08:00
Mikhal Shemer
1de99a2a81 Fixing divide by zero
Change-Id: I9d8a98a2f7ed1e3116d0bae35164618c41998bac
2011-03-03 10:33:36 -08:00
John Koleszar
2c5638334e Merge remote branch 'origin/master' into experimental
Conflicts:
	vp8/vp8_cx_iface.c

Change-Id: Ib30d0cfbdaeb605ee4b846f683d204cd07e0c028
2011-03-03 09:01:10 -05:00
John Koleszar
ca29f6a7c4 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/vp8_cx_iface.c

Change-Id: Iecfd4532ab1c722d10ecce8a5ec473e96093cf3b
2011-03-03 08:59:34 -05:00
John Koleszar
738a791917 Merge remote branch 'internal/upstream-experimental' into HEAD
Conflicts:
	vp8/common/blockd.h

Change-Id: Ica2bd1c3da614eab5ce23acfb597e777d16b3983
2011-03-03 08:58:57 -05:00
John Koleszar
36be4f7f06 Fix drastic undershoot in long form content
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.

This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.

This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.

This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.

Change-Id: Ie5cc1a7b516c578a76c3a50c892a6f04a11621fe
2011-03-02 22:52:27 -05:00
Johann
6f5189c044 Merge "ARMv6 optimized half pixel variance calculations" 2011-03-02 05:48:46 -08:00
Yunqing Wang
cfaee9f7c6 Merge "Add prefetch before variance calculation" 2011-02-28 11:42:28 -08:00
Scott LaVarnway
3e6d476ac3 Merge "Avoid double copying of key frames into alt and golden buffer" 2011-02-28 10:16:33 -08:00
Yunqing Wang
d96ba65a23 Add prefetch before variance calculation
This improved encoding performance by 0.5% (good, speed 1) to
1.5% (good, speed 5).

Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-28 11:25:55 -05:00
Johann
31dab574cc Merge "Remove a second check for invalid ptr in vp8_get_compressed_data" 2011-02-25 11:44:18 -08:00
Johann
e4fa638653 Merge "Remove temporal alt ref from realtime only build" 2011-02-25 06:55:17 -08:00
Attila Nagy
d8fc974ac0 Avoid double copying of key frames into alt and golden buffer
Change-Id: I726976a297a593a35ed6cba3c660e372562f7b27
2011-02-25 09:03:16 +02:00
Attila Nagy
6da2018789 Remove a second check for invalid ptr in vp8_get_compressed_data
Check is done first when function si entered.

Change-Id: Ief0d0cbd4860aaf492b78728f8d22f24029b1174
2011-02-25 08:41:13 +02:00
John Koleszar
1a7ce50a6c Merge remote branch 'origin/master' into experimental
Change-Id: I52f21ff6f9a1dca7099a8459657f6f288c5bfe40
2011-02-25 00:05:08 -05:00
Scott LaVarnway
861175ef00 Removed vp8_block2type
and used defines instead.

Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03
2011-02-24 14:35:18 -05:00
Scott LaVarnway
d53492bba4 Merge "Revisited rd_pick_intra4x4block" 2011-02-24 11:25:21 -08:00
Scott LaVarnway
658454a04c Revisited rd_pick_intra4x4block
Removed unnecessary copies.  No noticeable speed gains.


Change-Id: I996c50c23fedd06d54ee7a3e762cbf559cc4a9d1
2011-02-24 13:31:47 -05:00
Paul Wilkins
b862c108dd Overflow of frame error accumulators.
This fixes an overflow problem in the frame error accumulators.

The overflow condition is extreme but did trigger when Frank B.
coded some high motion interlaced HD content.

The observed effect was a catastrophic  breakdown of the rate
control leading to massive undershoot and poor bit allocation.

All the error values should really be unsigned but I will look at this
separately.

Change-Id: I9745f5c5ca2783620426b66b568b2088b579151f
2011-02-24 15:49:41 +00:00
Tero Rintaluoma
8ae92aef66 ARMv6 optimized half pixel variance calculations
Adds following ARMv6 optimized functions to the encoder:
 - vp8_variance_halfpixvar16x16_h_armv6
 - vp8_variance_halfpixvar16x16_v_armv6
 - vp8_variance_halfpixvar16x16_hv_armv6

Change-Id: I1e9c2af7acd2a51b72b3845beecd990db4bebd29
2011-02-23 13:27:27 +02:00
Attila Nagy
7af0d906e3 Remove temporal alt ref from realtime only build
It is not used in realtime mode. Reduces memory footprint.

Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-22 12:53:32 +02:00
John Koleszar
b21fe3b278 Merge remote branch 'internal/upstream' into HEAD 2011-02-19 00:05:44 -05:00
John Koleszar
bbfca323fb Merge remote branch 'origin/master' into experimental
Change-Id: Ia3197f432b424213a34b20071e5171a413ba1aaf
2011-02-19 00:05:11 -05:00
Johann
945dad277d Revert "use unaligned load"
This reverts commit f50f2fd2a7.

Change Ib7506e3e aligns the buffer

Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-18 10:23:02 -05:00
John Koleszar
c764c2a20f Merge "clean up unused files" 2011-02-18 06:33:05 -08:00
John Koleszar
3ed8fe8778 remove unused vp8_predict_dc function
Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42
2011-02-18 09:12:20 -05:00
John Koleszar
cbf923b12c clean up unused files
Removed a number of files that were unused or little-used.

Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da
2011-02-18 09:09:49 -05:00
John Koleszar
d371ca93e5 cosmetic: remove unnecessary scope
Clean up some unnecessary scoping around pick_filter_level.

Change-Id: Ic57fa33e3fcae37fe6beae977e5743783399d5af
2011-02-18 08:46:07 -05:00
John Koleszar
597d02b508 Merge "Dont pick encoder filter level when loopfilter is disabled." 2011-02-18 05:26:23 -08:00
Attila Nagy
fb5a692d27 Reinitialize quantizer only when any delta is changing
No need to reinitialize for base Q changes.

Change-Id: Ie76ec21dd3c5582d5183dbed75ed73a1eed3e291
2011-02-18 14:23:37 +02:00
Attila Nagy
c6ef75690f Dont pick encoder filter level when loopfilter is disabled.
Change-Id: I58154faf4f3ece24f9927a5c3ab7e830e0887fb6
2011-02-18 08:53:00 +02:00
John Koleszar
f13212b728 Merge remote branch 'internal/upstream' into HEAD 2011-02-18 00:05:13 -05:00
John Koleszar
4fafc4d985 Merge remote branch 'origin/master' into experimental
Change-Id: I8999a33db82d38eb85482f3c423db238d6ee3ed9
2011-02-18 00:05:11 -05:00
John Koleszar
562f1470ce Use endian-neutral bitstream packing/unpacking
Eliminate unnecessary checks on target endianness and associated
macros.

Change-Id: I1d4e6a9dcee9bfc8940c8196838d31ed31b0e4aa
2011-02-17 15:20:53 -05:00
John Koleszar
c351aa7f1b Merge "Fix relative include paths" 2011-02-17 04:13:44 -08:00
John Koleszar
c88dbb2dce Merge remote branch 'internal/upstream' into HEAD 2011-02-17 00:05:14 -05:00
John Koleszar
1293116895 Merge remote branch 'origin/master' into experimental
Change-Id: I3efb725e4da4e7c75b2512b80db6af51dec51f79
2011-02-17 00:05:13 -05:00
Yunqing Wang
da9402fbf6 Merge "Allocate source buffers to be multiples of 16" 2011-02-16 11:35:06 -08:00
Yunqing Wang
da227b901d Allocate source buffers to be multiples of 16
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.

Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
2011-02-16 12:57:17 -05:00
Johann
0c2cfff9b0 Merge "ARMv6 optimized sad16x16" 2011-02-16 05:22:38 -08:00
John Koleszar
e786bd3a01 Merge remote branch 'internal/upstream' into HEAD 2011-02-16 00:05:13 -05:00
John Koleszar
9e95a1a0cd Merge remote branch 'origin/master' into experimental
Change-Id: If846b0e4ec862b54b98d08608f4b5f9a7b7f94ef
2011-02-16 00:05:10 -05:00
James Zern
0030303b69 Remove redundant ptr checks in calls to vpx_free
vpx_free if used contains this check. If replaced, well behaved free
will behave similarly.

Change-Id: I25483aaa8b39255b9a8cf388d6e5eaa20a908ae1
2011-02-15 12:43:35 -08:00
John Koleszar
c6ea558c05 Merge remote branch 'internal/upstream' into HEAD 2011-02-15 00:05:39 -05:00
John Koleszar
cf8aa08348 Merge remote branch 'origin/master' into experimental
Change-Id: I4b1a7a2ad0d62bdcabfed66c9dfdbe9b6bfa8b5e
2011-02-15 00:05:29 -05:00
Yunqing Wang
7725a7eb56 Merge "Improve vp8_sad16x16_sse3 function" 2011-02-14 14:09:25 -08:00
Yaowu Xu
27dad21548 Merge "Improved vp8_rd_pick_intra_mbuv_mode" 2011-02-14 13:58:12 -08:00
Scott LaVarnway
94d4fee08f Improved vp8_rd_pick_intra_mbuv_mode
Eliminated unnecessary calculations. Very small change
to performance.

Change-Id: Ib7213d43c64e36955177c4d47950ff472266f822
2011-02-14 16:34:33 -05:00
Yunqing Wang
2debd5b5f7 Improve vp8_sad16x16_sse3 function
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).

Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-14 16:23:49 -05:00
Yaowu Xu
404e998eb7 Merge "mem leak fix for cpi->tplist" 2011-02-14 11:29:22 -08:00
James Berry
d3dfcde0f7 mem leak fix for cpi->tplist
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.

Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
2011-02-14 14:02:52 -05:00
Scott LaVarnway
d419b93e3e Improved rd_pick_intra4x4block
Eliminated unnecessary calculations.  Improved performance
by 10% on keyframes and 1.6% overall for the test clip used.

Change-Id: I87671b26af5e2cc439e81d0fee3b15c7cd2a3309
2011-02-14 13:32:58 -05:00
John Koleszar
1f8e42e7b8 Merge remote branch 'internal/upstream' into HEAD 2011-02-12 00:05:14 -05:00
John Koleszar
70dc0ed003 Merge remote branch 'origin/master' into experimental
Change-Id: I1cd33708d12bd51dfd1e78db4a7500653abc53c9
2011-02-12 00:05:11 -05:00
Yunqing Wang
353246bd60 Merge "Add improved_mv_pred flag in real-time mode" 2011-02-11 07:20:17 -08:00
Yunqing Wang
9d0b2cbbce Add improved_mv_pred flag in real-time mode
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.

Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
2011-02-11 09:59:41 -05:00
Tero Rintaluoma
1ef86980b9 ARMv6 optimized sad16x16
Adds a new ARMv6 optimized function vp8_sad16x16_armv6 to encoder.

Change-Id: Ibbd7edb8b25cb7a5b522d391b1e9a690fe150e57
2011-02-11 11:14:07 +02:00
Yaowu Xu
4f8a166058 Merge "Redefining good quality speed settings" 2011-02-10 21:38:19 -08:00
John Koleszar
64aebb6c7a Merge remote branch 'internal/upstream' into HEAD 2011-02-11 00:05:19 -05:00
John Koleszar
809dae2458 Merge remote branch 'origin/master' into experimental
Change-Id: Icf1a7c61a3b07da2ccfd94bca9e8810c01e46b2c
2011-02-11 00:05:14 -05:00
Yunqing Wang
6f53e59641 Merge "Improve motion search in real-time mode" 2011-02-10 12:42:44 -08:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
John Koleszar
ec3b8f1f32 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/onyxd_int.h

Change-Id: Id9aa577f03e37b4f406ba3b593c3c4330812a49e
2011-02-10 14:26:40 -05:00
Yunqing Wang
41e6eceb28 Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.

Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.

Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.

Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
2011-02-10 13:40:24 -05:00
Johann
7d8199f0c3 Merge "Adds armv6 optimized variance calculation" 2011-02-10 06:06:46 -08:00
John Koleszar
96ddc5c26e Merge remote branch 'origin/master' into experimental
Change-Id: Ie85d40c44bb23d56a519010356b2856c02fb4c05
2011-02-10 00:05:10 -05:00
Scott LaVarnway
19054ab6da Redefining good quality speed settings
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)

Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
2011-02-09 17:18:28 -05:00
James Berry
fffa2a61d7 fixed stride in vp8_temporal_filter_predictors_mb_c
stride would not be calculated correctly for material
with odd sized frame widths.

Change-Id: I1710f6aef9ebb93d36249c9239c68c5baa9791f8
2011-02-09 16:55:39 -05:00
John Koleszar
c2b43164bd Merge "correct cost for implicit bit in mvs" 2011-02-09 11:20:12 -08:00
John Koleszar
9954d05ca6 correct cost for implicit bit in mvs
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().

Change-Id: Ic1304d0ab56844bed8236edd1c5243a6767fc6b1
2011-02-09 12:50:17 -05:00
John Koleszar
a39b5af10b Merge "Put more code under #if CONFIG_MULTITHREAD." 2011-02-09 08:31:36 -08:00
Gaute Strokkenes
315e3c2518 Put more code under #if CONFIG_MULTITHREAD.
Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b
2011-02-09 11:21:18 -05:00
Scott LaVarnway
85e79ce288 Merge "Added early breakout for vp8_rd_pick_intra4x4mby_modes" 2011-02-09 07:55:04 -08:00
Tero Rintaluoma
cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
John Koleszar
b2ad177942 Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/vp8_common.mk

Change-Id: I2094ddf20834c0b7dfe912feac6a79500bb8cce2
2011-02-09 08:34:48 -05:00
John Koleszar
6e6b46d972 Merge remote branch 'origin/master' into experimental
Change-Id: Ibc762883a5e117f5db64dc01a46a9c78438e6c33
2011-02-09 00:05:12 -05:00
Scott LaVarnway
13db80c282 Added early breakout for vp8_rd_pick_intra4x4mby_modes
Improved performance of good quality, speed 0 (3% average)
with no average quality loss.

Change-Id: Ica34473f99bd74260eaebde6b132185e09e3c09d
2011-02-08 16:50:43 -05:00