2581 Commits

Author SHA1 Message Date
Scott LaVarnway
63a77cbed9 Merge "Remove usage of predict buffer for decode" 2011-10-19 10:24:48 -07:00
Scott LaVarnway
ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Yaowu Xu
152ce6b2b9 fixed the wrong rounding in inverse haar transform
Given the current forward haar transform:
 f0 = I0 + I1 + I2 + I3
 f1 = I0 + I1 - I2 - I3
 f2 = I0 - I1 + I2 - I3
 f3 = I0 - I1 - I2 + I3
the output of the inverse haar prior rounding:
 i0 = f0 + f1 + f2 + f3 = I0 * 4;
 i1 = f0 + f1 - f2 - f3 = I1 * 4;
 i2 = f0 - f1 + f2 - f3 = I2 * 4;
 i3 = f0 - f1 - f2 + f3 = I3 * 4;
As all the numbers are 4 multiples, simply >>2 always produces prefect
results in term of forward-inverse transform round trip error.

Change-Id: Id6658b00ea819ee61cfeef8c5985d4cd3e77f44e
2011-10-14 09:33:54 -07:00
Attila Nagy
a5cd42feb9 Fix: vp8cx_pack_tokens_into_partitions_armv5 crash
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.

Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
2011-10-14 10:53:04 +03:00
Adrian Grange
04182a121a Merge "Added rate-targeted temporal scalability" 2011-10-11 12:54:52 -07:00
Adrian Grange
217591fde5 Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.

The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)

Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
2011-10-11 12:49:12 -07:00
John Koleszar
07ba411914 Reset FPU state after calc_plane_error()
Fixes a MMX/SSE2 mismatch when building with --enable-internal-stats.

Change-Id: I0c50a1f246f6916b7a5fc6f36864ceb362f25520
2011-10-11 08:43:30 -07:00
James Berry
05bde9d4a4 bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.

Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
2011-10-10 12:16:55 -04:00
Attila Nagy
c0de35b413 enc: save entropy probs only when needed for refresh
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.

Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
2011-10-10 13:44:54 +03:00
Yaowu Xu
3ca849691c fixed a decoder bug
When 8x8 transform is enabled, the decoder does an extra reconstruct
on MBs that are coded using 8x8. This commit fixed the logic around
the decoding of mb encoded with 8x8 transform.

Change-Id: I6926557c9ef00eecb375f62946f7e140c660bf6f
2011-10-08 15:48:53 -07:00
Scott LaVarnway
af12c23e8e Merge "Improved tokenize" 2011-10-04 09:57:42 -07:00
John Koleszar
8f8b526b54 Merge "Fix uninitialized new_mv_count in first pass file" 2011-10-04 07:40:49 -07:00
Yunqing Wang
538865dfa5 Merge "Multithreaded encoder, late sync loopfilter" 2011-10-04 07:04:30 -07:00
John Koleszar
86712c50f2 Fix uninitialized new_mv_count in first pass file
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.

Also fix a number of compiler warnings.

Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
2011-10-04 09:50:52 -04:00
Johann
2aa408524c Merge "Reduce computational complexity of generic C loop filter." 2011-09-30 16:17:56 -07:00
Johann
48b1917112 Merge "combine loopfilter data access" 2011-09-30 15:47:56 -07:00
Scott LaVarnway
ab00d209bc Improved tokenize
For a realtime HD encodings, up to 1.6% gains seen.



Change-Id: If45028e23db95124da63f9d38ffe06e05596cc6e
2011-09-30 12:49:46 -04:00
Paul Wilkins
156b221a7f Segment coding of mode and reference frame.
Proof of concept test code that encodes mode and reference
frame data at the segment level.

Decode-able bit stream but some issues not yet resolved.
As it this helps a little on a couple of clips but hurts on most as
the basis for segmentation is unsound.

To build and test, configure with
--enable-experimental --enable-segfeatures

Change-Id: I22a60774f69273523fb152db8c31f4b10b07c7f4
2011-09-30 16:45:16 +01:00
Paul Wilkins
45e49e6e19 Experimental: segfeature added.
New setting added to configure script
2011-09-30 16:08:37 +01:00
Johann
3556deaca3 combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
2011-09-30 07:38:35 -07:00
Alpha Lam
7bce513afe Call vp8_find_near_mvs lazily
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.

Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.

Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
2011-09-30 14:48:18 +01:00
Paul Wilkins
a572ac8327 Merge "CQ and two pass rate control." 2011-09-30 02:57:54 -07:00
Paul Wilkins
b6e27d5f0b CQ and two pass rate control.
Changes to the selection of Q limits for two pass
and two pass CQ mode.

Allowance made for Mode and motion vector costs.
Some refactoring of common code.

For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).

Some increased tendency to undershoot even when
user CQ not reached.

Patch2: Removed some test code accidentally merged.

Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
2011-09-30 10:55:52 +01:00
Aaron Watry
69aa303d96 Reduce computational complexity of generic C loop filter.
Change-Id: I1e7f9ed3cd907844a495b9e0073bc140b87e5c06
2011-09-29 17:25:48 -05:00
Attila Nagy
380d64ecb1 Multithreaded encoder, late sync loopfilter
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.

Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
2011-09-29 10:06:24 +03:00
John Koleszar
6f9457ec12 Merge "clamp_mvs() using the wrong motion vector information" 2011-09-22 11:54:15 -07:00
John Koleszar
3c85c532bb Merge changes Ie650e9b8,I2427e494
* changes:
  vpxenc: get version string programatically
  Install missing default_coef_probs.h
2011-09-22 11:18:00 -07:00
Johann
9f41a8b0aa Merge "Replace vpx_ports/config.h with vpx_config.h" 2011-09-22 09:30:18 -07:00
John Koleszar
4a6ac727fe Install missing default_coef_probs.h
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.

Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
2011-09-22 11:08:24 -04:00
Attila Nagy
1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00
John Koleszar
305084d5fa Merge remote branch 'internal/upstream' into HEAD 2011-09-21 00:05:04 -04:00
Fritz Koenig
bd0c3409a8 Move neon only arm functions under arm/neon.
These files don't contain generic arm code, so should
only be compiled by neon.

Change-Id: Ie712823aa04d4235e7cfe7a3b725e73ee4c3e564
2011-09-20 10:51:06 -07:00
Johann
6829e62718 Merge "NEON FDCT updated to match current C code" 2011-09-20 09:51:05 -07:00
Johann
86e07525d5 Merge "NEON walsh transform updated to match C" 2011-09-20 09:50:42 -07:00
Johann
3a16276cf7 Merge "Updated ARMv6 forward transforms to match C" 2011-09-20 09:50:36 -07:00
Johann
fdd51829b1 Merge "Fixed armv5te multiplications" 2011-09-20 09:50:19 -07:00
Tero Rintaluoma
0c2529a812 NEON FDCT updated to match current C code
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9

Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
2011-09-20 10:20:55 +03:00
Tero Rintaluoma
3c19bc3fb3 Fixed armv5te multiplications
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.

Restriction applies only to armv5 targets and not for armv6 and above.

Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
2011-09-20 09:59:27 +03:00
John Koleszar
feea724296 Merge remote branch 'internal/upstream' into HEAD 2011-09-20 00:05:04 -04:00
Stefan Holmer
e529a825f7 Fix necessary for input partitions iface to match the RTP profile
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.

Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.

Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
2011-09-19 15:00:21 +02:00
Tero Rintaluoma
4c3ad66b7f Updated ARMv6 forward transforms to match C
- Updated walsh transform to match C
  (based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
  correspondingly

Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
2011-09-19 10:26:59 +03:00
Tero Rintaluoma
2a4b2a000c NEON walsh transform updated to match C
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.

Change-Id: I338f02b881098782f82af63d97f042b85e63e902
2011-09-19 10:15:33 +03:00
John Koleszar
f3fce80954 Merge remote branch 'internal/upstream' into HEAD 2011-09-17 00:05:04 -04:00
Yaowu Xu
1d44e7ce1f enable selecting&transmitting to for intra mode entropy
This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.

Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding

To build and test, configure with
--enable-experimental --enable-qimode

Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a
2011-09-16 16:33:19 -07:00
Yaowu Xu
aac2c12663 add quantizer adaptive intra mb mode encoding
make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.

To build and test, configure with
--enable-experimental --enable-qimode

Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a
2011-09-16 16:26:35 -07:00
Yaowu Xu
ca6b85aa4e add 8x8 intra prediction modes
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.

Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html

To build and test, configure with:
--enable-experimental --enable-i8x8

Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
2011-09-16 15:55:19 -07:00
John Koleszar
35ce4eb01d Merge "Fixes the boundary checks for extrapolated and interpolated MVs." 2011-09-16 08:09:44 -07:00
Scott LaVarnway
c0ee870b0a clamp_mvs() using the wrong motion vector information
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated.  The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values.  This
patch fixes this problem.

Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
2011-09-16 11:03:53 -04:00
John Koleszar
62371d382a Merge remote branch 'internal/upstream' into HEAD
Conflicts:
	vp8/decoder/decodframe.c
	vp8/encoder/encodeframe.c
	vp8/encoder/encodemb.c

Change-Id: I6e0d1669e4409a2dfd73ba2c7038d730842d3953
2011-09-16 09:22:29 -04:00
Stefan Holmer
b854bbd844 Fixes the boundary checks for extrapolated and interpolated MVs.
Change-Id: I5b47d39d1604f2650d2f2d1ca2a3f40843c8e1ea
2011-09-16 11:58:57 +02:00