Commit Graph

5401 Commits

Author SHA1 Message Date
Dmitry Kovalev
8283d893eb Merge "Renaming 'nmv' to 'mv' for several functions." 2013-06-20 10:17:12 -07:00
Dmitry Kovalev
77186ee61a Merge "Function decomposition inside vp9_decodemv.c file." 2013-06-20 10:17:05 -07:00
Deb Mukherjee
7947a33d72 Improving model rd with variance and quant step
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.

derfraw300: +0.05%
Speedup: ~1%

Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
2013-06-20 10:06:28 -07:00
Johann
d94aee6854 Cast value to avoid size_t/int warning on win64
dboolhuff.c(50) : warning C4267: 'initializing' : conversion from
'size_t' to 'int'

Change-Id: I6b85759efb2fa19f362f406623d8a7583a55c036
2013-06-20 09:52:08 -07:00
Jim Bankoski
9f2a1ae23e adds force partitioning greater than or less than block size
adds a new speed feature to force partitioning to be greater than
or less than a certain size

Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
2013-06-20 09:51:42 -07:00
Jim Bankoski
18bdf708e7 adds a set partitioning to speed features
this feature lets you set a partitioning size to be used by the entire
frame.

Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
2013-06-20 09:50:44 -07:00
Jim Bankoski
476d73d294 partition by variance using var from last frame
This uses variance to split partition. Variance is calculated using
nearest mv,  always from last ref frame.

Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
2013-06-20 09:48:22 -07:00
Jim Bankoski
1f94b97694 convert all speed things to speed features
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Jim Bankoski
727fa7b1e4 new partition via variance
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
2013-06-20 09:42:05 -07:00
Jim Bankoski
0fad6a9d99 fix to set up new speed feature
This uses the speed feature functionality for code.

Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
2013-06-20 09:35:02 -07:00
Jim Bankoski
df2314cfdd don't copy partitions for key frames or altrefs
force us to go through slow partitioning for keyframes, altref and
overlays.

Change-Id: I1a286361bf74083e71973575a7296be46eb98742
2013-06-20 09:34:32 -07:00
Ronald S. Bultje
8fb6c58191 Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
Jim Bankoski
f954490bbf disable speed > 1 speed corrections in firstpass
need to rework these

Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
2013-06-20 09:34:03 -07:00
Jim Bankoski
2c6bdbbc78 new debug modes code
The new print out includes skips and has prefixed sections so you can
grep to find things like transforms chosen on each frame.

Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
2013-06-20 09:33:11 -07:00
Jim Bankoski
fbcce4dd6f Merge "copy partitioning from last fame" 2013-06-20 09:32:43 -07:00
Jim Bankoski
f033b44e74 copy partitioning from last fame
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
2013-06-20 09:32:19 -07:00
Jingning Han
362809dfbf Add unit tests for 4x4 ADST
Enable sign bias check and round-trip error unit tests for 4x4 hybrid
transform modules.

Change-Id: Icd3d839f098d4b92b00ff76eac146765b039d0d3
2013-06-20 09:24:48 -07:00
John Koleszar
db938c2988 Merge "test_libvpx: disable pthreads in gtest" 2013-06-20 09:05:58 -07:00
Yaowu Xu
6e3b34bdc3 Removed a number of unnecessary check on ref_frame
Since intra block decoding is handled by decode_sb_intra() separately.

Change-Id: I42d757884714084c92fc23ec5d35d4dc946f4b15
2013-06-19 17:53:07 -07:00
Dmitry Kovalev
15eaba103d Function decomposition inside vp9_decodemv.c file.
Change-Id: Iab96e6a50aec543c63e15cd134f9d5f01ca7ceff
2013-06-19 13:09:34 -07:00
James Zern
90a9900abb test_libvpx: disable pthreads in gtest
currently threading is internal to libvpx so thread safety is unneeded
in libgtest -- visual studio builds already operate in this way as they
do not have pthread.h available by default.

this removes an unconditional link to libpthread using $(extralibs)
should libvpx require it.

Change-Id: Ieae1d693406653a54b54fba818c598836797d33b
2013-06-19 11:50:20 -07:00
John Koleszar
639db571df Add some unaligned test vectors
Tests resolutions of 8, 10, 16, 18, 32, 34, 64, 66 to exercise the
border conditions, as well as non-SB aligned sizes.

Change-Id: Ie7c2b7860ac3727e23202042f2e86792652912f8
2013-06-19 11:46:09 -07:00
Yunqing Wang
3656835771 Merge "Add two-pass quantization" 2013-06-19 11:35:40 -07:00
Yunqing Wang
b5bf7b13a8 Add two-pass quantization
Optimized the quantization function by making it a two-pass
process. The first pass does a quick checking of the transform
coefficients against the base ZBIN, and only keep the good
enough set of coefficients for quantization. A skipping
check is added. If all coefficients are within the base ZBIN, no
quantization is needed. The second pass is the actual quantization
pass, which only processes the coefficient subset determined
in first pass. This reduces the computation. Furthermore, an
alternitive method is used for large transform size, which often
has sparse nonzero quantized coefficients.

Overall, the encoder speedup is about 4%. The quantization function
itself gets 20% faster.

Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
2013-06-19 10:35:02 -07:00
Yaowu Xu
12180c8329 Remove unnecessary copying of probs.
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
2013-06-18 23:02:27 -07:00
Dmitry Kovalev
87e1fa7627 Renaming 'nmv' to 'mv' for several functions.
Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
2013-06-18 18:28:10 -07:00
John Koleszar
2319b7aaf1 Merge "tests: clear system state after non-API calls" 2013-06-18 16:40:15 -07:00
Jingning Han
7088426976 Merge "Make fdct32 computation flow within 16bit range" 2013-06-18 11:40:14 -07:00
James Zern
5b756748fd tests: clear system state after non-API calls
add ClearSystemState() to reset MMX registers avoiding corrupting
subsequent tests.

Change-Id: I668deb09aa7aa467709776e5819f936910698bc0
2013-06-18 11:32:27 -07:00
Dmitry Kovalev
f231a3edee Merge "Code cleanup inside the decoder code." 2013-06-18 10:16:46 -07:00
Dmitry Kovalev
dfc0385291 Merge "Removing vp9_invtrans.{c, h} files." 2013-06-18 10:16:25 -07:00
Jingning Han
a41a4860c0 Make fdct32 computation flow within 16bit range
This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.

This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.

Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
2013-06-18 09:46:24 -07:00
Ronald S. Bultje
9524765557 Merge "Move subpixel variance function from common/ to encoder/." 2013-06-18 09:07:21 -07:00
Ronald S. Bultje
b1a17e4ba7 Merge "Use assembly-optimized variance functions in sub_pixel_{avg}_var()." 2013-06-18 09:07:11 -07:00
John Koleszar
a15ca3fc0a Merge "vpx_ports/x86.h: de-dup #elif block" 2013-06-18 08:26:54 -07:00
James Zern
e7b599f683 convolve_test: align filter arrays
fixes issue #583

Change-Id: I4b855a5b5b168c8961410cef6ab5e6d86f14d301
2013-06-17 23:14:15 -07:00
James Zern
9fb6f40677 vpx_ports/x86.h: de-dup #elif block
Change-Id: I052647e13dd24354888c890f6b4a987d989552ae
2013-06-17 21:58:00 -07:00
Dmitry Kovalev
6f06450cec Code cleanup inside the decoder code.
Change-Id: I927c7223996cdeb44f46e0e6c2e2054d458c300b
2013-06-17 17:19:00 -07:00
Ronald S. Bultje
d9fc451666 Move subpixel variance function from common/ to encoder/.
This seems to only be used in the encoder. Also remove an empty wrapper
file that contained forward declarations for this function, but didn't
actually define any actual functions.

Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b
2013-06-17 16:54:09 -07:00
Dmitry Kovalev
686b99741c Removing vp9_invtrans.{c, h} files.
Moving single function from vp9_invtrans.c to vp9_encodemb.c.

Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7
2013-06-17 16:09:03 -07:00
Ronald S. Bultje
a2f33e2505 Use assembly-optimized variance functions in sub_pixel_{avg}_var().
2.5% faster when encoding first 50 frames of bus @ 1500kbps.

Change-Id: I5a64703996cf7fd39b07e32c72311c4b125ec6d4
2013-06-17 14:57:13 -07:00
Dmitry Kovalev
b1caa7c59c Merge "Fixing compilation error on Mac OS." 2013-06-17 14:40:29 -07:00
Ronald S. Bultje
d1bfa55d68 Merge "Fix typo ('weight' instead of 'width')." 2013-06-17 14:30:57 -07:00
Ronald S. Bultje
53729c7786 Fix typo ('weight' instead of 'width').
Change-Id: I5d3944051d091b4bf3eb13e2a30132d34203ef74
2013-06-17 13:56:24 -07:00
Dmitry Kovalev
ccd9886ddc Fixing compilation error on Mac OS.
The error happened because of vp8_decrypt_cb typedef redefinition in both
treereader.h and vp8dx.h. Removing typedef from vp8dx.h in favor of raw
function pointer declaration.

Change-Id: I0266eb341ce433d40caf0abf8748694d505ee786
2013-06-17 13:50:22 -07:00
John Koleszar
859a474718 Merge "Removed hardcoded global->limit" 2013-06-17 12:33:53 -07:00
Scott LaVarnway
0450a8891a Removed hardcoded global->limit
Looks like test code.

Change-Id: I5deae2bf14ea6fdcbb9b9d993966c9abef95eb2e
2013-06-17 15:28:45 -04:00
Jeff Petkau
368c72374e Change the encryption feature to use a callback for decryption.
This allows code calling the library can choose an arbitrary
encryption algorithm.

Decoder control parameter VP8_SET_DECRYPT_KEY is renamed to
VP8D_SET_DECRYPTOR, and now takes an small config struct instead
of just a byte array.

Change-Id: I0462b3388d8d45057e4f79a6b6777fe713dc546e
2013-06-17 11:32:16 -07:00
John Koleszar
f616cfe4d7 Merge "Add vp9 test vectors unit test" 2013-06-17 10:32:08 -07:00
John Koleszar
61ecc282b5 Merge "Remove unused need_to_clamp_mvs" 2013-06-17 10:31:58 -07:00