5339 Commits

Author SHA1 Message Date
Dmitry Kovalev
6fb10f2de4 Renaming "nmv" to "mv".
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
2013-06-25 15:19:18 -07:00
Dmitry Kovalev
dc0f457c94 Using get_plane_block_{width, height} instead of custom code.
Change-Id: I453ed11b965e857a14c18ea5c0f4a0a48e7dc0d9
2013-06-25 14:11:18 -07:00
Ronald S. Bultje
0441e0a2fc Merge "Only do metrics on cropped (visible) area of picture." 2013-06-25 13:51:18 -07:00
Ronald S. Bultje
1d0ae2e63c Merge "Don't skip right/bottom border pixels in SSIM calculations." 2013-06-25 13:51:04 -07:00
Ronald S. Bultje
c5be54eef3 Merge "Add averaging-SAD functions for 8-point comp-inter motion search." 2013-06-25 13:50:53 -07:00
Jingning Han
d52c359d43 Merge "Tune the rounding operations in 8x8 ADST/DCT sse2" 2013-06-25 13:17:05 -07:00
Ronald S. Bultje
450c7b57a8 Only do metrics on cropped (visible) area of picture.
The part where we align it by 8 or 16 is an implementation detail that
shouldn't matter to the outside world.

Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
44f349df62 Don't skip right/bottom border pixels in SSIM calculations.
Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
c24d922396 Add averaging-SAD functions for 8-point comp-inter motion search.
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.

Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
2013-06-25 12:57:28 -07:00
Jingning Han
0084e61d5f Tune the rounding operations in 8x8 ADST/DCT sse2
Improve the round-trip precision to meet the unit test setttings.

Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
2013-06-25 12:02:26 -07:00
Ronald S. Bultje
5ebe47747d Merge "Don't re-allocate comp_pred buffers for each call to comp motion search." 2013-06-25 12:00:36 -07:00
Dmitry Kovalev
9467571777 Moving subexp encoding functions in separate vp9_dsubexp.c file.
Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8
2013-06-25 11:53:17 -07:00
Dmitry Kovalev
5ae096778e Merge "Removing unused code." 2013-06-25 11:50:55 -07:00
Jingning Han
cd6932db77 Merge "Add 8x8 dct/adst unit tests" 2013-06-25 11:21:17 -07:00
Dmitry Kovalev
87ee34aacb Removing unused code.
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.

Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
70e9622185 Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead." 2013-06-25 10:16:06 -07:00
Dmitry Kovalev
529679bd52 Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function." 2013-06-25 10:15:33 -07:00
Jingning Han
ab362621fe Add 8x8 dct/adst unit tests
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.

Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-25 09:57:01 -07:00
Jingning Han
67365520e7 Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT" 2013-06-25 09:49:03 -07:00
Yaowu Xu
b9c934df8e Merge "Enable sse2 implmentation of 8x8 ADST/DCT" 2013-06-25 09:13:22 -07:00
Jingning Han
82d504b50f Use aligned buffer operations in 8x8/16x16 2D-DCT
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.

Change-Id: I137758b81cd127b936175284310e81378db64552
2013-06-24 19:56:23 -07:00
Jingning Han
a32a086d23 Enable sse2 implmentation of 8x8 ADST/DCT
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.

The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.

Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
2013-06-24 18:41:33 -07:00
Yaowu Xu
e371cd73a3 change to enable use_largest_txform feature
for all regular inter frames at speed 1

Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0
2013-06-24 16:43:26 -07:00
John Koleszar
4ecd6dbead Move vp9_counts_to_nmv_context to encoder
This function only used from within vp9_encodemv.c.

Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1
2013-06-24 15:58:18 -07:00
John Koleszar
08b1798ae7 Move vp9_full_to_model_counts to encoder
This function is not called from the decoder, so it doesn't need to be
in common/.

Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
2013-06-24 15:46:15 -07:00
Ronald S. Bultje
4dc70fa7f9 Don't re-allocate comp_pred buffers for each call to comp motion search.
Instead, just allocate a few bytes on the stack, this is 4k, which isn't
all that much.

Change-Id: I82af6ee89e6ed01faaa23ff891ee7ced76df8c16
2013-06-24 14:05:13 -07:00
Dmitry Kovalev
f27f76dfb3 Transforming scale_mv_component_q4 into scale_mv_q4 function.
Using MV instead of int_mv for function arguments.

Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Ronald S. Bultje
fc033b38ee Remove emms - that shouldn't be there.
Change-Id: I8fcab81e390f93dc17e9666bbf8f77883b5aa897
2013-06-21 14:45:04 -07:00
Dmitry Kovalev
40141681c0 Removing find_seg_id and using vp9_get_pred_mi_segid instead.
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
2013-06-21 13:05:10 -07:00
Ronald S. Bultje
ba42c02654 Add missing SECTION .text marker in assembly file.
Fixes a crash on Windows when building with MSVC.

Change-Id: I124ac756a1be55d190fadda5fcc46d23b1445dbf
2013-06-21 12:55:46 -07:00
Ronald S. Bultje
54b2a59623 Implement SSE2 block_error.
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.

Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.

Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21 12:54:52 -07:00
Ronald S. Bultje
7756e9892b Merge "Add subtract_block SSE2 version and unit test." 2013-06-21 12:49:50 -07:00
Ronald S. Bultje
9a480482cb Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()." 2013-06-21 12:49:43 -07:00
Ronald S. Bultje
25c588b1e4 Add subtract_block SSE2 version and unit test.
3% faster overall (3min35.0 to 3min28.5).

Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-21 09:35:37 -07:00
Yaowu Xu
869d770610 Merge "Get some speed back for cpuused 1" 2013-06-20 22:37:01 -07:00
Yaowu Xu
45e25a7814 Get some speed back for cpuused 1
and remove unused code.

Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01
2013-06-20 19:05:18 -07:00
Yaowu Xu
61721181ec Merge "rename variables to avoid build error in MSVC" 2013-06-20 19:04:30 -07:00
Yaowu Xu
ee07a261a0 rename variables to avoid build error in MSVC
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
2013-06-20 18:31:48 -07:00
Yaowu Xu
e6cd5ed307 Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes." 2013-06-20 17:42:50 -07:00
Ronald S. Bultje
1e6a32f1af SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.

Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Dmitry Kovalev
8283d893eb Merge "Renaming 'nmv' to 'mv' for several functions." 2013-06-20 10:17:12 -07:00
Deb Mukherjee
7947a33d72 Improving model rd with variance and quant step
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.

derfraw300: +0.05%
Speedup: ~1%

Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
2013-06-20 10:06:28 -07:00
Jim Bankoski
9f2a1ae23e adds force partitioning greater than or less than block size
adds a new speed feature to force partitioning to be greater than
or less than a certain size

Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
2013-06-20 09:51:42 -07:00
Jim Bankoski
18bdf708e7 adds a set partitioning to speed features
this feature lets you set a partitioning size to be used by the entire
frame.

Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
2013-06-20 09:50:44 -07:00
Jim Bankoski
476d73d294 partition by variance using var from last frame
This uses variance to split partition. Variance is calculated using
nearest mv,  always from last ref frame.

Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
2013-06-20 09:48:22 -07:00
Jim Bankoski
1f94b97694 convert all speed things to speed features
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Jim Bankoski
727fa7b1e4 new partition via variance
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
2013-06-20 09:42:05 -07:00
Jim Bankoski
0fad6a9d99 fix to set up new speed feature
This uses the speed feature functionality for code.

Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
2013-06-20 09:35:02 -07:00
Jim Bankoski
df2314cfdd don't copy partitions for key frames or altrefs
force us to go through slow partitioning for keyframes, altref and
overlays.

Change-Id: I1a286361bf74083e71973575a7296be46eb98742
2013-06-20 09:34:32 -07:00
Ronald S. Bultje
8fb6c58191 Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00