Ronald S. Bultje
1d0ae2e63c
Merge "Don't skip right/bottom border pixels in SSIM calculations."
2013-06-25 13:51:04 -07:00
Ronald S. Bultje
c5be54eef3
Merge "Add averaging-SAD functions for 8-point comp-inter motion search."
2013-06-25 13:50:53 -07:00
Jingning Han
d52c359d43
Merge "Tune the rounding operations in 8x8 ADST/DCT sse2"
2013-06-25 13:17:05 -07:00
Ronald S. Bultje
450c7b57a8
Only do metrics on cropped (visible) area of picture.
...
The part where we align it by 8 or 16 is an implementation detail that
shouldn't matter to the outside world.
Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
44f349df62
Don't skip right/bottom border pixels in SSIM calculations.
...
Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
2013-06-25 12:57:28 -07:00
Ronald S. Bultje
c24d922396
Add averaging-SAD functions for 8-point comp-inter motion search.
...
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.
Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
2013-06-25 12:57:28 -07:00
Jingning Han
0084e61d5f
Tune the rounding operations in 8x8 ADST/DCT sse2
...
Improve the round-trip precision to meet the unit test setttings.
Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79
2013-06-25 12:02:26 -07:00
Ronald S. Bultje
5ebe47747d
Merge "Don't re-allocate comp_pred buffers for each call to comp motion search."
2013-06-25 12:00:36 -07:00
Dmitry Kovalev
9467571777
Moving subexp encoding functions in separate vp9_dsubexp.c file.
...
Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8
2013-06-25 11:53:17 -07:00
Dmitry Kovalev
5ae096778e
Merge "Removing unused code."
2013-06-25 11:50:55 -07:00
Jingning Han
cd6932db77
Merge "Add 8x8 dct/adst unit tests"
2013-06-25 11:21:17 -07:00
Dmitry Kovalev
87ee34aacb
Removing unused code.
...
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.
Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
70e9622185
Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead."
2013-06-25 10:16:06 -07:00
Dmitry Kovalev
529679bd52
Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function."
2013-06-25 10:15:33 -07:00
Jingning Han
ab362621fe
Add 8x8 dct/adst unit tests
...
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.
Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-25 09:57:01 -07:00
Jingning Han
67365520e7
Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT"
2013-06-25 09:49:03 -07:00
Yaowu Xu
b9c934df8e
Merge "Enable sse2 implmentation of 8x8 ADST/DCT"
2013-06-25 09:13:22 -07:00
Jingning Han
82d504b50f
Use aligned buffer operations in 8x8/16x16 2D-DCT
...
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
Change-Id: I137758b81cd127b936175284310e81378db64552
2013-06-24 19:56:23 -07:00
Jingning Han
a32a086d23
Enable sse2 implmentation of 8x8 ADST/DCT
...
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.
The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.
Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
2013-06-24 18:41:33 -07:00
Yaowu Xu
e371cd73a3
change to enable use_largest_txform feature
...
for all regular inter frames at speed 1
Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0
2013-06-24 16:43:26 -07:00
John Koleszar
4ecd6dbead
Move vp9_counts_to_nmv_context to encoder
...
This function only used from within vp9_encodemv.c.
Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1
2013-06-24 15:58:18 -07:00
John Koleszar
08b1798ae7
Move vp9_full_to_model_counts to encoder
...
This function is not called from the decoder, so it doesn't need to be
in common/.
Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
2013-06-24 15:46:15 -07:00
Ronald S. Bultje
4dc70fa7f9
Don't re-allocate comp_pred buffers for each call to comp motion search.
...
Instead, just allocate a few bytes on the stack, this is 4k, which isn't
all that much.
Change-Id: I82af6ee89e6ed01faaa23ff891ee7ced76df8c16
2013-06-24 14:05:13 -07:00
Dmitry Kovalev
f27f76dfb3
Transforming scale_mv_component_q4 into scale_mv_q4 function.
...
Using MV instead of int_mv for function arguments.
Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Ronald S. Bultje
fc033b38ee
Remove emms - that shouldn't be there.
...
Change-Id: I8fcab81e390f93dc17e9666bbf8f77883b5aa897
2013-06-21 14:45:04 -07:00
Dmitry Kovalev
40141681c0
Removing find_seg_id and using vp9_get_pred_mi_segid instead.
...
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
2013-06-21 13:05:10 -07:00
Ronald S. Bultje
ba42c02654
Add missing SECTION .text marker in assembly file.
...
Fixes a crash on Windows when building with MSVC.
Change-Id: I124ac756a1be55d190fadda5fcc46d23b1445dbf
2013-06-21 12:55:46 -07:00
Ronald S. Bultje
54b2a59623
Implement SSE2 block_error.
...
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.
Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.
Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21 12:54:52 -07:00
Ronald S. Bultje
7756e9892b
Merge "Add subtract_block SSE2 version and unit test."
2013-06-21 12:49:50 -07:00
Ronald S. Bultje
9a480482cb
Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()."
2013-06-21 12:49:43 -07:00
Ronald S. Bultje
25c588b1e4
Add subtract_block SSE2 version and unit test.
...
3% faster overall (3min35.0 to 3min28.5).
Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-21 09:35:37 -07:00
Yaowu Xu
869d770610
Merge "Get some speed back for cpuused 1"
2013-06-20 22:37:01 -07:00
Yaowu Xu
45e25a7814
Get some speed back for cpuused 1
...
and remove unused code.
Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01
2013-06-20 19:05:18 -07:00
Yaowu Xu
61721181ec
Merge "rename variables to avoid build error in MSVC"
2013-06-20 19:04:30 -07:00
Yaowu Xu
ee07a261a0
rename variables to avoid build error in MSVC
...
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
2013-06-20 18:31:48 -07:00
Yaowu Xu
e6cd5ed307
Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes."
2013-06-20 17:42:50 -07:00
Ronald S. Bultje
1e6a32f1af
SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
...
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.
Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Dmitry Kovalev
8283d893eb
Merge "Renaming 'nmv' to 'mv' for several functions."
2013-06-20 10:17:12 -07:00
Deb Mukherjee
7947a33d72
Improving model rd with variance and quant step
...
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.
derfraw300: +0.05%
Speedup: ~1%
Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
2013-06-20 10:06:28 -07:00
Jim Bankoski
9f2a1ae23e
adds force partitioning greater than or less than block size
...
adds a new speed feature to force partitioning to be greater than
or less than a certain size
Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
2013-06-20 09:51:42 -07:00
Jim Bankoski
18bdf708e7
adds a set partitioning to speed features
...
this feature lets you set a partitioning size to be used by the entire
frame.
Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
2013-06-20 09:50:44 -07:00
Jim Bankoski
476d73d294
partition by variance using var from last frame
...
This uses variance to split partition. Variance is calculated using
nearest mv, always from last ref frame.
Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
2013-06-20 09:48:22 -07:00
Jim Bankoski
1f94b97694
convert all speed things to speed features
...
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Jim Bankoski
727fa7b1e4
new partition via variance
...
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
2013-06-20 09:42:05 -07:00
Jim Bankoski
0fad6a9d99
fix to set up new speed feature
...
This uses the speed feature functionality for code.
Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
2013-06-20 09:35:02 -07:00
Jim Bankoski
df2314cfdd
don't copy partitions for key frames or altrefs
...
force us to go through slow partitioning for keyframes, altref and
overlays.
Change-Id: I1a286361bf74083e71973575a7296be46eb98742
2013-06-20 09:34:32 -07:00
Ronald S. Bultje
8fb6c58191
Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
...
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):
sse2 4x4: 99 -> 82 cycles
sse2 4x8: 128 cycles
sse2 8x4: 121 cycles
sse2 8x8: 149 -> 129 cycles
sse2 8x16: 235 -> 245 cycles (?)
sse2 16x8: 269 -> 203 cycles
sse2 16x16: 441 -> 349 cycles
sse2 16x32: 641 cycles
sse2 32x16: 643 cycles
sse2 32x32: 1733 -> 1154 cycles
sse2 32x64: 2247 cycles
sse2 64x32: 2323 cycles
sse2 64x64: 6984 -> 4442 cycles
ssse3 4x4: 100 cycles (?)
ssse3 4x8: 103 cycles
ssse3 8x4: 71 cycles
ssse3 8x8: 147 cycles
ssse3 8x16: 158 cycles
ssse3 16x8: 188 -> 162 cycles
ssse3 16x16: 316 -> 273 cycles
ssse3 16x32: 535 cycles
ssse3 32x16: 564 cycles
ssse3 32x32: 973 cycles
ssse3 32x64: 1930 cycles
ssse3 64x32: 1922 cycles
ssse3 64x64: 3760 cycles
Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
Jim Bankoski
f954490bbf
disable speed > 1 speed corrections in firstpass
...
need to rework these
Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
2013-06-20 09:34:03 -07:00
Jim Bankoski
fbcce4dd6f
Merge "copy partitioning from last fame"
2013-06-20 09:32:43 -07:00
Jim Bankoski
f033b44e74
copy partitioning from last fame
...
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
2013-06-20 09:32:19 -07:00