Dmitry Kovalev
87ee34aacb
Removing unused code.
...
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.
Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
70e9622185
Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead."
2013-06-25 10:16:06 -07:00
Dmitry Kovalev
529679bd52
Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function."
2013-06-25 10:15:33 -07:00
Jingning Han
ab362621fe
Add 8x8 dct/adst unit tests
...
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.
Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-25 09:57:01 -07:00
Jingning Han
67365520e7
Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT"
2013-06-25 09:49:03 -07:00
Scott LaVarnway
c787f40bc4
Small mode_info_context cleanup in filter_block_plane
...
Unnecessary updates to xd->mode_info_context.
Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
2013-06-25 12:28:50 -04:00
Yaowu Xu
b9c934df8e
Merge "Enable sse2 implmentation of 8x8 ADST/DCT"
2013-06-25 09:13:22 -07:00
Yaowu Xu
ca976db44d
Merge "change to enable use_largest_txform feature"
2013-06-25 09:07:01 -07:00
Jingning Han
82d504b50f
Use aligned buffer operations in 8x8/16x16 2D-DCT
...
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.
Change-Id: I137758b81cd127b936175284310e81378db64552
2013-06-24 19:56:23 -07:00
Jingning Han
a32a086d23
Enable sse2 implmentation of 8x8 ADST/DCT
...
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.
The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.
Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
2013-06-24 18:41:33 -07:00
Yaowu Xu
e371cd73a3
change to enable use_largest_txform feature
...
for all regular inter frames at speed 1
Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0
2013-06-24 16:43:26 -07:00
John Koleszar
4ecd6dbead
Move vp9_counts_to_nmv_context to encoder
...
This function only used from within vp9_encodemv.c.
Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1
2013-06-24 15:58:18 -07:00
John Koleszar
08b1798ae7
Move vp9_full_to_model_counts to encoder
...
This function is not called from the decoder, so it doesn't need to be
in common/.
Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
2013-06-24 15:46:15 -07:00
John Koleszar
ece724ae16
Merge "Remove unused vp9_build_intra_predictors_sb{y,uv}_s"
2013-06-24 15:08:58 -07:00
John Koleszar
ee4a7e4e46
Merge "Remove unused vp9_model_to_full_probs_sb()"
2013-06-24 15:08:54 -07:00
Scott LaVarnway
dfa2ecc3f1
Changed size of mb_mode_context to 8 bits
...
This reduced the size of the MODE_INFO array (mip and prev_mip)
by 425,568 bytes each for 1080p resolutions.
Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
2013-06-24 17:11:16 -04:00
Ronald S. Bultje
4dc70fa7f9
Don't re-allocate comp_pred buffers for each call to comp motion search.
...
Instead, just allocate a few bytes on the stack, this is 4k, which isn't
all that much.
Change-Id: I82af6ee89e6ed01faaa23ff891ee7ced76df8c16
2013-06-24 14:05:13 -07:00
John Koleszar
858475a03a
Fix loopfilter of leftmost 4x4 edges in SB
...
For cases where there's no transform set in bit 0 (the left edge of
the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the
left edge needs filtering), it was incorrectly being skipped before.
This situation only happens on the leftmost edge of the image, as
the edge at column 0 is intentionally skipped since there aren't
pixels to the left to read.
Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3
2013-06-24 08:26:00 -07:00
John Koleszar
9e7019f7df
Remove unused vp9_build_intra_predictors_sb{y,uv}_s
...
The functions no longer referenced.
Change-Id: If2705dfbc607f79ec8ec2242d5e03bec27a35aaf
2013-06-21 16:10:05 -07:00
John Koleszar
5c32215e27
Remove unused vp9_model_to_full_probs_sb()
...
This function never referenced.
Change-Id: I1c42cd355bfa88e17d169f7335a44be682af58cc
2013-06-21 15:38:55 -07:00
Dmitry Kovalev
f27f76dfb3
Transforming scale_mv_component_q4 into scale_mv_q4 function.
...
Using MV instead of int_mv for function arguments.
Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Ronald S. Bultje
fc033b38ee
Remove emms - that shouldn't be there.
...
Change-Id: I8fcab81e390f93dc17e9666bbf8f77883b5aa897
2013-06-21 14:45:04 -07:00
Dmitry Kovalev
40141681c0
Removing find_seg_id and using vp9_get_pred_mi_segid instead.
...
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
2013-06-21 13:05:10 -07:00
Ronald S. Bultje
ba42c02654
Add missing SECTION .text marker in assembly file.
...
Fixes a crash on Windows when building with MSVC.
Change-Id: I124ac756a1be55d190fadda5fcc46d23b1445dbf
2013-06-21 12:55:46 -07:00
Ronald S. Bultje
54b2a59623
Implement SSE2 block_error.
...
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.
Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.
Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21 12:54:52 -07:00
Ronald S. Bultje
7756e9892b
Merge "Add subtract_block SSE2 version and unit test."
2013-06-21 12:49:50 -07:00
Ronald S. Bultje
9a480482cb
Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()."
2013-06-21 12:49:43 -07:00
Ronald S. Bultje
25c588b1e4
Add subtract_block SSE2 version and unit test.
...
3% faster overall (3min35.0 to 3min28.5).
Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-21 09:35:37 -07:00
Yaowu Xu
869d770610
Merge "Get some speed back for cpuused 1"
2013-06-20 22:37:01 -07:00
Yaowu Xu
45e25a7814
Get some speed back for cpuused 1
...
and remove unused code.
Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01
2013-06-20 19:05:18 -07:00
Yaowu Xu
61721181ec
Merge "rename variables to avoid build error in MSVC"
2013-06-20 19:04:30 -07:00
Yaowu Xu
ee07a261a0
rename variables to avoid build error in MSVC
...
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
2013-06-20 18:31:48 -07:00
Yaowu Xu
e6cd5ed307
Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes."
2013-06-20 17:42:50 -07:00
Ronald S. Bultje
1e6a32f1af
SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
...
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.
Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Frank Galligan
c259af4f73
Fix win64 warning.
...
- size_t vs int.
Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
2013-06-20 14:07:11 -07:00
Dmitry Kovalev
8283d893eb
Merge "Renaming 'nmv' to 'mv' for several functions."
2013-06-20 10:17:12 -07:00
Dmitry Kovalev
77186ee61a
Merge "Function decomposition inside vp9_decodemv.c file."
2013-06-20 10:17:05 -07:00
Deb Mukherjee
7947a33d72
Improving model rd with variance and quant step
...
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.
derfraw300: +0.05%
Speedup: ~1%
Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
2013-06-20 10:06:28 -07:00
Jim Bankoski
9f2a1ae23e
adds force partitioning greater than or less than block size
...
adds a new speed feature to force partitioning to be greater than
or less than a certain size
Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
2013-06-20 09:51:42 -07:00
Jim Bankoski
18bdf708e7
adds a set partitioning to speed features
...
this feature lets you set a partitioning size to be used by the entire
frame.
Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
2013-06-20 09:50:44 -07:00
Jim Bankoski
476d73d294
partition by variance using var from last frame
...
This uses variance to split partition. Variance is calculated using
nearest mv, always from last ref frame.
Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
2013-06-20 09:48:22 -07:00
Jim Bankoski
1f94b97694
convert all speed things to speed features
...
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Jim Bankoski
727fa7b1e4
new partition via variance
...
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
2013-06-20 09:42:05 -07:00
Jim Bankoski
0fad6a9d99
fix to set up new speed feature
...
This uses the speed feature functionality for code.
Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
2013-06-20 09:35:02 -07:00
Jim Bankoski
df2314cfdd
don't copy partitions for key frames or altrefs
...
force us to go through slow partitioning for keyframes, altref and
overlays.
Change-Id: I1a286361bf74083e71973575a7296be46eb98742
2013-06-20 09:34:32 -07:00
Ronald S. Bultje
8fb6c58191
Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
...
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):
sse2 4x4: 99 -> 82 cycles
sse2 4x8: 128 cycles
sse2 8x4: 121 cycles
sse2 8x8: 149 -> 129 cycles
sse2 8x16: 235 -> 245 cycles (?)
sse2 16x8: 269 -> 203 cycles
sse2 16x16: 441 -> 349 cycles
sse2 16x32: 641 cycles
sse2 32x16: 643 cycles
sse2 32x32: 1733 -> 1154 cycles
sse2 32x64: 2247 cycles
sse2 64x32: 2323 cycles
sse2 64x64: 6984 -> 4442 cycles
ssse3 4x4: 100 cycles (?)
ssse3 4x8: 103 cycles
ssse3 8x4: 71 cycles
ssse3 8x8: 147 cycles
ssse3 8x16: 158 cycles
ssse3 16x8: 188 -> 162 cycles
ssse3 16x16: 316 -> 273 cycles
ssse3 16x32: 535 cycles
ssse3 32x16: 564 cycles
ssse3 32x32: 973 cycles
ssse3 32x64: 1930 cycles
ssse3 64x32: 1922 cycles
ssse3 64x64: 3760 cycles
Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
Jim Bankoski
f954490bbf
disable speed > 1 speed corrections in firstpass
...
need to rework these
Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
2013-06-20 09:34:03 -07:00
Jim Bankoski
2c6bdbbc78
new debug modes code
...
The new print out includes skips and has prefixed sections so you can
grep to find things like transforms chosen on each frame.
Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
2013-06-20 09:33:11 -07:00
Jim Bankoski
fbcce4dd6f
Merge "copy partitioning from last fame"
2013-06-20 09:32:43 -07:00
Jim Bankoski
f033b44e74
copy partitioning from last fame
...
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
2013-06-20 09:32:19 -07:00