hkuang
f5574fb44c
Merge "Add more sse2 code for intra prediction."
2015-05-08 17:26:30 +00:00
Parag Salasakar
7c5f00f868
mips msa vp9 idct 8x8 optimization
...
average improvement ~4x-6x
Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91
2015-05-08 12:23:27 +05:30
Parag Salasakar
a8a9c2bb45
Merge "mips msa vp9 idct 32x32 optimization"
2015-05-08 04:27:44 +00:00
James Zern
7e55ff1593
build_intra_predictors*: reduce left_col size
...
this should only need to be the size of the largest block, i.e., 32, not
64.
Change-Id: Ib8cb2424771fdd2a64c55379597248b2722a5ceb
2015-05-07 16:16:42 -07:00
James Zern
fd3658b0e4
replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED
...
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
2015-05-07 11:55:08 -07:00
Johann
76a08210b6
Merge "Move shared SAD code to vpx_dsp"
2015-05-07 18:33:06 +00:00
hkuang
086934136b
Merge "Remove an unnecessary check."
2015-05-07 15:51:11 +00:00
Parag Salasakar
1601c1385a
mips msa vp9 idct 32x32 optimization
...
average improvement ~4x-6x
Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17
2015-05-07 12:42:23 +05:30
hkuang
7153b822ed
Add more sse2 code for intra prediction.
...
vp9_dc_left_predictor_16x16
vp9_dc_top_predictor_32x32
vp9_dc_left_predictor_32x32
vp9_dc_128_predictor_32x32
Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6
2015-05-06 17:17:00 -07:00
Johann
d5d9289800
Move shared SAD code to vpx_dsp
...
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-05-06 16:58:20 -07:00
hkuang
240767b29d
Remove an unnecessary check.
...
Change-Id: Id0f224ac4667dd173363b0f05711678448291d4e
2015-05-06 14:15:00 -07:00
hkuang
623e6eed5e
Merge "Optimize the read_partition."
2015-05-06 17:29:52 +00:00
Parag Salasakar
d1cdda88bd
Merge "mips msa vp9 idct 16x16 optimization"
2015-05-06 06:40:56 +00:00
hkuang
4c1a8be29d
Optimize the read_partition.
...
Change-Id: I5a796425ce5706824a2fc17c6f24f983c5b9e43b
2015-05-05 15:51:04 -07:00
James Zern
ccae5d99d2
fix and enable vp9_dc_128_predictor_16x16
...
widen the loads and stores to 128-bit.
this was added, but not enabled in:
493a857 Add some sse2 code for intra prediction.
Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
2015-05-05 11:40:13 -07:00
hkuang
e47811ef8f
Merge "Add some sse2 code for intra prediction."
2015-05-05 17:11:07 +00:00
Parag Salasakar
60052b618f
mips msa vp9 idct 16x16 optimization
...
average improvement ~4x-6x
Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022
2015-05-05 12:37:06 +05:30
James Zern
670b2c09ce
vp9_idct_intrin_sse2: cosmetics: reindent
...
+ fix some whitespace
Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2
2015-05-01 16:07:54 -07:00
James Zern
c77b1f5acd
vp9: RECON_AND_STORE4X4: remove dest offset
...
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly
Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
2015-04-30 19:14:17 -07:00
James Zern
778845da05
vp9_idct_intrin_*: RECON_AND_STORE: remove dest offset
...
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly.
additionally reroll 16x16/32x32 loops to reduce register spill with this
new format
Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
2015-04-30 19:14:17 -07:00
Yaowu Xu
2061359fcf
Merge "Remove vp9_idct16x16_10_add_ssse3()"
2015-04-30 23:13:33 +00:00
hkuang
493a8579f1
Add some sse2 code for intra prediction.
...
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
2015-04-30 15:42:57 -07:00
Yaowu Xu
47767609fe
Remove vp9_idct16x16_10_add_ssse3()
...
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.
Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
2015-04-30 09:07:30 -07:00
Parag Salasakar
95cb130f32
Merge "mips msa vp9 copy and avg convolve optimization"
2015-04-30 04:39:13 +00:00
Yaowu Xu
d45870be8d
Merge "Disable ssse3 version idct16x16_256_add()"
2015-04-30 03:09:23 +00:00
Yaowu Xu
486a73a9ce
Disable ssse3 version idct16x16_256_add()
...
The version is currently producing different result from c version
for some input. Disable the use of it for now to allow time for
investigation the source of mismatch.
Change-Id: Id039455494ee531db4886a9f1fa4761174ef6df3
2015-04-29 16:58:59 -07:00
Parag Salasakar
2301d10f73
mips msa vp9 copy and avg convolve optimization
...
average improvement ~3x-5x
Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb
2015-04-29 12:28:17 +05:30
James Zern
f58011ada5
vpx_mem: remove vpx_memset
...
vestigial. replace instances with memset() which they already were being
defined to.
Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
2015-04-28 20:00:59 -07:00
James Zern
f274c2199b
vpx_mem: remove vpx_memcpy
...
vestigial. replace instances with memcpy() which they already were being
defined to.
Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
2015-04-28 19:59:41 -07:00
Frank Galligan
2be50a1c9c
Merge "WIP: Use LUT for y_dequant/uv_dequant"
2015-04-28 16:12:10 +00:00
Scott LaVarnway
afcb62b414
WIP: Use LUT for y_dequant/uv_dequant
...
instead of calculating every block.
Change-Id: Ib19ff2546be8441f8755ae971ba2910f29412029
2015-04-28 07:52:06 -07:00
Yunqing Wang
297b2b99de
Fix debugmodes file to print modes and MVs correctly
...
This patch fixed the issues in debugmodes file because of the recent
changes in MODE_INFO struct.
Change-Id: I4df83379ecc887c1f009d4a8329c9809c5b299d6
2015-04-27 17:09:38 -07:00
Parag Salasakar
1c9af9833d
Merge "mips msa vp9 convolve8 horiz optimization"
2015-04-21 22:08:25 -07:00
Johann
931c0a954f
Merge "Rename neon convolve avg file"
2015-04-21 15:45:29 -07:00
Johann
66b9933b8d
Rename neon convolve avg file
...
Some build systems use just the basename for object files.
Change-Id: I333e1107ee866f3906cc46476ef8d04c6200a8a0
2015-04-21 14:18:17 -07:00
Scott LaVarnway
8b17f7f4eb
Revert "Remove mi_grid_* structures."
...
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)
For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.
Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
2015-04-21 11:16:45 -07:00
Parag Salasakar
ca90d4fd96
mips msa vp9 convolve8 horiz optimization
...
average improvement ~6x-8x
Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18
2015-04-21 12:31:26 +05:30
Parag Salasakar
ef51c1ab5b
mips msa vp9 convolve8 hv optimization
...
average improvement ~5x-8x
Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b
2015-04-21 09:17:49 +05:30
Parag Salasakar
2e36149ccd
Merge "mips msa vp9 convolve8 vert optimization"
2015-04-18 23:39:25 -07:00
Parag Salasakar
27d083c1b9
mips msa vp9 convolve8 vert optimization
...
average improvement ~6x-10x
Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa
2015-04-18 08:13:04 +05:30
Marco Paniconi
f76ccce5bc
Revert "Revert "Force_split on 16x16 blocks in variance partition.""
...
This reverts commit 004b9d83e37d355f590a6976a27b7b845d19a869
Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
2015-04-16 17:52:13 -07:00
Johann
14ef4aeafb
Reorganize *_rtcd() calling conventions
...
Change-Id: Ib1e17d8aae9b713b87f560ab5e49952ee2bfdcc2
2015-04-15 11:12:05 -04:00
Yunqing Wang
004b9d83e3
Revert "Force_split on 16x16 blocks in variance partition."
...
This reverts commit eb8c667570aa83134c7db0690de9dbdde4d90291.
The patch caused mismatch while using multi-threads.
Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
2015-04-14 15:19:31 -07:00
Marco
eb8c667570
Force_split on 16x16 blocks in variance partition.
...
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.
Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.
Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.
Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
2015-04-13 12:05:07 -07:00
Parag Salasakar
2f693be8f8
Merge "mips msa vp9 common headers added"
2015-04-09 21:50:15 -07:00
Jingning Han
93d9c50419
Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"
2015-04-09 11:16:11 -07:00
Parag Salasakar
481fb7640c
mips msa vp9 common headers added
...
Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223
2015-04-09 15:35:12 +05:30
Jingning Han
7f629dfca4
SSSE3 assembly implementation of 8x8 Hadamard transform
...
It uses about 10% less CPU cycles than the SSE2 intrinsic
implementation.
Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499
2015-04-04 09:59:37 -07:00
James Zern
44e3640923
Merge "vp9: enable sse4 sad functions"
2015-04-03 14:57:52 -07:00
James Zern
b644384bb5
Merge "vp9: fix high-bitdepth NEON build"
2015-04-01 23:36:17 -07:00