Commit Graph

208 Commits

Author SHA1 Message Date
Scott LaVarnway
e7fc39fdf5 Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-20 15:11:00 +00:00
Alex Converse
6aa2163b69 bitreader/writer: Change shift to signed
Silences several legal but suspicious unsigned overflows found with
clang -fsanitize=integer.

Change-Id: I69399751492a183167932b0a10751c433c32ca7b
2015-11-19 15:13:39 -08:00
Alex Converse
42b7c44b2f Fix a signed shift overflow in vpx_rb_read_inv_signed_literal.
Found with clang -fsanitize=integer

Change-Id: I17cb2166c06ff463abfaf9b0e6bc749d0d6fdf94
2015-11-19 15:04:20 -08:00
Jian Zhou
d76032ae87 Speed up h_predictor_4x4
Modify h_predictor_4x4 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: Id01c34c48e75b9d56dfc2e93af12cf0c0326a279
2015-11-19 11:34:22 -08:00
Jian Zhou
79b68626ae Speed up tm_predictor_4x4
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
2015-11-18 16:44:25 -08:00
Scott LaVarnway
ed833048c2 VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: Ic81f38998fb1b8d33f5a5d7424c2c41002786cef
2015-11-17 17:42:24 -08:00
James Zern
0ccad4d649 Revert "VPX: x86 asm version of vpx_idct32x32_34_add()"
This reverts commit 9aeaa2016e.

This causes some test vectors to fail.

Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4
2015-11-11 11:12:38 -08:00
James Zern
e3efed7f4c Merge "convolve_copy_sse2: replace SSE w/SSE2 code" 2015-11-10 22:35:12 +00:00
Scott LaVarnway
f48321974b Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-10 21:40:11 +00:00
Scott LaVarnway
9aeaa2016e VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb
2015-11-10 11:54:56 -08:00
James Zern
40dab58941 convolve_copy_sse2: replace SSE w/SSE2 code
this should be neutral or slightly faster on modern (P4+) architectures

Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b
2015-11-09 23:45:16 -08:00
Debargha Mukherjee
65dd056e41 Merge "Optimize vpx_quantize_{b,b_32x32} assembler." 2015-10-26 18:04:49 +00:00
Ronald S. Bultje
53dc9fd0a0 vp10: merge ext_ipred_bltr experiment into misc_fixes.
Change-Id: I2f2deb700748408b8278b7f5c29ee1f2e39785ec
2015-10-21 22:27:34 -04:00
Geza Lore
9cfba09ac0 Optimize vpx_quantize_{b,b_32x32} assembler.
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.

This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.

Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.

Change-Id: I86714a6b7364da20cd468cd784247009663a5140
2015-10-20 10:11:19 +01:00
Ronald S. Bultje
c7dc1d78bf vp10: add extended-intra prediction edges experiment.
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.

See issue 1043.

Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
2015-10-16 19:30:39 -04:00
Johann
ec623a0bb7 Upstream Mozilla fix for older Apple clang builds
Also use the _mm_broadcastsi128_si256 intrisic for
Apple clang versions 4.[012]

https://bugzilla.mozilla.org/show_bug.cgi?id=1085607
https://code.google.com/p/webm/issues/detail?id=1082

Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88
2015-10-14 07:41:23 -07:00
hui su
6f31722950 Fix compiler warnings
Change-Id: I761256a8100d83abf1b937f3739580237e3fad2a
2015-10-13 10:33:17 -07:00
Alex Converse
0c00af126d Add vpx_highbd_convolve_{copy,avg}_sse2
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall

Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
2015-10-09 11:50:25 -07:00
Geza Lore
cbada4a982 Remove 4 mova insts from quantize_ssse3_x86_64.asm
Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb
2015-10-09 07:52:04 -07:00
Julia Robson
37c68efee2 SSSE3 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.

Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
2015-10-06 13:32:02 +01:00
Scott LaVarnway
b212094839 Merge "VPX: refactor vpx_idct32x32_1_add_sse2()" 2015-10-06 11:35:15 +00:00
Julia Robson
5e6533e707 SSE2 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).

Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
2015-10-05 10:59:16 -07:00
Scott LaVarnway
23d1c06268 VPX: refactor vpx_idct32x32_1_add_sse2()
Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3
2015-10-05 06:33:42 -07:00
Ronald S. Bultje
3fedf4a59b Merge "vp10: reimplement d45/4x4 to match vp8 instead of vp9." 2015-10-02 17:15:59 +00:00
Debargha Mukherjee
cb5c47f20d Merge "Accelerated transform in high bit depth" 2015-10-02 06:55:55 +00:00
Ronald S. Bultje
62a1579525 vp10: reimplement d45/4x4 to match vp8 instead of vp9.
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?

See issue 1015.

Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
2015-10-01 10:11:54 -04:00
Ronald S. Bultje
c26a9ecaa2 vp8: change build_intra4x4_predictors() to use vpx_dsp.
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).

There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...

Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
2015-09-30 18:45:49 -04:00
Ronald S. Bultje
54d48955f6 vp8: change build_intra_predictors_mby_s to use vpx_dsp.
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
2015-09-30 18:45:40 -04:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Johann
dd4f953350 Remove vpx_filter_block1d16_v8_intrin_ssse3
This was rewritten and moved to vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm
in 195883023b

Change-Id: I117ce983dae12006e302679ba7f175573dd9e874
2015-09-18 16:05:43 -07:00
James Zern
683b5a3161 vpx_subpixel_8t_ssse3: fix reg counts/access
fixes build on windows x64; previously 'heightq' i.e., the 64-bit register
was accessed when only the 32-bit value was needed. given this is from a
stack variable the upper bits were undefined.

+ bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in
64-bit builds and filter_block1d16_v* uses one extra temp variable

Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
2015-09-17 12:27:34 -07:00
Ronald S. Bultje
a3df343cda vp10: code sign bit before absolute value in non-arithcoded header.
For reading, this makes the operation branchless, although it still
requires two shifts. For writing, this makes the operation as fast
as writing an unsigned value, branchlessly. This is also how other
codecs typically code signed, non-arithcoded bitstream elements.

See issue 1039.

Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead
2015-09-16 19:35:03 -04:00
Debargha Mukherjee
1c8567ff09 Remove some trailing whitespaces
Change-Id: Icf06d35ca347713253d1eba341a894b51efa81a9
2015-09-08 01:31:04 -07:00
Scott LaVarnway
195883023b VPX: subpixel_8t_ssse3 asm using x86inc
This is based on the original patch optimized for 32bit
platforms by Tamar/Ilya and now uses the x86inc style asm.
The assembly was also modified to support 64bit platforms.

Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
2015-09-03 20:35:51 -07:00
Johann
c5f11912ae Include vpx_dsp_common.h when using VPXMIN/MAX
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
2015-08-31 14:36:35 -07:00
Angie Chiang
45db71d0ac Expand the idct4_c() function in idct8_c()
Change-Id: I5afa3c351ba7c5e7deb3889f7471619ac60af255
2015-08-28 10:53:11 -07:00
Johann Koenig
5c245a46d8 Merge changes I53b5bdc5,Ib81168a7,Ie0113945
* changes:
  Only build ssse3 filter functions on 64 bit
  Clean up unused function warnings in vp8 encoder
  Clean up unused function warnings in vp8 onyx_if.c
2015-08-27 20:58:53 +00:00
Johann Koenig
18ea2a7e0c Merge "Add sse2 versions of halfpix variance" 2015-08-27 20:56:32 +00:00
Johann
a28b2c6ff0 Add sse2 versions of halfpix variance
These were lost in the great sub pixel variance move of
6a82f0d7fb

Not having these functions caused a ~10% performance regression in
some realtime vp8 encodes.

Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
2015-08-27 11:58:38 -07:00
James Zern
5e16d397bd vpx_dsp_common: add VPX prefix to MIN/MAX
prevents redeclaration warnings;
vp8 has its own define which will be resolved in a future commit

Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
2015-08-26 20:11:32 -07:00
Johann
f5507b514c Only build ssse3 filter functions on 64 bit
Avoid an unused function warning by only building the functions when
they will be used.

Change-Id: I53b5bdc5a180c79d63b34e4c8921d679bbc54009
2015-08-26 10:32:18 -07:00
Scott LaVarnway
6c0f6dd817 Merge "VPX: scaled convolve : fix windows build errors" 2015-08-21 12:06:34 +00:00
Scott LaVarnway
acf24cc1b8 VPX: scaled convolve : fix windows build errors
Change-Id: Ic81d435ea928183197040cdf64b6afd7dbaf57e4
2015-08-20 13:09:27 -07:00
Scott LaVarnway
6a21ca20cc Merge "VPX ssse3 scaled convolve" 2015-08-19 22:12:21 +00:00
Jingning Han
b1339751b9 Merge "Rename inv_txfm_sse2.asm to inv_wht_sse2.asm" 2015-08-19 18:26:30 +00:00
Jingning Han
49f6ff1103 Rename inv_txfm_sse2.asm to inv_wht_sse2.asm
Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70
2015-08-19 10:29:53 -07:00
Scott LaVarnway
2030c49cf8 VPX ssse3 scaled convolve
Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd
2015-08-18 15:13:02 -07:00
Jingning Han
5de049b067 Turn on dspr2 loop filter functions in vpx_dsp
Add the dspr2 files to vpx_dsp.mk and enable these functions in
vpx_dsp_rtcd_defs.pl file.

Change-Id: I79feb5af24f174f4a0788dc6f3b6df7f4e1fa467
2015-08-17 16:15:24 -07:00
James Zern
1794624c18 Merge changes I2fe52bfb,I5e5084eb
* changes:
  VPX: removed filter == 128 checks from mips convolve code
  VPX: removed step checks from mips convolve code
2015-08-14 19:45:27 +00:00
James Zern
78629508f2 Merge "VPX: removed step checks from neon convolve code" 2015-08-14 19:23:46 +00:00
Yaowu Xu
94ba3939cd vpx_highbd_ssim_parms_8x8: make parameter types consistent
Change-Id: Ie1fe6603232adc22dbe4d51bd1008c856a6d40ca
2015-08-14 09:18:07 -07:00
Scott LaVarnway
89dcc13939 VPX: removed filter == 128 checks from mips convolve code
The check is handled by the predictor table.

Change-Id: I2fe52bfbbfccb2edd13ba250986e3a4b4b589459
2015-08-13 12:57:01 -07:00
Scott LaVarnway
aeea00cc4f VPX: removed step checks from mips convolve code
The check is handled by the predictor table.

Change-Id: I5e5084ebb46be8087c8c9d80b5f76e919a1cd05b
2015-08-13 11:27:04 -07:00
Scott LaVarnway
fa47212933 VPX: removed step checks from neon convolve code
The check is handled by the predictor table.

Change-Id: I42479f843e77a2d40cdcdfc9e2e6c48a05a36561
2015-08-12 16:46:53 -07:00
Scott LaVarnway
6cf95bd1e7 Merge "VPX: remove step == 16 and filter[3] != 128 checks" 2015-08-12 20:13:33 +00:00
James Zern
345b11cd73 Merge "fix build w/only mmx+sse enabled" 2015-08-12 02:26:08 +00:00
Jingning Han
3ee6db6c81 Fork VP9 and VP10 codebase
This commit folks the VP9 and VP10 codebase and makes libvpx
support VP8, VP9, and VP10.

Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
2015-08-11 17:05:28 -07:00
James Zern
23532eb7b6 fix build w/only mmx+sse enabled
many _sse2.asm have sse implementations as well

Change-Id: Idfa1f5cab593e4913aaad37f7223e8430188c44a
2015-08-11 15:52:43 -07:00
Scott LaVarnway
b04dad328c Merge "VPX: remove scaled calls from FUN_CONV_1D" 2015-08-11 21:46:50 +00:00
Scott LaVarnway
4ef08dcec8 Merge "VPX: Add rtcd support for scaling." 2015-08-11 13:19:00 +00:00
Aℓex Converse
b152472ba7 Merge "Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h" 2015-08-11 01:18:39 +00:00
Alex Converse
a8a08ce57e Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.

Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
2015-08-10 15:37:14 -07:00
James Zern
9265bad906 Merge changes from topic 'x86inc'
* changes:
  Only use .text sections for aout
  Use newer x86inc.asm
  Use .text instead of .rodata on macho
  Copy PIC handling code from x86_abi_support
  Set 'private_extern' visibility for macho targets
  Avoid 'amdnop' when building with nasm
  Catch all elf formats
  Expand PIC default to macho64 and respect CONFIG_PIC from libvpx
  Use libvpx defines to set name mangling rules
  Customize x86inc.asm for libvpx
2015-08-10 21:20:38 +00:00
Scott LaVarnway
a229dbc1f0 VPX: remove step == 16 and filter[3] != 128 checks
from FUN_CONV_1D and FUN_CONV_2D macros.  The functions
will not be called with these inputs.

Change-Id: I67ec75e4edafc0acee70190521a80ea85dfa521b
2015-08-10 13:44:32 -07:00
Alex Converse
4ea7f2be43 fastssim: Add some missing consts
Change-Id: Id36f180032c8a92c686da6f716a7468332b23b94
2015-08-10 09:48:25 -07:00
Johann
41a0a0cb35 Use newer x86inc.asm
Rename updated version of x86inc.asm

Use "private_prefix" instead of "program_name" and make vpx the default
prefix.

Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
2015-08-07 16:44:44 -07:00
Alex Converse
26f4f2dc8e ssim: Add missing statics and consts
Change-Id: I2aa2a545bd2f8f170c66c2e267ea9d617ff10d87
2015-08-07 12:01:19 -07:00
Alex Converse
c1f911a2ea psnrhvs: Add missing consts and static consts.
Change-Id: I63932edaef4c4d4d0a57e6f7d3e4aa42651a5c47
2015-08-07 12:01:14 -07:00
Alex Converse
c65e79d2e5 ssim: Replace unsigned long with uint32_t.
The assembly only writes the low 4 bytes, and the HBD version only uses
uint32_t bytes.

Change-Id: Ie3694ecda511c231e55870df814cbae30e588073
2015-08-07 11:48:31 -07:00
Alex Converse
17cfee3cb5 fastssim: Add stdlib.h for malloc/free
Change-Id: I4d734febc14c534dba20b67cf6bd628996cc9ab7
2015-08-07 11:20:05 -07:00
Alex Converse
c7b7011b9b Move VP9 SSIM metrics to vpx_dsp.
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-08-06 18:25:25 -07:00
Aℓex Converse
7ac505c726 Merge "Narrow a load in iwht4x4_16_add." 2015-08-06 22:21:16 +00:00
Alex Converse
0572052725 Narrow a load in iwht4x4_16_add.
The top half is unused.

Change-Id: I29b2f6a93e20ea43aff4ad0bd2d52257e1e752b6
2015-08-05 12:16:12 -07:00
Scott LaVarnway
4e6b5079c6 VPX: remove scaled calls from FUN_CONV_1D
and FUN_CONV_2D macros.  The predict lut now handles
this case.  The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.

Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
2015-08-05 10:47:06 -07:00
James Zern
afd2f68dae Revert "VP9_COPY_CONVOLVE_SSE2 optimization"
This reverts commit a5e97d874b.

Additionally:
Revert "vpx_convolve_copy_sse2: fix win64"

This reverts commit 22a8474fe7.

This change performs poorly on various x86_64 devices affecting
performance by 1-3% at 1080P. Performance on chromebook like devices was
mixed neutral to slightly negative, so there should be minimal change
there.

Change-Id: I95831233b4b84ee96369baa192a2d4cc7639658c
2015-08-04 17:57:01 -07:00
Jingning Han
d621de7e8d Change vp9_quantize to vpx_quantize
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.

Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
2015-08-04 15:31:49 -07:00
Jingning Han
3ad75fc623 Merge "Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names" 2015-08-04 22:30:36 +00:00
Jingning Han
08a453b9de Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.

Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
2015-08-04 13:46:11 -07:00
Jingning Han
5f138986fc Exclude inv_txfm dspr2 files from make file when highbd is on
Add a guard to exclud dspr2 inverse transform files from vpx_dsp
make file, when high bit-depth is turned on. This fixes the jenkins
nightly build.

Change-Id: Ibacd86563af1ec4810c550905b3fa0397baeeafc
2015-08-04 09:47:31 -07:00
Parag Salasakar
814e1346a6 Merge "mips msa vpx convolve optimzation" 2015-08-04 04:30:22 +00:00
Parag Salasakar
cc4c5de22f Merge "mips msa vpx subpel variance optimization" 2015-08-04 04:30:11 +00:00
Jingning Han
bfad9d2fe6 Move inverse transfrom dspr2 functions from vp9 to vpx_dsp
Change-Id: Ia9cf7c31cab4ba3dd6b9bb668c4b3e84bd55cf69
2015-08-03 11:59:50 -07:00
Jingning Han
92b08f516a Add common_dspr2.c file to vpx_dsp/mips
Move the declaration of commonly referenced variable to
vpx_dsp/mips/common_dspr2.c.

Change-Id: Ia51287b02e2ac5cfae0fba98c721f0810618f28e
2015-08-03 10:53:47 -07:00
Jingning Han
a68356202d Remove vpx_ prefix from the dspr2 file name in vpx_dsp/mips
Make it consistent with other formats.

Change-Id: I28f0d05ff7c5bf2b815989b3f1bd6c6b25608677
2015-08-03 09:59:14 -07:00
Scott LaVarnway
8f6b943100 VPX: Add rtcd support for scaling.
Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2
2015-08-03 09:43:34 -07:00
Jingning Han
d10fc5af8f Merge "Add vpx_dsp_rtcd.h to inv_txfm_sse2.c" 2015-08-03 16:03:09 +00:00
Jingning Han
b096db5ad4 Merge "Remove vp9_common.h from idct16x16_neon.c" 2015-08-03 16:03:02 +00:00
Parag Salasakar
1579bb88c5 mips msa vpx convolve optimzation
Removed redundant clip/saturate code from 2tap filter functions
average improvement 10%-40%

Change-Id: I1dafb5f7d2ce7a021d883d8af30fb93cd9ace173
2015-08-03 14:03:40 +05:30
Parag Salasakar
9b375871db mips msa vpx subpel variance optimization
Removed redundant clip/saturate code from 2tap filter functions
average improvement 20%-40%

Change-Id: I362540b0c7d5d3d69932c39d61b7d2a44da533d2
2015-08-03 13:00:55 +05:30
Jingning Han
da7dc59837 Merge "Factor out mips/msa inverse transform implementations" 2015-08-03 03:18:39 +00:00
Jingning Han
0fcfc613c6 Merge "Add x86inc flag guard to inv_txfm_sse2.asm" 2015-08-02 21:56:09 +00:00
Jingning Han
6eabf229e2 Remove vp9_common.h from idct16x16_neon.c
Change-Id: I3df35a99900ef8ce549d315866849a10db1a4c7b
2015-08-02 09:57:25 -07:00
Jingning Han
4f7a7d29fa Add x86inc flag guard to inv_txfm_sse2.asm
Fix the VS build failure.

Change-Id: I4fb9d1c83980c4b52d5a848a9cb02ec72493dccb
2015-08-02 08:43:51 -07:00
Jingning Han
80ae856c8b Add vpx_dsp_rtcd.h to inv_txfm_sse2.c
Change-Id: Ibab434fb4bd6da02dba087582ed74811f555c3ed
2015-08-02 08:25:13 -07:00
James Zern
22a8474fe7 vpx_convolve_copy_sse2: fix win64
xmm6-7 need to be stored

Change-Id: I6c51559598d335946ec91be6246b49589c63b724
2015-08-01 11:45:49 -07:00
Jingning Han
44849516d4 Factor out mips/msa inverse transform implementations
Move mips/msa inverse transform implementations from vp9 folder to
vpx_dsp.

Change-Id: Ic4cf3f05247c3c63db7b532a0e5000017a962391
2015-08-01 09:25:12 -07:00
Jingning Han
b4c7d0523a Merge "Factor inverse transform functions into vpx_dsp" 2015-08-01 16:20:24 +00:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Scott LaVarnway
a5e97d874b VP9_COPY_CONVOLVE_SSE2 optimization
This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)

Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671
2015-07-31 14:51:51 -07:00
Zoe Liu
7cfdc00337 Refactor mips/dspr2 on convolution.
Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
2015-07-31 10:27:42 -07:00