Commit Graph

283 Commits

Author SHA1 Message Date
JackyChen
b29612fbbe vp9: Refactor vp9_denoiser_NxM_sse2.
Denoiser is ~1.5% faster in speed 6~8.

Change-Id: I7b350f3c50cce6773d9c4eded4c0c1b722d0a5fc
2016-05-02 13:19:40 -07:00
JackyChen
1a53c0c9e1 vp9: Simplify the logic in denoiser SSE2 code.
Block size passed into denoiser filter is always >= BLOCK_8X8 (in
vp9_pick_inter_mode), it is not necessary to check smaller block
size. Passed the bitexact test on clips with different resolutions and
noise levels.

Change-Id: I19fa3195d18c27d9e5de60dc11cff1522ef3714e
2016-04-26 09:04:39 -07:00
Alex Converse
fac947df77 Restore previous motion search bit-error scale.
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).

Use defines to derive the scale numbers and comment some of the fields.

derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)

Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
2016-02-09 13:20:25 -08:00
James Zern
ecd32d6faa Merge "Vidyo patch: Optimization for 1-to-2 downsampling and upsampling." 2016-02-05 02:36:03 +00:00
Scott LaVarnway
989c69303d Vidyo patch: Optimization for 1-to-2 downsampling and upsampling.
Change-Id: I9cc9780f506e025aea57485a9e21f0835faf173c
2016-02-04 14:50:26 -08:00
James Zern
497b6744ea Merge changes from topic 'dead-code'
* changes:
  yv12config.c: remove dead CONFIG_ALPHA code
  vp9_encoder.c: remove unused macro
  vpx_mem.c: remove unused macro
  vp9_svc_layercontext.c: relocate a macro definition
  vp9_encoder.c: protect SNPRINT* macros w/CONFIG check
  vp9_aq_360.c: remove some unused macros
  vp9_diamond_search_sad_avx.c: rename (un)likely macros
  vp9_resize.c: add missing include
  vp9_aq_complexity.c: remove unused macros
  vp9_detokenize.c: remove unused macros
2016-02-04 06:25:44 +00:00
James Zern
8647792975 vp9_diamond_search_sad_avx.c: rename (un)likely macros
avoid using '__' this is a reserved namespace for the compiler

Change-Id: I7d2be4dba2bdddc6f1010a16ad9e59a2e211b064
2016-02-02 18:01:49 -08:00
hui su
1c9b0918b3 Fix some interger overflow errors
Change-Id: I7e44bd952f28ce9925e8bdf6ee8ca2bb13de1b49
2016-02-02 17:32:15 -08:00
James Zern
d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
James Zern
60760f710f fix vp9_satd_sse2
accumulate satd in 32-bits
+ add unit test

Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
2015-11-20 14:35:46 -08:00
James Zern
3e0138edb7 vp9_satd: return an int
the final sum may use up to 26 bits

+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit

Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
2015-11-20 14:35:38 -08:00
Geza Lore
5eefd3ebfd Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
2015-11-11 14:03:47 +00:00
James Zern
30466f26b4 Revert "Add AVX vectorized vp9_diamond_search_sad"
This reverts commit f1342a7b07.

This breaks 32-bit builds:
 runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment

+ _mm_set1_epi64x is incompatible with some versions of visual studio

Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
2015-11-06 13:15:01 -08:00
Geza Lore
f1342a7b07 Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
2015-11-05 10:02:17 +00:00
Geza Lore
aa8f85223b Optimize vp9_highbd_block_error_8bit assembly.
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.

Further 2 optimizations are applied:

1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.

The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.

Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-21 12:30:40 +01:00
Geza Lore
0134764fa6 Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.

Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
2015-10-08 14:05:25 -07:00
Johann
41a0a0cb35 Use newer x86inc.asm
Rename updated version of x86inc.asm

Use "private_prefix" instead of "program_name" and make vpx the default
prefix.

Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
2015-08-07 16:44:44 -07:00
Alex Converse
c7b7011b9b Move VP9 SSIM metrics to vpx_dsp.
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-08-06 18:25:25 -07:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Jingning Han
4b5109cd73 Replace vp9_ prefix in 2D-DCT functions with vpx_
Clean up the forward 2D-DCT function names in vpx_dsp.

Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
2015-07-28 16:06:44 -07:00
Jingning Han
d19033fa4e Move DC only forward 2D-DCT functions to vpx_dsp
This completes the forward transform functions layout refactoring.

Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28 14:52:30 -07:00
Jingning Han
a6a4659bea Factor 32x32 fwd DCT to vpx_dsp folder
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/.

Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
2015-07-28 11:13:41 -07:00
Jingning Han
8eefb36ca9 Move forward dct sse2 header file to vpx_dsp
Change-Id: Iba03852ce778c956200818e3473cfb2b48cf8d8e
2015-07-27 14:59:57 -07:00
Jingning Han
a9a1d4e8e5 Replace vp9_idct.h for precise dependency
This commit replaces vp9_idct.h with txfm_common.h in many SIMD
implementation files for precise file dependency.

Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
2015-07-27 11:55:31 -07:00
Jingning Han
5ebc8febdc Refactor vp9_idct.h file
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.

Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
2015-07-26 08:26:32 -07:00
Jingning Han
48de07d882 Remove redundant function definitions in vp9_dct_sse2.h
Change-Id: I283d364a4e65ca9bf6ff581da1d0b498433c5402
2015-07-24 21:12:06 +00:00
Jingning Han
b67821f37b Factor forward 2D-DCT transforms into vpx_dsp
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.

Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
2015-07-22 15:48:17 -07:00
Jingning Han
07d5d538c2 Clean up vp9_dct32x32_sse2_impl.h header files
Remove redundant file dependency.

Change-Id: I4708218157617dabe00e2e33e237be2838c16603
2015-07-20 17:22:12 -07:00
Jingning Han
e253eaa036 Unify the high bit-depth forward hybrid transforms
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.

Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
2015-07-20 11:17:49 -07:00
Yunqing Wang
38f1fbbb75 Migrate quantization functions from vp9/ to vpx_dsp/
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32

vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32

The purpose of doing that was to allow these functions to be shared
by multiple codecs.

Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-17 16:38:14 -07:00
Yaowu Xu
b58c99eb71 Remove clamp operations.
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.

Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
2015-07-08 17:44:19 -07:00
Yaowu Xu
c369daf3ea Clean out more MSVC warnings
Change-Id: I1bab0c104df2ec4825d050cd516e26ab635a7b3e
2015-07-08 15:09:20 -07:00
Johann
6a82f0d7fb Move sub pixel variance to vpx_dsp
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-07 15:51:04 -07:00
Jingning Han
432cd4bfb7 Move subtract functions from vp9 to vpx_dsp
Factor out the subtraction operator as common function.

Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-07-06 12:22:47 -07:00
Jingning Han
176c291d9c Fix potential overflow issue in hadamard_16x16()
This commit fixes a potential integer overflow issue in function
hadamard_16x16. It adds corresponding dynamic range comment.

Change-Id: Iec22f3be345fb920ec79178e016378e2f65b20be
2015-06-12 10:56:18 -07:00
Johann
eb88b172fe Make vp9 subpixel match vp8
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.

This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.

Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
2015-06-03 22:10:51 -07:00
Johann
c3bdffb0a5 Move variance functions to vpx_dsp
subpel functions will be moved in another patch.

Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-26 12:01:52 -07:00
James Zern
a989c66b84 rename vp9_dct_impl_sse2.c to vp9_dct_sse2_impl.h
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version

silences missing prototype warnings for the unused FDCT* functions

Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
2015-05-15 17:01:19 -07:00
James Zern
587a71f1d6 rename vp9_dct32x32_sse2.c to vp9_dct32x32_sse2_impl.h
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version

silences missing prototype warnings for the unused FDCT32x32* functions

Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
2015-05-15 16:59:52 -07:00
James Zern
4ec47249bc rename vp9_dct32x32_avx2.c to vp9_dct32x32_avx2_impl.h
this file shouldn't be built directly, it is included in vp9_dct_avx2.c
to create a non-high-bitdepth and a high-bitdepth version

silences missing prototype warnings for the unused FDCT32x32* functions

Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
2015-05-15 16:47:51 -07:00
James Zern
330fba41e2 vp9 intrinsics: add vp9_rtcd include
silences a missing declaration warning

Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
2015-05-15 10:43:47 -07:00
James Zern
43d5cc7fe1 vp9_variance_sse2: sync function signatures
+ include vp9_rtcd.h
silences missing prototype warnings

Change-Id: I77902f07a454029baad4fe5fe6fc37c65644e6f7
2015-05-15 10:43:47 -07:00
James Zern
8515e62e6b vp9_dct_sse2: make some functions static
silences missing prototype warnings

Change-Id: I773b6a6b5bd7c57db18c3b17c519534f80e131de
2015-05-15 10:43:47 -07:00
Johann
1d7ccd5325 Relocate memory operations for common code
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.

Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-13 11:41:15 -07:00
James Zern
fd3658b0e4 replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.

Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
2015-05-07 11:55:08 -07:00
Johann
d5d9289800 Move shared SAD code to vpx_dsp
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.

This reduces the size of vpxenc/dec by 36k on x86_64 builds.

Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-05-06 16:58:20 -07:00
James Zern
f58011ada5 vpx_mem: remove vpx_memset
vestigial. replace instances with memset() which they already were being
defined to.

Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
2015-04-28 20:00:59 -07:00
James Zern
f274c2199b vpx_mem: remove vpx_memcpy
vestigial. replace instances with memcpy() which they already were being
defined to.

Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
2015-04-28 19:59:41 -07:00
Marco Paniconi
f76ccce5bc Revert "Revert "Force_split on 16x16 blocks in variance partition.""
This reverts commit 004b9d83e3

Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
2015-04-16 17:52:13 -07:00
Yunqing Wang
004b9d83e3 Revert "Force_split on 16x16 blocks in variance partition."
This reverts commit eb8c667570.
The patch caused mismatch while using multi-threads.

Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
2015-04-14 15:19:31 -07:00