Commit Graph

71 Commits

Author SHA1 Message Date
Alex Converse
ec8a3272fa Merge "Add an x86inc MMX fwht4x4." 2014-05-09 13:48:49 -07:00
Jingning Han
9412785b02 Merge changes I3edd4b95,I4514f974,Ie7fa4386
* changes:
  Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
  Change eob threshold for partial inverse 8x8 2D-DCT to 12
  SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
2014-05-09 09:58:39 -07:00
Alex Converse
b5422fab46 Add an x86inc MMX fwht4x4.
Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
2014-05-08 12:01:27 -07:00
Jingning Han
41a350a83d Change eob threshold for partial inverse 8x8 2D-DCT to 12
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.

The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).

Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
2014-05-08 09:48:58 -07:00
Jingning Han
9e7b09bc5d SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).

Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
2014-05-07 17:40:02 -07:00
Paul Wilkins
33b1c457ed Revert "Add an MMX fwht4x4"
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.

This reverts commit 89fbf3de50.

Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
2014-05-07 12:53:27 +01:00
Alex Converse
75d05d5ed4 Merge "Add an MMX fwht4x4" 2014-05-06 11:12:27 -07:00
Jingning Han
d289deb04c Merge "SSSE3 implementation of full inverse 8x8 2D-DCT" 2014-05-06 09:17:22 -07:00
Dmitry Kovalev
e8bbb3d9db Making vp9_get_sse_sum_{8x8, 16x16} static.
Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a
2014-05-05 19:12:38 -07:00
Alex Converse
89fbf3de50 Add an MMX fwht4x4
7% faster encoding a desktop lossless at RT speed 4.

Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
2014-05-05 15:10:48 -07:00
Jingning Han
52ae97b6aa SSSE3 implementation of full inverse 8x8 2D-DCT
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.

Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
2014-05-05 10:49:27 -07:00
Jingning Han
39761eb5d6 Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT" 2014-04-30 13:41:36 -07:00
Dmitry Kovalev
d2bc8816a1 Merge "Adding search_site_config struct." 2014-04-29 16:59:47 -07:00
Jingning Han
1eaa3a76dc Enable SSSE3 implementation of 8x8 forward 2D-DCT
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.

Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
2014-04-29 15:49:18 -07:00
Dmitry Kovalev
aa464eca5e Adding search_site_config struct.
Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
2014-04-29 10:34:53 -07:00
Dmitry Kovalev
6e01079cc0 Removing unused vp9_variance_halfpixvar*() functions.
Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
2014-04-25 11:50:07 -07:00
Dmitry Kovalev
03e7deae4f Removing unused vp9_sub_pixel_mse* functions.
Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508
2014-04-24 11:49:12 -07:00
Dmitry Kovalev
63fa722179 Removing unused cost arguments from mcomp functions.
Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434
2014-04-11 10:24:36 -07:00
Yunqing Wang
4e66293fcb Use source frame difference to make partition decision
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.

An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.

The selection of source_var_thresh needs to be investigated
further later.

RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.

Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
2014-04-08 17:03:02 -07:00
levytamar82
0fa8b668c1 AVX2 SAD Optimization:
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d

The function level gain is 66% and the user level gain is ~1%.

Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
2014-03-21 13:53:32 -07:00
James Zern
805078a1bf build: convert rtcd.sh to perl
significantly speeds up file generation.

the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.

---
Linux
    [CREATE] vpx_scale_rtcd.h
real    0m0.485s ->    0m0.022s
    [CREATE] vp8_rtcd.h
real    0m4.619s ->    0m0.060s
    [CREATE] vp9_rtcd.h
real    0m10.102s ->    0m0.087s

Windows
    [CREATE] vpx_scale_rtcd.h
real    0m8.360s ->    0m0.080s
    [CREATE] vp8_rtcd.h
real    1m8.083s ->    0m0.160s
    [CREATE] vp9_rtcd.h
real    2m6.489s ->    0m0.233s

Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-03-03 14:47:11 -08:00