38 Commits

Author SHA1 Message Date
Yaowu Xu
7e89c102c4 vp9-highbitdepth -> vpx-highbitdepth
Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff
2016-08-05 15:41:33 -07:00
Yaowu Xu
476e44a689 Merge "Replace variants of 'vp8' and 'vp9' with 'vpx'" into nextgenv2 2016-08-04 23:48:59 +00:00
Yaowu Xu
fe291b647e Replace variants of 'vp8' and 'vp9' with 'vpx'
Change-Id: Id6cb96b0b15efdda63348d8bfe59fc0533c85ba1
2016-08-04 22:20:38 +00:00
Yaowu Xu
56a91f139d Fix a number of msvc warnings
Change-Id: Ic5ddba3ca0c87245617b6dbc78c0f13dc952ce8b
2016-08-04 21:42:56 +00:00
Yaowu Xu
8bf837f153 Cherry pick from AOM:
68e7e4d0 Remove VP9_CAP_POSTPROC
0738390c Remove vp9_temporal denoise
b89861a4 Remove vp9-postproc

Change-Id: I4ecaa0ac83a519c8174a494378fc23df610ff2a8
2016-08-02 15:29:50 -07:00
Yaowu Xu
6fe07a207b Merge branch 'master' into nextgenv2
Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265
2016-07-14 16:05:48 -07:00
Jingning Han
edbbce8e61 Fix highbd inter prediction filter sse4 overwriting issue
Properly handle the case where the height is an integer multiple
of 4.

Change-Id: I11ac188c13f78db20902e2e333c60ce76ce837c5
2016-07-13 12:51:02 -07:00
Yi Luo
fde48c980a Merge "HBD convolution filtering (10/12 taps) SSE4.1 optimization" into nextgenv2 2016-07-12 19:28:48 +00:00
Yi Luo
8cacca73bf HBD convolution filtering (10/12 taps) SSE4.1 optimization
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
  On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
  drops from 6682503 ms to 5390270 ms.

Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
2016-07-12 10:13:30 -07:00
James Zern
08bd57ef0d vp10_convolve_ssse3.c: make some functions static
quiets -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I4d2eb7f4b45d7b829421976641b3212bcf29e7dd
2016-07-11 16:52:10 -07:00
James Zern
bc4341fd94 vp10: add some missing includes
quiets some -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I9174728459fcabb6d9ac0028ae58029e52c0da92
2016-07-11 16:52:07 -07:00
Yi Luo
8404253f81 Fix bugs in convolution filter optimization
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
  pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
  81ad953 Convolution vertical filter SSSE3 optimization

Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
2016-06-27 10:23:38 -07:00
Yi Luo
81ad95363a Convolution vertical filter SSSE3 optimization
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
  to 6.73%.

Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
2016-06-23 12:56:47 -07:00
Yi Luo
229690a95c Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
  parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.

Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
2016-06-20 11:10:30 -07:00
Yi Luo
bfe4c0ae07 Integrate HBD inverse HT flip types sse4.1 optimization
- tx_size: 4x4, 8x8, 16x16.
- tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
  ADST_FLIPADST, FLIPADST_ADST.
- Encoder speed improvement:
  park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
- Add unit test cases for bit-exact against C.

Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
2016-05-25 12:32:10 -07:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Yi Luo
346d2449f0 Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.

Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
2016-05-18 18:15:53 -07:00
Yi Luo
ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
Yi Luo
a3a69b400c HBD inverse HT 4x4 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.

Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e
2016-05-13 12:08:43 -07:00
Angie Chiang
1954fa390f Add flip option for vp10_fwd_txfm2d_#x#_c
Will add unit test to test/vp10_fwd_txfm2d_test.cc later

Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362
2016-05-10 18:14:57 -07:00
Yi Luo
412ad22f46 HBD hybrid transform 16x16 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update vp10_fht16x16_test.cc to do bit-exact test against
  latest C version.
- HBD encoder speed improves ~1.8%.

Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6
2016-05-09 11:07:01 -07:00
Yaowu Xu
f2512710d5 Replace inline with INLINE
This fixes build issues under MSVC

Change-Id: I6db6a43cba2e8ddb099b676f1ae019fe2742f366
2016-05-05 18:28:04 -07:00
Jim Bankoski
fce3cee8dd Move vpx_add_plane from codec to vpx_dsp and dedup.
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-05-02 12:17:39 -07:00
Yi Luo
299c5fc202 HBD hybrid transform 8x8 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update bit-exact unit test against current C version.
- HBD encoder speed improves ~3.8%.

Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
2016-04-29 17:04:52 -07:00
Alex Converse
97673cb128 Fix vp10 txfm on MSVC 2015.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1187

Change-Id: Ied6d3d003ed6ab9cf4f03cdd1d0037ae755254f4
2016-04-27 19:40:02 +00:00
Yi Luo
a4593f17ca HBD hybrid transform 4x4 SSE4.1 optimization
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.

Change-Id: If751c030612245b1c2470200c9570cf40d655504
2016-04-25 09:53:09 -07:00
Yi Luo
cf7f00691f Change hybrid transform function argument from TXFM_2D_CFG* to int
Unit test shows manually developed SSE4.1 code would performs ~30%
  better if TXFM_2D_CFG configuration is set in lower level. This
  change only updates function signature. There is no performance
  impact.

Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
2016-04-21 18:37:21 -07:00
Angie Chiang
25520d8dc3 change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1
The speed performance for running 20k times  is as follows

Notice that the vp10_highbd_fdct#x#_sse2 version is
16-bit version plus range check

The rest are 32-bit version

vp10_fwd_txfm2d_4x4_c (2 ms)
vp10_fwd_txfm2d_8x8_c (9 ms)
vp10_fwd_txfm2d_16x16_c (45 ms)
vp10_fwd_txfm2d_32x32_c (233 ms)

vp10_fwd_txfm2d_4x4_sse4_1 (2 ms)
vp10_fwd_txfm2d_8x8_sse4_1 (3 ms)
vp10_fwd_txfm2d_16x16_sse4_1 (16 ms)
vp10_fwd_txfm2d_32x32_sse4_1 (80 ms)

vp10_highbd_fdct4x4_c (1 ms)
vp10_highbd_fdct8x8_c (3 ms)
vp10_highbd_fdct16x16_c (17 ms)
highbd_fdct32x32_c (160 ms)

vp10_highbd_fdct4x4_sse2 (0 ms)
vp10_highbd_fdct8x8_sse2 (2 ms)
vp10_highbd_fdct16x16_sse2 (8 ms)
highbd_fdct32x32_sse2 (105 ms)

Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce
2016-03-30 15:25:26 -07:00
Angie Chiang
f2b311f580 Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#
Change-Id: I24ce46e157dc5b9c0d75000a1a48e9c136ed4ee1
2016-03-30 15:25:26 -07:00
Angie Chiang
11d2bb5429 Add vp10_fwd_txfm2d_sse2
Change-Id: Idfbe3c7f5a7eb799c03968171006f21bf3d96091
2016-03-30 15:25:26 -07:00
Geza Lore
4f5108090a Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.

The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.

Encode speedup with ext-tx enabled is about 3%.

Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
2015-11-04 17:11:44 +00:00
Debargha Mukherjee
f18322262f Backports highbitdepth accelerations into vp10
Ports the changes in
https://chromium-review.googlesource.com/#/c/302372/3
into vp10.

Change-Id: I334c409f693691227ad16fc703c91899592dd8dc
2015-10-02 00:57:37 -07:00
Angie Chiang
894ab8be7e fix implicit declaration
include vpx_dsp_rtcd.h to avoid implicit declaration of
vp10_highbd_fdct32x32_rd_c

Change-Id: I0b9ad50381a302750138deab14d2d5ac31f286ee
2015-09-11 12:17:15 -07:00
Angie Chiang
501efcad4a Merge "Isolate vp10's fwd_txfm from vp9" 2015-09-11 00:10:45 +00:00
Angie Chiang
ee5b80597e Isolate vp10's fwd_txfm from vp9
1) copy fw_txfm related files from vpx_dsp tp vp10

    vpx_dsp/fwd_txfm.h → vp10/common/vp10_fwd_txfm.h
    vpx_dsp/fwd_txfm.c → vp10/common/vp10_fwd_txfm.c
    vpx_dsp/x86/fwd_dct32x32_impl_sse2.h →  vp10/common/x86/vp10_fwd_dct32x32_impl_sse2.h
    vpx_dsp/x86/fwd_txfm_sse2.c →  vp10/common/x86/vp10_fwd_txfm_sse2.c
    vpx_dsp/x86/fwd_txfm_impl_sse2.h → vp10/common/vp10_fwd_txfm_impl_sse2.h

Change-Id: Ie9428b2ab1ffeb28e17981bb8a142ebe204f3bba
2015-09-10 15:19:43 -07:00
Angie Chiang
87175ed592 Isolate vp10's inv_txfm from vp9
1) copy following files from vpx_dsp/ to vp10/common/
vp10_inv_txfm.c
vp10_inv_txfm.h
vp10_inv_txfm_sse2.c
vp10_inv_txfm_sse2.h

2) change the function prefix "vpx_" to "vp10_" in above files

3) add unit test at vp10_inv_txfm_test.cc

Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913
2015-09-10 15:08:37 -07:00
Jingning Han
54d66ef165 Remove vp9_ prefix from vp10 files
Remove the vp9_ prefix from vp10 file names.

Change-Id: I513a211b286a57d6126fc1b0fbfd6405120014f1
2015-08-11 21:24:08 -07:00
Jingning Han
3ee6db6c81 Fork VP9 and VP10 codebase
This commit folks the VP9 and VP10 codebase and makes libvpx
support VP8, VP9, and VP10.

Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
2015-08-11 17:05:28 -07:00