Commit Graph

17907 Commits

Author SHA1 Message Date
Kyle Siefring
1b2f92ee8e Extend 16 wide AVX2 convolve8 code to support averaging.
Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Kyle Siefring
9ca06bcdd2 Add AVX2 version of vpx_convolve8_avg.
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
James Zern
807248ec81 Merge "ppc: Add vpx_idct32x32_1024_add_vsx" 2017-10-07 19:08:26 +00:00
Marco Paniconi
5bc4c37a89 Merge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."" 2017-10-06 22:41:34 +00:00
Marco Paniconi
bcbc6ed82d Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."
This reverts commit 9311ef18b4.

Reason for revert:
Notice small regression in some clips.
Will revisit in another change.

Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
> 
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
> 
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

TBR=marpan@google.com,builds@webmproject.org,jianj@google.com

Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-10-06 22:14:56 +00:00
Marco
e405eb06b1 Adjust threshold in scene detection
For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.

Reduces possible false detection for scene cut detection.

Neutral/small change in metrics or speed for speed 5.

Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
2017-10-06 11:08:56 -07:00
Marco Paniconi
7af6c6c9ca Merge "Speed >=5 real-time: add TM intra mode for high_source_sad." 2017-10-06 06:29:46 +00:00
Marco
9311ef18b4 Speed >=5 real-time: add TM intra mode for high_source_sad.
Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.

Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
2017-10-05 23:07:03 -07:00
James Zern
d2fb834ebd Merge "vpx_codec.h: namespace local defines" 2017-10-06 05:30:16 +00:00
James Zern
e8ed030da3 vpx_codec.h: namespace local defines
add VPX_ to UNUSED/*DEPRECATED to avoid conflicts with other headers.

Change-Id: Ie16bdac3575bc1af57a05d37e65b994370585377
2017-10-05 15:12:20 -07:00
James Zern
107eb6a9d4 vp9_ethread_test: abort early/add more detailed output
in the case compare_fp_stats fails report the 2 values and their index

Change-Id: I927a832b7a1e24c392961093b7caee1134223def
2017-10-05 15:02:51 -07:00
Marco Paniconi
e095bcce44 Merge "Adjust threshold for adapt_partition for speed 6." 2017-10-05 03:28:06 +00:00
Marco
18262a8576 Adjust threshold for adapt_partition for speed 6.
Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.

Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
2017-10-04 18:04:09 -07:00
Marco Paniconi
014976c251 Merge "Avoid nonrd_pick_partition for speed >= 6." 2017-10-04 23:36:27 +00:00
Marco
4bc1fc58b6 Avoid nonrd_pick_partition for speed >= 6.
For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.

Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
2017-10-04 15:31:54 -07:00
Marco Paniconi
6a42bdd25f Merge "Modify early exit for alt_ref in nonrd_pickmode." 2017-10-04 19:38:49 +00:00
Marco
77e51e2035 Modify early exit for alt_ref in nonrd_pickmode.
For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.

Small gain in metrics, ~0.18%, no change in speed.

Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
2017-10-04 11:53:39 -07:00
Linfeng Zhang
9a71811d98 Merge changes Id6a8c549,Ib1e0650b,Ic369dd86
* changes:
  Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
  Add vpx_dsp/x86/mem_sse2.h
  Add transpose_8bit_{4x4,8x8}() x86 optimization
2017-10-04 16:15:14 +00:00
Jerome Jiang
ffa3a3c441 Merge "Fix image width alignment. Enable ImageSizeSetting test." 2017-10-04 14:48:03 +00:00
Marco
98dbf31c87 Enable arf usage for speed >= 6, 1 pass vbr.
For speed 6 on ytlive set:
On average, speed slowdown ~5%, quality gain ~2%.

Change-Id: Ia18237cc1d52c54d7e2cb3c71f571cf37ef61b44
2017-10-03 17:18:33 -07:00
Marco
ab2bd340ac vp9: 1 pass vbr: Limit qpdelta on high_source_sad.
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.

Neutral/no change on metrics.

Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
2017-10-03 16:27:17 -07:00
James Zern
66b6b87471 Merge "vpx: fix nasm build errors" 2017-10-03 21:47:49 +00:00
Scott LaVarnway
bc4bc9b622 vpx: fix nasm build errors
BUG=webm:1462,766721

Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98
2017-10-03 20:02:21 +00:00
Linfeng Zhang
6543213e87 Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Linfeng Zhang
0f756a307d Add vpx_dsp/x86/mem_sse2.h
Add some load and store sse2 inline functions.

Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
2017-10-03 12:59:05 -07:00
Marco
c8678fb7f3 Use adapt_partition for ARF in 1 pass.
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.

Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.

Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
2017-10-03 11:49:55 -07:00
Linfeng Zhang
67c38c92e7 Add transpose_8bit_{4x4,8x8}() x86 optimization
Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878
2017-10-03 10:00:30 -07:00
Marco Paniconi
fe7b869104 Merge "ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode." 2017-10-03 03:01:14 +00:00
Marco
33e10dfa7e ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode.
Speedup of ~2-3% on 1080p clips speed 6.
Neutral/negligible loss in metrics on ytlive.

Change-Id: I7ac47a4d8b58c566920bae29a94a0e8d59c36dee
2017-10-02 19:04:03 -07:00
Linfeng Zhang
0e55b0b0a7 Add 4 to 3 scaling NEON optimization
Speed comparing with the one calling vpx_scaled_2d_neon()
  ~1.7 x in general
  ~2.8x for BILINEAR filter

BUG=webm:1419

Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
2017-10-02 15:04:09 -07:00
Linfeng Zhang
2c560c3c22 Specialize 4 to 3 frame scaling in C
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.

Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
   Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
   (The drifting is 1/(3*16) in each step.)

BUG=webm:1419

Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
2017-10-02 11:56:15 -07:00
Scott LaVarnway
3c700052be Merge "vpxdsp: [x86] add highbd_d135_predictor functions" 2017-10-02 15:00:19 +00:00
Alexandra Hájková
fb7fc1dbda ppc: Add vpx_idct32x32_1024_add_vsx
Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d
2017-09-30 19:31:20 +00:00
Marco
c8f6e7b99e Fix partition selection in speed features for arf overlay frame.
For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.

Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).

Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
2017-09-29 15:02:28 -07:00
Marco
f2c3d0a7a3 Enable use_altref_onepass for speed 4 real-time mode.
Used for VBR mode with lag-in-frames > 0.
On ytlive set at speed 4: ~3% average gain.

Change-Id: I45dad1700bf8be9d8f177815dc062774f6f2f0de
2017-09-29 10:56:14 -07:00
Scott LaVarnway
3bbd62ed27 vpxdsp: [x86] add highbd_d135_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.81x

C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x

BUG=webm:1411

Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
2017-09-29 08:56:38 -07:00
Scott LaVarnway
4cae64c32c vpxdsp: [x86] add highbd_d117_predictor functions
C vs SSE2 speed gains:
_4x4 : ~2.04x

C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x

BUG=webm:1411

Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
2017-09-28 14:45:28 -07:00
Jerome Jiang
5a40c8fde1 Fix image width alignment. Enable ImageSizeSetting test.
BUG=b/64710201

Change-Id: I5465f6c6481d3c9a5e00fcab024cf4ae562b6b01
2017-09-28 11:25:24 -07:00
Marco
a2ef180dd0 Set rc->high_source_sad = 0 before scene detection.
Only has effect when sf->use_altref_onepass is enabled,
as in that case scene detection is skipped for non-show frame
and so high_source_sad does not get reset to 0.

No change in metrics or speed.

Change-Id: I421f066d239341449c18826089e1810b9fc5967f
2017-09-28 10:49:45 -07:00
Marco Paniconi
3b8cc214ef Merge "vp9: Modification to adapt the ARF usage for 1 pass vbr" 2017-09-28 16:52:28 +00:00
Marco
03e8f13337 vp9: Modification to adapt the ARF usage for 1 pass vbr
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.

Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.

Only affects when sf->use_altref_onepass is enabled
(currently off by default).

Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
2017-09-28 09:10:30 -07:00
Marco
c493ea1a6b Add use_svc condition to the scene detection in 1 pass.
Scene detection is not currently used in SVC 1 pass code.
Speedup of ~0.4%.

Change-Id: I0ab769300919de710cd2da1402014fa3f22a1f86
2017-09-27 14:51:46 -07:00
Marco Paniconi
786b124e20 Merge "Revert "Remove the speed condition on scene detection in 1 pass code."" 2017-09-27 20:42:48 +00:00
Scott LaVarnway
80992a746c Merge "vpxdsp: [x86] add highbd_d153_predictor functions" 2017-09-27 20:40:21 +00:00
Marco Paniconi
8d438dc313 Revert "Remove the speed condition on scene detection in 1 pass code."
This reverts commit 535b7b915a.

This is actually used in CBR to reset the rate control if high source sad is detected.

Original change's description:
> Remove the speed condition on scene detection in 1 pass code.
> 
> Scene detection is used for VBR mode and for screen_content mode.
> 
> It was also enabled for CBR mode via the speed condition,
> but currently the analysis in the scene detection is not used
> in CRB mode (similar computations are done locally at superblock level
> when the source_sad feature is enabled).
> 
> For 1 pass code.
> No change in behavior. Small speed gain, ~0.5%.
> 
> Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f

TBR=marpan@google.com,builds@webmproject.org,jianj@google.com

Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-09-27 19:42:48 +00:00
James Zern
690fa6bb6e Merge "fix signed integer overflow of idct" 2017-09-27 19:39:11 +00:00
James Zern
6175f285a9 Merge "vp9_dx_iface: Stop using iter parameter incorrectly" 2017-09-27 18:37:20 +00:00
Linfeng Zhang
dbbbd44304 fix signed integer overflow of idct
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.

BUG=webm:1466

Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-27 11:17:54 -07:00
Scott LaVarnway
19c45ccd43 vpxdsp: [x86] add highbd_d153_predictor functions
C vs SSE2 speed gains:
_4x4 : ~1.95x

C vs SSSE3 speed gains:
_8x8 : ~3.30x
_16x16 : ~5.67x
_32x32 : ~3.87x

BUG=webm:1411

Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
2017-09-27 11:01:11 -07:00
Marco
535b7b915a Remove the speed condition on scene detection in 1 pass code.
Scene detection is used for VBR mode and for screen_content mode.

It was also enabled for CBR mode via the speed condition,
but currently the analysis in the scene detection is not used
in CRB mode (similar computations are done locally at superblock level
when the source_sad feature is enabled).

For 1 pass code.
No change in behavior. Small speed gain, ~0.5%.

Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
2017-09-27 10:32:54 -07:00