1015 Commits

Author SHA1 Message Date
Jingning Han
876c8b03e6 Store predicted motion vectors
Change-Id: I51307a217eeba14dbdaa2522be474530316a4faa
2016-02-19 14:25:34 -08:00
Yi Luo
5456aee6fc Initial SSE2 function fdst4_sse2().
Applied DST sse2 to 4x4 transform.

Fixed DST coefficient packing to satisfy 4x4 transpose requirement.

Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d
2016-02-19 11:13:37 -08:00
Yaowu Xu
5712456bd9 Merge "Properly normalize HBD sse computation" into nextgenv2 2016-02-19 02:26:47 +00:00
Yaowu Xu
0c0f3efdeb Properly normalize HBD sse computation
This fixes a bug in HBD sum of squared error computation introduced
in  #abd00505d1c658cc106bad51369197270a299f92.

Change-Id: I9d4e8627eb8ea491bac44794c40c7f1e6ba135dc
2016-02-18 15:42:19 -08:00
Debargha Mukherjee
9a019bce84 Merge "cost_coeff speed improvements" into nextgenv2 2016-02-18 19:31:18 +00:00
Julia Robson
c6eba0b47a cost_coeff speed improvements
Preliminary tests indicated that these changes make cost_coeffs
approximately 20% faster which is a 2% improvement overall

Change-Id: Iaf013ba75884415cd824e98349f654ffb1c3ef33
2016-02-18 13:18:39 +00:00
Yaowu Xu
7823fbb45c Merge "Move PSNR related functions into vpx_dsp/psnr.c" into nextgenv2 2016-02-18 01:00:54 +00:00
James Zern
345489c9ec vp10/resize: add missing alloc checks
Change-Id: I96669ddbcdda508a295c68ecf103d10f364e0ad5
2016-02-17 16:13:51 -08:00
James Zern
7fe96753d7 vp10/encoder: add missing alloc checks
Change-Id: I5f81250d054bfd1cc69308a491b8fd21b77e4ee1
2016-02-17 14:36:06 -08:00
James Zern
5d233390db vp10_cyclic_refresh_alloc: correct cleanup on error
previously only the CYCLIC_REFRESH allocation was being freed

Change-Id: I600eb496ec4b62accf1a6483c8170eabb046787d
2016-02-17 14:36:05 -08:00
Yaowu Xu
7538501ad1 Move PSNR related functions into vpx_dsp/psnr.c
This makes all metric computation to locate at some place, also gets
rid of duplicate code between vp9 and vp10.

Change-Id: I24a2707d183a2419cd18a8343010adae185ffcd4
2016-02-17 13:05:34 -08:00
Jingning Han
dd1391a005 Merge "Fix enc/dec mismatch in dynamic mv referenceing experiment" into nextgenv2 2016-02-17 19:03:14 +00:00
Debargha Mukherjee
35d9eadf08 Merge "Extends ext-tx to support 32x32 masked transforms" into nextgenv2 2016-02-17 18:33:10 +00:00
Debargha Mukherjee
7485498773 Extends ext-tx to support 32x32 masked transforms
Adds new 32x32 masked 1-d transforms that combine 1-D length-16
DCT with length-16 identity transforms.

To be continued in subsequent patches.

Change-Id: I0b4f66492d44c079b3c3b531ba48a97201de1484
2016-02-17 09:31:34 -08:00
Jingning Han
95247be0bf Fix enc/dec mismatch in dynamic mv referenceing experiment
This commit fixes an enc/dec mismatch in the dynamic motion vector
referencing experiment introduced in 837ef00.

Change-Id: I9fbe116fce118a80ef0f96bf41ce1f802547c2ee
2016-02-17 09:29:54 -08:00
Yaowu Xu
6ed7f7a516 Merge branch 'master' into nextgenv2 2016-02-17 07:23:58 -08:00
James Zern
fdc977afc6 vp10,encoder: relocate setjmp
move to encoder_encode() as vp10_get_compressed_data() allocates data and
would require some modification to make its error return meaningful.

Change-Id: Ia5267c35d16ccd42b6da6d2136402b13e28f9159
2016-02-16 19:33:16 -08:00
Yue Chen
907f88c4e6 Fixing a bug in obmc prediction in the rd loop
This bug made the rd loop use one-side obmc (compound of the current
predictor and the predictors of the left mi's, while the above ones
are ignored by mistake) to determine whether to use obmc. This fix
improved the compression performance by ~0.6% on different test sets.

Coding gain (%) of obmc experiment on derflr/derfhd/hevcmr/hevchd:
1.568/TBD/1.628/TBD

Change-Id: I43b239bedf9a8eebfd02315b1b036e140a998140
2016-02-16 14:43:45 -08:00
Debargha Mukherjee
f9c25498eb Merge "Tweak encoding flags for supertx." into nextgenv2 2016-02-16 22:10:30 +00:00
Debargha Mukherjee
907544a328 Merge "Code cleanup: remove redundant DST1 code" into nextgenv2 2016-02-16 19:43:25 +00:00
Geza Lore
c582aacb7a Tweak encoding flags for supertx.
Change-Id: I46f69d3a176897294d33c3f6d30b23c75b6267a8
2016-02-16 11:24:17 -08:00
Debargha Mukherjee
1badceada8 Code cleanup: remove redundant DST1 code
Removes the USE_DST2 flag that was on by default. DST2 performs
slightly better that DST1 and is faster to compute.

Change-Id: Ifb788f3f0a0e1995d7625230cec144b876f01206
2016-02-16 10:36:02 -08:00
Hui Su
0107373234 Merge "Add a speed feature to skip transform type selection" into nextgenv2 2016-02-16 18:31:18 +00:00
Debargha Mukherjee
8cc04ef505 Merge "Further supertx costing fixes." into nextgenv2 2016-02-16 18:02:24 +00:00
Debargha Mukherjee
6f49446dfa Merge "Fix double counting of compound reference bit cost." into nextgenv2 2016-02-16 17:55:49 +00:00
Geza Lore
abd00505d1 Add optimized vpx_sum_squares_2d_i16 for vp10.
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.

Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.

Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
2016-02-15 16:54:52 +00:00
Yue Chen
d1cad9c3f5 Overlapped block motion compensation experiment
In this experiment, an obmc inter prediction mode is enabled for
>= 8X8 inter blocks. When the obmc flag is on, the regular block-
based motion compensation will be refined by using predictors of
the above and left blocks.
Fixed some compatibility issues with vp9_highbitdepth, supertx,
ref_mv, and ext_interp.

Coding gain (%) on derflr/hevcmr/hevchd
OBMC:
1.047/1.022/0.708
OBMC + SUPERTX:
1.652/1.616/1.137
SUPERTX:
0.862/0.779/0.630

Change-Id: I5d8d3c4729c6d3ccb03ec7034563107893103b7f
2016-02-12 13:36:25 -08:00
Alex Converse
a45d5d3f94 Merge "Port switch to 9-bit rate cost to vp10." into nextgenv2 2016-02-12 21:15:35 +00:00
Geza Lore
599003969d Further supertx costing fixes.
Change-Id: I85897168c7fda3fd79daaba985b6607fd7df476b
2016-02-12 11:47:26 -08:00
Jingning Han
18eaf8e6fc Merge "Refactor vp10_drl_idx concept" into nextgenv2 2016-02-12 19:39:44 +00:00
Debargha Mukherjee
8b0a5b8718 Adding loop wiener restoration
Adds a wiener filter based restoration scheme in loop which can
be optionally selected instead of the bilateral filter.

The LMMSE filter generated per frame is a separable symmetric 7
tap filter. Three parameters for each of horizontal and vertical
filters are transmitted in the bitstream. The fourth parameter
is obtained assuming the sum is normalized to 1.
Also integerizes the bilateral filters, along with other
refactoring necessary in order to support the new switchable
restoration type framework.

derflr: -0.75% BDRATE

[A lot of videos still prefer bilateral, however since many frames
now use the simpler separable filter, the decoding speed is
much better].

Further experiments to follow, related to replacing the bilateral.

Change-Id: I6b1879983d50aab7ec5647340b6aef6b22299636
2016-02-12 09:56:24 -08:00
Yaowu Xu
13efa8a089 Merge "Refactor internal stats code" into nextgenv2 2016-02-12 16:37:16 +00:00
Yaowu Xu
1a69cb286f Refactor internal stats code
Also removed the use of postprocessing in computing internal stats.

Change-Id: Ib8fdbdfe7b7ca05cd1a034a373aa7762fa44323c
2016-02-12 07:31:29 -08:00
Yaowu Xu
89a1ab395c Merge "Enable computing PSNRHVS for hbd build" into nextgenv2 2016-02-12 15:24:28 +00:00
James Zern
8628898acf vp10_receive_raw_frame: add missing setjmp
allocations done within this function are protected with
vpx_internal_error; adding the setjmp fixes a crash in
vp10_lookahead_push() under low memory conditions.

Change-Id: I5515017cd71b218840c506791b3a517da7ffc93e
2016-02-11 19:21:28 -08:00
Jingning Han
a39e83d743 Refactor vp10_drl_idx concept
Remove the implicit assumption on offsetting the index by 1.

Change-Id: I6f1d391e067d57b7e45b9287e866014dbc16da71
2016-02-11 16:38:13 -08:00
Debargha Mukherjee
c1924b9ff0 Merge "Complete high bitdepth VAR_TX implementation." into nextgenv2 2016-02-12 00:16:18 +00:00
Angie Chiang
368e3d9293 Merge "Refactor: add predict_interp_filter() to simplify the flow in handle_inter_mode" into nextgenv2 2016-02-12 00:16:13 +00:00
Yaowu Xu
bb8ca08816 Enable computing PSNRHVS for hbd build
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.

Change-Id: Iac8a8073d2b3e3ba5d368829d770793212fa63b6
2016-02-11 13:17:59 -08:00
Jingning Han
57c83b330e Remove redundant parameters from vp10_txfm_rd_in_plane_supertx()
Change-Id: Icb164403239f88f18fd64de75d4881d33d3ab1cc
2016-02-11 11:53:22 -08:00
Jingning Han
f70134f729 Align rate-distortion cost metric for chroma compoments
This commit aligns the rate-distortion metrics for both luma and
chroma components in super transform rate-distortion optimization.
It improves the coding gains due to var-tx and supertx experiments
by 0.2% for high resolution test sets.

Change-Id: Ib89d99e29cb5ee27b1f867e301954d4164d8b364
2016-02-11 11:07:09 -08:00
Jingning Han
5c772f38fa Format clean-ups in transform experiments
Change-Id: Ib2843cb03ae452ce9fec3a94c709431ea0202d8b
2016-02-11 11:07:00 -08:00
Alex Converse
b3ad81288f Port switch to 9-bit rate cost to vp10.
Brings the following commits to vp10:
269428e Tie the bit cost scale to a define.
d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator.
ad43a73 Fix a signed overflow in vp9 motion cost.
1c9b091 Fix some interger overflow errors
fac947d Restore previous motion search bit-error scale.

Change-Id: I598ba7ee7efcde18439c31dfa96b86cbf297a580
2016-02-11 09:54:24 -08:00
Geza Lore
432e875dce Complete high bitdepth VAR_TX implementation.
VAR_TX now works in the high bitdepth configuration.

Change-Id: I4114d7d9ed59c598f1e4d35b8e75876c07074ba7
2016-02-11 10:49:56 +00:00
Yaowu Xu
00380700fb Merge "Enable computing of FastSSIM for HBD build" into nextgenv2 2016-02-11 05:43:55 +00:00
Hui Su
6779be2487 Merge "Refactor rd_pick_intra_angle_" into nextgenv2 2016-02-11 01:44:14 +00:00
Yaowu Xu
c0874f2441 Enable computing of FastSSIM for HBD build
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.

The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.

Change-Id: I631829da7773de400e77fc36004156e5e126c7e0
2016-02-10 17:11:58 -08:00
hui su
5a7c8d8c1d Refactor rd_pick_intra_angle_
Change-Id: I6c78188bdedb52655678c63f6a767567b256a880
2016-02-10 15:41:04 -08:00
Angie Chiang
c0035cc480 Refactor: add predict_interp_filter() to
simplify the flow in handle_inter_mode

Change-Id: Ic7934c0a5d0a79bdf546b4d2d106035449b475a6
2016-02-10 15:32:10 -08:00
hui su
329e340dc5 Add a speed feature to skip transform type selection
Setting FIXED_TX_TYPE as 1 makes the encoder skip tx_type search,
about twice as fast.

This speed feature is off by defualt; we can turn it on when we
want to quickly test new ideas.

Change-Id: Ieab5807d17fcd54fce3e8ae2f59a18b42eb79408
2016-02-10 15:11:01 -08:00