408 Commits

Author SHA1 Message Date
Yi Luo
0f80b1f754 Optimized HBD block subtraction for all block sizes
- Interface function takes a local MxN function to call based on the
  block size.
- Repetition call (w/o cache line miss) shows improvement:
  ~63% - ~340%.
- Overall encoder speed improvement: ~0.9%.

Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
2016-04-12 12:04:43 -07:00
Geza Lore
61af8981b0 Extend variance based partitioning to 128x128 superblocks
Change-Id: I41edf266d5540a9b070a5e65bc397dd3da210507
2016-04-12 09:40:11 +01:00
Yi Luo
e5f4e8eab9 Some cosmetic improvements since HBD variance 4x4 optimization
Change-Id: I414c1fabd2e3a9b1d9daa8a90f85a0bace8bd3cd
2016-04-08 10:32:13 -07:00
Geza Lore
454989ff32 Make superblock size variable at the frame level.
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.

vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).

Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:

If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
     The tile coding in this case is the same as VP9. In particular,
     tiles have a minimum width of 256 pixels and a maximum width of
     4096 pixels. The tile width must be multiples of 64 pixels
     (except for the rightmost tile column). There can be a maximum
     of 64 tile columns and 4 tile rows.

If --enable-ext-tile is OFF and --enable-ext-partition is ON:
     Same constraints as above, except that tile width must be
     multiples of 128 pixels (except for the rightmost tile column).

There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.

If --enable-ext-tile is ON and --enable-ext-partition is ON:
     This is the new large scale tile coding configuration. The
     minimum/maximum tile width and height are 64/4096 pixels. Tile
     width and height must be multiples of 64 pixels. The uncompressed
     header contains two 6 bit fields that hold the tile width/heigh
     in units of 64 pixels. The maximum number of tile rows/columns
     is only limited by the maximum frame size of 65536x65536 pixels
     that can be coded in the bitstream. This yields a maximum of
     1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
     frame).

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     Same applies as above, except that in the bitstream the 2 fields
     containing the tile width/height are in units of the superblock
     size, and the superblock size itself is also coded in the bitstream.
     If the uncompressed header signals the use of 64x64 superblocks,
     then the tile width/height fields are 6 bits wide and are in units
     of 64 pixels. If the uncompressed header signals the use of 128x128
     superblocks, then the tile width/height fields are 5 bits wide and
     are in units of 128 pixels.

The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:

If --enable-ext-tile is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the base 2 logarithm of the desired number of tile columns
     and tile rows. The actual number of tile rows and tile columns,
     and the particular tile width and tile height are computed by the
     codec ensuring all of the above constraints are respected.

If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
     The valid values are in the range [1, 64] (which corresponds to
     [64, 4096] pixels in increments of 64.

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     If --sb-size=64 (default):
         The user interface is the same as in the previous point.
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 64 pixels, in the range [1, 64] (which corresponds
         to [64, 4096] pixels in increments of 64).
     If --sb-size=128 or --sb-size=dynamic:
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 128 pixels in the range [1, 32] (which corresponds
         to [128, 4096] pixels in increments of 128).

Change-Id: Idc9beee1ad12ff1634e83671985d14c680f9179a
2016-04-07 10:34:25 +01:00
James Zern
5ab46e0ecd Merge changes I7a1c0cba,Ie02b5caf,I2cbd85d7,I644f35b0
* changes:
  vpx_fdct16x16_1_sse2: improve load pattern
  vpx_fdct16x16_1_c/msa: fix accumulator overflow
  vpx_fdctNxN_1_sse2: reduce store size
  dct32x32_test: add PartialTrans32x32Test, Random
2016-04-06 02:51:53 +00:00
James Zern
38bc1d0f4b vpx_fdct16x16_1_sse2: improve load pattern
load the full row rather than doing 2 8-wide columns

Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
2016-04-04 16:03:42 -07:00
James Zern
eb64ea3e89 vpx_fdct16x16_1_c/msa: fix accumulator overflow
tran_low_t is only signed 16-bits in non-high-bitdepth mode

Change-Id: Ie02b5caf2658e8e71f995c17dd5ce666a4d64918
2016-04-04 16:03:41 -07:00
James Zern
3735def667 vpx_fdctNxN_1_sse2: reduce store size
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case

Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
2016-04-04 16:02:06 -07:00
Yi Luo
250935cab3 Optimized HBD 4x4 variance calculation
vpx_highbd_8/10/12_variance4x4_sse4_1 improves performance ~7%-11%.

Change-Id: Ida22bb2a2f7a58037cfd73e186d4f6267a960c02
2016-04-04 11:28:59 -07:00
James Zern
c21d437052 vpx_fdct32x32_1_msa: fix accumulator overflow
Change-Id: I33a5432eda3416382e1cea06b45082c0c65faa75
2016-04-02 11:04:38 -07:00
James Zern
f4cae05cd4 vpx_fdctNxN_1_c: remove unnecessary store
only output[0] needs to be set, the other values will be ignored in this
case.

Change-Id: I8e9692fc0d6d85700ba46f70c2e899a956023910
2016-04-01 12:21:59 -07:00
James Zern
0269df41c1 vpx_fdct32x32_1_c: fix accumulator overflow
tran_low_t is only 16-bits in non-high-bitdepth mode

Change-Id: Ifc06110c95e86e6d790c44250d52a538b2e9713b
2016-03-30 15:20:20 -07:00
Geza Lore
552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Yaowu Xu
c810740c36 Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp9/encoder/vp9_encoder.c
	vpx_dsp/x86/convolve.h

Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55
2016-03-28 17:44:28 -07:00
Yunqing Wang
5f5552d846 Optimize HBD up-sampled prediction functions
Optimized 2 up-sampled reference prediction functions in high-bit
depth case. This reduced the HBD encoding time by 3%.

Change-Id: I8663ffb5234f5e70168c0fc9ca676309fe8e98f2
2016-03-14 19:04:33 -07:00
Yunqing Wang
e6e2d886d3 Add high-precision sub-pixel search as a speed feature
Using the up-sampled reference frames in sub-pixel motion search is
enabled as a speed feature for good-quality mode speed 0 and speed 1.

Change-Id: Ieb454bf8c646ddb99e87bd64c8e74dbd78d84a50
2016-03-11 16:32:11 -08:00
Debargha Mukherjee
f34deab243 Adds compound wedge prediction modes
Incorporates wedge compound prediction modes.

Change-Id: Ie73b54b629105b9dcc5f3763be87f35b09ad2ec7
2016-03-10 07:19:54 -08:00
Scott LaVarnway
67c4c8244a VPX: loopfilter_mmx.asm using x86inc 2
This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61.

Fixes a decoder mismatch with 32bit PIC builds.

Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-08 04:24:47 -08:00
Geza Lore
938b8dfc73 Extend convolution functions to 128x128 for ext-partition.
Change-Id: I7f7e26cd1d58eb38417200550c6fbf4108c9f942
2016-03-07 11:39:27 +00:00
James Zern
9aa083d164 Revert "VPX: loopfilter_mmx.asm using x86inc"
This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f.

breaks 32-bit pic builds

Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-03-04 18:23:45 -08:00
Geza Lore
697bf5beff Add 128 pixel variance and SAD functions
Change-Id: I8fde245b32c9e586683a28aa6925da0b83850b39
2016-03-03 10:24:29 +00:00
Debargha Mukherjee
1d69ceee5c Adds masked variance and sad functions for wedge
Adds masked variance and sad functions needed for wedge
prediction modes to come.

Change-Id: I25b231bbc345e6a494316abb0a7d5cd5586a3a54
2016-03-01 17:28:56 -08:00
Yunqing Wang
342a368fd4 Do sub-pixel motion search in up-sampled reference frames
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.

Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.

Change-Id: I300ebaafff57e88914f3dedc8784cb21d316b04f
2016-02-29 12:14:47 -08:00
Scott LaVarnway
dd6729f826 VPX: Remove pmin/pmax from subpixel functions.
These instructions are unnecessary if the adds
are done in the correct order.

Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
2016-02-27 05:47:56 -08:00
Scott LaVarnway
51beb29f52 Merge "VPX: vpx_filter_block1d16_(v8, v8_avg)" 2016-02-27 13:31:18 +00:00
hui su
4aeabf1b0d Fix compiler warnings
Change-Id: Id7240260cec471a3f8d0986b9c8df06efda925f9
2016-02-26 13:52:49 -08:00
Yaowu Xu
a570cefcf8 Merge "Extend vpxssim to handle more HBD combinations" into nextgenv2 2016-02-26 15:57:40 +00:00
James Zern
654d2163c9 x86/convolve.h: remove redundant check in FUN_CONV_2D
the filter will be the same in this case

Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
2016-02-25 23:31:50 -08:00
James Zern
6d8c8c6201 x86/convolve.h: replace while w/if for w < 16
in non-hbd configurations; any high-bitdepth changes will be done in a
follow-up

Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
2016-02-25 21:44:06 -08:00
Scott LaVarnway
1f736e400f VPX: vpx_filter_block1d16_(v8, v8_avg)
Store result with one 16 byte store instead of
two 8 byte stores.

Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09
2016-02-25 06:15:24 -08:00
James Zern
b3ceb629ba x86/convolve.h: change filter[] || chains to |
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
2016-02-24 19:47:43 -08:00
hui su
8537826eb4 Fix some compiler warnings.
"taking the absolute value of unsigned type 'unsigned int' has no effect"

Change-Id: Iea1f67c2a3171a98ca89d5dc7192a5508d086c16
2016-02-24 11:17:33 -08:00
Yaowu Xu
aa6c754635 Merge remote-tracking branch 'webm/master' into nextgenv2 2016-02-24 10:53:17 -08:00
Scott LaVarnway
06d0e2fe6c BUG FIX: vpx_filter_block1d(8,4)_(v8, v8_avg)
Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1
2016-02-23 12:23:41 -08:00
Yaowu Xu
eeaf8e6b6c Extend vpxssim to handle more HBD combinations
Change-Id: I38426d946b74c9090a265d34b89e2db6693927c2
2016-02-22 16:09:08 -08:00
Yaowu Xu
38cfc45e07 Cleanup psnr.h
Change-Id: Id026e72ee655ee5bd645a89e378da0d462be367d
2016-02-22 15:37:40 -08:00
Yaowu Xu
d1c5cd4a30 Add shift stage in FASTSSIM computation
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.

Change-Id: I997799634076ef7b00fd051710544681ed536185
2016-02-22 14:58:10 -08:00
Yaowu Xu
195bf52bca Add shift stage for PSNRHVS computation
This commit adds the ability to shift down the working buffer when
source bit_depth is different than working bit_depth. It does so
by shift down to be consistent with source bit_depth.

Change-Id: Idfdbfc614d73fe445d62e35e642cc7d75e9dc4ff
2016-02-22 10:22:42 -08:00
Yaowu Xu
6e695da2d9 Move psnrhvs function declaration to psnr.h
From "ssim.h"

Change-Id: Ie53378794149ef8a844b4eb47ad4f08579de4b60
2016-02-22 08:38:49 -08:00
Scott LaVarnway
15ecdc3970 VPX: loopfilter_mmx.asm using x86inc
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
2016-02-18 15:34:58 -08:00
Yaowu Xu
acc4addb60 Merge "Add tests for Highbitdepth PSNR metric computations" into nextgenv2 2016-02-18 01:01:00 +00:00
Yaowu Xu
7823fbb45c Merge "Move PSNR related functions into vpx_dsp/psnr.c" into nextgenv2 2016-02-18 01:00:54 +00:00
Yaowu Xu
9fb593d0fc Add tests for Highbitdepth PSNR metric computations
Change-Id: I07324155f73bbdbe25bb7a7ccd587ebf9010ac7a
2016-02-17 21:28:22 +00:00
Yaowu Xu
7538501ad1 Move PSNR related functions into vpx_dsp/psnr.c
This makes all metric computation to locate at some place, also gets
rid of duplicate code between vp9 and vp10.

Change-Id: I24a2707d183a2419cd18a8343010adae185ffcd4
2016-02-17 13:05:34 -08:00
Debargha Mukherjee
35d9eadf08 Merge "Extends ext-tx to support 32x32 masked transforms" into nextgenv2 2016-02-17 18:33:10 +00:00
Debargha Mukherjee
7485498773 Extends ext-tx to support 32x32 masked transforms
Adds new 32x32 masked 1-d transforms that combine 1-D length-16
DCT with length-16 identity transforms.

To be continued in subsequent patches.

Change-Id: I0b4f66492d44c079b3c3b531ba48a97201de1484
2016-02-17 09:31:34 -08:00
Yaowu Xu
6ed7f7a516 Merge branch 'master' into nextgenv2 2016-02-17 07:23:58 -08:00
James Zern
9b44d9d00f split vpx_highbd_lpf_horizontal_16 in two
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter

Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
2016-02-16 23:13:58 -08:00
James Zern
1b519fb666 split vpx_lpf_horizontal_16 in two
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter

Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16 22:57:45 -08:00
James Zern
e7a23d703b vpx_highbd_lpf_horizontal_4: remove unused count param
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
2016-02-16 22:57:45 -08:00