Commit Graph

8148 Commits

Author SHA1 Message Date
Yaowu Xu
149822e399 Merge "Correctly report "Unsupported bitstream profile"" 2015-07-20 22:49:54 +00:00
Jingning Han
f62805fae0 Make local functions in vp9_dct.c static
This commit limits the scope of 1-D DCT and ADST functions within
vp9_dct.c and makes them static. This largely clears out the cross
referencing issue between vp9_dct.c and the SIMD optimizations.

Change-Id: If7cac478b11bb32328ccf70a9f60b709dad43d7f
2015-07-20 15:15:27 -07:00
Yaowu Xu
add779e425 Merge "Remove vp9_ prefix from bit writer files" 2015-07-20 21:21:53 +00:00
Yaowu Xu
7a63e6446b Merge "Move bit writer files to vpx_dsp/" 2015-07-20 21:21:41 +00:00
Jingning Han
f987e64476 Merge "Unify the high bit-depth forward hybrid transforms" 2015-07-20 20:19:03 +00:00
Jingning Han
9e23c6d534 Merge "Refactor highbd forward transform use case" 2015-07-20 20:18:22 +00:00
Yaowu Xu
1fcef81cb0 Remove vp9_ prefix from bit writer files
Change-Id: I07647c7482b9ec498fbad3a9c9901f72b2336500
2015-07-20 11:20:03 -07:00
Yaowu Xu
c5ad31e518 Move bit writer files to vpx_dsp/
Change-Id: Id27e0007a0feac821ca66bcecbf3a723305da82d
2015-07-20 11:20:02 -07:00
Jingning Han
e253eaa036 Unify the high bit-depth forward hybrid transforms
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.

Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
2015-07-20 11:17:49 -07:00
hui su
f744613be9 Fix uninitialized value warning
Change-Id: Ib919a8ec2ec66d460d2f8a26d72aabc09dcbbd72
2015-07-20 11:13:00 -07:00
Jingning Han
389ed6da10 Refactor highbd forward transform use case
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.

Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
2015-07-20 10:31:17 -07:00
Yaowu Xu
345ff1a2f2 Merge "Removed vp9_ prefix from vpx_dsp/bitreader file names" 2015-07-20 17:12:08 +00:00
Yunqing Wang
f65473c036 Merge "Migrate quantization functions from vp9/ to vpx_dsp/" 2015-07-20 16:20:07 +00:00
Yaowu Xu
87d2c3c063 Removed vp9_ prefix from vpx_dsp/bitreader file names
Change-Id: I0426126d0a65f13f9250983e44cc366b1b1a9c4a
2015-07-20 08:57:35 -07:00
Yaowu Xu
b0e6811ace Merge "Move bit reader files to vpx_dsp" 2015-07-20 14:52:50 +00:00
Jingning Han
a2b623d467 Merge "Remove dspr2 loop filter files from vp9_common.mk" 2015-07-18 01:33:35 +00:00
Jingning Han
44925b4c17 Merge "Rename loop filter function from vp9_ to vpx_" 2015-07-18 01:33:15 +00:00
Jingning Han
fd15cd5ad9 Remove dspr2 loop filter files from vp9_common.mk
These files have been moved to vpx_dsp directory. Clean the
vp9_common make file accordingly.

Change-Id: I9b1e820376421c801f705157e60cc7a55487f469
2015-07-17 16:38:53 -07:00
Yunqing Wang
38f1fbbb75 Migrate quantization functions from vp9/ to vpx_dsp/
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32

vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32

The purpose of doing that was to allow these functions to be shared
by multiple codecs.

Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-17 16:38:14 -07:00
Jingning Han
2992739b5d Rename loop filter function from vp9_ to vpx_
Change-Id: I6f424bb8daec26bf8482b5d75dd9b0e45c11a665
2015-07-17 15:55:02 -07:00
Yaowu Xu
97279ed2e2 Move bit reader files to vpx_dsp
Change-Id: Ib1cb1fbe92a39ff5312cee069559be6d3ea458d0
2015-07-17 15:38:40 -07:00
Marco
479c669a61 Merge "Dynamic resize 1 pass mode: fix buffer underflow threshold." 2015-07-17 21:31:56 +00:00
Jingning Han
4735edd00f Migrate mips dspr2 loop filter implementation from vp9 to vpx
This commit moves the loop filter dspr2 implementation from vp9 to
vpx_dsp directory. It also fixes header file format issues.

Change-Id: I09203ed4bd267d7fd76bb79a6ee84a37646206b2
2015-07-17 11:51:05 -07:00
Marco
7501de267c Dynamic resize 1 pass mode: fix buffer underflow threshold.
Remove the use of drop_frames_water_mark, as this is used for
frame dropping control. Use fixed threshold for now on buffer underflow.

Change-Id: If0ddda9f7f6fa96067cdcb0eccb42e17bda37c32
2015-07-17 11:25:15 -07:00
Yaowu Xu
7c0c62df1d Correctly report "Unsupported bitstream profile"
For vp9 decoder build without profile 2 and profile 3 support, this
commit changes to report error "Unsupported bitstream profile" for
input streams in profile 2 or 3, rather than other misleading error
information.

In addition, one of the invalid files in unit tests is actually coded
profile 2, this commit makes it tested only when the decoder is built
with vp9-highbitdepth.

This fixes issue #1028.

Change-Id: I8b6c1210787c8f89c703a546687dcf973ac20fc0
2015-07-17 10:51:02 -07:00
Jingning Han
d0750d287f Resolve dspr2 loop filter dependency complexity
Narrow the scope of dependency required by the dspr2 implementation
of loop filters.

Change-Id: Ib8d99dc7d9c231f69dd31d02e0a89e5bd0545a28
2015-07-17 10:38:35 -07:00
Jingning Han
55e80a3cc6 Replace vp9_common_dspr2.h with common_dspr2.h
Narrow the scope of dependency in dspr2 loop filter implementation.

Change-Id: I30426d7e4d41575a82286f1d3c5881aeb99a3250
2015-07-17 10:31:38 -07:00
Jingning Han
b8ff84b7f8 Create common dspr2 header file in vpx_dsp
Move the common prefetch_load/store in dspr2 to header file in
vpx_dsp/mips.

Change-Id: I8acc22970f2a0ef97d73061e39a3ae65c6955eac
2015-07-17 09:54:02 -07:00
Jingning Han
3590a4b437 Merge "Simplify dependencies in dspr2 related codes" 2015-07-17 16:12:52 +00:00
Jingning Han
845aad42b8 Merge "Migrate loop filter functions from vp9/ to vpx_dsp/" 2015-07-17 16:12:01 +00:00
Jingning Han
d190ad228f Simplify dependencies in dspr2 related codes
The common_dspr2.h should be independent of codec-specific data
structures.

Change-Id: I34ee1f9552c2d2d205fd7f1813cdf312c7ff5d2b
2015-07-16 18:22:48 -07:00
Jingning Han
50adfdf5ba Migrate loop filter functions from vp9/ to vpx_dsp/
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.

Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
2015-07-16 16:40:47 -07:00
Marco
f83f9dbb3a Merge "Dynamic resize for 1 pass: update of golden frame." 2015-07-16 19:38:27 +00:00
Marco
7ae1aa6b37 Dynamic resize for 1 pass: update of golden frame.
In aq-mode=3 under a resizing action (i.e., resize_pending != 0),
force an update of the golden reference frame.

Change-Id: I14806f6db71b5f8c827678cc5e1fc913c138a9a4
2015-07-16 09:27:20 -07:00
paulwilkins
7d15444d07 Fix bug in setting sf->use_square_partition_only.
Fix bug in setting this flag for animated content.
The bug did cause quality to increase because far
more frames are not boosted than boosted.

However, the speed trade off to gain is a lot less
favorable and the behavior was not as intended.

Change-Id: I89fb70419c88b26f40b3534de0481730a1b3fcfa
2015-07-16 16:20:39 +01:00
Frank Galligan
8be1dcb4cb Merge "Add vp9_int_pro_col_neon." 2015-07-16 05:45:17 +00:00
Jingning Han
b946e5ce0f Merge "Add vpx_dsp_common.h file" 2015-07-15 22:41:54 +00:00
Jingning Han
de740b258b Merge "Remove redundant header files in vp9_loopfilter_filers.c" 2015-07-15 22:41:11 +00:00
Marco
eaf1ffd837 Merge "Fix to resize logic for 1 pass mode." 2015-07-15 21:43:07 +00:00
Jingning Han
db8e731b8d Add vpx_dsp_common.h file
Move the clamp functions to vpx_dsp_common.h file. Clear out the
dependency of vp9_loopfilter_filters.c on vp9_common.h file.

Change-Id: I9c4b928bcd7f597106b5aa96354356d3775a3431
2015-07-15 13:03:23 -07:00
Jingning Han
3fe83cdf81 Remove redundant header files in vp9_loopfilter_filers.c
This cleans out the unnecessary dependency on vp9 codec-specific
data structures.

Change-Id: Iadbe431174a0f9bf9423f39ab854fc18be554bea
2015-07-15 12:44:47 -07:00
Marco
2f66fdd375 Adjust some logic for dynamic_resize 1 pass mode.
Use drop_frames_water_mark for threshold on buffer underflow,
and change threshold for resize down.

Change-Id: I2de19adce50abe9bcdc0b107528cec8cc1857fcc
2015-07-15 11:54:04 -07:00
Frank Galligan
1c39998e39 Add vp9_int_pro_col_neon.
BUG=https://code.google.com/p/webm/issues/detail?id=1023

Change-Id: I212a1d67b23ce3b5ce08800de369b25b9e375e7d
2015-07-15 09:04:28 -07:00
Marco
7b756183aa Fix to source scaling for dynamic_resize.
The fast scaling for 1 pass mode was being used only on the
first frame after resizing event (because resize_scale_num/den
is set to 1 and only changed for first frame following resize event).

Change-Id: I723b63e21823eb858f25f5662d2bbe4f1842e61f
2015-07-15 08:28:59 -07:00
Marco
dc7da005d7 Fix to resize logic for 1 pass mode.
Proper use/update of resize_state and resize_pending to constrain
the total amount of downsizing to be at most one scale down, for now.

Change-Id: Id18fc32499f2fbdbec16728dcdc9e4eac09098f0
2015-07-14 16:23:57 -07:00
Alex Converse
fa94dbda81 Merge "Add an SSE2 version of vp9_iwht4x4_16_add" 2015-07-14 22:11:47 +00:00
Alex Converse
d8426d6f12 Add an SSE2 version of vp9_iwht4x4_16_add
Roughly half as many cycles as plain C.

Change-Id: I8c16c29940b76d54ee7e4fb874c328ce90bff5d4
2015-07-14 14:23:32 -07:00
paulwilkins
e11878c8e3 Merge "Add extra resize trigger for frames above maximum allowed size." 2015-07-14 18:24:13 +00:00
Debargha Mukherjee
3c5244886a Fixes part of merge regression from adding arf parameters.
From Change  Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
There is still an issue relating to one animated test clip with repeat
patterns where this change effectively increase the default  maximum
arf interval by +1. This can be examined seperately.

Change-Id: Idd01d5480fc45202d8a059a0c3afc0997cc5bdd1
2015-07-14 18:32:38 +01:00
Jingning Han
cda17e12ed Merge "Refactor intra block prediction and reconstruction process" 2015-07-14 16:22:42 +00:00
Jingning Han
d5975b733b Merge "Refactor intra block prediction function" 2015-07-14 16:22:21 +00:00
Jingning Han
cb1e817c77 Refactor intra block prediction and reconstruction process
Flaten the intra block decoding process. It removes the legacy
foreach_transformed_block use in the decoder. This saves cycles
spent on retrieving the transform block position.

Change-Id: I21969afa50bb0a8ca292ef72f3569f33f663ef00
2015-07-13 22:24:17 +00:00
Jingning Han
81452cf0b7 Refactor intra block prediction function
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.

Change-Id: If00142512eb88992613d6609356dfd73ba390138
2015-07-13 15:20:47 -07:00
Marco
e03b8b78b2 Merge "Dynamic resize for real-time: source scaling" 2015-07-13 19:06:26 +00:00
Yaowu Xu
0cdc85d8cf Merge "Revert "Add an SSE2 version of vp9_iwht4x4_16_add."" 2015-07-13 16:27:10 +00:00
Yaowu Xu
ae5394b9e2 Revert "Add an SSE2 version of vp9_iwht4x4_16_add."
This reverts commit f8d3501640.

Change-Id: If8c7af403c091b7fb447a6f0c73fecdbccbc51b3
2015-07-13 16:26:27 +00:00
Jim Bankoski
c243835303 Merge "Revert "Fill buffer speed up"" 2015-07-13 14:39:01 +00:00
Jim Bankoski
da9db83270 Revert "Fill buffer speed up"
This reverts commit 9b4f9f45ee.

Change-Id: I23545ac8c7464127f7466fc6a58de517874fe0cf
2015-07-13 13:47:46 +00:00
Marco
4bbd95512a Dynamic resize for real-time: source scaling
Use faster scaling on source.

Change-Id: I968df97239a86834c96126b86832d3d6d0875a53
2015-07-10 11:04:18 -07:00
Jim Bankoski
db50037ece Merge "Fill buffer speed up" 2015-07-09 20:26:23 +00:00
Jim Bankoski
9b4f9f45ee Fill buffer speed up
Eliminates the byte by byte read from bool decoder,  by reading
in a size_t and then shifting it into place.

Change-Id: Id89241977103fc3b973e4ed172a5cbf246998e5d
2015-07-09 11:41:30 -07:00
paulwilkins
4b44e46de0 Merge "Changes to use of rectangular partitions." 2015-07-09 18:34:41 +00:00
Yaowu Xu
49fa5276fe Merge "Remove clamp operations." 2015-07-09 17:49:18 +00:00
Yaowu Xu
f70c80289c Merge "Clean out more MSVC warnings" 2015-07-09 17:49:08 +00:00
Scott LaVarnway
e8103f3676 Merge "Eliminate num_8x8 and num_4x4 width/height lookups" 2015-07-09 17:16:22 +00:00
Alex Converse
74f869b962 Merge "Add an SSE2 version of vp9_iwht4x4_16_add." 2015-07-09 16:57:03 +00:00
paulwilkins
2d637ca36d Merge "Change speed and rd features for formatting bars." 2015-07-09 16:38:38 +00:00
Scott LaVarnway
13a4f14710 Eliminate num_8x8 and num_4x4 width/height lookups
Also some log2 lookups.

Pass in 8x8 block width/height and log2 num4x4s instead.

Change-Id: I8ea9a1ec1e0bbab23f8ba556954a1b5433f4d613
2015-07-09 05:30:46 -07:00
Yaowu Xu
b58c99eb71 Remove clamp operations.
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.

Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
2015-07-08 17:44:19 -07:00
Jingning Han
535cc6d87f Format fixes in vp9_encodeframe.c and vp9_encodemb.c
Change-Id: Ib1303dac9043ab1b1f8fce54611cf4ea8a208038
2015-07-09 00:04:28 +00:00
Jingning Han
8783a8a97c Refactor transform block loop for inter mode decoding
Rework the inter mode transform block decoding loop. Replace the
block index with the row and col index as the input argument. It
saves function call to compute the row and col index according to
the block index and overall block size, and many if statements
associated with the transform block position relative to the coding
block. For the test bit-stream pedestrian_area 1080p at 5 Mbps,
the decoding speed goes up from 81.13 fps to 81.92 fps.

Note that the intra coded block decoding needs more refactoring
work than the inter ones. So keep it using foreach_transforme_block
as for now.

Change-Id: I5622bdae7be28ed5af96693274057f55ba9b4fb4
2015-07-08 22:55:16 +00:00
Yaowu Xu
c369daf3ea Clean out more MSVC warnings
Change-Id: I1bab0c104df2ec4825d050cd516e26ab635a7b3e
2015-07-08 15:09:20 -07:00
Alex Converse
f8d3501640 Add an SSE2 version of vp9_iwht4x4_16_add.
80% fewer cycles than C

Change-Id: I841bde1e268ddd33ae2ee75eee94737a400e2cde
2015-07-08 15:00:51 -07:00
Alex Converse
8bf791e7ef Merge "Don't allocate dqcoeff in MACROBLOCKD." 2015-07-08 20:42:36 +00:00
Alex Converse
89090d8046 Don't allocate dqcoeff in MACROBLOCKD.
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.

Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
2015-07-08 12:37:55 -07:00
Jingning Han
66da771040 Merge "Refactor inverse_transform_block argument list" 2015-07-08 19:28:25 +00:00
Jingning Han
0497d3a827 Merge "Reset dqcoeff[0] only if eob is 1" 2015-07-08 19:27:22 +00:00
Frank Galligan
b770def572 Merge "VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization" 2015-07-08 18:15:39 +00:00
paulwilkins
a6f2a9619b Add extra resize trigger for frames above maximum allowed size.
Even if the recode loop is not enabled for the current frame type
trap the case where the projected size of a a frame is above the
maximum allowed in recode_loop_test()

Change-Id: I453004694b8f8699e3c2a83252e9f83adccdda4e
2015-07-08 18:15:10 +01:00
paulwilkins
8dd466edc8 Changes to use of rectangular partitions.
Changes to allow more use of rectangular partitions at
speeds 1 and 2 for content classed by the first pass as
animation and for blocks near the active image edge.

This has quite a big impact in quality for the animated
test sequence but also hurts encode speed for speed 2.

For other content types the impact on both speed and
quality is small.

Added some plumbing for detection of internal vertical
image edges.

Change-Id: I3fc48de2349f8cb87946caaf0b06dbb0ea261a9a
2015-07-08 18:14:12 +01:00
paulwilkins
a126b6ce7d Change speed and rd features for formatting bars.
Change speed features / behavior for split mode when there
is an internal active edge (e.g. formatting bars).

Remove some threshold constraints in rd code near the active
edge of the image.

Add some plumbing for left and right active edge detection.

Patch set 5. Limit rd pass through for sub 8x8 to internal active edges.
This takes away any speed penalty for most clips but keeps the enhanced
edge coding for the more critical case of internal image edges

Change-Id: If644e4762874de4fe9cbb0a66211953fa74c13a5
2015-07-08 17:51:42 +01:00
Jingning Han
7e0d0de211 Refactor inverse_transform_block argument list
Replace block index with transform type in the argument list. This
allows to save an extra fetch to the prediction mode. For pedestrian
area 1080p coded at 5 Mbps with single tile, the average decoding
speed goes up from 80.55 fps (before the refactoring series) to
81.13 fps.

Change-Id: Icbebf84ce63c19c0c92f3690ed201f6c3eab7881
2015-07-08 09:26:02 -07:00
James Zern
892128f6ca Merge "vp9_entropymv: remove vp9_get_mv_mag()" 2015-07-08 01:27:13 +00:00
Frank Galligan
5327fcf857 Merge "Add vp9_int_pro_row_neon." 2015-07-08 00:16:03 +00:00
Johann
ac7f403cbe Merge "Move sub pixel variance to vpx_dsp" 2015-07-07 23:57:18 +00:00
Jingning Han
55c2646666 Merge "Rework scan order fetch logic for decoder" 2015-07-07 23:09:39 +00:00
Johann
6a82f0d7fb Move sub pixel variance to vpx_dsp
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-07 15:51:04 -07:00
Marco
155b9416b3 Merge "Update to speed 5 non-rd mode partition search." 2015-07-07 22:47:47 +00:00
Jingning Han
c2d0f9ddeb Merge "Add vp9_ prefix to init_macroblockd" 2015-07-07 22:35:45 +00:00
Jingning Han
6e6c57da9a Merge "Reduce dqcoeff array size in decoder" 2015-07-07 22:35:31 +00:00
Jingning Han
76ccba9ec8 Reset dqcoeff[0] only if eob is 1
If only the first dequantized coefficient is non-zero, reset
dqcoeff[0] to zero directly.

Change-Id: I0197ba72028a8ec436f0b1b9abcc1c0ae5d70abe
2015-07-07 15:20:34 -07:00
Jingning Han
97d1f1aaae Rework scan order fetch logic for decoder
Save redundant call for getting prediction mode to obtain scan
order for detokenization.

Change-Id: I0683ef119f1579d1261ed5d59052a1745b68ef6f
2015-07-07 15:03:21 -07:00
Jingning Han
a652048efd Add vp9_ prefix to init_macroblockd
Change-Id: I202d4924e627eec94838741df004ed9259d38b88
2015-07-07 12:00:01 -07:00
Marco
478fbc8f23 Update to speed 5 non-rd mode partition search.
If the pre-selected partition size (from variance partition) is
32x32, also apply nonrd partition search for 32x32 and 16x16 size.

Overall small positive gain in metrics, average ~1%.
Some visual improvement, for lower resolutions.

Change-Id: I69cb425bda94f7d13d34c451ab30e9276335a30e
2015-07-07 11:52:01 -07:00
Jingning Han
cccad1c5de Reduce dqcoeff array size in decoder
The decoding process handles detokenization and reconstruction per
transform block sequentially. There is no need to offset the dqcoeff
buffer according to the transform block index. This allows to
reduce the memory spill and improve cache performance.

Change-Id: Ibb8bfe532a7a08fcabaf6d42cbec1e986901d32d
2015-07-07 11:36:05 -07:00
Yaowu Xu
a8f8b83cef Allows using optimzed version vp9_fdct8x8
Change-Id: I59cecb7178a93cdee7ad535fa996ef0caa6e988c
2015-07-07 10:28:42 -07:00
paulwilkins
02b3b05278 Merge "Alter partition search at image edge." 2015-07-07 12:44:28 +00:00
paulwilkins
8051b6d256 Merge "Error score recalibration for inactive regions." 2015-07-07 08:44:35 +00:00
paulwilkins
00c0cbb445 Merge "ARF Boost correction for inactive regions." 2015-07-07 08:44:17 +00:00
James Zern
c6d90f0535 vp9_entropymv: remove vp9_get_mv_mag()
inline the code directly in read_mv_component(), the only place where it
was being used; this removes a function call in a hot function

Change-Id: I66f99c0c9ce3bc310101dbca4a470f023cc6fb55
2015-07-06 22:30:21 -07:00