Commit Graph

2230 Commits

Author SHA1 Message Date
Jingning Han
56df76bf1b Merge "Optimize 32x32 2D inverse DCT for speed-up" 2013-08-01 11:53:39 -07:00
Dmitry Kovalev
ff4bfa726b Merge "Adding missing const to vp9_extra_bits array." 2013-08-01 10:19:51 -07:00
Dmitry Kovalev
5b65246a71 Adding missing const to vp9_extra_bits array.
Change-Id: Icd128ab58719e0b9066bdfa66a5d0d427a84d6df
2013-07-31 18:51:18 -07:00
Jingning Han
9d67495f72 Optimize 32x32 2D inverse DCT for speed-up
This commit exploits the sparsity of quantized coefficient matrix.
It detects each 32x8 array and skip the corresponding inverse
transformation if all entries are zero.

For ped1080p at 8000 kbps, this on average reduces the runtime of
32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
cycles. It makes the overall encoding process about 2% faster at
speed 0. The speed-up is more pronounceable for the decoding process.

Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
2013-07-31 17:13:31 -07:00
Jingning Han
12f5762756 Remove unnecessary arguments in rd_pick_ref_frame
This commit removes redundant arguments passing in the function of
rd_pick_reference_frame. This resolves the clang warnings about
potential use of uninitialized values.

Change-Id: Ic68f949a9f8fcd0a583786b0c75321104ea44739
2013-07-31 17:04:13 -07:00
Dmitry Kovalev
8259cdf298 vp9_decodemv.c cleanup.
Inlining VP9_NMV_UPDATE_PROB constant, consistent local variable names.

Change-Id: I01692501982568fa535882d6b320e3c692f88abb
2013-07-31 15:03:36 -07:00
Dmitry Kovalev
9239e96536 Removing get_mi_{row, col} functions.
Passing mi_row and mi_col parameters to functions explicitly. Removing
unused xd argument from scale_mv function.

Change-Id: Icb4c495ec72d26fb066c14470d3ae0b741fbf18a
2013-07-31 14:06:55 -07:00
Dmitry Kovalev
3be9fd9120 Merge "Removing unused "ishp" arguments." 2013-07-31 12:03:04 -07:00
Dmitry Kovalev
0e0a6f840b Merge "Consistent update for inter_mode probabilities." 2013-07-31 12:02:35 -07:00
Dmitry Kovalev
500ade243a Removing unused "ishp" arguments.
Using different variable names "allow_hp" and "use_hp" instead of "usehp".

Change-Id: I0cd5996ddeb46bd754473b680a993c0aaf8eb879
2013-07-31 11:27:53 -07:00
Jingning Han
ac7bab7575 Merge "Make the use of ref_frame index consistent" 2013-07-31 09:11:37 -07:00
Jingning Han
86c384d398 Make the use of ref_frame index consistent
Refactor the frame buffer referencing in choose_partition and make
it consistent with other places. This means to prevent potential
issues when we extend reference frame buffer.

Change-Id: I5ff33ed5f671e1f4cc7049622212769a9b4578d9
2013-07-30 19:49:36 -07:00
Dmitry Kovalev
8701bc11df Consistent update for inter_mode probabilities.
Using inter-mode counts instead of inter-mode-tree branch counts inside
FRAME_COUNTS structure.

Change-Id: I60dde13af37d06146d7d15543311c1b5044e9e04
2013-07-30 18:06:34 -07:00
Adrian Grange
fbd73648dd Merge "Cleanup typos, remove unnecessary lines, replace switch" 2013-07-30 12:59:46 -07:00
Adrian Grange
b30a06b930 Cleanup typos, remove unnecessary lines, replace switch
Removed unnecessary code lines, replaced switch with an if,
fixed spelling errors and formatting.

Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
2013-07-30 12:10:32 -07:00
Yaowu Xu
88e48444da Merge "removed duplication" 2013-07-30 09:38:02 -07:00
Yaowu Xu
a15d1f3134 removed duplication
Change-Id: Ica23b66f6664e5a5b168499584f0afffbc54794f
2013-07-30 09:09:14 -07:00
Jingning Han
525745b17a Remove a redundant branching in tokenize_b
The tokenize_b function is only called when output flag is on. Hence
removing the conditional branch on it therein.

Change-Id: Ib709f47f23f39ca05a695faf86fa3377f11f2dd0
2013-07-29 17:08:13 -07:00
Jingning Han
455f2de20b Tune tokenization/detokenization flow for speed-up
This commit optimizes the tokenization and detokenization operational
flow for speed-up. It makes the coding process about 0.3% faster at
speed 0.

Change-Id: I28008df7482874e4b5f237f2d418ff82a249dd56
2013-07-29 16:15:30 -07:00
Jingning Han
b5323ed89a Skip redundant tokenization in rd loop
This commit makes the encoder skip the redundant tokenization process
in the rate-distortion optimization search loop, while updating the
entropy contexts accordingly. It makes the speed 0 encoding process
about 0.5% faster at no performance change.

Change-Id: I34a4155a0b5332afeb45c93a51c7f35a294d685c
2013-07-29 16:09:16 -07:00
Jingning Han
5875d7a4a4 Merge "16x16 inverse 2D-DCT with DC only" 2013-07-29 15:29:25 -07:00
John Koleszar
9c6fafb25b Merge "Remove unnecessary 64 byte alignment" 2013-07-29 15:09:15 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Dmitry Kovalev
828119d6ab Renaming txfm to tx for consistency in some places.
Change-Id: I2a6a646570e2af66315e7c658d00d99f80c4b127
2013-07-29 14:35:55 -07:00
John Koleszar
a31effca75 Remove unnecessary 64 byte alignment
Fixes a warning on MSVS 2012 where the alignment of vp9_default_iscan_8x8
didn't match between its declaration and definition.

Change-Id: I1466a15635f4b22594d705d570b7e399bfb6cf21
2013-07-29 14:02:02 -07:00
Dmitry Kovalev
730a34416f Renaming NB_TXFM_MODES constant to TX_MODES.
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
2013-07-29 13:38:40 -07:00
Dmitry Kovalev
23391ea835 Renaming TX_SIZE_MAX_SB to TX_SIZES.
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
2013-07-29 12:25:34 -07:00
Jingning Han
decb1b94de Merge "Shortcut 8x8/16x16 inverse 2D-DCT" 2013-07-29 11:04:07 -07:00
Dmitry Kovalev
cc0ff7ecfa Cleanup: replacing xd->mode_info_context with temp variable.
Change-Id: I5a3e83102784cabb918a5404405fcab99c5bb9b6
2013-07-26 19:05:37 -07:00
Ronald S. Bultje
118ccdcd30 Inverse dimension order in token_cost array.
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).

Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
2013-07-26 17:30:04 -07:00
Dmitry Kovalev
35e7e7b614 Merge "vp9_decodemv.c cleanup." 2013-07-26 17:24:34 -07:00
Ronald S. Bultje
6f3054b65d Merge "d45 intra prediction SSSE3 optimizations." 2013-07-26 17:21:09 -07:00
Ronald S. Bultje
dcacce6dd9 Merge "Save pixels instead of coefficients in intra4x4 RD loop." 2013-07-26 17:20:58 -07:00
Ronald S. Bultje
d30c8f41ef Merge "Add best_rd breakout in intra4x4 RD loop." 2013-07-26 17:20:51 -07:00
Jingning Han
38fa487164 Shortcut 8x8/16x16 inverse 2D-DCT
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.

Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
2013-07-26 17:19:14 -07:00
Dmitry Kovalev
d42e60d2d8 vp9_decodemv.c cleanup.
Renaming:
  read_intra_mode_info  -> read_intra_frame_mode_info
  read_inter_mode_info  -> read_inter_frame_mode_info
  read_intra_block_part -> read_intra_block_mode_info
  read_inter_block_part -> read_inter_block_mode_info
  read_ref_frame        -> read_ref_frames
  read_reference_frame  -> read_is_inter_block

Using num_4x4_blocks_{wide, high}_lookup instead of bit shifts.

Change-Id: I83c81573b4ef6f53f2f8d24683895014bebfba61
2013-07-26 16:49:49 -07:00
Jingning Han
b9c3dd481a Merge "Special handle on DC only inverse 8x8 2D-DCT" 2013-07-26 16:04:14 -07:00
Dmitry Kovalev
620861dedc Merge "Making read_inter_mode_info function more clear." 2013-07-26 15:47:40 -07:00
hkuang
aaa9755746 Merge "Fix some format error and code error in neon code." 2013-07-26 15:24:28 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00
hkuang
588b4daf54 Fix some format error and code error in neon code.
Change-Id: I748dee8938dfb19f417f24eed005f3d216f83a82
2013-07-26 14:14:57 -07:00
Dmitry Kovalev
c09b81719f Merge "General cleanups." 2013-07-26 13:59:39 -07:00
Ronald S. Bultje
94b0c6791d d45 intra prediction SSSE3 optimizations.
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
2013-07-26 13:30:02 -07:00
Yaowu Xu
4f75a1f4ed Merge "Auto min and max partition size experiment." 2013-07-26 12:10:27 -07:00
Paul Wilkins
fe5e2a91bb Auto min and max partition size experiment.
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.

This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.

Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However,  I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.

Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
2013-07-26 18:30:49 +01:00
Yunqing Wang
52256cdbca Modify static threshold calculation
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;

Similar speedup is achieved. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.952  5.077s(50f)
akiyo            500        500        48.866  4.169s(50f)

parkjoy(1080p)   4000       0          30.388  78.20s(30f)
parkjoy          4000       500        30.367  70.85s(30f)

sunflower(1080p) 4000       0          44.402  74.55s(30f)
sunflower        4000       500        44.414  68.69s(30f)

Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
2013-07-25 19:59:33 -07:00
Dmitry Kovalev
048e9c0991 Making read_inter_mode_info function more clear.
Now read_inter_mode_info calls read_intra_block_part (renamed from
read_intra_block_modes) or read_inter_block_part (just added).

Change-Id: I541badea6b663e0ae692ec158665efb90ed20c03
2013-07-25 15:30:18 -07:00
Johann
67b07c520d Merge "Add const to vp9_accum_mv_refs parameter" 2013-07-25 15:10:52 -07:00
Yunqing Wang
845fd5011c Merge "Add encoding option --static-thresh" 2013-07-25 14:58:00 -07:00
Yunqing Wang
d36852b702 Add encoding option --static-thresh
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.

The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.

Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;

For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.923  5.635s(50f)
akiyo            500        500        48.863  4.402s(50f)

parkjoy(1080p)   4000       0          30.380  77.54s(30f)
parkjoy          4000       500        30.384  69.59s(30f)

sunflower(1080p) 4000       0          44.461  85.2s(30f)
sunflower        4000       500        44.418  78.1s(30f)

Higher static-thresh values give larger speedup with larger
quality loss.

Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
2013-07-25 14:28:05 -07:00