Commit Graph

610 Commits

Author SHA1 Message Date
Paul Wilkins
f76f52df61 Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.

At high speed settings the key frame is now taking a high %
of the cycles.

This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings

TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.

Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.

Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
2013-08-23 16:10:30 +01:00
Dmitry Kovalev
640dea4d9d Adding vp9_is_scaled function.
Change-Id: Ieb7077ca3586b9491912027eed450a4f6fd38d30
2013-08-22 14:04:59 -07:00
Jingning Han
fcb890d751 Merge "Enable zero coeff check in sub8x8 UV rd loop" 2013-08-21 22:07:00 -07:00
Dmitry Kovalev
2f1a0a0e2c Removing PLANE_TYPE argument from cost_coeffs function.
We can determine plane_type for another function arguments.

Change-Id: I85331877aedb357632ae916a37b5b15f22c0bb1f
2013-08-21 13:02:28 -07:00
Adrian Grange
ce28d0ca89 Fix typos and minor stylistic cleanup
Change-Id: I32e43474e8651ef2eb181d24860a8f118cfea7bf
2013-08-21 08:45:42 -07:00
Dmitry Kovalev
7f814c6bf8 Merge "Passing plane_bsize to foreach_transformed_block_visitor." 2013-08-20 14:25:01 -07:00
Jingning Han
1bf1428654 Enable zero coeff check in sub8x8 UV rd loop
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.

Change-Id: Ia4bc7700437d5e13d7cdad4cf9ae57ab036d3e97
2013-08-20 10:33:42 -07:00
Deb Mukherjee
2ffe64ad5c Cleanup/enhancements of switchable filter search
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.

Results: derfraw300
threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)

Based on these results, the threshold is chosen as 16 for speed 1,
32 for speed 2, 64 for speed 3 and 96 for speed 4.

Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
2013-08-20 09:47:04 -07:00
Jingning Han
3275ad701a Enable early termination in uv rd loop
This commit enables early termination in the rate-distortion
optimization search loop for chroma components. When the cumulative
rd cost is above the current best value, skip the rest per-block
transform/quantization/coeff_cost and continue to the next
prediction mode.

For bus_cif at 2000 kbps, the average run-time goes down from
168546ms -> 164678ms, (2% speed-up) at speed 0
 36197ms ->  34465ms, (4% speed-up) at speed 1

Change-Id: I9d3043864126e62bd0166250d66b3170d520b3c0
2013-08-19 16:31:19 -07:00
Dmitry Kovalev
82d4d9a008 Passing plane_bsize to foreach_transformed_block_visitor.
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.

Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
2013-08-19 15:47:24 -07:00
Jingning Han
31c97c2bdf Merge "Fix potential use of uninitialized value" 2013-08-19 15:15:58 -07:00
Jingning Han
5dc0b309ab Merge "Fix the returned distortion value in rd_pick_intra" 2013-08-19 14:34:19 -07:00
Dmitry Kovalev
2e3478a593 Using plane_bsize instead of bsize.
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.

Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
2013-08-19 13:20:21 -07:00
Jingning Han
b34ce04378 Fix potential use of uninitialized value
Initialize the best mode and tx_size values in the rate-distortion
optimization search loop.

Change-Id: Ibfb5c0895691f172abcd4265c23aef4cb99fa8af
2013-08-19 11:15:53 -07:00
Jingning Han
f67919ae86 Fix the returned distortion value in rd_pick_intra
Return the distortion value in vp9_rd_pick_intra_mode_sb as sum of
dist_y and dist_uv. Remove the right shift operation on dist_uv,
and make it consistent with that of vp9_rd_pick_inter_mode_sb.

Change-Id: I9d564e242d9add38e32595d33b0e0dddb1d55e5b
2013-08-16 21:23:22 -07:00
Dmitry Kovalev
26e5b5e25d Removing unused or redundant arguments from *_args structures.
Redundant dst, pre[2] from build_inter_predictors_args, unused cm from
encode_b_args.

Change-Id: I2c476cd328c5c0cca4c78ba451ca6ba2a2c37e2d
2013-08-16 12:51:20 -07:00
Dmitry Kovalev
367cb10fcf Merge "Moving from ss_txfrm_size to tx_size." 2013-08-16 12:46:45 -07:00
Adrian Grange
79f4c1b9a4 Fixed typos and formatting
Change-Id: I3814984a624bc64147c57efa74fbdda8eda47262
2013-08-16 09:15:26 -07:00
Dmitry Kovalev
afd9bd3e3c Moving from ss_txfrm_size to tx_size.
Updating foreach_transformed_block_visitor and corresponding functions
to accept tx_size instead of ss_txfrm_size. List of functions per file:

vp9_decodframe.c
  decode_block
  decode_block_intra

vp9_detokenize.c
  decode_block

vp9_encodemb.c
  optimize_block
  vp9_xform_quant
  vp9_encode_block_intra

vp9_rdopt.c
  dist_block
  rate_block
  block_yrd_txfm

vp9_tokenize.c
  set_entropy_context_b
  tokenize_b
  is_skippable

Change-Id: I351bf563eb36cf34db71c3f06b9bbc9a61b55b73
2013-08-15 17:03:03 -07:00
Jingning Han
5e80a49307 Merge "Refactor rd loop for chroma components" 2013-08-15 16:02:12 -07:00
Dmitry Kovalev
9451e8d37e Merge "Converting code from using ss_txfrm_size to tx_size." 2013-08-15 15:21:09 -07:00
Dmitry Kovalev
939b1e4a8c Merge "Moving segmentation struct from MACROBLOCKD to VP9_COMMON." 2013-08-15 15:14:32 -07:00
Jingning Han
68369ca897 Refactor rd loop for chroma components
This commit makes the rate-distortion optimization search of chroma
components consistent across all block sizes. It removes redundant
codes.

Change-Id: I7e76f54d045e8efdd41d84a164c71f55b484471b
2013-08-15 14:54:48 -07:00
Jingning Han
ca983f34f7 Merge "Unify luma and chroma rd-cost estimation" 2013-08-15 13:48:15 -07:00
Dmitry Kovalev
bb3b817c1e Converting code from using ss_txfrm_size to tx_size.
Updated function signatures:
  txfrm_block_to_raster_block
  txfrm_block_to_raster_xy
  extend_for_intra
  vp9_optimize_b

Change-Id: I7213f4c4b1b9ec802f90621d5ba61d5e4dac5e0a
2013-08-15 11:44:57 -07:00
Dmitry Kovalev
6f4fa44c42 Using { 0 } for initialization instead of memset.
Change-Id: I4fad357465022d14bfc7e13b348c6da267587314
2013-08-15 11:37:56 -07:00
Dmitry Kovalev
b7616e387e Moving segmentation struct from MACROBLOCKD to VP9_COMMON.
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.

Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
2013-08-15 10:47:48 -07:00
Jingning Han
ec01f52ffa Unify luma and chroma rd-cost estimation
This commit unifies the rate-distortion cost calculation process of
luma and chroma components. It allows early termination to be enabled
later in the rd search loop of chroma components, in consistent with
luma pixels.

Change-Id: I2e52a7c6496176bf2a5e3ef338d34ceb8aad9b3d
2013-08-15 09:41:33 -07:00
Paul Wilkins
26fead7ecf Renaming in MB_MODE_INFO
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.

This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.

TODO clean up the nomenclature more widely in respect of
mbmi and bmi.

Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
2013-08-14 12:47:52 +01:00
Jingning Han
7e0f88b6be Use lookup table to find largest txfm size
Refactor choose_largest_txfm_size_ and make it find the largest
transform size via lookup table.

Change-Id: I685e0396d71111b599d5367ab1b9c934bd5490c8
2013-08-13 10:32:14 -07:00
Jingning Han
dc70fbe42d Merge "Refactor model based tx search in super_block_yrd" 2013-08-13 08:48:49 -07:00
Jingning Han
78136edcdc SSE2 high precision 32x32 forward DCT
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.

Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
2013-08-12 16:52:53 -07:00
Jingning Han
14cc7b319f Refactor model based tx search in super_block_yrd
Remove unnecessary conditional branches in model-based transform
size search.

Change-Id: Ic862dc33ed6710a186f6248239dd5f09b5c19981
2013-08-12 16:34:48 -07:00
Dmitry Kovalev
1aedfc992a Using MV* instead of int_mv* as argument of vp9_clamp_mv_min_max.
Change-Id: I3c45916a9059f11b41e9d798e34ffee052969a44
2013-08-12 13:56:04 -07:00
Dmitry Kovalev
3c43ec206c Renaming BLOCK_SIZE_TYPES constant to BLOCK_SIZES.
There will be another change set to rename BLOCK_SIZE_TYPE enum to
BLOCK_SIZE.

Change-Id: I8d1dfc873d6186fa5e554262f5169e929978085e
2013-08-09 17:47:32 -07:00
Dmitry Kovalev
e7c5ca8983 Merge "Inlining 16 as a stride for BLOCK_OFFSET macro." 2013-08-09 17:22:46 -07:00
James Zern
ef101af8ae Merge "vp9_rd_pick_inter_mode_sb: fix uninitialized value" 2013-08-09 17:13:32 -07:00
Dmitry Kovalev
f1559bdeaf Inlining 16 as a stride for BLOCK_OFFSET macro.
Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82
2013-08-09 16:40:05 -07:00
James Zern
f295774d43 vp9_rd_pick_inter_mode_sb: fix uninitialized value
'skippable' can remain unset and negatively affect later decisions

address one aspect of issue #599

Change-Id: Iffdf0ac2e49ac481c27dc27c87fa546d4167bb28
2013-08-09 16:26:22 -07:00
Deb Mukherjee
2158909fc3 Merge "Adds a new subpel motion function" 2013-08-08 12:26:55 -07:00
Deb Mukherjee
1ba91a84ad Adds a new subpel motion function
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.

This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.

Also includes various cleanups for consistency and redundancy removal.

Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%

Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
2013-08-08 11:41:49 -07:00
Dmitry Kovalev
8db2675b97 Adding ss_size_lookup table.
Removing the old one bsize_from_dim_lookup. Now we have a way to determine
block size for plane using its subsampling values (ss_size_lookup). And
then we can find the number of pixels in the block (num_pels_log2_lookup).

Change-Id: I6fc981da2ae093de81741d3d78eaefed11015db9
2013-08-07 15:33:17 -07:00
Deb Mukherjee
71b43b0ff0 Clean ups of the subpel search functions
Removes some unused code and speed features, and organizes the
interfaces for fractional mv step functions for use in new speed
features to come.

In the process a new speed feature - number of iterations per
step during the subpel search - is exposed.

No change when this parameter is set as the original value of 3.

Results:
subpel_iters_per_step = 3: baseline
subpel_iters_per_step = 2: psnr -0.067%, 1% speedup
subpel_iters_per_step = 1: psnr -0.331%, 3-4% speedup

Change-Id: I2eba8a21f6461be8caf56af04a5337257a5693a8
2013-08-06 17:23:50 -07:00
Deb Mukherjee
fac7c8c9f9 Merge "Flexible support for various pattern searches" 2013-08-06 14:03:27 -07:00
Deb Mukherjee
15b5a6a2c7 Flexible support for various pattern searches
Adds a few pattern searches to achieve various tradeoffs
between motion estimation complexity and performance.
The search framework is unified across these searches so that a
common pattern search function is used for all. Besides it will
be easier to experiment with various patterns or combinations
thereof at different scales in the future.

The new pattern search is multi-scale and is capable of using
different patterns at different scales.

The new hex search uses 8 points at the smallest scale
and 6 points at other scales.
Two other pattern searches - big-diamond and square are
also added. Big diamond uses 4 points at the smallest scale and
8 points in diamond shape at the larger scales.
Square is very similar conceptually to the default n-step search
but is somewhat faster since it keeps only one survivor across
all scales.

Psnr/speed-up results on derf300:

hex: -1.6% psnr%, 6-8% speed-up
big-diamond: -0.96% psnr, 4-5% speedup
square: -0.93% psnr, 4-5% speedup

Change-Id: I02a7ef5193f762601e0994e2c99399a3535a43d2
2013-08-06 11:56:39 -07:00
Dmitry Kovalev
0c80065694 Inlining vp9_get_pred_probs_switchable_interp function.
There was no benefit having this function. For example, inside
read_switchable_filter_type switchable filter context was calculated twice.

Change-Id: I79cd5bf95cbc0f6d8bf91a2e32289e01b18dcff1
2013-08-06 11:04:31 -07:00
Dmitry Kovalev
3e51acafec Merge "Finally removing all old block size constants." 2013-08-06 10:30:37 -07:00
Dmitry Kovalev
4a692e4168 Merge "Changing the order switchable filter enum constants." 2013-08-06 10:30:26 -07:00
Dmitry Kovalev
25b7dc08cd Merge "Removing unused functions." 2013-08-06 10:29:57 -07:00
Deb Mukherjee
33afddadb9 Merge "Add variance based mode/skipping" 2013-08-06 10:19:15 -07:00
Dmitry Kovalev
b9c7d04e95 Finally removing all old block size constants.
Change-Id: I3aae21e88b876d53ecc955260479980ffe04ad8d
2013-08-05 15:23:49 -07:00
Deb Mukherjee
8b3faccb9e Add variance based mode/skipping
Adds a speed feature to skip all intra modes other than
DC_PRED if the source variance is small. This feature is
made part of speed 1 and up.

Results on derf300: psnr -0.07%, speedup about 1-2%

Also uses the source variance to fine-tune the early
termination criteria when FLAG_EARLY_TERMINATE is on.
This feature is made part of speed 2 and up.

Results on derf300: psnr -0.52%, speedup about 5-7%

Change-Id: I59e38aa836557cfa5405ae706fc64815cbfe4232
2013-08-05 14:14:01 -07:00
Jim Bankoski
9f988a2edf Merge "cleanups after bw bh code" 2013-08-05 14:02:02 -07:00
Dmitry Kovalev
3f611555d7 Changing the order switchable filter enum constants.
This changeset allows to remove vp9_switchable_interp and
vp9_switchable_interp_map arrays and make code much clear. Actually we
still have to use these mapping but only inside read_interp_filter_type and
write_interp_filter_type functions.

Change-Id: I4026c6f8c4acefba6c81421b7bacbaa52cc45f50
2013-08-05 12:26:15 -07:00
Jim Bankoski
5d2cb7ead0 cleanups after bw bh code
Cons bw/bh parms that should have been const. Additional formatting.

Change-Id: Icd36a5c9dc17dadd7284315ac0d6fef1a565ca16
2013-08-05 12:15:52 -07:00
Dmitry Kovalev
d007446b3f Replacing long block size enum values with shorter ones (2).
Change-Id: I428c4d42212b757112e3acfe5b81314cfbb5fd6b
2013-08-05 10:51:02 -07:00
Dmitry Kovalev
fe2a201eb1 Replacing "txfm" with "tx" in identifiers.
Consistent names with TX_SIZE, TX_MODE, and TX_MODE.

Change-Id: I79592218bf5a40ace89197a34a06ee7de581ed8d
2013-08-02 17:28:23 -07:00
Dmitry Kovalev
fec4ec4edd Removing unused functions.
Removed functions:
  model_rd_for_sb_y,
  block_error_sby,
  get_sb_variance

Change-Id: Iec458df180caf6f8eac3605773841a4121dd3a8f
2013-08-02 16:41:09 -07:00
Dmitry Kovalev
25b77e2569 Changing function arg type from int_mv* to MV*.
Change-Id: Ic878d31df2ce783a2c9a8c4bc9ed301ec8ffe25e
2013-08-02 15:26:32 -07:00
Adrian Grange
60ff123536 Merge "Fixed typos and added a few explanatory comments" 2013-08-02 11:37:47 -07:00
Adrian Grange
075b11f004 Merge "Changed name of rd_pick_intra4x4mby_modes" 2013-08-02 11:36:46 -07:00
Dmitry Kovalev
741537f3ce Cleanup: replacing xd->seg with seg, and xd->lf with lf.
Change-Id: I73b59d7699a8e7e7acd3bf8041cb6c98ce9ba4bf
2013-08-01 15:38:16 -07:00
Dmitry Kovalev
ce8dedc353 Cleanup: removing unused function arguments.
Change-Id: I27471768980fc631916069f24bc7c482a5c9ca17
2013-08-01 13:41:38 -07:00
Dmitry Kovalev
b621e2d72e Nice looking motion vector clamping functions.
Removing assign_and_clamp_mv function, making implementation of clamp_mv
and clamp_mv2 more clear and consistent.

Change-Id: Iecd08e1c1bf0379f8314ebe01811f8253f4ade58
2013-08-01 13:40:26 -07:00
Adrian Grange
89e73c63c0 Fixed typos and added a few explanatory comments
Change-Id: Ib4e4b41094b54874ee34343dd77c0c131ceed9d2
2013-08-01 09:23:49 -07:00
Adrian Grange
5271d47892 Changed name of rd_pick_intra4x4mby_modes
The function name rd_pick_intra4x4mby_modes is confusing, so
I changed it to rd_pick_intra_sub_8x8_y_modes to better
reflect what the function does. Also added const qualifiers
to some of the input parameters and removed camel-case.

Change-Id: I23d53d4c7af5d79ed8a471acd59a09bbb47add39
2013-08-01 09:23:49 -07:00
Dmitry Kovalev
9239e96536 Removing get_mi_{row, col} functions.
Passing mi_row and mi_col parameters to functions explicitly. Removing
unused xd argument from scale_mv function.

Change-Id: Icb4c495ec72d26fb066c14470d3ae0b741fbf18a
2013-07-31 14:06:55 -07:00
Dmitry Kovalev
500ade243a Removing unused "ishp" arguments.
Using different variable names "allow_hp" and "use_hp" instead of "usehp".

Change-Id: I0cd5996ddeb46bd754473b680a993c0aaf8eb879
2013-07-31 11:27:53 -07:00
Adrian Grange
fbd73648dd Merge "Cleanup typos, remove unnecessary lines, replace switch" 2013-07-30 12:59:46 -07:00
Adrian Grange
b30a06b930 Cleanup typos, remove unnecessary lines, replace switch
Removed unnecessary code lines, replaced switch with an if,
fixed spelling errors and formatting.

Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
2013-07-30 12:10:32 -07:00
Dmitry Kovalev
730a34416f Renaming NB_TXFM_MODES constant to TX_MODES.
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
2013-07-29 13:38:40 -07:00
Dmitry Kovalev
23391ea835 Renaming TX_SIZE_MAX_SB to TX_SIZES.
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
2013-07-29 12:25:34 -07:00
Ronald S. Bultje
118ccdcd30 Inverse dimension order in token_cost array.
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).

Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
2013-07-26 17:30:04 -07:00
Ronald S. Bultje
dcacce6dd9 Merge "Save pixels instead of coefficients in intra4x4 RD loop." 2013-07-26 17:20:58 -07:00
Ronald S. Bultje
d30c8f41ef Merge "Add best_rd breakout in intra4x4 RD loop." 2013-07-26 17:20:51 -07:00
Dmitry Kovalev
c09b81719f Merge "General cleanups." 2013-07-26 13:59:39 -07:00
Yunqing Wang
52256cdbca Modify static threshold calculation
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;

Similar speedup is achieved. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.952  5.077s(50f)
akiyo            500        500        48.866  4.169s(50f)

parkjoy(1080p)   4000       0          30.388  78.20s(30f)
parkjoy          4000       500        30.367  70.85s(30f)

sunflower(1080p) 4000       0          44.402  74.55s(30f)
sunflower        4000       500        44.414  68.69s(30f)

Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
2013-07-25 19:59:33 -07:00
Yunqing Wang
845fd5011c Merge "Add encoding option --static-thresh" 2013-07-25 14:58:00 -07:00
Yunqing Wang
d36852b702 Add encoding option --static-thresh
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.

The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.

Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;

For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.923  5.635s(50f)
akiyo            500        500        48.863  4.402s(50f)

parkjoy(1080p)   4000       0          30.380  77.54s(30f)
parkjoy          4000       500        30.384  69.59s(30f)

sunflower(1080p) 4000       0          44.461  85.2s(30f)
sunflower        4000       500        44.418  78.1s(30f)

Higher static-thresh values give larger speedup with larger
quality loss.

Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
2013-07-25 14:28:05 -07:00
Dmitry Kovalev
7131cb0e3d General cleanups.
Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.

Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
2013-07-25 14:13:48 -07:00
Adrian Grange
e862c6f9eb Merge "Simplify handling of sub-partition motion vectors" 2013-07-25 12:58:38 -07:00
Adrian Grange
6f0f0e4907 Merge "Use local variables rather than structure members" 2013-07-25 12:57:52 -07:00
Adrian Grange
be700e140a Simplify handling of sub-partition motion vectors
Simplified the code that extracts and uses the motion
vectors for the 4 sub-partitions in rd_pick_partition.

Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9
2013-07-25 11:51:51 -07:00
Dmitry Kovalev
fcc34796d2 Removing CONFIG_BALANCED_COEFTREE experiment.
Change-Id: I61a8b0101eac3ee2e0621d56151b90c269fd4db4
2013-07-24 15:53:42 -07:00
Dmitry Kovalev
9139ee0908 Adding condition inside get_tx_type_{4x4, 8x8, 16x16}.
Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.

Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230
2013-07-24 12:55:45 -07:00
Adrian Grange
4cfd36d8fd Use local variables rather than structure members
Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.

Change-Id: I84d8c822e5cb3d8a02fc3de8a4037ca3fea8bfad
2013-07-24 11:17:44 -07:00
Ronald S. Bultje
7817d3221f Save pixels instead of coefficients in intra4x4 RD loop.
Prevents doing duplicate IDCTs; encoding of first 50 frames of bus
(speed 0) @ 1500kbps goes from 1min4.0 to 1min3.5, i.e. 0.87% faster
overall.

Change-Id: I2df39e29ed9d5ea5e7d2704a34940ba622832ddd
2013-07-24 09:03:20 -07:00
Ronald S. Bultje
b72ecbb1b9 Add best_rd breakout in intra4x4 RD loop.
Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.

Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6
2013-07-24 09:02:05 -07:00
Ronald S. Bultje
47336afd8d Merge "More optimizations for cost_coeffs()." 2013-07-23 21:36:12 -07:00
Dmitry Kovalev
db7f5d28b9 Removing vp9_is_interpolating_filter array.
All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.

Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
2013-07-23 14:24:39 -07:00
Dmitry Kovalev
2855d8aea1 Merge "Adding update_tx_counts function." 2013-07-23 13:57:59 -07:00
James Zern
8dede954c7 Merge "vp9: make some static tables const" 2013-07-23 11:37:01 -07:00
Jim Bankoski
86a9dec73c clean up bw, bh
many structures use bw and bh and they have different meanings.   This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...

also removed partition type multiple operation code in bitstream.c.

Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
2013-07-23 06:51:44 -07:00
Paul Wilkins
7c134bc0cd Merge "Reworked the auto_mv_step_size speed feature" 2013-07-23 04:49:55 -07:00
James Zern
3c8cce353f vp9: make some static tables const
Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
2013-07-22 19:19:13 -07:00
Ronald S. Bultje
e20fcd9585 More optimizations for cost_coeffs().
4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
2013-07-22 16:09:09 -07:00
Dmitry Kovalev
b2fc6fa969 Adding update_tx_counts function.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
2013-07-22 14:57:43 -07:00
Yaowu Xu
fc186dcad6 fix a build error
Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
2013-07-22 12:37:30 -07:00
Jingning Han
416f315e82 Merge "Skip buffer update in sub8x8 rd loop" 2013-07-22 12:08:22 -07:00
Jingning Han
a5a9f5f7f3 Merge "Optimize operation flow in sub8x8 rd loop" 2013-07-22 12:08:15 -07:00