Commit Graph

6888 Commits

Author SHA1 Message Date
Jingning Han
f60a3910c4 Move token_cache from cost_coeffs to MACROBLOCK
This commit moves token_cache buffer into macroblock struct, instead
of defining as a local variable in cost_coeffs. This avoids repeatedly
re-allocating memory space in the rate-distortion optimization loop.

The runtime at speed 0 reduces:
bus 2000kbps, 161692ms to 159951ms
football 600kbps, 229505ms to 225821ms

Change-Id: If7da6b0b6d8c5138a16271a33c4548fba33d8840
2013-10-14 10:45:56 -07:00
James Zern
798cf80c1e add a test vector with frame parallel mode enabled
vp90-2-07-frame_parallel.webm:
vpxenc stefan_sif.y4m \
  --codec=vp9 -p 2 \
  --frame-parallel=1 \
  --limit=10 \
  --auto-alt-ref=1 \
  --lag-in-frames=5

Change-Id: I7381a69aaaec238b309169a51b34cb6bf29a9c50
2013-10-14 18:50:55 +02:00
Dmitry Kovalev
706d4a7c29 Merge "Moving libmkv library to third_party folder." 2013-10-13 12:24:56 -07:00
Dmitry Kovalev
f36ba3da20 Merge "Making input pointer of any inverse transform constant." 2013-10-13 12:22:55 -07:00
Dmitry Kovalev
898c217cbc Merge "Adding TREE_SIZE macro + cleanup." 2013-10-13 12:21:09 -07:00
Erik Niemeyer
7336f1f156 Merge "Adjust icc compiler options" 2013-10-13 09:20:18 -07:00
Yunqing Wang
51af8a5103 Adjust icc compiler options
"-no-prec-div" option helps codec performance, so it was added back.
"-no-intel-extensions" was added to suppress link warning #10237.
option '-use-asm' is deprecated and removed.

Tested icc 32bit build and 64bit build.

Change-Id: I736ec2619857efd425ef76338dc52f8fbc0bcc7e
2013-10-11 20:17:59 -07:00
Dmitry Kovalev
65f118d72f Making input pointer of any inverse transform constant.
Also renaming dest_stride to stride in some places.

Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
2013-10-11 18:27:12 -07:00
Johann
1ea04d980c Merge "Get libvpx to compile on VS2013." 2013-10-11 17:26:29 -07:00
Dmitry Kovalev
860e467643 Adding TREE_SIZE macro + cleanup.
Using TREE_SIZE for the following trees:
  vp9_intra_mode_tree
  vp9_inter_mode_tree
  vp9_partition_tree
  vp9_switchable_interp_tree
  vp9_mv_joint_tree
  vp9_mv_class_tree
  vp9_mv_class0_tree
  vp9_mv_fp_tree

Change-Id: I0212bb4c1ee6648249f68517e28a67a56591ee1b
2013-10-11 16:25:50 -07:00
Dmitry Kovalev
ac468dde46 Consistent names for inverse hybrid transforms (2 of 2).
Renames:
  vp9_iht_add       -> vp9_iht4x4_add
  vp9_iht_add_8x8   -> vp9_iht8x8_add
  vp9_iht_add_16x16 -> vp9_iht16x16_add

Change-Id: I8f1a2913e02d90d41f174f27e4ee2fad0dbd4a21
2013-10-11 15:49:05 -07:00
Dmitry Kovalev
107897cf05 Merge "Consistent names for inverse hybrid transforms (1 of 2)." 2013-10-11 15:33:00 -07:00
Scott Graham
3806bab283 Get libvpx to compile on VS2013.
`round` is defined in the runtime library now.
https://codereview.chromium.org/23922008/

Change-Id: I3852740058d32f63ce283579acbe284865e32dba
2013-10-11 14:27:00 -07:00
Dmitry Kovalev
e765aade0b Merge "Replacing {VP9_COEF, MODE}_UPDATE_PROB with DIFF_UPDATE_PROB." 2013-10-11 14:15:46 -07:00
Deb Mukherjee
c222b96bfd Merge "Change in rddiv parameter to make it a power of 2" 2013-10-11 13:53:59 -07:00
Dmitry Kovalev
7ef573914d Consistent names for inverse hybrid transforms (1 of 2).
Renames:
  vp9_short_iht4x4_add     -> vp9_iht4x4_16_add
  vp9_short_iht8x8_add     -> vp9_iht8x8_64_add
  vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add

Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
2013-10-11 13:31:32 -07:00
Dmitry Kovalev
1ab7eb1406 Merge "Adding const to the input argument of all 1D transforms." 2013-10-11 13:20:57 -07:00
Yaowu Xu
4c20bff9d2 Merge "Masking intra mode choice adaptively" 2013-10-11 11:25:52 -07:00
Dmitry Kovalev
44195fda71 Adding const to the input argument of all 1D transforms.
Also adding static to iadst16_1d and fadst16 functions.

Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf
2013-10-11 11:19:58 -07:00
Dmitry Kovalev
4a0f9478ef Replacing {VP9_COEF, MODE}_UPDATE_PROB with DIFF_UPDATE_PROB.
Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
  vp9_cond_prob_diff_update (encoder)
  vp9_diff_update_prob (decoder)

Change-Id: I1255a1cb477743b799b3bfbbcd8de6b32b067338
2013-10-11 10:47:22 -07:00
Dmitry Kovalev
6e21ca7635 Merge "Removing vp9_tree_p typedef." 2013-10-11 10:44:04 -07:00
Dmitry Kovalev
9c8f3063b1 Merge "Removing vp9_idct4_1d_sse2 function." 2013-10-11 10:43:56 -07:00
Deb Mukherjee
d9655e42b8 Change in rddiv parameter to make it a power of 2
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.

There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.

There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%

Change-Id: I70ac69f58fa71c83108f68fe41796cd19d1fc760
2013-10-11 10:43:02 -07:00
Yaowu Xu
8b175679be Masking intra mode choice adaptively
The commit changes to mask available intra prediction modes for test
based on prediction block size.

With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%

Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
2013-10-11 10:29:53 -07:00
Yunqing Wang
dc079ab138 Merge "Code cleanup" 2013-10-11 09:38:24 -07:00
Jingning Han
54e702b5d7 Merge "Restore mode skip feature in sub8x8 rd loop" 2013-10-11 09:21:06 -07:00
Yunqing Wang
57b97b56f6 Code cleanup
Minor code cleanup.

Change-Id: I47c1f794842d4570bb39cfd23b80f54f5606bba6
2013-10-11 09:08:41 -07:00
Paul Wilkins
b30445edd6 Merge "Experimental rate control change." 2013-10-11 08:45:13 -07:00
Paul Wilkins
39c0e4e034 Merge "Disable recode loop." 2013-10-11 08:45:00 -07:00
Yunqing Wang
3a0b59e3fd Merge "SSE2 8-tap sub-pixel filter optimization" 2013-10-11 08:44:56 -07:00
Paul Wilkins
704028d435 Experimental rate control change.
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.

In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.

In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.

This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.

Change-Id: I268868b28401900d0cd87e51e609cd3b784ab54a
2013-10-11 13:40:54 +01:00
Paul Wilkins
8b989f5b23 Disable recode loop.
For VBR coding disable the recode loop for speeds > 0.

Results pending.

Change-Id: I2cd9a87c3fcbe39c05b954798d0671a4ca62c37f
2013-10-11 13:38:52 +01:00
Paul Wilkins
899ab95c8c Adjustment to allowed range in resize unit test
Change-Id: I5222e3db2627a3a9f7fc34f2ab4554aa5807ed51
2013-10-11 13:38:24 +01:00
Dmitry Kovalev
98400c1bc4 Removing vp9_tree_p typedef.
It is used only two times and it is more clear to use real type instead
of typedef.

Change-Id: Idc25c16504c3da4d040e0cdb33a2987631bb6a5b
2013-10-10 17:16:20 -07:00
Dmitry Kovalev
ddf1b76205 Removing vp9_idct4_1d_sse2 function.
We have two SSE2-optimized functions for idct4_1d:
  vp9_idct4_1d_sse2 <-- removing this one
  idct4_1d_sse2

vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
  vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
  idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
  vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2

Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
2013-10-10 16:50:43 -07:00
Scott LaVarnway
83936e8cd5 d207 intra prediction ssse3 using bytes
byte version of ronalds d207 ssse3 optimizations
(commit: f891f84d3ba9345b0074e682f0fea09b8ddf4f1e)

Change-Id: If15f71a589ea16f78ac86a501b0c5c6231dc9af1
2013-10-10 15:50:31 -07:00
Dmitry Kovalev
2be3b84aed Merge "Giving consistent names to IDCT 32x32 functions." 2013-10-10 15:31:25 -07:00
Dmitry Kovalev
3309b040c8 Merge "Consistent names for FDCT functions." 2013-10-10 15:29:29 -07:00
Yunqing Wang
86528586a3 Merge "d153 intra prediction (32x32) ssse3 using bytes" 2013-10-10 15:16:45 -07:00
Yunqing Wang
3fb728c749 SSE2 8-tap sub-pixel filter optimization
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.

Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10 14:12:47 -07:00
Adrian Grange
61c607fd79 Merge "Fix typo in comment message" 2013-10-10 14:05:51 -07:00
Yaowu Xu
e2d6e37a54 Merge "change to avoid out-of-range computation" 2013-10-10 13:38:16 -07:00
Jingning Han
09aca3089f Merge "Re-design rate-distortion cost tracking buffers" 2013-10-10 12:57:31 -07:00
Guillaume Martres
b364176c08 Prevent accidental changes to the previous frame mode_infos
This is needed to fix mbgraph but shouldn't affect anything else

Change-Id: I2f515052f62e348cd3794b7ff0c139802225ea95
2013-10-10 12:18:12 -07:00
Jingning Han
f0772dc5b8 Fix typo in comment message
Change-Id: Ifef756a3a91423bb9f5411f06fa092027be21ecf
2013-10-10 12:17:10 -07:00
Dmitry Kovalev
fc82dbb434 Consistent names for FDCT functions.
Renames:
  fdct4_1d   -> fdct4
  fadst4_1d  -> fadst4
  fdct8_1d   -> fdct8
  fadst8_1d  -> fadst8
  fdct16_1d  -> fdct16
  fadst16_1d -> fadst16

"_1d" suffix is redundant, so removing it. The same will happen with idct
in the next change sets.

Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
2013-10-10 11:53:55 -07:00
Dmitry Kovalev
1e766b50e2 Giving consistent names to IDCT 32x32 functions.
Renames:
  vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
  vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
  vp9_idct_add_32x32        -> vp9_idct32x32_add

Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
2013-10-10 11:27:39 -07:00
Jingning Han
fc19243ced Re-design rate-distortion cost tracking buffers
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.

It reduces speed 0 runtime:

bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.

Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.

Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
2013-10-10 11:03:44 -07:00
Yaowu Xu
b47cef056e change to avoid out-of-range computation
Change-Id: Id5e31833a0ef40de9f64c2f5674af7083233bf14
2013-10-10 11:01:50 -07:00
Dmitry Kovalev
1e8fc24af8 Merge "Removing inv_txm4x4_1_add and inv_txm4x4_add function pointers." 2013-10-10 10:49:27 -07:00