Commit Graph

1247 Commits

Author SHA1 Message Date
Yunqing Wang
c647ec4462 Merge mr_pick_inter_mode and pick_inter_mode
Merged multi-resolution motion estimation with regular motion
estimation function in order to remove duplicated part. This
caused slight changes in multi-resulotion encoder quality &
performance.

Change-Id: Ib4ecc7acfebfe5eea959b5b91febae6db7b95fd1
2011-12-16 18:02:29 -05:00
James Berry
24196dd987 fix: make sure ss_err is large enough
increase size of ss_err by one to make
sure there is room for 64 elements.

Change-Id: I355cb8c499aa7da3b9675f2326a8d25a74bb88d2
2011-12-16 17:43:55 -05:00
John Koleszar
26c6a44c66 Avoid heap allocation of firstpass stats
The total_stats, this_frame_stats, and total_left_stats structures
were previously create by a heap allocation, despite being of fixed
size. These structures were allocated and deallocated during
{de,}allocate_compressor_data, which is reinvoked whenever the frame
size changes. Unfortunately, this clobbers the total_stats and
total_left_stats data.

Historically, these were variable size at one time, due to the first
pass motion map, which necessitated their being created by a unique
heap allocation. However, this bug with the total_stats being
clobbered has probably been present since that initial implementation.

These structures are instead moved to be stored within the struct
twopass_rc directly, rather than being heap allocated separately.

Change-Id: I7f9e519e25c58b92969071f0e99fa80307e0682b
2011-12-16 11:40:23 -08:00
Scott LaVarnway
0ccefd2c8f Fixed mb_skip_coeff bug
When mb_skip_coeff is set, the idct is not necessary.  Prior
to this patch, the code would call idcts based on leftover
eob information.  This patch will now skip the idct for
SPLIT_MV and clear out the eobs for B_PRED, forcing the idct
to be skipped.

Change-Id: If5b0d2ed3ebd07789d30ec5160df927485fcaa17
2011-12-16 13:48:01 -05:00
Scott LaVarnway
a53d5a4c44 Moved dequant idct into common
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder.  A boost of 1.8% was seen for
the HD rt test clip used.

[Tero] Added needed changes to ARM side.

Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
2011-12-15 14:23:41 -05:00
Yunqing Wang
c8df1656bd Merge "Only call vp8_find_near_mvs() once for each macroblock" 2011-12-15 09:53:25 -08:00
Yunqing Wang
e06c242baa Only call vp8_find_near_mvs() once for each macroblock
While doing motion search on a macroblock, we usually call
vp8_find_near_mvs once per reference frame. Actually, for
different reference frames, the only difference in calculating
these near_mvs is they may have different sign_bias, which
causes a sign change in resulting near_mvs. In this change, we
only do find_near_mvs for the first reference frame. For other
reference frames, only need to adjust the near_mvs according to
that reference frame's sign_bias value.

Change-Id: I661394b49c6ad79fed7d0f2eb2be239b9c56f149
2011-12-15 11:19:18 -05:00
Yunqing Wang
d7e09b6ada Merge "Force realtime version 1 streams to only use simple loopfilter" 2011-12-15 05:38:57 -08:00
John Koleszar
72f459c77f Merge "Avoid multiple test for same lvl in auto filter lvl pick" 2011-12-14 16:28:13 -08:00
John Koleszar
4f8f360098 Merge "add check to ensure that cq_level falls within min and max q" 2011-12-14 12:10:06 -08:00
John Koleszar
e542627b0c Merge "fix: active_worst_quality could be set above 127" 2011-12-14 11:22:00 -08:00
James Berry
c1c47e83b0 add check to ensure that cq_level falls within min and max q
Add the notion of deferred validation of parameters. We don't want to
validate the cq_level at initialization time, because it won't have
been set via set_param() yet.

Change-Id: Ia1308395e8c10e0b1dc4e9af3a09b2bd6744cc30
2011-12-14 10:39:06 -08:00
Attila Nagy
51c4f9e6b1 Avoid multiple test for same lvl in auto filter lvl pick
Sometimes same level is tested 2-3 times; store and reuse the
calculated error value.

Change-Id: Ia1c04a2568232edf9a5a62c4e2d8e8a50d85e00e
2011-12-14 15:56:29 +02:00
Attila Nagy
55fbdd58ac Force realtime version 1 streams to only use simple loopfilter
...regardless of the speed settings.

Change-Id: I4b91ac7a7208efd690dfc69e175f8eb769b6ce03
2011-12-14 12:57:49 +02:00
James Berry
f8b431c334 fix: active_worst_quality could be set above 127
add check to set active_worst_quality to 127 if it
is set above 127

Change-Id: I7db353d5c1b1c8516a116542b6ed21c0110bb512
2011-12-13 14:58:59 -05:00
John Koleszar
d6020f9d52 tokenizer: use correct block type context in stuff1st_order_b
The fast-path for skipped MBs was not correctly respecting the
block type during update of the coefficient counts. Extracted
this from part of change I365cfb6ac636f19c545f682e3aeac185253abaef

Change-Id: I53d8cf0a00a98034b97b0ed3414b703bae74a739
2011-12-13 11:58:56 -08:00
Jim Bankoski
6b2792b0e0 Merge "vp8e - entropy stats per frame type" 2011-12-12 09:08:34 -08:00
Scott LaVarnway
c4aa1d508e Removed unused diff buffer
Change-Id: I9211358cca89b1c4f84b53a202a63ecf9e79ae4c
2011-12-12 11:06:55 -05:00
Scott LaVarnway
afa1b66108 Merge "Improved mmx/sse2 versions of iwalsh" 2011-12-12 06:40:28 -08:00
Jim Bankoski
6de67cd6e8 vp8e - entropy stats per frame type
Change-Id: I4168eb6ea22ae541471738a7a3453e7d52059275
2011-12-09 16:56:18 -08:00
Scott LaVarnway
9fa6132fc5 Improved mmx/sse2 versions of iwalsh
Removed unnecessary transposes.

Change-Id: I029fbaf8afafee34d54a4f3333c22023c15003c3
2011-12-08 14:37:59 -05:00
Johann
a69810b893 Merge "Reduce mem copies in encoder loopfilter level picking" 2011-12-07 10:41:00 -08:00
Attila Nagy
e570b0406d Reduce mem copies in encoder loopfilter level picking
Do the test filtering in the existing backup frame buffer instead of
the original. Copy the original data into extra buffer before doing
the  filtering. This way there is no need to restore the original
unfiltered  frame at the end of level picking process.

This came up in some discussions with Johann. Thanks!

Change-Id: I495f4301d983854673276c34ec0ddf9a9d622122
2011-12-07 09:59:50 +02:00
Yunqing Wang
aa7335e610 Multiple-resolution encoder
The example encoder down-samples the input video frames a number of
times with a down-sampling factor, and then encodes and outputs
bitstreams with different resolutions.

Support arbitrary down-sampling factor, and down-sampling factor
can be different for each encoding level.

For example, the encoder can be tested as follows.
1. Configure with multi-resolution encoding enabled:
../libvpx/configure --target=x86-linux-gcc --disable-codecs
--enable-vp8 --enable-runtime_cpu_detect --enable-debug
--disable-install-docs --enable-error-concealment
--enable-multi-res-encoding
2. Run make
3. Encode:
If input video is 1280x720, run:
./vp8_multi_resolution_encoder 1280 720 input.yuv 1.ivf 2.ivf 3.ivf 1
(output: 1.ivf(1280x720); 2.ivf(640x360); 3.ivf(320x180).
The last parameter is set to 1/0 to show/not show PSNR.)
4. Decode:
./simple_decoder 1.ivf 1.yuv
./simple_decoder 2.ivf 2.yuv
./simple_decoder 3.ivf 3.yuv
5. View video:
mplayer 1.yuv -demuxer rawvideo -rawvideo w=1280:h=720 -loop 0 -fps 30
mplayer 2.yuv -demuxer rawvideo -rawvideo w=640:h=360 -loop 0 -fps 30
mplayer 3.yuv -demuxer rawvideo -rawvideo w=320:h=180 -loop 0 -fps 30

The encoding parameters can be modified in vp8_multi_resolution_encoder.c,
for example, target bitrate, frame rate...

Modified API. John helped a lot with that. Thanks!

Change-Id: I03be9a51167eddf94399f92d269599fb3f3d54f5
2011-12-05 17:59:42 -05:00
John Koleszar
6127af60c1 Merge "Speed selection support for disabled reference frames" 2011-12-05 14:36:54 -08:00
Yunqing Wang
06fc0f83b6 Populate q_index in multi-thread encoding
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.

Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
2011-11-28 15:58:28 -05:00
Scott LaVarnway
34d7c8b3d4 Added vp8_dequant_idct_add_y_block_sse2 setup
In Change I83202ffd, I deleted one too many lines.

Change-Id: If05d7c8988eb5c00898dc7c833ad7d99b5eb23e7
2011-11-28 13:06:13 -05:00
Scott LaVarnway
f46e17fd6f Merge "Modified the inverse walsh to output directly" 2011-11-28 07:26:07 -08:00
Scott LaVarnway
4a91541c94 Modified the inverse walsh to output directly
to the dqcoeff or qcoeff buffer.  The encoder would
populate the dc coeffs of the y blocks as a separate
stage (recon_dcblock) and the decoder would use a special
version of the idct.  This change eliminates the extra copy
and reduces the code footprint.

[Tero] Added needed changes to armv6 and NEON assembly.

Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
2011-11-25 09:24:04 +02:00
Johann
e2bacd581a Merge "Move shared data to shared location" 2011-11-23 11:20:54 -08:00
Johann
15ea268d62 Merge "Fix encoder partitioned output on ARM" 2011-11-23 08:44:21 -08:00
Attila Nagy
97259b460c Fix encoder partitioned output on ARM
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.

Use buffer validation (overflow check) in all ARM bitpacking
functions.

Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
2011-11-23 12:29:43 +02:00
John Koleszar
b79879c2e3 Merge "Decoder fixes to better support reference picture selection." 2011-11-22 17:12:06 -08:00
Stefan Holmer
b5ee7b12d2 Decoder fixes to better support reference picture selection.
Change-Id: Id3388985d754706b9fd1f079c47121e79a63efdf
2011-11-21 10:25:21 +01:00
Johann
f2cd4ded22 Move shared data to shared location
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad

Moving towards allowing --disable-mmx

Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
2011-11-18 16:23:14 -08:00
John Koleszar
e55974bf86 Speed selection support for disabled reference frames
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.

Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.

Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700
2011-11-18 13:53:21 -08:00
Attila Nagy
c84d42f864 Validate encoder buffer writes for single token partition
Extend buffer write validation (overflow check) to single token
partition packing, both mb and row based functions.

Change-Id: I36e19b7d37fc43712d05c70e3ad223d3eb5b973d
2011-11-18 12:49:27 +02:00
Scott LaVarnway
3c755577b8 Merge "Added predictor stride argument(s) to subtract functions" 2011-11-17 10:17:53 -08:00
Scott LaVarnway
edd98b7310 Added predictor stride argument(s) to subtract functions
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix

[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
             Added also minor NEON optimizations to subtract
             functions.

Patch set 5: x86 stride bug fix

Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
2011-11-15 12:53:01 -05:00
John Koleszar
bdd35c13cc avoid resetting framerate during vpx_codec_enc_config_set()
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().

Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
2011-11-11 14:45:58 -08:00
Scott LaVarnway
df49c7c58d SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}()
Ronald recently sent me this patch that he did in April.
> From: Ronald S. Bultje <rbultje@google.com>
> Date: Thu, 28 Apr 2011 17:30:15 -0700
> Subject: [PATCH] SSE2 optimizations for
> vp8_build_intra_predictors_mby{,_s}().
HD decode tests have shown a performance boost up to 1.5%,
depending on material.
Patch set 3: Fixed encoder crash.

Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8
2011-11-09 15:30:35 -05:00
Scott LaVarnway
9532bda0fb Merge "Relocated idct/add calls for encoder" 2011-11-09 10:17:43 -08:00
Johann
ea2229bab6 Merge "ARMv6 optimized Intra4x4 prediction" 2011-11-09 09:36:33 -08:00
John Koleszar
2999ca3094 Merge "Reset FPU state after calc_plane_error()" 2011-11-09 09:35:08 -08:00
John Koleszar
3fcf0e3668 Merge "Compiler warning fix for const array." 2011-11-09 09:34:50 -08:00
John Koleszar
82e8884ad8 Merge "Remove unused file recon.c" 2011-11-09 09:31:23 -08:00
Scott LaVarnway
861ed6a5c1 Relocated idct/add calls for encoder
Call the idct/add after the tokenize.  This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.

Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
2011-11-09 10:41:05 -05:00
Tero Rintaluoma
5a2fd63a2a ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
 - 2x faster on Profiler compared to C-code compiled with -O3
 - Function interface changed a little to improve BLOCKD structure
   access

Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
2011-11-09 09:13:51 +02:00
James Zern
9d60506130 threading: avoid defining _WIN32_WINNT
The referenced function (SignalObjectAndWait) isn't used. Reduces the
warnings with mingw32-w64 which defines this.

Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210
2011-11-08 18:50:45 -08:00
John Koleszar
f89e109f56 Remove unused file recon.c
File not referenced from anywhere and no longer compiles.

Change-Id: I38b11bd60db615c2c2c9d7ad35caba3a1adf1750
2011-11-08 15:54:56 -08:00
Yunqing Wang
4c14efd234 Fix checks in MB quantizer initialization
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.

Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
2011-11-08 12:11:48 -05:00
Adrian Grange
b615a6d47f Third set of checks of buffer level against maximum buffer size
Additional check of buffer level to ensure it doesn't exceed the
maximum buffer size.

Change-Id: I1ba4f8b09bbec89646885040ff47470196af521e
2011-11-07 17:15:28 -08:00
Adrian Grange
fa25a31ed4 Additional clipping of buffer level to maximum buffer size
Added additional check of buffer level against maximum
buffer size.

Change-Id: Iaf1fbaf008601161e402b43ce82c3dbc129bf740
2011-11-07 16:54:40 -08:00
Adrian Grange
9dc95b0a12 Added check to make sure maximum buffer size not exceeded
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.

This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.

As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).

Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6
2011-11-07 16:28:13 -08:00
James Zern
f89ea3432f fix file permissions
all of googletest import (0ab00a22) was marked executable

Change-Id: Id7b7ee03efc21ab998bb03349bd91644e8af25da
2011-11-04 18:50:35 -07:00
Fritz Koenig
f0c01413fb Compiler warning fix for const array.
Fix compiler warning for passing a non const array
to a function expecting a const array by using an
intermediary pointer and casting.

Change-Id: I9bdd358ebdc926223993fb8fb2098ffedd2f3fc7
2011-11-04 18:19:26 -07:00
Yunqing Wang
e1a55b504a Merge "Add checks in MB quantizer initialization" 2011-11-04 11:52:27 -07:00
Scott LaVarnway
44b5f76e34 Merge "Fix issue 374: eob read incorrectly" 2011-11-04 11:31:17 -07:00
John Koleszar
7ca6c91732 Merge "Changing decoder input partition API to input fragments." 2011-11-04 09:36:27 -07:00
Tero Rintaluoma
d497ec688d Fix issue 374: eob read incorrectly
Updated eob changes to check_reset_2nd_coeffs function.

Change-Id: Id1b21c91c7f0fd286640b487ffe47867009b717d
2011-11-04 09:36:49 +02:00
Scott LaVarnway
46639567a0 Merge "Change use of eob in the encoder" 2011-11-03 08:06:06 -07:00
Tero Rintaluoma
e4f2ec7a52 Change use of eob in the encoder
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.

Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
2011-11-03 16:08:09 +02:00
Yaowu Xu
8002c31804 Merge "added code to clear 2nd order block when appropriate" 2011-11-02 08:22:58 -07:00
John Koleszar
63bf108731 Merge "Fix: Increase default cx_data_size" 2011-11-01 15:25:47 -07:00
Stefan Holmer
1427205215 Changing decoder input partition API to input fragments.
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.

Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463
2011-11-01 14:44:37 -07:00
Yunqing Wang
e44720af84 Add checks in MB quantizer initialization
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.

Change-Id: Ie4f0a5ade2202041254990a4e9d5b03bd1ac5aea
2011-11-01 17:41:22 -04:00
John Koleszar
9bf3bc9a72 Correct SPLITMV clamping
Prior to this fix, the clamping state of the last subblock partition
dominated, whereas the correct behavior is to clamp if any partition
needs clamping. This bug was introduced by v0.9.6-232-g6b25501

See also:
  [1]: http://code.google.com/p/webm/issues/detail?id=371
  [2]: https://bugzilla.mozilla.org/show_bug.cgi?id=696390

Change-Id: I444db492b4c4f05f039c7da6f4216da8207dc138
2011-10-31 14:42:51 -07:00
Yaowu Xu
88e24f07ae added code to clear 2nd order block when appropriate
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.

Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.

As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.

It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.

Change-Id: I718e19cf59a4fc2462cb7070832759beb9f7e7dd
2011-10-28 12:07:21 -07:00
Scott LaVarnway
e0309e1509 Merge "Improved decode_split_mv()" 2011-10-28 09:27:17 -07:00
Johann
cd1ef53d12 Merge "Fix ARM build problem introduced by CL I3fab6f2b" 2011-10-27 11:17:54 -07:00
Scott LaVarnway
6064384d59 Improved decode_split_mv()
Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.

Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
2011-10-27 11:26:30 -04:00
Scott LaVarnway
0db5599957 Merge "Improved mv_bias" 2011-10-27 06:14:00 -07:00
Attila Nagy
9452dce181 Fix ARM build problem introduced by CL I3fab6f2b
Update ARM asm implementation of vp8_start_encode to new definition.

Change-Id: Ic44791c969e351082331ba6146c3384c01a0dfad
2011-10-27 09:06:45 +03:00
Johann
294777b915 Merge "Reduce partial frame copy in encoder's pick_filter_level_fast" 2011-10-26 11:33:14 -07:00
Scott LaVarnway
21970d1dc2 Improved mv_bias
Small performance gains.

Change-Id: I709b9390a8a27a70f5f23574313b8db85ac7f23d
2011-10-26 11:46:10 -04:00
Scott LaVarnway
efa69d26a1 Merge "Improved read_mb_modes_mv()" 2011-10-26 08:26:30 -07:00
Scott LaVarnway
ff1d170e69 Improved read_mb_modes_mv()
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.

Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
2011-10-26 10:46:36 -04:00
Attila Nagy
de82809444 Reduce partial frame copy in encoder's pick_filter_level_fast
The partial frame copy function used to copy an extra 8 lines above
and  below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.

Define the "magic" fraction number for partial filtering in
loopfilter.h .

Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
2011-10-26 15:25:07 +03:00
Johann
f9dba66877 Merge "remove uninitialized variable warning" 2011-10-25 14:42:21 -07:00
Scott LaVarnway
e03330bd80 Merge "Improved token decoder" 2011-10-25 10:04:11 -07:00
Johann
9409af2083 remove uninitialized variable warning
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.

Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
2011-10-25 09:24:40 -07:00
Scott LaVarnway
3579baa115 Merge "Removed read_mv_ref" 2011-10-25 08:01:32 -07:00
Johann
a82cc0205d remove unused variable warning
Change-Id: I4fcd6e4656d9823aead941616cd63501aecbd6e2
2011-10-24 16:33:45 -07:00
Scott LaVarnway
49ea2bc3f4 Removed read_mv_ref
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure.  This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.

Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
2011-10-24 16:16:08 -04:00
Scott LaVarnway
f182376dd6 Moved the split motion vector decode
into a function.

Change-Id: Ia023a0587100a52cb084f5d9d5512efa6198dad3
2011-10-24 13:52:15 -04:00
Scott LaVarnway
231339932b Merge "Removed redundant mv clamps for nearmv and nearestmv" 2011-10-24 10:27:53 -07:00
James Berry
bac6c229e5 Fix: Increase default cx_data_size
Prior size could be too small in some instances
resulting in an error.

Change-Id: Ic601e49cbae92c98a0e7fb51ba8c186b352ffba6
2011-10-24 11:50:27 -04:00
Scott LaVarnway
a99c20c0f4 Removed redundant mv clamps for nearmv and nearestmv
Did some cleanup as well.

Patchset 2:  Fixed bug.  Will revisit the segmentation logic.

Change-Id: Idf9fbcff9aaf467bdace9fbd58ef2cea6c602049
2011-10-24 11:37:52 -04:00
Scott LaVarnway
e59d53e999 Merge "Remove unused DETOK structure" 2011-10-21 06:30:57 -07:00
Tero Rintaluoma
bdb4fb8991 Remove unused DETOK structure
DETOK structure is not used anymore.

Change-Id: Id22e1af78fb85d4bb151237a60290d9364faf217
2011-10-21 09:33:49 +03:00
John Koleszar
2c0b4a24b9 Merge "Fix: check cx_data buffer prior to write" 2011-10-20 17:36:40 -07:00
James Berry
bc7151131d Fix: check cx_data buffer prior to write
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.

Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
2011-10-20 15:55:00 -04:00
Johann
7cdc986cdf Don't copy borders for loop_filter_pick
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.

Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
2011-10-19 18:54:14 -07:00
Johann
f382173225 Merge "enc: save entropy probs only when needed for refresh" 2011-10-19 14:36:29 -07:00
Scott LaVarnway
5e54085703 Improved token decoder
Tests showed over 2% improvement on various HD clips.

Change-Id: I94a30d209c92cbd5fef285122f9fc570688635fe
2011-10-19 13:38:35 -04:00
Scott LaVarnway
63a77cbed9 Merge "Remove usage of predict buffer for decode" 2011-10-19 10:24:48 -07:00
Scott LaVarnway
ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Attila Nagy
a5cd42feb9 Fix: vp8cx_pack_tokens_into_partitions_armv5 crash
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.

Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
2011-10-14 10:53:04 +03:00
Adrian Grange
04182a121a Merge "Added rate-targeted temporal scalability" 2011-10-11 12:54:52 -07:00
Adrian Grange
217591fde5 Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.

The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)

Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
2011-10-11 12:49:12 -07:00
John Koleszar
07ba411914 Reset FPU state after calc_plane_error()
Fixes a MMX/SSE2 mismatch when building with --enable-internal-stats.

Change-Id: I0c50a1f246f6916b7a5fc6f36864ceb362f25520
2011-10-11 08:43:30 -07:00
James Berry
05bde9d4a4 bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.

Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
2011-10-10 12:16:55 -04:00
Attila Nagy
c0de35b413 enc: save entropy probs only when needed for refresh
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.

Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
2011-10-10 13:44:54 +03:00
Scott LaVarnway
af12c23e8e Merge "Improved tokenize" 2011-10-04 09:57:42 -07:00
John Koleszar
8f8b526b54 Merge "Fix uninitialized new_mv_count in first pass file" 2011-10-04 07:40:49 -07:00
Yunqing Wang
538865dfa5 Merge "Multithreaded encoder, late sync loopfilter" 2011-10-04 07:04:30 -07:00
John Koleszar
86712c50f2 Fix uninitialized new_mv_count in first pass file
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.

Also fix a number of compiler warnings.

Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
2011-10-04 09:50:52 -04:00
Johann
2aa408524c Merge "Reduce computational complexity of generic C loop filter." 2011-09-30 16:17:56 -07:00
Johann
48b1917112 Merge "combine loopfilter data access" 2011-09-30 15:47:56 -07:00
Scott LaVarnway
ab00d209bc Improved tokenize
For a realtime HD encodings, up to 1.6% gains seen.



Change-Id: If45028e23db95124da63f9d38ffe06e05596cc6e
2011-09-30 12:49:46 -04:00
Johann
3556deaca3 combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
2011-09-30 07:38:35 -07:00
Alpha Lam
7bce513afe Call vp8_find_near_mvs lazily
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.

Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.

Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
2011-09-30 14:48:18 +01:00
Paul Wilkins
a572ac8327 Merge "CQ and two pass rate control." 2011-09-30 02:57:54 -07:00
Paul Wilkins
b6e27d5f0b CQ and two pass rate control.
Changes to the selection of Q limits for two pass
and two pass CQ mode.

Allowance made for Mode and motion vector costs.
Some refactoring of common code.

For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).

Some increased tendency to undershoot even when
user CQ not reached.

Patch2: Removed some test code accidentally merged.

Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
2011-09-30 10:55:52 +01:00
Aaron Watry
69aa303d96 Reduce computational complexity of generic C loop filter.
Change-Id: I1e7f9ed3cd907844a495b9e0073bc140b87e5c06
2011-09-29 17:25:48 -05:00
Attila Nagy
380d64ecb1 Multithreaded encoder, late sync loopfilter
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.

Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
2011-09-29 10:06:24 +03:00
John Koleszar
6f9457ec12 Merge "clamp_mvs() using the wrong motion vector information" 2011-09-22 11:54:15 -07:00
John Koleszar
3c85c532bb Merge changes Ie650e9b8,I2427e494
* changes:
  vpxenc: get version string programatically
  Install missing default_coef_probs.h
2011-09-22 11:18:00 -07:00
Johann
9f41a8b0aa Merge "Replace vpx_ports/config.h with vpx_config.h" 2011-09-22 09:30:18 -07:00
John Koleszar
4a6ac727fe Install missing default_coef_probs.h
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.

Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
2011-09-22 11:08:24 -04:00
Attila Nagy
1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00
Fritz Koenig
bd0c3409a8 Move neon only arm functions under arm/neon.
These files don't contain generic arm code, so should
only be compiled by neon.

Change-Id: Ie712823aa04d4235e7cfe7a3b725e73ee4c3e564
2011-09-20 10:51:06 -07:00
Johann
6829e62718 Merge "NEON FDCT updated to match current C code" 2011-09-20 09:51:05 -07:00
Johann
86e07525d5 Merge "NEON walsh transform updated to match C" 2011-09-20 09:50:42 -07:00
Johann
3a16276cf7 Merge "Updated ARMv6 forward transforms to match C" 2011-09-20 09:50:36 -07:00
Johann
fdd51829b1 Merge "Fixed armv5te multiplications" 2011-09-20 09:50:19 -07:00
Tero Rintaluoma
0c2529a812 NEON FDCT updated to match current C code
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9

Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
2011-09-20 10:20:55 +03:00
Tero Rintaluoma
3c19bc3fb3 Fixed armv5te multiplications
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.

Restriction applies only to armv5 targets and not for armv6 and above.

Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
2011-09-20 09:59:27 +03:00
Stefan Holmer
e529a825f7 Fix necessary for input partitions iface to match the RTP profile
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.

Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.

Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
2011-09-19 15:00:21 +02:00
Tero Rintaluoma
4c3ad66b7f Updated ARMv6 forward transforms to match C
- Updated walsh transform to match C
  (based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
  correspondingly

Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
2011-09-19 10:26:59 +03:00
Tero Rintaluoma
2a4b2a000c NEON walsh transform updated to match C
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.

Change-Id: I338f02b881098782f82af63d97f042b85e63e902
2011-09-19 10:15:33 +03:00
John Koleszar
35ce4eb01d Merge "Fixes the boundary checks for extrapolated and interpolated MVs." 2011-09-16 08:09:44 -07:00
Scott LaVarnway
c0ee870b0a clamp_mvs() using the wrong motion vector information
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated.  The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values.  This
patch fixes this problem.

Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
2011-09-16 11:03:53 -04:00
Stefan Holmer
b854bbd844 Fixes the boundary checks for extrapolated and interpolated MVs.
Change-Id: I5b47d39d1604f2650d2f2d1ca2a3f40843c8e1ea
2011-09-16 11:58:57 +02:00
Scott LaVarnway
5bc7b3a68e Fixed encoder crash
caused by the "Removed bmi copy to/from BLOCKD" commit.

Change-Id: I9fae71bdc34c8ecc07bb81cd3ccf498b91ce3ec7
2011-09-13 11:46:33 -04:00
Scott LaVarnway
c4b9089bb9 Merge "Skip computation of distortion in vp8_pick_inter_mode if active_map is used" 2011-08-31 07:18:52 -07:00
Scott LaVarnway
222c72e50f Merge "Removed bmi copy to/from BLOCKD" 2011-08-31 06:57:20 -07:00
Alpha Lam
0e05f2c6c9 Skip computation of distortion in vp8_pick_inter_mode if active_map is used
If a block is marked to be inactive then set distortion to 0.

Change-Id: Ib415f19642a2ff7b5cf5cfaedd60ebbd79732272
2011-08-31 14:06:55 +01:00
John Koleszar
800b70a3bf Merge "Recalculate zbin_extra only if regular quantizer is being used" 2011-08-30 12:49:24 -07:00
Alpha Lam
bc9293b815 Recalculate zbin_extra only if regular quantizer is being used
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.

Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
2011-08-30 19:23:34 +01:00
Yunqing Wang
1f20202e2c Minor modification on key frame decision
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.

Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
2011-08-25 16:54:45 -04:00
Fritz Koenig
4797a97215 Quiet warning by removing unused variable.
fwd_boost_score was not being computed or
referenced, so remove declaration.

Change-Id: Iece36cde1ec113e3c6afaff1407d24cdf12bd0a8
2011-08-24 15:47:09 -07:00
Scott LaVarnway
b870947d42 Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes.  Modified code to use the bmi
found in mode_info_context instead of BLOCKD.  On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock.  This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)

Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
2011-08-24 14:42:26 -04:00
Fritz Koenig
112bd4e2b4 Fix naming of sse2 idct functions.
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.

Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
2011-08-24 10:25:32 -07:00
Scott LaVarnway
1de5da80c9 Merge "Faster vp8_default_coef_probs" 2011-08-24 07:52:10 -07:00
Johann
85358d04cd Fix data accesses for simple loopfilters
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.

For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).

This shows a small improvement on older processors which have a
significant penalty for unaligned reads.

postproc_mmx.c is unused

Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
2011-08-23 20:42:45 -04:00
Fritz Koenig
c5f890af2c Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code.  This
allows non unique labels within a file.  Also
makes profiling information more informative
by keeping the function name with the loop name.

Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-08-23 09:05:29 -07:00
Fritz Koenig
694d4e7777 Reclassify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3.  Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.

Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
2011-08-22 12:36:28 -07:00
Fritz Koenig
b7a6f1d20e Merge "Revert "Reclasify optimized ssim calculations as SSE2."" 2011-08-22 12:32:12 -07:00
Fritz Koenig
734b1b2041 Revert "Reclasify optimized ssim calculations as SSE2."
This reverts commit 01376858cd
2011-08-22 11:31:12 -07:00