Video Pre-Processing using Intel® Media SDK

May 1, 2013, 7:35 am

Latest and popular articles on Intel Technologies

≪ Previous: Video Decoder using Intel® Media SDK

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Video Processing (VPP) Sample demonstrates how to use the Intel® Media SDK API to create a simple console application that performs video processing for raw video sequences.

The Intel® Media SDK VPP Sample supports the following video formats:

input (uncompressed)	YV12, NV12, YUY2*, RGB3 (RGB 24-bit), RGB4 (RGB 32-bit)
output (compressed)	NV12

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1 or more recent version of this tool

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Microsoft DirectShow* filters

May 1, 2013, 11:37 am

Latest and popular articles on Intel Technologies

≫ Next: Application Sample using Microsoft* DirectShow

≪ Previous: Video Pre-Processing using Intel® Media SDK

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Plug-ins Sample using Microsoft* DirectShow* demonstrates how to use the Intel® Media SDK Library with DirectShow filters to compress and decompress video files (streams).

This sample supports the following input/output formats for compressing/decompressing:

H.264 (AVC)
VC-1 (WMV)—decoding only
MPEG-2 Video
MVC—decoding only

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1
For the Microsoft* DirectShow* samples: Microsoft Windows SDK 6.1 or greater

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Application Sample using Microsoft* DirectShow

May 1, 2013, 11:51 am

Latest and popular articles on Intel Technologies

≫ Next: Video Transcoder using Intel® Media SDK

≪ Previous: Microsoft DirectShow* filters

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Application Sample using Microsoft* DirectShow* demonstrates how to use the Sample DirectShow filters to play and transcode media files (streams).

The Intel® Media SDK Application Sample using Microsoft DirectShow supports the following input formats:

Input System Stream Format	MPEG-2 Program Stream	MPEG-2 Transport Stream	MP4	WMV
Compressed video	MPEG-2	H.264, MPEG-2	H.264, MVC	VC-1
Compressed audio	MP3	AAC, MP3	AAC, MP4	WMA

The Intel® Media SDK Microsoft DirectShow Application Sample supports the following output formats for transcoding:

Output System Stream Format	MPEG-2 Transport Stream	MPEG-2 Transport Stream	MP4
Compressed video	MPEG-2	H.264	H.264
Compressed audio	MP3	AAC	AAC, MP3

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

For S3D playback the following hardware configuration is required:

2nd Generation Intel® Core™ Processors with Intel® HD Graphics 3000/2000
HDMI 1.4, eDP 1.1 or similar based monitor/TV
Active shutter glasses

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1
For the Microsoft* DirectShow* samples: Microsoft Windows SDK 6.1 or greater

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Video Transcoder using Intel® Media SDK

June 9, 2013, 6:10 pm

Latest and popular articles on Intel Technologies

≫ Next: Application Sample using Microsoft* Multimedia Framework Plug-ins

≪ Previous: Application Sample using Microsoft* DirectShow

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Multi-Transcoding Sample demonstrates how to use the Intel® Media SDK API to create a console application that performs the transcoding (decoding and encoding) of a video stream from one compressed video format to another, with optional video processing (resizing) of uncompressed video prior to encoding. The application supports multiple input and output streams meaning it can execute multiple transcoding sessions concurrently.

The main goal of this sample is to demonstrate CPU/GPU balancing in order to get maximum throughput on Intel hardware-accelerated platforms (with encoding support). This is achieved by running several transcoding pipelines in parallel and fully loading both CPU and GPU.

This sample also demonstrates integration of user-defined functions for video processing (picture rotation plug-in) into the Intel Media® SDK transcoding pipeline.

This version of sample also demonstrates surface type neutral transcoding (opaque memory usage).

The Intel® Media SDK Multi-Transcoding Sample supports the following video formats:

input (compressed)	H.264 (AVC, MVC – Multi-View Coding), MPEG-2 video, VC-1
output (uncompressed)	H.264 (AVC, MVC – Multi-View Coding), MPEG-2 video

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1 or more recent version of this tool

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Application Sample using Microsoft* Multimedia Framework Plug-ins

June 9, 2013, 7:16 pm

Latest and popular articles on Intel Technologies

≫ Next: Microsoft Media Foundation Transforms (MFT)

≪ Previous: Video Transcoder using Intel® Media SDK

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Application Sample using Microsoft* Multimedia Framework Plug-ins demonstrates how to use the Sample Media Foundation* plug-ins and the Sample DirectShow* plug-ins to play and/or transcode media files (streams).

The Intel® Media SDK Application Sample using Microsoft Multimedia Framework Plug-ins supports only Microsoft DirectShow* plug-ins when executed within the Windows Vista* Operating System. When using Microsoft Windows 7 Operating System, Intel® Media SDK supports both DirectShow and Media Foundation plug-ins.

The Intel® Media SDK Application Sample using Microsoft* Multimedia Framework Plug-ins supports the following input formats:

Input System Stream Format	MPEG-2 Program Stream	MPEG-2 Transport Stream	MP4	WMV
Compressed video	MPEG-2	H.264, MPEG-2	H.264	VC-1
Compressed audio	MP3	AAC, MP3	AAC, MP3

The Intel® Media SDK Application Sample using Microsoft* Multimedia Framework Plug-ins supports the following output formats for transcoding:

Output System Stream Format	MP4
Compressed video	H.264 (AVC)
Compressed audio	AAC

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1
For the Microsoft* DirectShow* samples: Microsoft Windows SDK Update 6.1 for Windows Vista
For the Microsoft* Media Foundation* samples: Microsoft Windows SDK for Windows 7

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Microsoft Media Foundation Transforms (MFT)

June 10, 2013, 12:33 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Media SDK for Servers

≪ Previous: Application Sample using Microsoft* Multimedia Framework Plug-ins

Features / Description

The Intel® Media Software Development Kit (Intel® Media SDK) Plug-ins Sample using Microsoft* Media Foundation* demonstrates how to use the Intel Media SDK Library with Media Foundation plug-ins to compress and decompress video files (streams).

This sample supports the following input/output formats for compressing/decompressing:

H.264 (AVC)
VC-1 (WMV)—decoding only
MPEG-2 Video—decoding only

System Requirements

Hardware:

IA-32 or Intel® 64 architecture processors with the Intel® Core™ processor or later is required for this Developer’s release.
200 MB free hard disk space for this release.
The software implementation DLLs, libmfxsw32.dll and libmfxsw64.dll, requires compatible IA-32 or Intel® 64 architecture processor with support for Intel® Streaming SIMD Extensions 2 instructions.

Software:

Microsoft* Windows* Vista* with Service Pack 2, or Microsoft* Windows* 7 Operating System
Microsoft* Visual* C++ 2005 with Service Pack 1
For the Microsoft* Media Foundation* samples: Microsoft Windows SDK for Windows 7

visual computing

Intel® Media SDK

URL

URL:

Download all samples with the Intel® Media SDK

↧

Intel® Media SDK for Servers

June 30, 2013, 11:01 pm

Latest and popular articles on Intel Technologies

≫ Next: ooVoo Intel Enabling – HD Video Conferencing

≪ Previous: Microsoft Media Foundation Transforms (MFT)

all

Intel® Media SDK 2014 for Servers is an SDK for optimizing datacenter and embedded media applications for Linux and Windows server operating systems to utilize Intel Iris™ Pro and Intel HD Graphics hardware acceleration capabilities. Now, quickly and easily develop optimized media applications for Linux and Windows server operating systems such as encode, decode, and transcode for real-time streaming, teleconferencing, and video analytics.

Features:

For Intel Xeon® processor E3-1200 v3 product family and 4th Generation Intel Core™ Processor-based Platforms with Intel Iris™ Pro and Intel HD Graphics
Encode, decode, and transcode for server-based streaming
Supports Ubuntu*, SUSE* Linux Enterprise, Windows Server 2012* operating systems
Supports H.264 (AVC), H.265 (HEVC), MPEG-2, VC-1 formats

all

Intel® Xeon Processor E3-1200 v2 Product Brief ›

all

Support

Forums ›

Intel® Visual Computing Source

Imparare ›

Esempi ›

Strumenti ›

Light

A cross-platform API for developing server media solutions.

Streaming Density: Hardware accelerated video server workloads
Future Proof: Develop now for today and tomorrow's server platforms
Save Time and Money: Spend more time delivering content and solutions

Contenuto barra laterale:

Named-User License is $499
Buy Now

Or Download a Free 30-Day Evaluation Version

End User License Agreement

↧

ooVoo Intel Enabling – HD Video Conferencing

September 10, 2013, 10:56 am

Latest and popular articles on Intel Technologies

≫ Next: OpenCL™ Technology and Intel® Media SDK Interoperability

≪ Previous: Intel® Media SDK for Servers

Downloads

Download ooVoo Intel Enabling – HD Video Conferencing [PDF 754KB]

Throughout Q1/Q2 of 2013 Intel and ooVoo collaborated to enable multiple hardware accelerated video conferencing use cases. These include standard person-to-person video conferencing at 720p, multi-party video conferencing (up to 12 participants), and media sharing. To enable this, ooVoo collaborated with Intel to make the most of the wide range of performance options available on systems running both Intel® Atom™ and 4th generation Intel® Core™ processors.

Materials presented within this paper target Microsoft Windows* 8.x operating systems. Key resources / technologies leveraged during the optimization process include:

Intel® VTune™ Analyzer - http://software.intel.com/en-us/intel-vtune-amplifier-xe
Intel® Media SDK - http://software.intel.com/en-us/vcsource/tools/media-sdk
Intel® Performance Primitives (Intel® IPP) - http://software.intel.com/en-us/intel-ipp
Microsoft MSDN Reference - http://msdn.microsoft.com/en-us/default.aspx

To provide a complete picture of the optimization process, this paper discusses the overall analysis and testing approach, specific optimizations made, and illustrates before and after quality improvements.

Overall, focus was placed on ensuring video quality first and foremost. No significant emphasis was placed on power-efficiency optimization at this time. The quality improvements enabled by leveraging Intel hardware offloading capabilities are impressive and can be enjoyed in either person-to-person conferencing or in multi-party conferences for primary speakers.

Primary Challenge

Eliminating sporadic corruption occurring during network spikes and on bandwidth-limited connections was by far the biggest challenge faced during the optimization process. Initially, all HD calls experienced persistent constant corruption.

Analysis & Testing Approach

Before pursuing optimization efforts, a test plan and analysis approach was defined to ensure repeatability of results. Several key learnings were observed during our initial testing cycles. These included:

Establish performance baseline– Before beginning to optimize our application, we made sure we had eliminated network inconsistencies, unnecessary traffic / noise, adequate lighting, and adequate cameras. With the major variables under control we were able to confirm that our testing approach and results were repeatable. From this baseline, we could begin the optimization process.
Establish minimum network bandwidth (BW) requirements– Poor quality networks lead to poor quality video conferencing. A minimum of 1-1.5 MBits available for uplink / downlink was required to realize a full HD conference.
Camera quality is important– Poor quality cameras lead to poor quality video conferencing. As camera quality degrades, the amount of noise, blocking, and other artifacts increases. This also tends to increase the overall BW required to drive the call as well.
Define, test, and validate the network– To be certain you are not reporting issues created by BW issues present on your network, testing is required to determine the amount of packet loss, jitter, etc. present on the network before testing can begin.

The following data was collected during each testing cycle to enable triage / investigation of corruption issues affecting video quality. Initial data collected at the beginning of the optimization process showed that even reliable Ethernet connections experience corruption problems despite very low packet loss and jitter on a ~17 MBit connection.

Test	Intvl	Tx Size	BW	Jitter	Lost/Total Datagrams	Out of order	enc fps	dec fps	Quality
Ethernet VGA	10 sec	20.2 MBytes	17.0 Mbits/sec	1.04 2 ms	0.00%	1	15	15	Some corruption and coarse image
Ethernet VGA	10 sec	20.2 MBytes	16.2 Mbits/sec	1.01 6 ms	0.00%	1	15	15	Some corruption and coarse image
Ethernet 720p	10 sec	20.2 MBytes	16.2 Mbits/sec	1.01 6 ms	0.00%	1	15.41	14.47	Frequent corruption
Ethernet 720p	10 sec	20.2 MBytes	16.2 Mbits/sec	1.01 6 ms	0.00%	1	15.22	15.01	Frequent corruption

Key Optimizations

The three major optimization phases are: phase 1 focused on quality of service (QoS), phase 2 examined the user interface behavior, and phase 3 took a hard look at the rendering pipeline itself.

Phase 1 - QoS approach optimization – Resulting in a switch from 1% to 5% acceptable packet loss.
Phase 2 – User interface optimization – Resulting in I/O pattern set to output to system memory versus video memory due to the use of CPU-centric rendering APIs and reduction in GDI workload.
Phase 3 - Rendering pipeline optimization – Resulting in the elimination of per pixel copies, use of Intel IPP for memory copies where possible, and update to Intel IPP v7.1.

Phase 1 – QoS Approach Optimization
Quality of service algorithms seek to ensure that a consistent level of service is provided to end users. During our initial evaluation of the software, it was unclear if the QoS solution within ooVoo was too aggressive or if our network environment was too unstable. Initial measurements of network integrity indicated that it was unlikely that the network conditions were causing our issues. Jitter was measured at around 1.0-1.8 ms, packet loss was at or near 0%, and very few (if any packets) were being received out of order. All in all this indicated a potential issue on the QoS side of the application.

To find the root cause of the issue, it was necessary to perform a low-level analysis on the actual bitstream data being sent / received by the ooVoo application. Our configuration for this process was as follows:

Direct analysis of the bitstream being encoded on the transmit side and the bitstream reconstructed on the receiver side indicated that frames were clearly being lost somewhere. Further analysis of the encode bitstream showed that all frames could be accounted for on the transmit side of the call; however, the receive side of the call was not seeing the entire bitstream encoded on the transmit side.

As can be seen from the diagram above, the receiver (decoder) stream is missing frames throughout the entire call. After working closely with the ooVoo development team, it was found that relaxing the QoS to accommodate up to 5% packet loss improved things significantly during point-to-point 720p video calls.

Phase 2 – User Interface Optimizations
Today, graphics and media developers have a wide variety of APIs to select from to meet engineering needs. Some APIs offer richer feature sets targeting newer hardware while others offer backwards compatibility. Since backwards compatibility was a key requirement for the ooVoo application, legacy APIs developed by Microsoft Corporation such as GDI and DirectShow* are necessary.

The following simplified pipeline illustrates the key area (“Draw UI” in green) where optimizations took place during this phase.

Before diving in to the details, a quick word regarding video and system memory is in order. In simple terms, video memory is typically accessible to the GPU while system memory is typically accessible to the CPU. Memory can be copied between video / system memory; however, this comes at a significant performance cost. When working with graphics APIs, it is important to know whether the API you are using is CPU-centric. If it is, then it is critical to set the MSDK I/O PATTERN to output to system memory. Failure to do this when using a CPU-centric rendering API may lead to very poor performance. In cases where APIs such as GDI are used to operate on the surface data provided by the MSDK, operations that require surface locking will (in particular) be the most costly.

In the case of the ooVoo client application, it was observed that fullscreen rendering required significantly more processing power than when running in windowed mode. This puts us squarely in the case of needing to account for a CPU-centric API in our rendering pipeline.

A detailed look at the overall workload when in fullscreen mode illustrates the following GDI activity (see yellow). Measurements below were made on an Ivy Bridge platform with a total of 4 cores yielding 400% total processing power.

Continuing the investigation, it became clear that there was a significant difference in how the ooVoo application handled window versus fullscreen display modes. Note GDI workload virtually disappears in window mode.

The Intel Vtune analyzer was used to identify the area of code where the GDI workload was being introduced. After discussing the issue with the ooVoo team, it became clear that this was unexpected behavior, and the ooVoo team found that the application was using GDI too frequently during fullscreen rendering. The solution was to limit the number of GDI calls made during each frame when in fullscreen mode. Despite the simple nature of this change, significant improvements were observed across the board:

GDI workload reduction impact and observations:

Limited rendering via legacy APIs such as GDI+ is possible for video conferencing applications if resources are already available in system memory and very limited calls are made to GDI+ during each frame.
Reduction of GDI+ call frequency within ooVoo application virtually eliminated all GDI overhead.
Broad overall application CPU utilization for ooVoo went down by ~4-5%. Within the app the percent of time spent on GDI+ work is down from 10% to 0.2.
Overall workload associated with the ooVoo application is more organized and predictable with less CPU spikes due to less surface locking by GDI.
System wide reduction in CPU of ~20%.

The following diagram illustrates the ooVoo application workload and related GDI effort after our optimizations:

Final measurements post optimization follow:

Metric	Previous Build	Latest Build	Delta / Improvement
System CPU Peaks	~200%	~175%	Reduced ~25%
ooVoo.exe CPU Peaks	~40%	~37%	Reduced 3%
GDI+ Workload	10% of ooVoo.exe	0.12% of ooVoo.exe	Reduced 10%
Total System CPU	159% of 400%	137% of 400%	Reduced ~22%
ooVoo Total CPU	114% of 400%	110% of 400%	Reduced ~4-5%

Phase 3 – Rendering Pipeline Optimizations
Our final step in the optimization process was to take a hard look at the backend rendering pipeline for any un-optimized copies or pixel format conversions that might be affecting performance. Three key things to watch out for include:

Per pixel copies – A copy operation executed serially for each pixel. For this type of operation it is always best to leverage Intel IPP. Our Intel IPP package comes with copy operations optimized for Intel HW.
Copies across video / system memory boundaries – Instead of copying MSDK frame data from video to system memory yourself, it is more effective to allow the MSDK to steam to system memory for you.
Fourcc conversions – Fourcc color conversions are always expensive. If possible, try to get your data in the format you need and stay there. If converting between YUV / RGB colorspace, you can use either Intel IPP or pixel shaders to expedite.

Early on in the process of profiling the ooVoo application, it was clear that memory copies were affecting performance; however, it was not clear what opportunities existed to address the issue. The ooVoo team performed a detailed code review and found cases where Intel IPP copy operations were not being used, places where per pixel copies were used, and ultimately upgraded to Intel IPP v7.1 to benefit from the latest updates.

The results were impressive, giving us our first look at video conferencing at 720 on both Intel Core and Intel Atom platforms. The following before/after shots illustrate the improvements.

Point-to-Point
Note the elimination of blocky corruption in the facial area:

Configuration: Point to point, 720p, 15 fps, 1-1.5 MBits/sec, IVB:IVB, 4G network

Multi-Party
Note the level of detail enabled for primary speaker during multi-party conference.

Configuration: Muti-Party via ooVoo Server, 4 callers + YouTube, 15 fps, IVB

Intel, the Intel logo, Atom, Core, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2013 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.

Intel® Integrated Performance Primitives

Intel® Media SDK

Intel® VTune™ Performance Analyzer

Grafica

Elaborazione multimediale

Ottimizzazione

Laptop

PDF

↧

OpenCL™ Technology and Intel® Media SDK Interoperability

November 27, 2013, 5:15 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Media SDK 2014 Professional Camera Pack – Coming Soon!

≪ Previous: ooVoo Intel Enabling – HD Video Conferencing

Download Code Sample Download Documentation

Features / Description

The Intel® Media SDK Interoperability sample demonstrates how to use Intel® Media SDK and Intel® SDK for OpenCL™ Applications together for efficient video decoding and fast post-processing.

The sample demonstrates the Intel® Media SDK pipeline combined with post-processing filters in the OpenCL technology, showing how to:

Integrate processing with Intel® SDK for OpenCL Applications into Intel® Media SDK pipeline and get benefit from hardware-accelerated (if available) video decoding with Intel® Media SDK pipeline
Organize efficient sharing between Intel® Media SDK frames and OpenCL images by use of cl_khr_dx9_media_sharing extension
Implement simple video effects in OpenCL

Supported Devices: Intel® Processor Graphics
Supported OS: Windows* OS
Complexity Level: Advanced

Refer to the sample release notes for information on system requirements.
For more information about the sample refer to the sample User's Guide inside the sample package.

* OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

visual computing

Sviluppatori

Microsoft Windows* (XP, Vista, 7)

URL:

Sample Release Notes

Intel® SDK for OpenCL™ Applications 2014 Beta

Intel® SDK for OpenCL™ Applications 2013

Visual Computing Samples

↧

Intel® Media SDK 2014 Professional Camera Pack – Coming Soon!

March 31, 2014, 2:01 pm

Latest and popular articles on Intel Technologies

≫ Next: Framework for developing applications using Media SDK

≪ Previous: OpenCL™ Technology and Intel® Media SDK Interoperability

A new accelerator addition to the Intel® Media SDK product family for Professional usage is coming soon! The Intel® Media Software Development Kit (Intel® Media SDK) Professional Camera Pack enables high-quality, high-performance GPU-accelerated camera raw processing modules for RAW photo application developers and emerging 4K RAW video application developers.

The beta professional camera pack, which will be available soon, contains a library of plug-ins that expose GPU-accelerated RAW image processing capabilities of Intel® platforms. Intel® Iris Pro Graphics is recommended for 4K RAW processing.

To be notified once the Intel® Media SDK 2014 Professional Camera Pack beta is released, please submit your email address below. In the meantime, check out our other Intel® Media SDK products.

Light

↧

Framework for developing applications using Media SDK

August 19, 2014, 7:48 pm

Latest and popular articles on Intel Technologies

≫ Next: Обмен текстурами между Intel Media SDK и OpenGL

≪ Previous: Intel® Media SDK 2014 Professional Camera Pack – Coming Soon!

Legal Disclaimer

This article is geared towards beginner developers in Media SDK. Often times, we use the existing samples and tutorials to understand how to develop applications in Media SDK, and modify these samples/tutorials to plug in our own code. Sometimes, it works. But when it is time to add more features to the code or optimize the code, we have to invest a lot more time understanding and re-writing our code. My belief is that if we understand the basic steps that are required to develop an application in Media SDK, the actual time to code itself collapses to simply understanding and using select APIs, and not on coding/debugging the setup part. Enough motivation, let's begin!

Media SDK is a framework that enables developing applications in media by providing APIs for ease of development. These APIs are optimized for the underlying hardware/accelerators and provide good abstractions for most of the heavy-duty media algorithm implementations. So, as a developer, we need to understand the sequence these APIs should be called in to set-up the media pipeline and say go. Much of this article will focus on the former, and also provide some details into the different options available to the developer to add more features. Shown below are the basic structure of a Media SDK application. We will discuss each of these stages in more detail in this article. Now, let us get into details of each of these stages.

Initialize Session

The first part of the set-up stage is initializing the Media SDK session, within which the media pipeline we will define will execute. Sessions use the dispatcher to map function calls to their DLL implementation.

MFXVideoSession session;
sts = MFXInit(mfxIMPL impl, mfxVersion ver, MFXVideoSession *session);

The MFXInit() function initializes the media session for the implementation specified and for the version available.

"mfxIMPL impl" -> Use software, or hardware or best available implementation. We recommend using MFX_IMPL_HARDWARE, or MFX_IMPL_AUTO if you are unsure of the underlying driver support.

"mfxVersion ver" -> If you specify 0, it uses the API version from the SDK release with which an application is built. Alternately, you can use MFXQueryVersion function to query for the version.

Set Parameters

In this stage, we will specify all the video parameters required for the decode/encode/vpp processing. For each pipeline stage (decode/encode/vpp) in your application, we will populate the mfxVideoParam structure with the appropriate values. We will illustrate this with the following example use-case:

Use-case: Input YUV stream -> Pre-processing using VPP -> Encode to H264-> Decode the h264 stream
Input Resolution: 352x288
Output Resolution: 176x144
Required bitrate for encode: 3000 Kbps, Constant Bitrate
Frame rate: 30 fps
Pre-processing: Resize and Denoise filter with window size of 5
Video formats: YUV420p and H264

In the table below, the fields filled in are the mandated fields based on the use-case we defined above.

In the code snippet below, we show it for generic stage and will point out specifics:

mfxVideoParam Params;    // Require one each for decode/encode/vpp
memset(&Params, 0, sizeof(Params));

For encode/decode/transcode, the mfxVideoParam::mfxInfoMFX structure needs to be filled in, as it contains variables that define the video properties (for input and output), and variables to control the properties of the encode/decode process. For VPP, the mfxVideoParam::mfxInfoVPP is the main structure that specifies the properties of the input and output VPP frames. Note that the properties specified for the output frames are queried later for validity and support.

NOTE: For decoder, you can call the function DecodeHeader that parses the input bit stream and fills the mfxVideoParam structure with appropriate values, such as resolution and frame rate. The application can then pass the resulting mfxVideoParam structure to the MFXVideoDECODE_Init function for decoder initialization. You can call this function at any time before or after decoder initialization.

sts = mfxDEC.DecodeHeader(mfxBitstream* mfxBS, mfxVideoParam* Params);
MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);    //Ignore this warning message. Indicates SW implementation will be used instead of the HW implementation.
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

Apart from these, some common properties for encode/decode/transcode/VPP in the mfxVideoParam structure are IOPattern, AsyncDepth, NumExtParam and ExtParam. IOPattern is a mandated field for decode, encode and VPP. This parameter itemizes memory access patterns for SDK functions. For example, is the (input,output) access pattern in video memory, system memory or opaque. When using the hardware implementation, it is suggested to use the video memory. AsyncDepth parameter specifies how many asynchronous operations can be performed before the application explicitly synchronizes.

External Buffers for added features

The NumExtBuffer and ExtParam parameters specify the number of extra configuration structures attached, and the pointer to these configurations respectively. For example, in the use-case above, we want to resize and denoise the input stream before we encode. We use the external buffers to specify these operations, and attach them to the videoParams structure for processing. The ExtendedBufferID enumerator lists the possible operations that can be attached to this structure. simple_4_vpp_resize_denoise_vmem tutorial gives an example of using this structure for de-noising operation.

    // Initialize extended buffer for frame processing
    // - Denoise           VPP denoise filter
    // - mfxExtVPPDoUse:   Define the processing algorithm to be used
    // - mfxExtVPPDenoise: Denoise configuration
    // - mfxExtBuffer:     Add extended buffers to VPP parameter configuration
    mfxExtVPPDoUse extDoUse;
    memset(&extDoUse, 0, sizeof(extDoUse));
    mfxU32 tabDoUseAlg[1];
    extDoUse.Header.BufferId = MFX_EXTBUFF_VPP_DOUSE;
    extDoUse.Header.BufferSz = sizeof(mfxExtVPPDoUse);
    extDoUse.NumAlg = 1;
    extDoUse.AlgList = tabDoUseAlg;
    tabDoUseAlg[0] = MFX_EXTBUFF_VPP_DENOISE;

    mfxExtVPPDenoise denoiseConfig;
    memset(&denoiseConfig, 0, sizeof(denoiseConfig));
    denoiseConfig.Header.BufferId = MFX_EXTBUFF_VPP_DENOISE;
    denoiseConfig.Header.BufferSz = sizeof(mfxExtVPPDenoise);
    denoiseConfig.DenoiseFactor = 5;        // can be 1-100

    mfxExtBuffer* ExtBuffer[2];
    ExtBuffer[0] = (mfxExtBuffer*) &extDoUse;
    ExtBuffer[1] = (mfxExtBuffer*) &denoiseConfig;
    VPPParams.NumExtParam = 2;
    VPPParams.ExtParam = (mfxExtBuffer**) &ExtBuffer[0];

    // Initialize Media SDK VPP
    sts = mfxVPP.Init(&VPPParams);

With this, we have finished setting the parameters for the media pipeline. You can find more details about the control options and properties in the documentation for Media SDK. Until now, the developer has specified what he wants to achieve with the media pipeline. The SDK may or may not support all the transformations, thus, querying is an important and often ignored step that should follow the set-up.

Query

Once the video parameters are specified in the mfxVideoParam structure, you can Query to check for the validity and SDK support of these parameters. The Query functionality provided by Media SDK is powerful and very useful.

The Query functions can be used to check: the implementation supported by the platform - software or hardware (MFXQueryIMPL), version of the SDK (MFXQueryVersion), if the required transformations are supported by the SDK, if not, what are the minimum that can be (Encode_Query. Decode_Query, VPP_query), the number of surfaces required for the transformations (Encode_QueryIOSurf, Decode_QueryIOSurf, VPP_QueryIOSurf).

MFXQueryIMPL(session, &impl); // returns the actual implementation of the session
MFXQueryVersion(session, &version); //returns the version of the SDK implementation
sts = mfxVPP.Query(&Params_in, &Params_out); //*Params_in points to the requested out features, *Param_out points to the features that can be best achieved. If sts is MFX_ERR_NONE, it means requested features can be met. If the status returns warnings for incompatible video parameters, the *param_out is filled with best achievable parameters.

Now that we have initialized the session, set the parameters and queried their support, we have to allocate the surface buffers that will be used by the SDK pipeline.

Allocate Surfaces

Once we have set the video parameters, our next step is to allocate surfaces for these operations. The QueryIOSurf function returns the minimum and suggested numbers of frame surfaces required for encoding/decoding/VPP initialization and their type. This number depends on multiple parameters such as number of asynchronous operations before synchronization, number of operations in the pipeline stage and buffer sizes. As mentioned above, AsyncDepth is a good parameter to constrain the number of surfaces.

mfxFrameAllocRequest Request;
memset(&Request, 0, sizeof(Request));
sts = QueryIOSurf(session, &Params, &Request);
mfxU16 numSurfaces = Request.NumFrameSuggested;

Now, we proceed with allocation of the surface buffers for the suggested number of surfaces.

Allocation of surfaces is very important for performance - The SDK's best performance comes from using the underlying hardware. Thus, allocating the buffers the SDK operates on in the video memory is highly desirable since this eliminates copying them from the system memory to the video memory. During the session initialization, please use video memory for IO for input and output to specify the usage. And during allocation, use SDK provided alloc functions instead of memset() or new() that would allocate the buffers in the system memory.

/* Using system memory for allocation - NOT RECOMMENDED, since lowers performance
mfxU32 surfaceSize = width * height * bitsPerPixel / 8;
mfxU8* surfaceBuffers = (mfxU8*) new mfxU8[surfaceSize * numSurfaces];
*/

/* Using video memory for allocation - RECOMMENDED */
mfxFrameAllocResponse DecResponse;
sts = mfxAllocator.Alloc(mfxAllocator.pthis, &frameAllocRequested, &frameAllocGiven);
mfxFrameSurface1** pmfxSurfacesDec = new mfxFrameSurface1 *[numSurfaces];
MSDK_CHECK_POINTER(pmfxSurfacesDec, MFX_ERR_MEMORY_ALLOC);
for (int i = 0; i < numSurfaces; i++) {
    pmfxSurfaces[i] = new mfxFrameSurface1;
    memset(pmfxSurfaces[i], 0, sizeof(mfxFrameSurface1));
    memcpy(&(pmfxSurfaces[i]->Info), &(mfxParams.mfx.FrameInfo), sizeof(mfxFrameInfo));
    pmfxSurfaces[i]->Data.MemId = DecResponse.mids[i];   // MID (memory id) represent one D3D NV12 surface
    }

After the surface allocation, we can now proceed with initializing the encoder/decoder/VPP. NOTE about opaque surfaces: Opaque surfaces, as the name suggests, are managed by the SDK and are not visible or controllable by the developer. These surfaces are best used when the functionality required is basic and concrete (like decoding or encoding) and will not be expanded upon later. As a thumb rule, we always recommend using hardware implementation with video surfaces.

Find Free Surface

With allocation of surfaces, the set-up part is complete. Yes, it looks tedious, but once you get a hang of it, it is quite intuitive - start the dialogue (initialize session), tell the SDK what you want (set parameters), ask if SDK can do it (query), allocate resources (allocate surfaces) to start doing it! In the following sections, we will touch upon how to start the processing stage and clean-up afterwards.

The SDK uses the surfaces allocated and initialized to do the processing. So, to begin processing, use the GetFreeSurface() function to get an input and output surface that is free (not locked by other process) and can be used for the processing. This functionality is as simple as find an unlocked surface for use.

nIndex = GetFreeSurfaceIndex(pmfxSurfaces, numSurfaces);        // Find free frame surface

After finding the free surface, read the bit stream (for decoder), or the frame (for encoder) into the surface and pass the surface for processing.

Processing Loop

The Media SDK provides the developer with synchronous and asynchronous function calls for the processing - be it encode, decode or VPP. As the name suggests, if the function call is synchronous, it is required to wait on sync after every processing function call. In essence, you cannot fire multiple frame processing in parallel.

As a rule of thumb, we recommend the use of asynchronous calls to process frames. In a loop that terminates when the input is empty, fire asynchronous functions for either decode/encode/VPP. You can specify the number of asynchronous operations you'd like to perform before synchronizing using the AsyncDepth parameter. This way, you are processing multiple frames in parallel thus improving the performance significantly.

while (MFX_ERR_NONE <= sts || MFX_ERR_MORE_DATA == sts || MFX_ERR_MORE_SURFACE == sts)
{
	/** Asynchronous call handling **/
	nTaskIdx = GetFreeTaskIndex(*taskPool, poolSize=AsyncDepth); 	//Find free taskID
	if(nTaskIdx == NULL){
		sts = session.SyncOperation(pTasks[nFirstSyncTask].syncp, 60000);	// Synchronize. Wait until processed frame is ready
		pTasks[nFirstSyncTask].syncp = NULL;
		nFirstSyncTask = (nFirstSyncTask + 1) % taskPoolSize;
		}

	nIndex = GetFreeSurfaceIndex(pmfxSurfaces, numSurfaces);        // Find free frame surface
	sts = {Decode|Encode|VPP}FrameAsync;

	/** Synchronous call handling */
	if (MFX_ERR_NONE == sts)
		sts = session.SyncOperation(syncp, 60000);      // Synchronize. Wait until processed frame is ready
}

Drain and Cleanup

This stage is similar to the previous Processing Loop, with the exception that the input parameter to the FrameAsync function is NULL, and we are draining the pipeline at this stage. For pseudo-code, all you need to do is replace the first parameter for FrameAsync function with NULL in the while loop above. Once the pipeline draining is done, we deallocate all the buffers used and close the file handles if any.

With this we conclude the basic steps for developing using Media SDK. Hope this was helpful. I will add more information as and when I find it.

Sviluppatori

Media SDK per Windows*

Elaborazione multimediale

URL

Area tema:

IDZone

↧

Обмен текстурами между Intel Media SDK и OpenGL

September 8, 2014, 11:48 pm

Latest and popular articles on Intel Technologies

≫ Next: Query Functionality in Media SDK

≪ Previous: Framework for developing applications using Media SDK

Code Sample

Краткий обзор

Обычно в ОС Windows* для обработки видео используется Direct3D. Однако во многих приложениях неизменность графического интерфейса и внешнего вида при работе на разных платформах обеспечивается за счет возможностей OpenGL*. В последних версиях графических драйверов Intel поддерживается расширение NV_DX_interop, что обеспечи-вает возможность обмена поверхностями между D3D и OpenGL для их последующего использования в Intel® Media SDK. В Intel® Media SDK можно настроить использование Direct3D, а благодаря поддержке NV_DX_interop кадровый буфер Intel Media SDK может использоваться в OpenGL. При этом устраняется необходимость в ресурсоемком копировании текстур из графического процессора в ЦП и обратно для обработки. В данном примере кода и техническом документе приведена процедура настройки использования D3D в Intel® Media SDK для кодирования и декодирования видео, преобразования цветов из цветовой схемы NV12 (стандартный цветовой формат Media SDK) в схему RGBA (стандартный цветовой формат OpenGL) и сопоставления наложения поверхности D3D с на текстурой текстуру OpenGL. В данной процедуре отсутствует этап копирования текстур из графического процессора в ЦП для обработки, что всегда представляло большую сложность при использовании OpenGL с Intel® Media SDK.

Требования к системе

Пример кода написан с помощью Visual Studio* 2013. Он предназначен (1) для демонст-рации работы Miracast и (2) обмена текстурами между Intel® Media SDK и OpenGL. В процессе обмена декодированные поверхности Intel® Media SDK сопоставляются с текстурами OpenGL без необходимости в выполнении копирования, что значительно повышает эффективность работы. При использовании процессоров Haswell и более поздних версий выполняется аппаратное ускорение декодера MJPEG. При использовании процессоров более ранних версий в Media SDK автоматически применяется программный декодер. Необходимо использовать камеру с поддержкой MJPEG (это может быть как встроенная камера, так и камера с подключением через USB).
Большая часть процедур, используемых в примере кода и техническом документе, применимы и для Visual Studio 2012 (за исключением идентификации типа подключения Miracast). Пример кода основан на Intel® Media SDK 2014 для клиентских систем. Загрузить пример можно по следующей ссылке: (https://software.intel.com/sites/default/files/MediaSDK2014Clients.zip.) После установки SDK создается набор переменных среды для поиска правильных путей к файлам заголовков и библиотекам в Visual Studio.

Обзор приложения

Приложение распознает камеру как устройство ввода MJPEG, выполняет декодирование этого видео, затем кодирование потока в формат H264 и, наконец, его декодирование и завершение обработки. Поток видео формата MJPEG с камеры (после декодирования) и полностью обработанные потоки отображаются в графическом интерфейсе на базе MFC. В системах Haswell для обеспечения удобочитаемости выполняется последовательный запуск двух декодеров и одного кодировщика (с разрешением 1080p). Благодаря аппаратному ускорению эта процедура не занимает много времени. Единственным ограничением количества передаваемых кадров в секунду является скорость работы камеры. В реальных условиях кодировщики и декодеры запускаются в отдельных потоках, поэтому проблем с производительностью возникать не должно.

При использовании одного монитора в графическом интерфейсе на базе OpenGL потоковое видео с камеры отображается в режиме «картинка в картинке» поверх обработанного видео (рис. 1). При использовании технологии Miracast программа автоматически определяет монитор с поддержкой Miracast , и на нем во весь экран отображается окно с обработанным видео, при этом в основном графическом интерфейсе отображается необработанное видео с камеры. Такой режим позволяет с легкостью сравнить исходное и кодированное видео. Кроме того, в меню View -> Monitor Topology можно отслеживать текущую топологию мониторов, а также изменять ее. К сожалению, запустить подключение Miracast в этом меню невозможно. Это можно сделать только в меню чудо-кнопок ОС (меню чудо-кнопок справа -> «Устройства» -> «Проект»). На настоящий момент API для запуска подключения Miracast не существует. При этом отключить Miracast-монитор можно, изменив топологию мониторов на «только внутренние». При наличии нескольких мониторов, подключенных с помощью проводов, их топологию можно в любой момент изменить в этом меню.

Рисунок 1. Топология с использованием одного монитора. Видео с камеры MJPEG отображается в правом нижнем углу. Обработанное видео отображается во весь экран. При включении режима использования нескольких мониторов (например, в режиме Miracast) программа обнаруживает данное изменение, в результате чего видео с камеры MJPEG и обработанное видео автоматически выводятся на разные мониторы.

Главная точка входа для настройки процесса обработки

Пример кода выполнен на базе MFC. Главная точка входа для настройки процесса обработки — CChildView::OnCreate (). Здесь выполняется инициализация камеры с последующим транскодированием видео из формата MJPEG в H264, декодированием формата H264 и связыванием текстур из транскодера и декодера в модуле визуализации OpenGL. Транскодер представляет собой подкласс декодера с добавлением кодировщика поверх базового декодера. Событие OnCreate запускает поток, в рамках которого осуществляется и упорядочивается потоковое вещание с камеры. При считывании потокового вещания рабочий поток отправляет сообщение функции OnCamRead, которая выполняет декодирование видео в формате MJPEG, кодирование в формат H264, его декодирование и обновление текстур в модуле визуализации OpenGL. На верхнем уровне весь процесс очень прозрачен и прост.

Инициализация декодера/транскодера

Для использования D3D9Ex необходимо выполнить инициализацию декодера и транс-кодера. Intel® Media SDK может быть настроен на использование программного метода, D3D9 или D3D11. В этом примере для упрощения преобразования цветов используется D3D9. Стандартным цветовым форматом Intel® Media SDK является NV12. Для преобразования цветовой схемы в формат RGBA можно использовать функцию IDirect3DDevice9::StretchRect или функцию IDirectXVideoProcessor::VideoProcessBlt. В целях упрощения в этом техническом документе используется функция StretchRect, однако на практике рекомендуется использовать функцию VideoProcessBlt, так как в нее включена дополнительная возможность последующей обработки. К сожалению, D3D11 не поддерживает StretchRect, что может усложнить процесс преобразования цветов. Кроме того, в этом документе для выполнения различных опытов (таких как сочетание различных типов программного и аппаратного обеспечения) в декодере и транскодере используются отдельные устройства D3D. Однако для экономии ресурсов памяти в декодере и транскодере может использоваться одно устройство D3D. В результате такой настройки процесса обработки результатам, получаемым на выходе после декодирования, задается тип (mfxFrameSurface1 *). Это оболочка для D3D9. Тип mfxFrameSurface1 -> Data. MemId можно преобразовать в тип (IDirect3DSurface9 *). После декодирования этот тип можно использовать в StretchRect или VideoProcessBlt функции CDecodeD3d9::ColorConvert. Полученные поверхности Media SDK нельзя сделать общими, однако преобразование цветов все равно является обязательным в OpenGL. Для хранения результатов преобразования создаются общие поверхности.

Инициализация транскодера

Декодированные транскодером данные направляются непосредственно в кодировщик. Убедитесь, что при выделении поверхностей используется MFX_MEMTYPE_FROM_DECODE.

Связывание текстур в D3D и OpenGL

Код для связывания текстуры можно найти в функции CRenderOpenGL::BindTexture. Убедитесь, что расширение WGLEW_NV_DX_interop определено, затем последовательно используйте функции wglDxOpenDeviceNV, wglDXSetResourceShareHandleNV и wglDXRegisterObjectNV. Поверхность D3D связывается с текстурой OpenGL. Текстуры не обновляются автоматически. Их можно обновить, вызвав функции wglDXLockObjectsNV/wglDXUnlockObjectsNV (CRenderOpenGL::UpdateCamTexture и CRenderOpenGL::UpdateDecoderTexture). После обновления текстуру можно использовать, как любую другую текстуру в OpenGL.

Важные моменты при изменении топологии для нескольких мониторов

Может показаться, что вывести еще одно окно на внешний монитор и управлять им посредством обнаружения изменений в топологии довольно просто. Однако на деле ОС может потребоваться некоторое время для инициализации переключения между режимами, завершения настройки монитора и отображения содержимого. Учитывая использование кодировщика/декодера/D3D/OpenGL и всех сопутствующих компонентов, отладка этого процесса может быть довольно сложным делом. В примере кода при переключении между режимами повторно используется большая часть процедуры, однако более простым вариантом может быть завершение процесса и его повторная инициализация, так как если процесс добавления монитора занимает более 10 секунд, могут возникнуть различные проблемы — даже при подключении с помощью кабелей HDMI или VGA.

Задачи на будущее

Пример кода для этого технического документа написан для D3D9 и не включает поддержку D3D11. Пока трудно с точностью сказать, какой способ преобразования цветовой схемы NV12 в схему RGBA при отсутствии StretchRect или VideoProcessBlt является наиболее эффективным. Документ и пример кода будут обновлены после разрешения вопроса с использованием D3D11.

Благодарности

Выражаем благодарность Петеру Ларссону (Petter Larsson), Михелю Джеронимо (Michel Jeronimo), Томасу Итону (Thomas Eaton) и Петру Биалеки (Piotr Bialecki) за их помощь в создании этого документа.

Intel, эмблема Intel и Xeon являются товарными знаками корпорации Intel в США и в других странах.
*Прочие наименования и товарные знаки могут быть собственностью третьих лиц.
© Корпорация Intel, 2013. Все права защищены.

Media SDK per Windows*

OpenGL*

URL

Area tema:

IDZone

↧

Query Functionality in Media SDK

September 19, 2014, 1:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Full Pipeline Optimization for Immersive Video in Tencent QQ*

≪ Previous: Обмен текстурами между Intel Media SDK и OpenGL

Intel Media SDK is a framework for developing media applications. It provides the hardware implementations for some of the founding algorithms for Media - encode, decode, video processing; and also exposes a numbers of relevant parameters that can be used to tune the application to one's need. The hardware implementation is the recommended implementation to use, but sometimes, the implementation can default to software due to the lack of underlying hardware. To leverage the full potential of the SDK, the developer has to ensure the features he wants (achieved by specifying the parameters) are supported by the underlying hardware.

If you would the application to be portable and easily deployable across many platform, it is not practical to check the compatibility of each system and customize the media application accordingly. This where the QUERY functionality of Media SDK comes into picture. The developer can use thus functionality to not only check the capabilities of the underlying system, it can also suggest the best possible configuration one can achieve using the underlying system. In the following table, I will list some of the commonly used Query functions and their usage.

All the above-mentioned query functions greatly aid in portability and programmability. The first two (QueryImpl and QueryVersion) can help you choose the best possible implementation for the underlying hardware. The other query functions (Query and QueryIOSurf) are focused on the pipeline being used, and help with feature-check and ensuring enough resources are allocated for the requested features.

Code Snippets:

MFXQueryIMPL(session, &impl); // returns the actual implementation of the session
MFXQueryVersion(session, &version); //returns the version of the SDK implementation
sts = mfxVPP.Query(&Params_in, &Params_out); //*Params_in points to the requested out features, *Param_out points to the features that can be best achieved. If sts is MFX_ERR_NONE, it means requested features can be met. If the status returns warnings for incompatible video parameters, the *param_out is filled with best achievable parameters.

Query for Surfaces

mfxFrameAllocRequest Request;
memset(&Request, 0, sizeof(Request));
sts = QueryIOSurf(session, &Params, &Request);	//Returns number of (minimum,maximum and suggested) surfaces needed for the specified pipeline and parameters.
mfxU16 numSurfaces = Request.NumFrameSuggested;

Some optimization tips:
The hardware implementation is always faster than the software counterpart. You can achieve the best performance with hardware by using the video memory which is local to the hardware (GPU or fixed-function logic). When defaulting to software implementation, make sure you use the system memory and not the video memory. You can do this by adding a simple check to the output of the QueryIMPL function, and conditionally allocating the surfaces based on the output. For example,

MFXQueryIMPL(session, &impl); // returns the actual implementation of the session
if (impl == MFX_IMPL_SOFTWARE)
m_mfxEncParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY;
else
m_mfxEncParams.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY;

For surface allocation, refer to the tutorials such as simple_3_encode_vmem and surface_3_encode. The former uses video memory, while the latter uses system memory.

Sviluppatori

Principiante

Intermedio

Media SDK per Windows*

Strumenti di sviluppo

Istruzione

URL

Per iniziare

Area tema:

IDZone

↧

Full Pipeline Optimization for Immersive Video in Tencent QQ*

September 22, 2014, 2:54 pm

Latest and popular articles on Intel Technologies

≫ Next: Кодирование видео с использованием встроенного видео Intel HD Graphics

≪ Previous: Query Functionality in Media SDK

Download PDF

Abstract

Tencent (Tencent Technology Company Ltd) integrated the Intel® Media SDK to optimize performance and reduce power consumption of its video conferencing app, QQ*. The app went from a max resolution of 480p with low frames per second (fps) to 720p resolution at 15-30 fps while consuming only 35% of the original amount of power. And it now supports 4-way conferencing while lowering CPU utilization from 80% to <20%, reducing power consumption from 14w to 6w and cutting RAM usage in half.^Z

These techniques to optimize the entire pipeline using the hardware acceleration of Intel® graphics from camera capture through decoding, encoding, and final display can also be used by other media applications.

Introduction

Tencent QQ is a popular instant messaging service for mobile devices and computers. QQ boasts a worldwide base of more than one billion registered users and is particularly popular in China. QQ has more than 100 million people logged in at any time and offers not only video calls, voice chats, rich texting, and built-in translation (text) but also file and photo sharing.

Like all video on the Internet, QQ performs best when there’s plenty of data bandwidth available, but video conferencing is bi-directional so both uplink and download speeds are important. Unfortunately in many countries, including China, uplink speed may only be 512kbps. So to please customers, Tencent needed good compression and low latency while still leaving CPU and RAM bandwidth available for multitasking. Plus the devices need to remain cool and power efficient while balancing high quality with available bandwidth.

So Tencent engineers worked with Intel engineer, Youwei Wang, to first diagnose the bottlenecks and power consumption of their app and then improve performance of the data flow pipeline. The main changes involved using the CPU and GPU in parallel to increase performance while making major memory handling changes to decrease memory usage, both of which provided a significant decrease in power consumption.

This article details how the improvements were accomplished using the special features of Intel® processors by integrating the Intel Media SDK and using Intel® Streaming SIMD Extensions (Intel® SSE4) instructions.

Performance and Power Analysis Tools

Significant data capture and analysis can be done using tools currently available free on the Internet. From Microsoft, the team used the Windows* Assessment and Deployment Kit (Windows ADK) (available at http://go.microsoft.com/fwlink/p/?LinkID=293840), which includes:

Windows Performance Analyzer (WPA)
Windows Performance Toolkit (WPT)
GPUView
Windows Performance Recorder (WPR)

The Intel® tools used were:

Intel® Performance Bottleneck Analyzer https://software.intel.com/en-us/articles/intel-performance-bottleneck-analyzer
Graphics Performance Analyzers https://software.intel.com/en-us/vcsource/tools/intel-gpa
Intel® Power Gadget https://software.intel.com/en-us/articles/intel-power-gadget-20
Battery Life Analyzer https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=19351

The Intel® Media Software Development Kit (Intel® Media SDK)

The Intel® Media SDK is a cross-platform API that includes features for video editing and processing, media conversion, streaming and playback, and video conferencing. The SDK makes it easy for developers to optimize applications for Intel® HD Graphics hardware acceleration, which is available starting with the 2nd generation Intel® Core™ processors as well as the latest Intel® Celeron® and Intel® Atom™ processors.

Features of the Intel® Media SDK include:

Low Latency Encode and Decode
Allows dynamic control of bit rate via filter settings shown in the UI including
mfxVideoParam::AsyncDepth (limits internal frame buffering and forces per frame sync)
mfxInfoMFX::GopRefDist (stops use of B frames)
mfxInfoMFX::NumRefFrame (can set to only use previous P-frame)
mfxExtCodingOption::MaxDecFrameBuffering
(extends buffer, can set to show frame immediately)
Dynamic Bit Rate and Resolution Control
Adapts target and max Kbps to actual bandwidth at any time, OR customizes bit rate encoding per frame with the Constant Quantization Parameter (CQP) DataFlag.
Reference List Selection
Uses client side frame reception feedback to adjust reference frames, can improve robustness and error resilience.
Provides 3 types of Lists: Preferred, Rejected, and Long Term.
Reference Picture Marking Repetition SEI Message
Repeats the decoded reference picture marking syntax structures of earlier decoded pictures to maintain status of the reference picture buffer and reference picture lists - even if frames were lost.
Long Term Reference
Allows temporal scalability through use of layers providing different frame rates.
MJPEG decoder
Accelerates H.264 encode/decode and video processing filters. Allows delivery of NV12 and RGB4 color format decoded video frames.
Blit Process
Option to combine multiple input video samples into a single output frame. Then post-processing can apply filters to the image buffer (before display) and use de-interlacing, color-space conversion, and sub-stream mixing.
Hardware-accelerated and software-optimized media libraries built on top of Microsoft DirectX*, DirectX Video Acceleration (DVXA) APIs, and platform graphics drivers.

Understanding the Video Pipeline

Sending video data between devices is more complex than most people imagine. Figure 3 shows the key steps that the QQ app takes to send video data from a camera (device A) to the user’s screen (device B).

Figure 3: Serial processing

As you can see, many steps that require data format conversion or ‘data swizzling’. When these are handled serially in the CPU, significant latency occurs. The pre-optimized solution of QQ had limited pre and post processing. But since each packet of data is independent of the next, the Intel Media SDK can parallelize the tasks, split them between CPU and GPU, and optimize the flow.

Figure 4: Optimized multi-thread flow

Changing SIMD instructions

Another major improvement came from replacing the older Intel SIMD instruction set (MMX) with the Intel® Streaming SIMD Extensions (Intel® SSE4) instructions. This provided double throughput capabilities by moving from 64-bit standard floating point registers (where 2 32-bit integers can be swizzled simultaneously) to 128-bit registers (using the _mm_stream_load_si128 and _mm_store_si128 functions). Besides the larger registers, Intel SSE also separates the floating point registers from the data point registers. This means the processor can work on multi set data within one single CPU cycle, which greatly improves the data throughput and execution efficiency. Just the change from MMX to SSE4 calls increased QQ performance 10x. (See Additional References at the end of this article for more information on how to rewrite copy functions using Intel SSE4 and conversion of SIMD instructions.)

Additionally, Tencent was using C libraries to do the many large memory copies for each frame, which was too slow for HD video. The code was changed to use system memory only for the software pipeline and the hardware pipeline was changed so that D3D surfaces handle all the sessions/threads. For copies between system memory and the D3D surface, the engineers used the Intel SSE and Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions to decrease any unnecessary memory copies in the pipeline.

Using Dynamic Features of the Intel® Media SDK

Another improvement was to use the proper codec level when encoding (doing MJPEG decoding in the GPU). The team used the Intel Media SDK dynamic buffer and dynamic bit and frame rates, which decreased latency and reduced buffer use. Adding the pre and post processing in hardware improved compression helping the performance on low bandwidth networks.

For the user experience, the teams added de-noise in the preprocessing and used post processing to adjust colors (hue/saturation/contrast). By also using the integrated skin tone detection and face color adjustment, user experience was greatly improved.

Figure 5: Optimized Skin Tones

Changing Reference Frames

Regardless of the efficiency of the encode and decode processing, the user’s experience in a video conference will suffer if the network connection can’t consistently deliver the data. Without data, the decoder will skip ahead to a new reference frame (since the incremental frames come in late or was missing). Both frame type selection and accurate bit-rate control is necessary for a stable bit-stream transfer. Tencent found that setting I-frames to 30% of bandwidth gave the best balance. Plus the Intel Media SDK allowed to the elimination of B frames and allows changes to the max frame size and the buffer size.

Moving away from only using I-Intra frames and P-Inter frames, the new SP frames in H.264 allow switching between different bit rate streams without requiring an intra-frame. Tencent moved to using SP frames between P frames (reducing the importance of the P frame) and allowed dynamic adjustment to get the best balance between network conditions and video quality.

Reducing Power Consumption

In addition to improving the performance of QQ, the changes to memory copies, reference frames, and post processing also reduced the power consumption of the app. This is a natural consequence of doing the same amount of work in less time. But Tencent further reduced the amount of power required by throttling down the power states of the processor cores when they weren’t actually processing. Using the findings from the power tests, the engineers reworked areas that were keeping the processor unnecessarily active. Video conferencing apps don’t need to run the CPU continuously since data supplied by the network is never continuous and because there is no value in drawing new frames faster than the screen refresh rate. Tencent added short, timed lower power states, using the Windows API Sleep and WaitforSingleObject functions. The latter is triggered by events such as data arriving on the network. The resulting improvements can be seen in Figure 8:

Figure 8: Power savings per release

Summary of QQ Improvements

Using the Intel Media SDK and changing to the Intel SSE4 instruction set, Tencent made the following improvements to the QQ app:

Offloaded H.264 and MJPEG encode and decode tasks to GPU
Moved pre and post process tasks (when possible) to hardware
Used both CPU and GPU simultaneously
Reduced memory copies
Reduced processor high power states (sleep calls, WaitForSingleObject, and timers)
Changed MMX to Intel SSE4 instructions
Optimized reference frame flow

Figure 9: Pre optimized 640x480 versus optimized 1280x720

Conclusion

The performance of Tencent QQ was dramatically increased by using key features of the Intel Media SDK. QQ was transformed from an app that could deliver 480p resolution images at low frame rate over a DSL connection into an app that could deliver 720p resolution images at 30 fps over that same DSL connection and support 4-way conferencing.

After moving key functions into hardware using the Intel Media SDK, the power consumption of QQ was reduced to almost 50% of its initial value. Then, by optimizing processor power states, power usage was further reduced to about 35% of its initial value. This is a remarkable power savings that permits QQ users to run the optimized app for more than twice as long as the preoptimized app. While improving customer satisfaction, QQ became a more capable (and greener) app by integrating the Intel Media SDK.

If you are a media software developer, be sure to evaluate how the Intel Media SDK can help increase performance and decrease memory usage and power consumption of your app by providing an efficient data flow pipeline with improved video quality and user experience even with limited bandwidth. And don’t forget the Intel tools available to help find problem spots and bottlenecks.

Additional Resources:

Intel® Media SDK: https://software.intel.com/en-us/vcsource/tools/media-sdk-clients
Intel® Media SDK sample code: https://software.intel.com/en-us/articles/media-sdk-tutorial-tutorial-samples-index
Intel® Media SDK features: https://software.intel.com/en-us/articles/video-conferencing-features-of-intel-media-software-development-kit
Intel® Media SDK video conferencing sample: https://software.intel.com/en-us/vcsource/samples/video-conferencing-using-media-sdk
Copying Accelerated Video Decode Frame Buffers: https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
Intel® SSE4 instructions guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide
ooVoo* Video Conferencing case study: https://software.intel.com/en-us/articles/oovoo-intel-enabling-hd-video-conferencing
Windows Performance Analyzer: http://go.microsoft.com/fwlink/?LinkID=293840.
QQ Official Site: http://www.imqq.com

About the Authors

Colleen Culbertson is an Application Engineer in Intel’s Developer Relation Division Scale Enabling in Oregon. She has worked for Intel for more than 15 years. She works with various teams and customers to enable developers to optimize their code.

Youwei Wang is an Application Engineer in Intel’s Developer Relation Division Client Enabling in Shanghai. Youwei has worked at Intel for more than 10 years. He works with ISVs on performance and power optimization of applications.

Testing Configuration

Some performance results were provided by Tencent. Intel performance results were obtained on a Lenovo* Yoga 2 Pro 2-in-1 platform with a 4^th generation Intel® Core^TMmobile processor and Intel® HD Graphics 4400.

Intel, the Intel logo, Intel Atom, Intel Celeron, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.

Media SDK per Windows*

Intel® Advanced Vector Extensions

Intel® Streaming SIMD Extensions

URL

Area tema:

IDZone

↧

Кодирование видео с использованием встроенного видео Intel HD Graphics

September 25, 2014, 7:52 am

Latest and popular articles on Intel Technologies

≫ Next: Video Composition using Intel® Media SDK

≪ Previous: Full Pipeline Optimization for Immersive Video in Tencent QQ*

В этой статье речь пойдет о кодировании видео с использованием видеокодека h264 на GPU, интегрированном в современные процессоры Intel и о том опыте, который приобрела наша компания Inventos в процессе создания и оптимизации медиа сервера Streambuilderдля обработки потокового видео.

Введение

Задача, поставленная перед нами заключалось в том, что бы в наш медиа сервер Streambuilder добавить поддержку технологии Intel Quick Sync. Наш медиасервер представляет собой эдакий «комбайн» на все случаи жизни и умеет следующее:

Кодирование/ресемплинг аудио/видео из почти всех популярных потоковых форматов в HLS и RTMP;
Поддержка съема сигнала с SDI, DVB;
Резервированиеи масштабирование серверов кодирования;
Описание конфигурации кодировщика на встроенном языке;
Различные модули для нормализации звука, усиления, деинтерлейсинга видео и т.д

Этот продукт является бэкендом медиа платформы Webcaster.pro. Решение написано на C++ с использованием библиотек libavcodec, в которые имплементированы многие известные кодеки, такие как h264, mpeg4 и т.д. Такое решение позволяет достаточно быстро разворачивать инфраструктуру доставки контента любой сложности и является гибким в плане конфигурирования.

При помощи конфигурационных файлов мы можем описывать графы, удовлетворяющие почти любым потребностям. В сильно упрощенном виде этот граф будет выглядеть следующим образом:

Несмотря на то, что код библиотек libavcodec хорошо оптимизирован, он рассчитан на работу на CPU, содержащим конечное число исполнительных устройств, таком, как x86 based. Увеличение количества ядер только отчасти решает проблему, так как обходится это недешево, да и ядра всегда есть чем загрузить, помимо кодирования видео. И логичным шагом была попытка использовать возможности графических ускорителей для решения данной задачи.

Решение от Intel

В качестве эксперимента мы решили попробовать ускорить кодирование видео при помощи графических сопроцессоров Intel HD Graphics, встроенных в современные процессоры Intel. Intel любезно предоставляет свой Media SDK для кодирования, декодирования, ресемплинга и других алгоритмов видео обработки. Данный SDK, к нашей большой радости, теперь доступен и для Linux, что крайне важно для промышленного использования. Именно благодаря появлению поддержки Linux мы заинтересовались этим решением. Коллег из Intel тоже заинтересовали результаты практического использования данного SDK в условиях промышленного использования. При этом, должен отметить, на протяжении всего периода разработки сотрудники Intel нам очень сильно помогали, отвечали на вопросы (которых было много поначалу) и давали действительно ценные советы.

В комплекте вместе с Media SDK идет хорошая документация и примеры почти на все случаи жизни. Процесс интеграции Intel Media SDK сильно упростило наличие примеров, без них он, надо сказать, покажется не самым тривиальным. Суть интеграции заключалась в замены наиболее требовательных к аппаратной части софтверных модулей кодирования/декодирования/ресемплинга на соответствующие модули, использующие аппаратные возможности Intel HD Graphics

Тестирование

Для тестирования нашего ПО был выбран 1U сервер в следующей конфигурации:

M/B	Supermicro X10SLH-F
Процессор	Intel® Xeon® CPU E3-1225 v3 @ 3.20GHz
Память	16 Гб

Версия OS на сервере Ubuntu 12.04.4 LTS 3.8.0-23-generic. Главным условием работы Quick Sync является наличие строки С226 в спецификации чипсета. Только чипы с такой маркировкой умеют работать с аппаратным кодированием видео. Кроме этого, желательно отсутствие встроенного видео на материнской плате, в противном случае могут возникнуть проблемы с определением, а значит, и использованием Intel GPU средствами Intel Media SDK.

Материнская плата, описанная выше, имеет интегрированную графику (встроенное видео) на борту, и нам пришлось повозиться для того, что бы заставить работать SDK на этом железе. При установке SDK на новый сервер скрипт установки Media SDK не увидел ID устройства. При этом, нам не удалось включить встроенную в процессор графику из BIOS. Поиск решения привел к необходимости обновить BIOS. После этого в BIOS появился заветный пункт. Однако, пришлось еще отключить встроенное на материнской плате видео путем переключения перемычки. В такой конфигурации не работает IPMI и выход на монитор, но мы работаем с сервером через SSH и это не так критично.

Кроме этого, есть некоторые ограничения на используемое в системе ядро Linux. Для серверов это Ubuntu 12.04 LTS с ядрами 3.2.0-41 и 3.8.0-23 или SUSE Linux Enterprise Server 11 с ядром SP3 3.0.76-11.

Процессор: E3-1225 V3, 16 Гб ОЗУ, Intel® HD Graphics P4600

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 мин. 42 с	1 мин. 19 с	2 мин.19 с	1 мин. 40 с
cpu (max)	750%	55%	125%	50%
mem (max)	3,3%	4,6%	0.5%	0.4%
PSNR	48,107	46,68
Average PSNR	51,204	49,52
SSIM	0,99934	0,9956
MSE	1,623	2,969

Процессор: I7-3770, 3 Гб ОЗУ, Intel® HD Graphics 4000

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 мин. 48 с	1 мин. 24 с	2 мин. 31 с	1 мин. 23 с
cpu (max)	750%	19%	150%	45%
mem (max)	18%	20%	2.8%	2.3%
PSNR	48,107	46,495
Average PSNR	51,204	49,27
SSIM	0,99934	0,991
MSE	1,623	3,036

Процессор E3-1285 v3, 16 Гб, Intel® HD Graphics P4700

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 мин. 1 с	1 мин. 11 с	2 мин. 11 с	1 мин. 34 с
cpu (max)	750%	55%	130%	55%
mem (max)	3,3%	4,6%	0.5%	0,4%
PSNR	48,107	46,68
Average PSNR	51,204	49,52
SSIM	0,99934	0,9956
MSE	1,623	2,969

Анализ результатов

Метрики для streambuilder соответствуют полученным метрикам для тестовой утилиты sample_full_transcode и я их опустил.

Из этих таблиц видно, что серверные процессоры с Intel® HD Graphics P4700/P4600 в данном эксперименте работают быстрее и дают лучшее качество кодирования, чем I7-3770, Intel® HD Graphics 4000. Однако этот тезис не всегда верен, так как Intel совершенствует качество кодирования с каждой новой версией чипа и SDK и скорость на новых чипах может быть меньше. При этом нагрузка на CPU у первых немного больше. С чем это связано, пока непонятно.

Кроме этого, оптимизация работы с памятью дала прирост примерно в 2 раза в плане производительности.

Качество кодирования на Intel® HD Graphics P4700 получилось таким же, как и на Intel® HD Graphics P4600, но E3-1285 v3 работает быстрее примерно на 14% при той же загрузке ресурсов. Кроме этого, E3-1285 v3 быстрее E3-1225 V3 в кодировании при помощи ffmpeg примерно на 10%.

Сервер с установленным streambuilder с поддержкой Quick Sync позволяет кодировать один источник в 12 качеств Full HD (1080p), 24 качества HD (720p) и 46 качеств SD (480p) с нарезкой в HLS. Если это съем «сырого» сигнала с SDI, то число одновременно кодируемых качеств немного больше.

Поэкспериментировать со streambuilder (пока только libavcodec based версия) можно скачавc сайта продукта: http://streambuilder.pro. С ним в комплекте идет стандартный конфиг, позволяющий записывать в формат HLS любой источник.

Итоги

Технология Intel Quick Sync позволяет собрать сравнительно недорогой производительный сервер для кодирования видео с приемлемым качеством. В процессе внедрения этой технологии мы столкнулись с некоторыми техническими проблемами, связанными с наличием интегрированного в материнскую плату видео, которые, впрочем вполне решаемы. Напомним, главное при выборе железа для этих целей — это чип со спецификацией С226 и материнская плата без интегрированного видео, так как с ним может не работать IPMI и VGA выход).

Плюсы такого решения, на мой взгляд, это то, что почти не задействован CPU, а также небольшое потребление памяти. При этом, свободные ресурсы можно использовать для других задач или для дополнительногокодирования средствами CPU.

Intel Media SDK

Intel® Quick Sync Video

Media SDK per Windows*

Server

URL

Miglioramento delle prestazioni

Area tema:

IDZone

Server

↧

Video Composition using Intel® Media SDK

September 28, 2014, 3:20 pm

Latest and popular articles on Intel Technologies

≫ Next: Media: Video Transcoding Sample

≪ Previous: Кодирование видео с использованием встроенного видео Intel HD Graphics

Legal Disclaimer

Intel® Media SDK is a framework for developing media applications, by providing APIs for ease of development and optimizing them for underlying hardware for best performance. The SDK provides optimized versions for basic, building-block algorithms of media domain; and articles such as these will provide insights into development of use-case scenarios using the SDK.

In this article, we are going to see how to achieve Video Composition using the SDK. Video Composition is a very useful feature for many video applications such as video wall, advertisements and any application that wants to smartly display multiple streams.

For Linux platforms, Media SDK provides hardware implementation of the composition feature,(as of SDK version 1.9, this feature is not available on Windows OS), which can composite up-to 16 different video streams. The SDK provides structures that control the specifics of composition, such as the resolution, size, source co-ordinates on the input stream, placement co-ordinates on the destination surface, for each individual stream. We will explain each of these below using the example below:

Example use-case: Compositing two video streams, one of resolution 352x288 and other of resolution 176x144. The destination surface is of resolution 352x288. We want to show the smaller resolution stream as an inset on the first input stream.

Step 1: Input Parameters: Parameter file (.par) with per-input stream information

We need to specify each of the input stream that is being composited, as well as its parameters such as resolution, crop dimensions, placement of the destination image, and other secondary parameters such as alpha factor. It can be tedious to specify these parameters individually for each stream. Thus, Media SDK allows specifying these in a parameter file and passing that file as input to the application. For our use-case defined above, here is the parameter file:

stream=/path/to/stream/in_352_288.yuv
width=352
height=288
cropx=0
cropy=0
cropw=352
croph=288
dstx=0
dsty=0
dstw=352
dsth=288
fourcc=nv12

stream=/path/to/stream/in_176_144.yuv
width=176
height=144
cropx=0
cropy=0
cropw=88
croph=72
dstx=0
dsty=0
dstw=88
dsth=72
fourcc=yuv

Step 2: Set-up the video parameters in the application

In this step, we populate the video parameters for both input and output. For input VPP parameters, the details such as resolution and crop dimensions should correspond to the largest input stream. Otherwise, the process of filling these parameters is similar to the explanation in this section of the article.

/******
Initialize VPP parameters
For simplicity, we have filled these parameters for the streams used here.
The developer is encouraged to generalize the mfxVideoParams filling using either command-line options or par file usage
******/
    mfxVideoParam VPPParams;
    memset(&VPPParams, 0, sizeof(VPPParams));
    // Input data
    VPPParams.vpp.In.FourCC = MFX_FOURCC_NV12;
    VPPParams.vpp.In.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    VPPParams.vpp.In.CropX = 0;
    VPPParams.vpp.In.CropY = 0;
    VPPParams.vpp.In.CropW = inputWidth;
    VPPParams.vpp.In.CropH = inputHeight;
    VPPParams.vpp.In.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
    VPPParams.vpp.In.FrameRateExtN = 30;
    VPPParams.vpp.In.FrameRateExtD = 1;
      // width must be a multiple of 16
      // height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture
    VPPParams.vpp.In.Width = MSDK_ALIGN16(inputWidth);
    VPPParams.vpp.In.Height =
        (MFX_PICSTRUCT_PROGRESSIVE == VPPParams.vpp.In.PicStruct) ?
        MSDK_ALIGN16(inputHeight) :
        MSDK_ALIGN32(inputHeight);

    // Output data
    VPPParams.vpp.Out.FourCC = MFX_FOURCC_NV12;
    VPPParams.vpp.Out.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    VPPParams.vpp.Out.CropX = 0;
    VPPParams.vpp.Out.CropY = 0;
    VPPParams.vpp.Out.CropW = inputWidth;
    VPPParams.vpp.Out.CropH = inputHeight;
    VPPParams.vpp.Out.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
    VPPParams.vpp.Out.FrameRateExtN = 30;
    VPPParams.vpp.Out.FrameRateExtD = 1;
    // width must be a multiple of 16
    // height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture
    VPPParams.vpp.Out.Width = MSDK_ALIGN16(VPPParams.vpp.Out.CropW);
    VPPParams.vpp.Out.Height =
        (MFX_PICSTRUCT_PROGRESSIVE == VPPParams.vpp.Out.PicStruct) ?
        MSDK_ALIGN16(VPPParams.vpp.Out.CropH) :
        MSDK_ALIGN32(VPPParams.vpp.Out.CropH);

    // Video memory surfaces are used to storing the raw frames. Use with HW acceleration for better performance
    VPPParams.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY;

Step 3: Populate per-stream mfxFrameInfo with details from parameter file

In this step, we will specify the parameters specific to each input file such as resolution and crop dimensions. During the VPP processing loop when each stream is loaded, the surface info parameters are set to these parameters.

/*************************************************************************************************
COMPOSITION-SPECIFIC BEGINS: Setting Phase

How we are compositing?
Let us crop the second stream to W/2,H/2 size starting at (0,0) co-ordinate. This cropped stream will be composited onto the first stream, which will be used at its original resolution. You can also choose where you would like the cropped second stream to go on the output surface - let's say we want it at (0,0) co-ordinates
NOTE: For clean implementation, we recommend these values be read in from the parameter file.
*************************************************************************************************/

    mfxU16 W1 = 352, H1 = 288;
    mfxU16 Cx1 = 0, Cy1 = 0, Cw1 = W1, Ch1 = H1;

    mfxU16 W2 = 176, H2 = 144;
    mfxU16 Cx2 = 0, Cy2 = 0, Cw2 = W2 >> 1, Ch2 = H2 >> 1;

    /** Fill frame params in mFrameInfo structures with the above parameters **/
    for (mfxU16 i = 0; i < NUM_STREAMS; i++){
        memcpy(&inputStreams[i],&(VPPParams.vpp.In), sizeof(mfxFrameInfo));
        inputStreams[i].Width = i == 0 ? W1 : W2;
        inputStreams[i].Height = i == 0 ? H1 : H2;
        inputStreams[i].CropX = i == 0 ? Cx1 : Cx2;
        inputStreams[i].CropY = i == 0 ? Cy1 : Cy2;
        inputStreams[i].CropW = i == 0 ? Cw1 : Cw2;
        inputStreams[i].CropH = i == 0 ? Ch1 : Ch2;
    }

Step 4: Initialize extended buffer for Composition

As explained in this section, to perform auxiliary functions such as denoising, stabilization or composition, extended buffers are used by VPP. These extended buffers are passed to the VPP parameters (as shown in Step 5).

// Initialize extended buffer for Composition
    mfxExtVPPComposite composite;
    memset(&composite, 0, sizeof(composite));
    composite.Header.BufferId = MFX_EXTBUFF_VPP_COMPOSITE;
    composite.Header.BufferSz = sizeof(mfxExtVPPComposite);
    composite.NumInputStream = 2;
    composite.Y = 10;
    composite.U = 80;
    composite.V = 80;
    composite.InputStream = (mfxVPPCompInputStream*) new mfxVPPCompInputStream * [2];

    composite.InputStream[0].DstX = (mfxU32)0;
    composite.InputStream[0].DstY = (mfxU32)0;
    composite.InputStream[0].DstW = (mfxU32)W1;
    composite.InputStream[0].DstH = (mfxU32)H1;

    composite.InputStream[1].DstX = (mfxU32)0;		//Co-ordinates for where the second stream should go on the output surface
    composite.InputStream[1].DstY = (mfxU32)0;
    composite.InputStream[1].DstW = (mfxU32)Cw2;
    composite.InputStream[1].DstH = (mfxU32)Ch2;

    mfxExtBuffer* ExtBuffer[1];
    ExtBuffer[0] = (mfxExtBuffer*) &composite;
    VPPParams.NumExtParam = 1;
    VPPParams.ExtParam = (mfxExtBuffer**) &ExtBuffer[0];

Step 5: VPP Processing loop - Read & process each input stream

    // Stage 1: Main processing loop
    //
    while (MFX_ERR_NONE <= sts || MFX_ERR_MORE_DATA == sts) {
        nSurfIdxIn = GetFreeSurfaceIndex(pVPPSurfacesIn, nVPPSurfNumIn);        // Find free input frame surface
        MSDK_CHECK_ERROR(MFX_ERR_NOT_FOUND, nSurfIdxIn, MFX_ERR_MEMORY_ALLOC);

        // Surface locking required when read/write video surfaces
        sts = mfxAllocator.Lock(mfxAllocator.pthis, pVPPSurfacesIn[nSurfIdxIn]->Data.MemId, &(pVPPSurfacesIn[nSurfIdxIn]->Data));
        MSDK_BREAK_ON_ERROR(sts);

        /******************************************************************************************************************
        COMPOSITION-SPECIFIC CODE BEGINS:
        Loading data from each of the input streams, and
        Setting the surface parameters to the Crop, Width, Height, values of the input stream being loaded
        ******************************************************************************************************************/
        streamNum %= NUM_STREAMS;
        memcpy(&(pVPPSurfacesIn[nSurfIdxIn]->Info), &(inputStreams[streamNum]), sizeof(mfxFrameInfo));
        sts = LoadRawFrame_YV12toNV12(pVPPSurfacesIn[nSurfIdxIn], fSource[streamNum], &inputStreams[streamNum]);        // Load frame from file into surface
        streamNum++;
        MSDK_BREAK_ON_ERROR(sts);
        /******************************************************************************************************************
        COMPOSITION-SPECIFIC CODE ENDS:
        ******************************************************************************************************************/

        sts = mfxAllocator.Unlock(mfxAllocator.pthis, pVPPSurfacesIn[nSurfIdxIn]->Data.MemId, &(pVPPSurfacesIn[nSurfIdxIn]->Data));
        MSDK_BREAK_ON_ERROR(sts);

        nSurfIdxOut = GetFreeSurfaceIndex(pVPPSurfacesOut, nVPPSurfNumOut);     // Find free output frame surface
        MSDK_CHECK_ERROR(MFX_ERR_NOT_FOUND, nSurfIdxOut, MFX_ERR_MEMORY_ALLOC);

        for (;;) {
            // Process a frame asychronously (returns immediately)
            sts = mfxVPP.RunFrameVPPAsync(pVPPSurfacesIn[nSurfIdxIn], pVPPSurfacesOut[nSurfIdxOut], NULL, &syncp);
            if (MFX_WRN_DEVICE_BUSY == sts) {
                MSDK_SLEEP(1);  // Wait if device is busy, then repeat the same call
            } else
                break;
        }

        if (MFX_ERR_MORE_DATA == sts)   // Fetch more input surfaces for VPP
            continue;

        // MFX_ERR_MORE_SURFACE means output is ready but need more surface (example: Frame Rate Conversion 30->60)
        // * Not handled in this example!

        MSDK_BREAK_ON_ERROR(sts);

        sts = session.SyncOperation(syncp, 60000);      // Synchronize. Wait until frame processing is ready
        MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

        ++nFrame;
        // Surface locking required when read/write video surfaces
        sts = mfxAllocator.Lock(mfxAllocator.pthis, pVPPSurfacesOut[nSurfIdxOut]->Data.MemId, &(pVPPSurfacesOut[nSurfIdxOut]->Data));
        MSDK_BREAK_ON_ERROR(sts);

        sts = WriteRawFrame(pVPPSurfacesOut[nSurfIdxOut], fSink);
        MSDK_BREAK_ON_ERROR(sts);

        sts = mfxAllocator.Unlock(mfxAllocator.pthis, pVPPSurfacesOut[nSurfIdxOut]->Data.MemId, &(pVPPSurfacesOut[nSurfIdxOut]->Data));
        MSDK_BREAK_ON_ERROR(sts);

        printf("Frame number: %d\n", nFrame);
        fflush(stdout);
    }

With this, we have come to the conclusion of Composition using Media SDK. For the above example, shown below is the output composited stream thumbnail. You can easily extend the example shown above to process multiple streams. You can find documentation on Composition in the mediasdk-man.pdf document.

Sviluppatori

Media SDK per Windows*

Intel® Media Server Studio Essentials Edition

Intel® Media Server Studio Professional Edition

Istruzione

URL

Esempio di codice

Area tema:

IDZone

↧

Media: Video Transcoding Sample

October 7, 2014, 12:23 pm

Latest and popular articles on Intel Technologies

≫ Next: Video encoding using the integrated Intel HD Graphics

≪ Previous: Video Composition using Intel® Media SDK

Native console application sample which performs transcoding of elementary video stream from one compressed format to another. Includes the following features:

multiple video streams transcoding
video resizing, de-interlacing
video rotation via User Plug-in Sample
video rotation via User Plug-in Sample using Intel® OpenCL™

Download the sample here

Intel INDE

media client

Intel INDE media client

openCL

windows

Sviluppatori

Microsoft Windows* (XP, Vista, 7)

Intel® Integrated Native Developer Experience (INDE)

Media SDK per Windows*

OpenCL*

Elaborazione multimediale

Desktop Microsoft Windows* 8

Area tema:

IDZone

↧

Video encoding using the integrated Intel HD Graphics

October 13, 2014, 4:46 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® INDE Media SDK tools - System Analyzer & Tracer

≪ Previous: Media: Video Transcoding Sample

In this article we’d like to write about H.264 video processing on Intel GPUs on Linux and the experience our company, Inventos, got in the process of enhancing StreamBuilder— our streaming media server.

Introduction

When Intel Server SDK Beta had been released for Linux we were very keen on implementing Intel Quick Sync Video technology into StreamBuilder — our versatile media server software that works as a backend for Webcaster.pro. At that moment StreamBuilder was able to:

capture input streams from SDI, IP multicasts, RTMP
transcode and resample virtually any audio/video streams to H264 HLS or RTMP
support distributed and fault-tolerant deployment scheme where ingesting, encoding and streaming are performed on independent and redundant nodes
apply filters (for audio normalization \ amplification, video deinterlace, crop, resize, etc.)
have flexible configuration (with own DSL) which allows to build pipelines (and even trees) for consequent media processing using mentioned filters etc.

StreamBuilder is based on libavcodec and despite that it’s already optimized well enough, it was designed to work on x86 CPUs. Increasing number of CPU cores speeds up encoding almost linearly, but it’s expensive and there are always tasks to do on CPU besides video encoding. Using GPU for encoding could make processing faster, cheaper and with a higher channel/rack unit density value.

Intel solution

So it was set: rewrite a major part of StreamBuilder core, implement Intel SDK for Servers to get a significant performance boost. Our goal was to encode at least 4 Full HD streams on a “budget price” hardware. Slightly anticipating, we’d say that the goal was outperformed.

Colleagues from Intel were interested in a “reallife” use case of Media SDK for Linux Servers too. They did a great job, helping us during the development and implementation process, answering our questions, providing code samples and making valuable pieces of advice.

Media SDK for Server comes with documentation and examples, which cover almost all possible use cases. It helped us a lot and simplified implementation greatly. As a matter of fact, implementation in our case came down to replacing decoding/ encoding/resampling modules to Intel Quick Sync-enabled modules that use Intel HD Graphics abilities.

Staging hardware and software

We used 1RU (Rack Unit) server with following specs:

Motherboard	Supermicro X10SLH-F
CPUs	#1 Intel® Xeon® CPU E3-1225 v3, Intel® HD Graphics 3000 #2 Intel® Xeon® CPU I7-3770, Intel® HD Graphics 4000 #3 Intel® Xeon® CPU E3-1285 v3, Intel® HD Graphics P4700
RAM	16 GB
OS	Ubuntu 12.04.4 LTS 3.8.0-23-generic

Motherboard chipset must be C226 PCH, because only those server chips are able to work with hardware encoding (for the moment of writing that article). Also it’s highly recommended to have motherboard without built-in GPU otherwise there could be issues with GPU identification and working.

Motherboard that we used had built-in GPU and that caused us a lot of headache to make things work. Intel Media SDK didn’t recognize device ID at first, we couldn't enable Quick Sync Video. After BIOS update the required setting appeared in BIOS, but we still had to manually turn off motherboard’s GPU with a on-board jumper. That configuration blocks IPMI and video output, but we are accessing server via SSH, so that wasn’t a big issue.

Note that here are some limitations on Linux kernel version: 3.2.0-41 or 3.8.0-23 for Ubuntu 12.04 and SP3 3.0.76-11 for SUSE Linux Enterprise Server.

Results

CPU: E3-1225 V3, 16 GB RAM, Intel® HD Graphics P4600

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 min 42 s	1 min 19 s	2 min 19 s	1 min 40 s
cpu (max)	750%	55%	125%	50%
mem (max)	3,3%	4,6%	0.5%	0.4%
PSNR	48,107	46,68
Average PSNR	51,204	49,52
SSIM	0,99934	0,9956
MSE	1,623	2,969

CPU: I7-3770, 3 GB RAM, Intel® HD Graphics 4000

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 min 48 s	1 min 24 s	2 min 31 s	1 min 23 s
cpu (max)	750%	19%	150%	45%
mem (max)	18%	20%	2.8%	2.3%
PSNR	48,107	46,495
Average PSNR	51,204	49,27
SSIM	0,99934	0,991
MSE	1,623	3,036

CPU: E3-1285 v3, 16 GB RAM, Intel® HD Graphics P4700

	ffmpeg	sample_full_transcode	streambuilder (no optimization)	streambuilder (optimization)
time	8 min 1 s	1 min 11 s	2 min 11 s	1 min 34 s
cpu (max)	750%	55%	130%	55%
mem (max)	3,3%	4,6%	0.5%	0,4%
PSNR	48,107	46,68
Average PSNR	51,204	49,52
SSIM	0,99934	0,9956
MSE	1,623	2,969

StreamBuilder’s signal quality metrics values (PSNR, SSIM, MSE) are equal to sample_full_transcode values so we didn’t show them in the table.

As you could see from tables above, server CPUs with Intel HD Graphics P4700/P4600 perform in our test better and give better output video quality than i7-3770, Intel HD Graphics 4000. But that statement is not always correct. Intel keeps improving video encoding with each microchip and SDK versions. Encoding speed could be slightly slower on the latest microchips, but CPU load would be lower too. We have no ideas, why it is that way.

Intel HD Graphics P4700 encoding quality was comparable to P4600, but it was 14% faster on E3-1285 v3 with the same resource consumption. Other notable thing is that E3-1285 v3 is faster than E3-1225 v3 by 10% on encoding with ffmpeg.

Server with installed StreamBuilder and enabled Quick Sync Video makes possible to encode one input stream to 12 Full HD (1080p) HLS streams or 24 HD HLS streams (720p) or 46 SD HLS streams (480p).

Also, optimized memory operations reduced RAM consumption by half.

Our initial goal was outperformed for three times! Now we could encode several times more streams on a hardware much cheaper that we used before.

You could try out StreamBuilder too, just email us at ask@streambuilder.pro, and we’ll send you a demo distributive.

Conclusion

Intel Media SDK for Servers allows building cost-effective, high-performance encoding/transcoding servers with high stream/rack unit density. Implementation wasn’t a walk in a park, we bumped into some difficulties linked with motherboard’s GPU, but they were solved eventually. As a reminder: main hardware requirements are C226 microchip and motherboard without built-in GPU.

Benefits of that solution: besides of a significant performance boost you get much lower CPU usage, low memory consumption — result in additional free resources that you could utilize for other tasks (even extra CPU encoding).

Video Streamming

Intel Media SDK

Intel® Quick Sync Video

Sviluppatori

Sviluppatori Intel AppUp®

Media SDK per Windows*

Intel® Media Server Studio Essentials Edition

Elaborazione multimediale

URL

Librerie

Area tema:

IDZone

↧

Intel® INDE Media SDK tools - System Analyzer & Tracer

October 13, 2014, 3:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Media SDK Tutorials for Client and Server

≪ Previous: Video encoding using the integrated Intel HD Graphics

Legal Disclaimer

In this article we will briefly cover the tools included in the Intel® Media Software development Kit (Media SDK). Intel® Media SDK is a framework used in development of Media Applications. It includes a software development library that exposes the media acceleration capabilities of Intel platforms for decoding,encoding and video prepossessing. The API library also covers wide range of optimizations and targets to help developers to integrate encoding and decoding into their applications.Media SDK is now a part of Integrated Native Developer Experience (INDE). The tools are:

1] MediaSDK Tracer (found in: “<install-folder>\tools\mediasdk_tracer\tracer.exe”) - 64-bit and 32-bit supported.

2] MediaSDK System Analyzer (found in: "<install-folder>\tools\mediasdk_sys_analyzer\sys_analyzer.exe") - 64-bit and 32-bit supported.

MediaSDK Tracer:

This tool will capture the basic call information from Media SDK API functions. It generate a full log of interaction between the application and the SDK library including per-frame processing.

Usage:

Press Start at any time to start logging and press Stop to stop logging. After logging is stopped, results are appended to the specified file. Press Open to view the output log file. Use Delete to delete previous log information prior to capturing new data, as to avoid appending the new data to the existing output file.

Tracer output log:

The output log is from example sample_encode where various parameters of API call functions are logged by the tracer. Also as it is an example from encode, hence functions start with "encode." same goes for decode as "decode." etc. In the end of the log the complete summary of all the unique surfaces used can be seen, if the application ran successfully.

For performance profiling, it is recommended to turn off per-frame parameter recording as it impacts performance.Please refer to the document (readme-mediasdk-tracer.rtf) to know the system requirements and limitations of the tool.

MediaSDK System Analyzer:

This tool utility analyzes the system and reports all Media SDK related capabilities, driver and components status. This tool can also be used to determine setup environment issues. This tool reports back installed graphics adapter, basic system information, installed Media SDK versions, installed DirectShow filters, Media foundation Transforms (MFT) and also tips for solutions in case either software or hardware implementations did not work.

Usage:

This tool starts reporting system status immediately, when complete user can exit the tool by pressing any key.

Example Output:

This tool also has few command line options available:

-skipPackage : Skip query for installed Media SDK packages.

-skipDShow : Skip query for installed DirectShow filters.

-skipMFT : Skip query for installed MFTs.

-skipWait : Do not wait for user key press on analysis completion.

Please refer to the document (mediasdk_release_notes.rtf) to know the system requirements and limitations of the tool. With this we conclude this article on the tools available with Media SDK. We will update this article when new features and functionality are added into the tools.

Sviluppatori

Media SDK per Windows*

Intel® Integrated Native Developer Experience (INDE)

URL

Area tema:

IDZone

↧

Media SDK Tutorials for Client and Server

October 13, 2014, 3:39 pm

Latest and popular articles on Intel Technologies

≫ Next: Using Intel® INDE 2015 to Develop OpenCL™ Applications

≪ Previous: Intel® INDE Media SDK tools - System Analyzer & Tracer

The Media Software Development Kit (Media SDK) Tutorials show you how to use theMedia SDK by walking you step-by-step through use case examples from simple to increasingly more complex usages.

The Tutorials are divided into few parts (sections):

1.	Introduces the Media SDK session concept via a very simple sample.
2-4.	Illustrates how to utilize the three core SDK components: Encode, Decode and VPP (video pre/post processing).
5.	Showcases transcode workloads, utilizing the components described in earlier sections.
6.	Showcases more advanced and compound usages of the SDK.

For simplicity and uniformity the Tutorials focus on the H.264 (AVC) video codec. Other codecs are supported by Intel® Media SDK and can be utilized in a similar way.

Additional information on the tutorials can be found at https://software.intel.com/en-us/articles/media-sdk-tutorial-tutorial-samples-index. The Media SDK is available for free through the Intel® INDE Starter Edition for client and mobile development, or the Intel® Media Server Studio for datacenter and embedded usages.

Download the Media SDK tutorial in the following available formats:

Quick installation instructions:

For Linux setup MFX_HOME environment variable:
```
export MFX_HOME=/opt/intel/mediasdk
```
For Windows setup INTELMEDIASDKROOT and build with Microsoft* VS2012.

Previous versions of the Tutorials package: