08 Nov 2014 |
Research article |
Information and Communications Technologies
Maximizing quality of Multimedia Messaging Services
The Multimedia Messaging Service (MMS) allows users with heterogeneous terminals to exchange structured messages composed of text, audio, still images, and video . It is a great source of revenue for mobile operators. According to Portio Research, 207 billion MMS messages were sent in 2011, and this number is expected to rise to 276.8 billion by 2016. With this volume of messages, MMS would maintain its position as the second most successful non voice mobile service to date, behind the highly lucrative Short Message Service (SMS) [2, 3]. Informa sees even higher MMS volumes by 2016, predicting that 387.5 billion MMS messages will be sent, representing 10.6 % of global messaging revenues, at US$20.7 billion .
MMS technical specifications have been defined by the 3rd Generation Partnership Project (3GPP) and the Open Mobile Alliance (OMA) (and adapted in 3GPP2). The overall architecture of the MMS implementation in cellular networks is shown in Fig. 1 [5, 6].
The Multimedia Message Service Center (MMSC), also called MMS relay/server or MMS proxy-relay, is responsible for storing messages received from a user and relaying them to the intended recipient. As shown in Fig. 1, the MMSC provides access to the cellular network via the MM1 interface (or reference point) . This is the interface used when two users having the same operator exchange multimedia messages. If the operators are different, then MM4 must be used between the users’ respective MMS relays/servers. The MMSC provides access for value-added service providers (premium content emails, Web) via MM7. Access to external servers, such as email and fax servers, is performed through the MM3 reference point. MM11 allows the MMSC to access an external transcoding server to perform message adaptation, and is specified by the OMA Standard Transcoding Interface (STI) v1.0 [5, 7]. To ensure interoperability, these interfaces are defined to operate over WAP/WSP and HTTP.
MMS traffic growth
As MMS usage increases and standards evolve, providers are being pressed to handle ever greater numbers of messages containing richer media content. This daunting task cannot be reduced to merely passing along messages, as server-side adaptation of messages, performed by the MMSC or an external transcoding server connected to it, is also necessary, in order to ensure interoperability among users . The very high volume of messages to be handled by service providers will call for the most efficient adaptation algorithms and strategies to cope with the demand, the growth and evolution of standards, and the solution of scaling problems.
Receiving terminal characterization
For MMS applications, a receiving terminal is characterized by its capabilities— or perhaps more accurately by its limitations—as defined by profiles. A profile determines the terminal’s constraints, such as the maximum multimedia message size in bytes, the media types that the terminal can interpret, and the specific constraints of individual media types, such as maximum image resolution. A device supporting the “Content Rich” profile will support the JPEG and GIF image formats with resolutions up to 1600×1200 pixels and a maximum message size of 600KB . A device supporting the “Image Basic” profile will only support these image formats at resolutions up to 160×120 pixels and a maximum message size of 30 KB. We can see in Fig. 2 that the delivered message, which belonging to the Content Rich class was adapted to meet the lower resolution and memory capabilities of the receiving “Image Basic” class of terminal.
This figure shows a typical message flow for multimedia message delivery between two mobile terminals having the same operator .
Multimedia and Server-side adaptation
Server-side adaptation will ensure that not only each individual multimedia attachment is compatible with the receiving terminal but also that the message as a whole can be sent and correctly interpreted. Every attachment must be characterized and transformed, if need be, to satisfy the receiving terminal’s constraints, whether by adjusting its format or its resolution. In the absence of server-side adaptation, a message exceeding the terminal’s capabilities (either by message size or media type) can result in terminal-specific behavior that ranges from merely incomplete messages to terminal crash. If the server is capable of determining the capabilities of the terminal, but incapable or unwilling to perform adaptation, it can apply an alternative strategy, such as sending a text-only SMS along with the location of the original message, for the user to download or browse by other means . While this ensures the delivery of content, it does not provide the user with a satisfactory experience, which means that server-side adaptation is preferable from the user’s point of view.
We have shown in previous work that strategies involving both adaptation of JPEG compression parameters and scaling produce significantly better results than using either method alone .
Adapting an image, even in JPEG format, against maximum file size and resolution constraints, while maximizing perceived quality in a computationally efficient manner remains a challenge, as there are no established methods for estimating the resulting file size and quality of an image subject to changes in compression parameters and resolution. To this end, we have proposed predictors and systems in previous work designed to adapt images and messages [12, 13, 14]. These predictors have been exploited in [15, 16] to develop a dynamic content adaptation framework applied to collaborative mobile presentations (e.g. OpenOffice Impress presentations). In this paper, we extend our previous work  where we proposed a general framework based on dynamic programming for the adaptation of image-only multipart messages, given receiving terminal constraints explicitly maximizing perceived quality of the whole message using a new algorithm, step dynamic programming.
Rather than performing a transcoding for every combination of transcoding parameters examined, we will resort to fast predictors that, given a (superficial) characterization of an image m (such as original le size, quality factor, and resolution) and transcoding parameters t, will predict the resulting le size and quality of T (m; t), the transcoded image m to which were applied transcoding parameters t.
We have presented such predictors in previous works [12, 14], and in this study we use the file size and quality predictor presented in , which will be denoted JQSP (JPEG Quality and Size Predictor). To assess the proposed methods’ resilience to prediction error, we will, in addition to JQSP, use oracular predictors, predictors with known characteristics, discussed in the research paper.
For the purposes of estimating the perceived visual quality of images subjected to transformation, we use the widely known Structural Similarity Index (SSIM) proposed by Wang et al. . We chose the SSIM because of its popularity in the scientific community and its high level of accuracy. A statistical evaluation of recent FR image quality assessment algorithms on various image databases has been performed in . The results show that the SSIM is very accurate for various distortions.
The second of our proposed methods, which we refer to as “step dynamic programming”, mitigates error propagation by proceeding by iterative refinement of the solution, again based on dynamic programming. Step dynamic programming first optimizes the message globally and determines the predicted optimal transcoding parameter series, but transcodes only the first image (in attachment order). After the first image has been transcoded, its actual le size is observed and the budget for the remaining images is adjusted to take into account the transcoded image and the corresponding prediction error. The remaining images are optimized jointly, and, again, only the rest of the remaining images is transcoded, its transcoded size observed, and the budget adjusted, and so on, until all the images have been transcoded.
For our experiments, we created four groups of 1000 MMS, with two (being the minimum to qualify as a “multipart” message) to five attached images. The images with resolutions between 320×200 and 3000×2000 were uniformly randomly chosen from a database of 370,000 images obtained by crawling the Web in the fall of 2010 . The profile chosen to test adaptation in our experiments was “Image Rich” (supporting images with resolutions up to 640×480 and a maximal message size of 100 KB). Forcing messages into the “Image Rich” profile from the original MMS (with an average message size of 284 KB, 563 KB, 790 KB, 1.2 MB, and 1.4 MB, for 1, 2, 3, 4, and 5 attachments respectively) demonstrates that the various algorithms tested were stressed with adaptation ratios of up to ≈14:1. The overall test architecture is shown in Fig. 3.
In this work, we have shown that the two proposed predictor-based dynamic programming multipart message adaptation algorithms maximize quality explicitly (as a proxy for user experience), and also make better use of message capacity (the portion of the allowable message size used) than the comparative algorithms inspired by products currently on the market and described in the literature. We have also shown that, while predictor accuracy is important, our proposed algorithms degrade very gracefully with increases in predictor error, making them robust to prediction errors. Furthermore, our proposed algorithms are significantly faster and better than earlier solutions, and would be of great benefit to the Multimedia Messaging Service.
Pigeon, Steven et Coulombe, Stéphane. 2014. « Quality-aware predictor-based adaptation of still images for the multimedia messaging service ». Multimedia Tools and Applications, vol. 72, nº 2. p. 1841-65.
Stéphane Coulombe is a Professor at the Software and IT Engineering Department, at ÉTS where he currently carries out research and development on video processing and systems, media adaptation, and transcoding.