Mobile Phone QoS / QoE Monitor

Preamble

Though this document (authored as white paper a few years ago) describes the service as seen from mobile phones to server environments, the same principles hold true when connecting other clients (work stations, business networks)  to, say, hosting services.

Overview

While the operations center monitors all systems, plot performance, server and network load, storage utilization, database health, the overall availability and throughput to ensure premium service, there may be still situations where the end-user would see a completely different picture.  There may be other factors at play that could very well impact phone service and contribute to a poor consumer experience.

The lack of transparency to an end-to-end mobile data service could easily result (unknowingly) into a situation where users get upset, and even more so when customer support is not able to handle the calls in an expedient and satisfactory manner.  Mobile operators need insight to critical key performance indicators (KPI) to ensure continuous consumer satisfaction.

Service providers are interested to find out the where/when/how to learn of problems and ideally address the incidents before the customers start alerting the service hotlines and perhaps become upset when service personnel appears ignorant to problems, keep aside offering a solution!

Customers and operators alike can experience service becoming degraded due to traffic peaks and network loads, where availability and performance may drop, when increased demand would cause bottlenecks.  Users then get upset when tasks cannot be done, as timeouts and aborts hinder the phone use.

Mobile phone operators then feel the users’ pain when hotlines are bombarded, and even worse, when frustrated customers then turn to a competitor.  To repair the damage, the reputation and to rebuild consumer confidence, to win back customers can become quite an expensive ordeal.  Usually the cost ratio between preventive measures and restoring the service after a failure can easily reach a number somewhere between 1:10 (yes, one to ten) if not 1:20 or much more.

Device key performance indicator (KPI) tools can be designed to measure end-to-end email, web browsing, responsiveness of certain Apps, and more.  That will allow giving an inside view to the overall experience from the customer perspective.  Various phones can be assigned as monitors with the KPI tool implemented.

Every so often (15 minutes?) the collected KPI data is sent from the phone to a dedicated database web service to provide for statistics and trend reporting.

Following the spirit of the original KPI service tools will be provided to the mobile phone world to help track the end-to-end experience of the mobile service in a selected area or around the globe at any time to ensure consistent and desired level of “Quality of Experience” (QoE).  To be proactive is key!

 

Scope

To monitor and record system response to a mobile phone and to analyze and help improve overall service, critical performance data (seen) on a mobile phone will be captured.  That could be simple log reporting, and it could be a more sophisticated App running selectable, pre-defined user profiles, simulating to some extend how end-users would use their mobile phone.  The so collected data will provide a view into the real world from the consumer perspective and help the mobile operator understand where and when quality would be compromised.

Apps may be distributed / loaded on phones set aside for that purpose and/or phones of (selected) Microsoft and Mobile Operators personnel.  Scripts will describe a user profile to benchmark (e.g., load a web site, send a message, receive an email, etc.).  The App would run independently without disturbing or interrupting the (human) user of the phone.

In more generic and smaller fashion customer service could probe end-to-end phone service from the consumer perspective by issuing and measuring I/O blips of test data blocks and/or capturing logs from a phone — or even a defined school of phones; etc..

A designated QoE server will collect the data from the connected phones and display statistics and alerts and is essential part of the operation center showing the service from the outside.  The data can be further filtered by geographic locations (cell towers), correlated to peak vs. quiet times, network or system load, and to transfer times probed internally and externally throughout the various services.

Goal is to get a sense of the client or phone users’ real-life experience, to allow troubleshooting, perhaps pin-pointing to device, the application, or the mobile service, to implement corrective actions before problems become widespread.  And to help network operations look outside their “ivory tower” and further improve the monitoring and sustain positive consumer experience.

 

Scenario

Phone / Client Status

  1. The mobile phone QoE App is continuously / on demand / during certain intervals connecting to various services, logging send requests + response receipts —simulating common user profiles.
  2. The end-user could use the tool to test the connection status, perhaps even trouble-shoot his/her phone
  3. Customer Service connects to a user’s phone, probes phone health, line quality and connection status.
  4. Server connects to (defined) groups of mobile phones and collects logs with overall phone and line benchmark data.

 

Server Backend

  1. QoE Server monitors “its” phones, requests/receives data and displays key performance data.
  2. Collected data is qualified and stored in a QoE database to allow spot-on analysis and trends.
  3. Data is correlated with service performance matrix to pin-point problem areas, and to challenge (internal) network operations display.

 

Specification

Key Performance Data

How:

Phone responsiveness is monitored:

  • Log files, test data I/O
  • QoE App / Scripts (Email, Web, Messaging, real-time vs. batch tasks, etc.)

Who:

Identify client or phone:

  • Client or Phone, OS, etc.
  • IMEI, ICC ID, etc.
  • User ID

Where:

Location parameters:

  • GPS coordinates
  • Cell tower connected
  • Provider (roaming)
  • signal strength / ASU

When:

Timestamps recorded:

  • Request sent out; start & finish
  • Confirmation receive time
  • first byte of message header (email) or web page (web)
  • Task completed & total bytes

What:

Data collected:

  • Device health (OS, Uptime, CPU %, Mem %, NW load, …)
  • Apps loaded, version number
  • Phone to Provider Network (to Partner Network to …?)
    … and back!

 

Device QoE Support

The phone collects status changes and other data along with time stamps in a log.  Customer Support can retrieve the log files, and can enquire overall phone health; if necessary test I/O can be initiated.

A special App can do much more — continuously and repeatedly — executing the tasks in specified scripts.  Tasks are actions a phone user would do at any given time, to place or answer a telephone call, sending and receiving emails, messages, browsing the web, etc.

Collected performance data will be then sent periodically or on request to the service.

 

Service QoE Tasks

Captured performance data (from the mobile phone) is collected, sanitized, and stored in a database.  From there reports can be prepared to help monitor overall network quality as experienced from the outside, i.e., here, the mobile phone user.

Furthermore, this “outside” data can help putting the internal monitoring of network operation with its many individual components into perspective.  Of course, communication between those components and their interrelationship are continuously probed, the health of each service and the servers with their processor and memory and network and disk utilization painstakingly watched.  All key service data is continuously gathered and reported, the overall network usage plotted so that performance in desired levels can always be assured— even or in particular during peak demand.

Nevertheless, from the network operation center to the mobile phone is a long way.  And not all entities responsible for the communication to the mobile phone are under the (direct) control of said center.  Are the data trunks to the mobile provider’s central office working within their parameters?  What about the wireless access points?  Or the network traffic “in between”?

Last but not least we cannot exclude problems with the phone.

 

Closing the Loop

[Analytics]

  • Integration to overall operation center monitoring and reporting
  • Correlation of “inside” and “outside” probing
  • Data comparable, scalable, representative

 

Mobile Phone Services Topology

[Visio chart]

Mobile Phone Services Topology

 

QoE Program Flow

QoE Scripts

  1. Define start/stop times and repeat cycles, timeouts
  2. List task to execute (e.g., email, web, messaging, streaming, down/upload, etc.)
  3. Contains “What–If” scenarios (e.g., cannot load website, insufficient resources, etc.)
  4. Send collected data periodically to …

QoE App

  1. Retrieves static environment data (“who”)
  2. Executes scripts as specified, launches tasks
  3. Measures Response times (“when”)
  4. Records dynamic data (“what” and “where”)
  5. Distributes to QoE Service

QoE Service

  1. Collects data from mobile phones / QoE App
  2. Qualifies, accumulates, analyzes, reports, plots trends
  3. Complements / challenges internal monitoring

QoE Requirements

  1. Service availability and responsiveness
  2. Real-time access 24/7, always on, no wait
  3. Fault-tolerance, resiliency, redundancy
  4. Data security and reliability

 

Basic Program Structure

[tbd]

 

Constraints

[tbd]

  • What does the given mobile phone (standard or debug) log contain.
  • Can customer support / network service initiate benchmark I/Os on a connected mobile phone.
  • It is assumed the App can retrieve all necessary health data, initiate the proposed tasks, gather key performance data and send all that (in compressed form) to the service.
  • Scripts can be readily distributed to the mobile phones to program and set the App to probe specific services.
  • Can the QoE App and/or the service setup their own (QoE) user IDs to allow sending emails, SMS/MMS, etc. to themselves; what about hooks into partner systems to send (or echo) IMs, tweets, and other messages or content.

 

Examples

Program Flow

Script Example

Prerequisites

 

Web Benchmark

Email Benchmark

SMS Benchmark

IM Benchmark

APP1 Benchmark

APPn Benchmark

Other Benchmarks

 

Appendices

Environment

Specific services can connect to (defined) groups of phones to collect more generic benchmark data – and/or troubleshooting an individual phone

App running on Windows Phone for sophisticated and continuous monitoring; flexible scripts allow easy change / adapt to resemble (representative) user profiles

If necessary, dedicated user accounts as defined for the App to perform variety of phone functions w/o impacting the environment or the user of the phone

Collected data is sent to dedicated server and database farm to analyse / plot / monitor / alert

The service provides appropriate support for the QoE tasks running on the mobile phone; it is feasible to implement the various features in phases.

 

QoE Scripts

Universal application, running series of scripts repeatedly to perform individual tasks

Telephone, voice + data + control

Send & Receive emails (text only vs. large attachments)

Messaging (IM, SMS, MMS), Social Network

Web site (simple vs. complex), file up/download

Real-time applications (e.g., navigation, online games, streaming, etc.)

 

Control Parameters

App running distributed script(s) as ”user profile”

Script #, Tasks

User/Phone ID

Location & Scheduled Activity

Peak-usage vs. sporadically

 

Script run settings

Think time

Sleep time

Timeout

What–If

 

Key Performance Data

What:

Script info, data type, bandwidth, threshold

Time stamps send request & response receipt (aborts, retries)

GPS location, signal strength, mobile connection / service

User ID & Phone ID, (overall) usage utilization and health

Where:

Transmit collected data to monitor service

Immediately or periodically or on demand or some other trigger

How:

End-to-end

Transaction logs

Collect / Measure / Scale

Multimedia or secure data

Real-time vs. batch


 Components

Backend

Wireless, LAN/WAN

Proxy, Filter, …

Database

3rd Party Connect

Services

Content

Phone & V/M, SMS

Web & Forms, down/upload, Email, IM

Streaming Audio/Video

FM Radio / TV

Games, XNA

Microphones, Cameras, Speakers

GPS, gyroscope, environment

Data security

Frontend

Device Hardware (processor, memory, interfaces, sensors, antennas)

Device OS

Installed / Running Apps

Wireless Connection

The Unforeseen

Now what?

Data out of sync / invalid;

Loss of connection;

Services failures;

App aborts; …

 

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *