Predicting Human Movement based on Telecom s Handoff in Mobile Networks
Yi-Bing Lin, Fellow, IEEE, Chien-Chun Huang-Fu
Department of Computer Science
National Chiao Tung University, Hsinchu, Taiwan
{liny,jjfu}@csie.nctu.edu.tw
and
Nabil Alrajeh
Biomedical Technology Department
King Saud University, Saudi Arabia
*****@***.***.**
Abstract
Investigating human movement behavior is important for studying issues such as prediction of
vehicle traffic and spread of contagious diseases. Since mobile telecom network can effi-
ciently monitor the movement of mobile users, the telecom s mobility management is an ideal
mechanism for studying human movement issues. The problem can be abstracted as follows:
What is the probability that a person at location A will move to location B after T hours. The
answer cannot be directly obtained because commercial telecom networks do not exactly trace
the movement history of every mobile user. In this paper, we show how to use the standard
outputs (handover rates, call arrival rates, call holding time and call traffic) measured in a
mobile telecom network to derive the answer for this problem.
Index Terms: Human movement, Little s Law, mobile computing, mobility management.
1. Introduction
Prediction of human movement behavior is important for studying issues such as prediction of
vehicle traffic and spread of contagious diseases, which requires tracing the movement of
people. In [1], a wireless sensor network technology is utilized to obtain high-resolution data
1
of close proximity interactions which cause the spread of most contagious diseases. However,
this method needs extra effort to distribute the senor network motes, and is constrained in a
small area (e.g., a campus).
An alternative to investigate user movement can be achieved through mobile telecom service.
In a mobile telecom network (e.g., UMTS, cdma2000, and GSM [6]), the users are tracked by
the mobility management mechanism so that the network can connect incoming calls to the
users through Base Stations (BSs) [2]. For this purpose, BSs in the service area are grouped
into location areas (LAs). The users are tracked at the accuracy of a LA coverage, and when
an incoming call arrives, all BSs in that LA will page the user. Since this mobility mana ge-
ment mechanism provides the position information of a user at the accuracy of one LA cov-
erage that may include 10-100 BSs, it cannot be used for location-based applications that re-
quire position accuracy within the size smaller than a cell (the radio coverage of a BS or a
sector of the BS). Location-based services that need to accurately track the position of a user
require specific techniques described in 3GPP TS 25.305 [3]. Details of these methods were
described in [4], and are elaborated here for the reader s benefit:
The Cell-ID-based method determines the mobile user s position based on the coverage of
Service Areas (SAs). An SA includes one or more cells. At most one-cell-sized accuracy
(about 500 meters) can be achieved when the SA includes only one cell. The Observed Time
2
Difference of Arrival (OTDOA) method utilizes trilateration to determine the mobile user s
position. At least three concurrent downlink signals from different cells are measured by the
mobile phone. The time differences among the signal arrivals are calculated to form hyper-
bolic curves. The intersection of these curves is then used to indicate the mobile user s posi-
tion. This method provides location accuracy within 50-150 meters. The Assisted Global Po-
sitioning System (A-GPS) method speeds up GPS positioning by downloading GPS infor-
mation through the Radio Access Network (RAN). Execution of A-GPS positioning only re-
quires several seconds while execution of normal GPS positioning requires 30 seconds to
several minutes. GPS modules are installed in both the mobile phone and the RAN. This
method provides location accuracy within 5-15 meters. The Uplink Time Difference of Arrival
(U-TDOA) method evolves from OTDOA, which utilizes uplink signals instead of downlink
signals. A normal uplink signal from the mobile user is measured in different cells, and no
extra signal is required. Same calculation process as OTDOA is then conducted to find out the
mobile user s position. Since the measurement and the calculation process are exercised onl y
in the RAN, this method does not require any modification to the mobile phone. This method
provides location accuracy within 50-150 meters.
The aforementioned techniques can effectively monitor the behaviors of specific mobile users
at the cost of modifications to telecom network, which are not appropriate to generate behav-
ior statistics for a large number of users that are typically required to study problems such as
3
pedestrian movement and contagious disease spread. In other words, these techniques cannot
be used to answer questions like What is the probability that a person at location A
will move to location B after T hours.
In [5], two data sets of the BS locations are utilized to analyze the human mobility patterns.
The first data set records the BS locations of 100,000 individuals for six -month when they
initiated/received a call or a short message. The second data set captures the BS locations of
206 individuals recorded every two hours for one week. The distribution of the displacements
calculated from these two data sets is found to be well approximated by a truncated pow-
er-law equation. This method as well as other solutions [9,10] require quasi-anonymous
phone identities for tracing individual movements which causes extra undesirable overhead
for the telecom operators. Furthermore, the data cannot be processed quickly (say, in one day)
if the number of mobile users is larger than millions.
In this paper, we propose a novel approach to address the spread problem by only using the
statistics from the standard mobile telecom switches such as Mobile Switching Centers
(MSCs), and Serving General Packet Radio Service Support Nodes (SGSNs) [6]. Our ap-
proach does not need to identify individual users and therefore does not cause any customer
privacy problem. The notation used in this paper is listed in Appendix A.
2. Spread Prediction Model
4
Figure 1 illustrates a mobile telecom service area covered by several BSs. In this figure, a cell
of a BS is represented by a circle. A mobile user is represented by a vehicle moving around
the cells. If a user in conversation moves from one cell to another, then the call connection
must be switched from the old cell to the new cell. This switching operation is called a hand-
over. When a call arrives at a user or when he/she performs a handover, the activity is recor d-
ed at the MSC/SGSN. The mobile telecom network collects the statistics of the activities for
every t interval typically ranging from 15 minutes to several hours. The mobile operator can
then investigate these statistics (output measures) for future network planning.
Mobile Telecom Network
MSC/SGSN
Base Station (BS)
i,j
BSj
BSi
j,i
Cell i Cell j
Figure 1. A simplified mobile telecom network
Four major measures provided by mobile telecom network are the expected call holding time,
the numbers of handovers in and out of the cells, the number of new call arrivals of the cells,
and the voice/data traffic (in Erlang) of the cells. For time define as the timeslot
5
+ 1 . Suppose that a mobile user in conversation moves from cell i to cell j at
time, then he/she contributes to one handover out of cell i, and one handover into cell j in
timeslot . The mobile telecom network measures, the number of handovers from
cell i to cell j in timeslot . Note that it is meaningless that a call hands over to the same cell,
and therefore = 0. A mobile user resides at cell i in timeslot may receive an in-
coming call or originate an outgoing call. Such new calls are counted by the measure .
In telephony engineering, an Erlang represents the continuous use of one voice path. Let
be the measure of the Erlang traffic of cell i in Then is the number of calls
arriving at cell i in times the expected call holding time. Practically, the mobile telecom
by summing up all conversation minutes of cell i in The ex-
network measures
pected call holding time E[ ] is the average of all collected call holding times over a long
observed period.
Consider a cell as the granularity of location coverage. The movement of a mobile user is de-
scribed as follows. The user stays at one cell for a period of time, and then moves to the next
cell. If we sum up the residence times of the cells he/she visited, then we know exactly which
cell the user visited after a specific elapsed time. Unfortunately, the above task cannot be
achieved because the standard outputs E[ ],, and cannot tell you how
a specific mobile user moves exactly. On the other hand, these outputs can be used to derive
the probability that where and when a user moves. When a user arrives at cell i in timeslot,
6
let be the average residence time before the user moves out of the cell. Let be
the transition probability that a user moves from cell i to cell j in timeslot . If both
and are known, then we can predict the probability of the user s location at time +
. That is, the user moves to cell j with probability . Note that = 0
because = 0. We use a prediction model to approximate and by using
E[ ],, and, and show how the model computes the probability
that starting from cell i, a user will move to cell j after a time period T.
The concept behind the prediction model is Little s Law [7], which says that the expected
number of users in a system is the arrival rate of the users times the expected response
time R that a user stays in the system; i.e.,
1
=
Equation (1) is used to compute . We first derive the average number of users at
cell i in timeslot . The intuition behind the derivation of is the following: if every-
one at cell i makes at most one call in and every call takes E[ ] minutes, then
/E[ ]. Let tR be the true cell residence time. Assume that
=
will discuss what happens if this assumption is violated later). When E[ ], the con-
versation minutes contributed by the user is, therefore the denominator of the equa-
tion is adjusted by min E[ ] and the equation is re-written as
=
min E[ ]
7
When E[ ], most calls occurring in are new calls (Section 3 will show in (13) that
handover calls rarely occur when E[ ] is small), and their call holding times are completely
before ends. Therefore, above equation is reasonably accurate when
measured in
E[ ] or E[ ] . When E[ ], many ongoing calls (either in or out of the
cell) are observed in the beginning or the end of, and such a call contributes much less
conversation minutes than min E[ ] . Therefore, we scale down the conversation
minutes by a linear factor expressed as
E[ ] E[ ]
= + [1 ] where 0 1
max E[ ] max E[ ]
Note that when E[ ] or E[ ], 1, and the conversation minutes are
min E[ ] . When E[ ],, the conversation minutes are {min E[ ] }
for incomplete calls that do not begin or end in . The final equation in our predic-
tion model is
2
=
{min E[ ] }
Based on the above discussion, accuracy of (2) is affected by two effects:
Effect 1 (Multiple-Calls-Per-Cell Effect on ). If a user generates multiple calls per cell,
then (2) overestimates .
Effect 2 (Observed Timeslot). If E[ ] and E[ ] E[ ], then the measured
conversation minutes of a call should be min E[ ] E[ ], and (2) will underestimate .
In Section 3, we show that this error can partially corrected by selecting a smaller value (e.g.,
= 0.4 .
In timeslot, the number of handovers flowing into cell i is
= 3
Four types of users are observed at cell i. In timeslot a type 1 user moves into the cell
when he/she is in phone conversation (and a handover occurs). A type 2 user is not in phone
conversation when he/she moves into the cell, and then has phone calls at this cell in . A
type 3 user moves in and/or out of cell i without any call activity. A type 4 user arrives at cell i
earlier than, and then has at least one phone call at this cell in .
The number of type 1 users moving into cell i in timeslot is . If Multi-
ple-Calls-Per-Cell effect does not exist, then is the number of type 2 and type 4 users
of cell i in timeslot .
According to (1), the average residence time of a user arriving in cell i in timeslot
9
is approximated as
=
+
From (2), we have
4
=
{min E[ ] }[ + ]
where in (4) is computed from (3).
Effect 3. (Multiple-Calls-Per-Cell Effect on ). If Effect 1 is significant, i.e., a user tends
to make multiple calls in a cell, then overestimates the number of type 2 users.
Note that due to Effect 1, (4) overestimates the cell residence time. Due to Effect 3, (4) un-
derestimates the cell residence time.
Effect 4. A type 4 user is not supposed to contribute to the arrival rate in (1). However, a
type 4 user does contribute to, and therefore (4) underestimates the cell resi-
dence time.
If > E[ ], a type 4 user becomes a type 2 user. Note that type 3 users contribute to neither
nor, and are reasonable to be ignored in computing (4). The predicted routing
probability is expressed as
= 5
However, behavior of Type 3 users does affect the routing probability, which results in the
10
following effect.
Effect 5. Movement of type 3 user is not included in, which affects the accuracy of
5.
By using (4) and (5), (T) can be expressed recursively as
6
(T)=
= + . For 0
where
1 for =
={
0 for
0. We note that the above recursive algorithm is given for the
The recursion stops when
description purpose. In our implementation of the prediction model, an iterative algorithm is
used to compute (6), where the details are given in Appendix B.
3. Numerical Examples
We have collected, E[ ], and statistics from a commercial mobile tele-
com service area in Hsinchu, Taiwan. The statistics were measured when = 1 hour, which
can be translated into = 15 minutes, and are listed below.
E[ ]: 3.3725 movements per 15 minutes (13.49 movements per hour)
E[ ] : 1.0325 Erlangs per 15 minutes (4.103 Erlangs per hour)
11
E[ ]: 1-5 minutes for voice calls and 10-20 minutes for data sessions
Due to the Personal Information Protection Act in Taiwan, we are not allowed to publish the
mappings between the collected statistics and the base stations in that area. Therefore, we as-
sume that the cell layout is of the Manhattan Street fashion with 7x7 cell structure, where
every cell has 4 neighbors (the boundary cells have 2 or 3 neighbors and the users visiting
these cells will bounce back).
We consider a baseline scenario where call arrivals are a Poisson process with the expected
inter-call arrival time E[ ] = 2 hours, and the call holding time is Exponentially distributed
with the mean E[ ] = 2 minutes. Based on these assumptions as well as the E[ ] and
the E[ ] statistics obtained from the commercial mobile telecom network, we select the
number of users as follows. The total call traffic generated in this system is E[ ]
49 =4.103 49 = 201.05 Erlangs per hour. The Erlang traffic generated from a user is
E[ ] 1 1
= 60. Therefore, the number of users in the system is 201.05 (60) = 12062.82. The
E[ ]
numbers of users considered in our experiments are 12000 and 24000, respectively. Also, as a
baseline scenario, the expected cell residence time E[ ] can be selected as follows. Since we
assume that every cell has 4 neighbors, the handover rate out of a cell is E[ ]
4=13.49 4=53.96 per hour. The expected number of users in a cell is 12000 49 = 244.9.
Since every user leaves the cell with rate 1/E[ ], and such cell crossing is a handover with
12
E[ ] 1
=,
probability we have
E[ ]+E[ ] 61
1 1
244.9 (E[ ]) (61) =53.96
That is, E[ ] = 0.08 hours or 4.8 minutes. Our experiments consider E[ ] = 5, 10, 15,
and 20 minutes.
Based on the above parameters, we simulate user movement and call activities for 24 hours.
The cell residence time has an arbitrary distribution (we specifically consider Gamma,
Normal, and Weibull distributions; the results for Gamma distribution are elaborated in this
paper, other distributions show similar results and are not presented). In our experiments, the
cell residence times E[ ] and the routing probabilities are given. Probabilities
are randomly generated to avoid homogeneous routing; i.e., our experiments arrange
that for cell i s neighboring cells . At the end of a simulation run, we
= 6 12 18 and 24 hours). During the experi-
obtain the real probabilities ) for
ments, the simulated mobile telecom network produces i,j, tc, and in every 15
minutes (i.e., t=15 minutes). Then we use the prediction model to compute
), and compare them with the real values, E[ ], and
and .
In our experiments, phone calls (connected to the MSC) are represented by E[ ] 5
minutes, and data sessions (connected to the SGSN) are represented by E[ ] 10 minutes.
The accuracy of the prediction model is investigated through the following measures (where
13
the number of users is 24000 and the expected inter call arrival time E[ ] =2 hours) :
Measure 1 is the average error between the predicted and the real routing probabilities. Let
be the error between the predicted routing probabilities and the real values
. That is,
=
Then we have
1
=
1 =
1
=
where is the number of cells in the mobile network, and is the set of neighboring cells
of cell .
As pointed out in Effect 5, the users who do not make calls when they cross the cells also af-
fect which are not captured by (5). If increases, decreases, or decreases,
more users are in conversation when they move into a cell (see (13) and (14)), and therefore
there are more handovers. Having more handovers means that the effect of the users who do
not make calls when crossing the cells becomes insignificant and the measured is
more accurate, as indicated in Figure 2.
14
Figure 2. Accuracy of routing probability prediction (E[ ]=2 hours)
Measure 2 is the error between the real cell residence time E[ ] and the predicted cell
residence time computed by (4), which is mainly caused by Effects 1, 2 (on Equation
(2)), 3 and 4. Figure 3 (a) plots 2 against where E[ ]=15 minutes. The figure indicates
that when = 1 (i.e., no adjustment), the largest 2 =50% occurs at E[ ]= t=15 minutes.
This result is exactly what we expected (see Effect 2). When t, 2 decreases as E[ ] decreases due to Effect 6. The above state-
ment is also true for E[ ]