Data Training

Location:

Hawthorne, NY

Posted:

November 23, 2012

Contact this candidate

Resume:

A Pattern Discovery Approach to Retail Fraud Detection

Prasad Gabbur, Sharath Pankanti, Quanfu Fan, and Hoang Trinh

Exploratory Computer Vision Group, IBM Research

** ******* *****, *********, ** 10532

*******@*****.*******.***, {sharat, qfan, trinhh}@us.ibm.com

ABSTRACT 1. INTRODUCTION

A major source of revenue shrink in retail stores is the

A major source of revenue shrink in retail stores is the inten-

intentional or unintentional failure of proper checking out

tional or unintentional failure of proper checking out of items

of items by the cashier. It is estimated to cause billions

by the cashier. More recently, a few automated surveillance

of dollars in revenue losses each year [8]. In the case of

systems have been developed to monitor cashier lanes and

an intentional failure, it is referred to as sweethearting be-

detect non-compliant activities such as fake item checkouts

tween a cashier and a customer as both intend to derive

or scans done with the intention of deriving monetary bene-

monetary bene t from it. More recently a few automated

t. These systems use data from surveillance video cameras

surveillance systems have been developed to monitor cashier

and transaction logs (TLog) recorded at the Point-of-Sale

lanes and detect non-compliant activities. These systems use

(POS). In this paper, we present a pattern discovery based

data from surveillance video cameras and transaction logs

approach to detect fraudulent events at the POS. Our ap-

(TLog) recorded at the Point-of-Sale (POS). Video-based

proach is based on mining time-ordered text streams, rep-

[8, 1, 2] systems visually monitor the activities of a cashier

resenting retail transactions, formed from a combination of

around the Point-of-Sale to detect item checkouts and ver-

visually detected checkout related activities called primitives

ify them using TLog data. Being able to detect as many

and barcodes from TLog data. Patterns representing single

non-compliant events as possible while keeping the number

item checkouts, i.e. anchored around a single barcode, are

of false alarms low is key to the successful deployment of

discovered from these text streams using an e cient pat-

these systems. It is a challenging problem to optimize the

tern discovery technique called Teiresias. The discovered

two con icting objectives due to variations and noise within

patterns are used to build models for true and fake item

the input data streams.

scans by retaining or discarding the anchoring barcodes in

Instead of using the two streams (video and TLog) of data

those patterns respectively. A pattern matching and classi-

separately, we posit that much can be learned about the na-

cation scheme is designed to robustly detect non-compliant

ture of an item scan performed by a cashier by combining

cashier activities in the presence of noise in either the TLog

them into a single stream. This is because most item check-

or the video data. Di erent weighting schemes for quanti-

outs are normal (no fraud) and a barcode is registered in

fying the relative importance of the discovered patterns are

the TLog. By analyzing visual information around the reg-

explored: Frequency, Support Vector Machine (SVM) and

istered barcode events, it is possible to model variations in

Frequency+SVM. Using a large scale dataset recorded from

the cashier s activities for checking out an item. This is

retail stores, our approach discovers semantically meaning-

helpful in detecting non-compliant cashier activities more

ful cashier scan patterns. Our experiments also suggest that

robustly in the presence of noise in either the TLog or the

di erent weighting schemes result in varied false and true

video data.

positive performances on the task of fake scan detection.

Video-based systems usually make use of complex ana-

Categories and Subject Descriptors: I.5.4 [Pattern Recog- lytics to detect and monitor cashier activities [7]. Many of

nition]: Applications computer vision, text processing these approaches make adhoc assumptions such as only a

constrained set of activities being performed by a cashier

General Terms: Security

during retail transactions. Real-world data presents a num-

Keywords: Retail, Security, Sweethearting, Pattern dis- ber of challenges to such approaches due to those assump-

covery, Teiresias tions being violated very often. Computer vision systems are

based on extracting visual features from input video streams.

Recently, there has been some work in transforming videos

into a sequence of discrete features [11] and applying text-

based data mining algorithms on the resulting streams. This

Permission to make digital or hard copies of all or part of this work for has opened up new possibilities for looking at video data in a

personal or classroom use is granted without fee provided that copies are di erent light in order to infer useful knowledge. Text-based

not made or distributed for prof t or commercial advantage and that copies

algorithms are simpler and faster than many sophisticated

bear this notice and the full citation on the f rst page. To copy otherwise, to

video analysis techniques. But their potential in addressing

republish, to post on servers or to redistribute to lists, requires prior specif c

some of the vision challenges have not been explored. In

permission and/or a fee.

KDD 11 August 21 24, 2011, San Diego, California, USA. this work, we propose a text-based approach to analyzing

as another primitive, the alphabet set from which the text

strings are formed has a cardinality of four: {P, S, D, B }.

The output of the pattern discovery algorithm is a set of

patterns within the input strings that are repeated at least

a minimum number of times across those strings. Only those

patterns that have a single barcode B primitive within them

and with particular con gurations of the other three prim-

itives around it are selected and the others are discarded.

Practically most item checkouts in retail transactions are

genuine and fake checkouts are very rare. With this assump-

tion, the selected patterns are representative of single item

checkouts and model the variations in the process of check-

1: A typical checkout involves the pickup (P ), scan (S )

ing out an item represented as a combination of discrete

and drop (D) of the item of interest. Each such action is

primitives. Conceptually, a fake scan or a fraudulent item

detected as a primitive by a motion-based primitive detector

checkout is similar in essence to a genuine checkout except

(See Section 3.1).

that a barcode is not registered in the TLog. So a model

representing fake scans is derived from the model for true

scans by discarding the barcode primitives from the pattern

videos represented as time-ordered discrete features called

set. For the sake of convenience, we term the models cor-

primitives in the context of a retail surveillance application.

responding to true and fake scans as positive and negative

A retail transaction involves the checkout of a set of items

models respectively.

the customer intends to buy. Ideally, each checkout event

The positive and negative models are used to detect the

can be thought of as comprising three sub-events or primi-

presence of any fake scans within a new transaction with the

tives: Pickup (P ), Scan (S ), and Drop (D) in that temporal

help of a pattern matching and classi cation scheme. Each

order, as illustrated in Fig. 1. A visual compliance approach

primitive within the transaction stream is assigned a vote

[7] is designed to detect these sub-events in a video and

by each of the patterns in the positive and negative groups

group them into P SD triplets corresponding to visual scans

when they match a certain neighborhood of that primitive.

of items. A visual scan with an associated barcode con rms

Following a Bayesian approach, these votes are transformed

a valid checkout, otherwise it is a potential candidate for a

into posterior likelihoods of the primitive being a part of

fraud or operational error. We refer to a valid checkout as

a true or a fake scan. A classi cation based on maximum

a true or visual scan and a fraud or operational error as a

posterior likelihood (above a certain minimum con dence

fake scan.

level) is done to assign the primitive to the most likely class.

Usually the output of the primitive and event detection

Finally a grouping of the classi ed primitives based on their

modules is not a single P SD triplet for every checkout.

class labels and temporal adjacency leads to a segmentation

Some of the sub-events may be repeated more than once

of the transaction stream into true and fake scans.

or may not even be detected due to various reasons. One

common situation arises when a cashier has di culty scan-

ning the item in a single attempt and ends up doing mul-

3. CHECKOUT PATTERN DISCOVERY

tiple scans resulting in repeated S primitives for a single

item. Therefore the primitive sequence output by the de- More details of the primitive detection and pattern dis-

tection system is usually noisy with repetitions and/or no covery steps are given in the following subsections.

occurrences of one or more of the P, S and D primitives.

3.1 Primitive Detection

This poses di culties to any approach looking for the pres-

ence of a P SD triplet corresponding to an item checkout.

To detect primitives: Pickup (P ), Scan (S ), and Drop (D)

A more robust method accounting for missing and repeated

from the retail video stream, we implement an e cient vi-

primitives is needed.

sion technique proposed in [14]. The approach is based on

In what follows, a paradigm for visual compliance based

the observation that each primitive P, S, D follows a spe-

on pattern discovery and matching is proposed. A brief de-

ci c motion pattern - each primitive is an in/out process in

scription of the proposed approach is given in Section 2. The

which the cashier s hand enters and exits a region of interest

details of the various steps involved including pattern discov-

(ROI). To detect hand motion, a hand color model is rst

ery and matching, primitive classi cation and true/fake scan

adaptively learned by continuously collecting hand pixel ex-

detection are given in Sections 3 and 4. Section 5 describes

amples. Based on this model, each pixel in the motion map

a few weighting schemes to quantify the relative importance

computed by frame di erencing is classi ed as hand pixel or

of the discovered patterns for classi cation. Experimental

not, to form a hand motion map for each video frame. This

details and results are given in Section 6 followed by a brief

map explicitly captures the hand motion robustly in the

discussion of the results in Section 7 and conclusions in Sec-

presence of motion noise from belt movement, background

tion 8.

changes, and customer interactions. A hierarchical nite

state machine (FSM) is then applied to deterministically

2. OUR APPROACH check whether or not the hand motion follows the prede-

Various patterns of item checkouts are discovered using ned in/out pattern corresponding to each primitive. After

an e cient pattern discovery algorithm called Teiresias [12]. the primitives P, S, D are detected, these are combined with

The input to the pattern discovery algorithm are text strings the barcode primitives (B ) from the TLog data to form a

formed from temporally ordering the output of the primi- temporally ordered text string of {P, S, D, B }, which is used

tive detectors and the TLog data. By treating the barcode as input string to the pattern discovery algorithm.

3.2 TEIRESIAS any wildcard characters so that exact pattern matching can

be done using those patterns. For the experiments in this

Teiresias [12] is a widely used algorithm for discovering re-

work, the L = W values range from 3 to 10. We choose

peated patterns across nite alphabet streams. It was orig-

a conservative value for K (2) so that even patterns that

inally developed for discovering rigid patterns in biological

are repeated only once are captured. From the resulting

sequences, eg., motifs in amino acid sequences. However, it

set, only those patterns that match the regular expression

has been applied to problems in other domains such as spam

P [P SD] B [P SD] D are retained. These are all the sin-

ltering [13] and intrusion detection [18] in UNIX processes.

gle barcode (B ) patterns that start with a pickup (P ) and

Teiresias discovers all unique maximal patterns

end in a drop (D) primitive with any possible number and

having a certain minimum support K within a given set of

arrangement of P, S and D primitives between them.

alphabetic sequences. A pattern is de ned to be

Some of the patterns discovered by Teiresias are represen-

one in which every sub-pattern of length W or more has at

tative of di erent possible ways an item checkout occurs at

most (W L) wildcard characters, the others are alphabets.

the POS. Table 1 lists the rst eight patterns based on their

The term maximal implies that the discovered patterns can-

repeatability across transactions used for pattern discovery

not be made more speci c without reducing their support

along with the number of transactions in which they oc-

sets within the input streams. The computational complex-

cur. Upon visual inspection of the corresponding videos, the

ity of the algorithm is seen to scale quasi-linearly with the

patterns P BSD and P SBD represent idealized item check-

size of the output patterns, hence making it a very e cient

outs involving one each of the P, S, D and B primitives in

pattern discovery tool.

them. P DBSD, P BSP D, and P DBSP D are representa-

3.3 Patterns in retail transactions tive of overlaps between checkouts of two successive items.

P DBSD indicates that the drop of the previous item oc-

The Teiresias pattern discovery algorithm is used to dis-

curs in parallel or after the pickup of the next item, whose

cover patterns that are representative of genuine item scans

barcode is included in the pattern. Similarly, P BSP D rep-

within retail transactions. In our system, a transaction is

resents picking up the next item while or before dropping o

a stream of P, S, D and B primitives, where the ordering

the current item, whose barcode forms a part of the pattern.

is determined by their occurrence in time. Since a typical

These are instances of parallel scanning. Parallel scanning

item checkout involves picking, scanning and then dropping

of the current item with both the previous and the next one

it, some representative patterns that correspond to this pro-

results in the pattern P DBSP D, with the B primitive cor-

cess would be P SBD, P BSD or P SDB, allowing for time

responding to the current item. The other patterns in the

delays in primitive detection or barcode registration. In re-

table result from a missing S primitive in one or more of

ality, there are a number of other patterns that are possible

the cases described above. Similarly, we found that multi-

due to noisy primitive streams rendered by the primitive

ple scanning was represented by the patterns: P DSDSBD

detector and TLog data. Some typical examples include:

and P DSDBSD, although they did not occur in as many

transactions (19 and 53 respectively) as the patterns above.

1. Multiple scan: An item is scanned more than once be-

fore a successful barcode registration. Representative

3.4 Reducing pattern redundancy

patterns in this case might be P SDSBD, P SDSDSBD,

and many more. The output of the Teiresias algorithm is a set of unique

maximal patterns for a particular setting of the

2. Late barcode: A successful scan but a delay in the

parameter values. We capture various levels of noise in the

arrival of barcode might result in patterns such as

representation of a true scan by discovering patterns with

P SDB, P SDP SB, etc. Such patterns might also re-

di erent settings for L = W . This leads to some redun-

sult from items which have to be manually registered

dancy in the discovered pattern set in that a few patterns

at the POS without scanning. Examples include or-

are repeatedly discovered for two di erent settings of L = W

ganic produce such as vegetables and fruits.

and there are overlaps between some patterns. One example

of an overlapping pair of patterns is P SB and SBD, which

3. Parallel scan: Following a successful scan, an item is

form sub-patterns of the canonical true scan pattern P SBD.

dropped after the next item in the pickup queue is

This might be the result of slightly di erent support sets for

picked up. This can be represented using P SP BD,

the two patterns caused by a missing P or a D primitive in

P SBP D and so on.

one of the scans within the input transactions. Repetitions

Variations in the patterns representing item scans are cap- are easily taken care of by retaining only the unique pat-

tured by mining primitive streams, resulting from transac- terns. One possible way to account for overlaps is to nd

tions, for patterns that repeat across transactions. Most of all pairs of patterns with overlaps among them and stitch

the item checkouts in transactions are true scans with a bar- them together into longer and longer patterns in an itera-

code being registered for each scan. Therefore patterns that tive process. But this has a high likelihood of reducing the

are repeated across transactions and have a single barcode support set of the resulting longer patterns relative to the

(B ) primitive within them can be thought of as correspond- component patterns if the component patterns did not come

ing to true scans. We seek such patterns by rst discovering from a single parent pattern. This is not desirable because

unique maximal patterns with support at least each of the overlapping patterns may uniquely represent a

K from a set of transactions, employing the Teiresias algo- true scan with some missing primitives.

rithm. This process is repeated for a few di erent L = W In order to address redundancy due to overlapping pat-

values to capture representative patterns of di erent lengths, terns, we do a pattern-transaction co-occurrence analysis,

i.e., with di erent levels of noise embedded in them. Set- which is similar to Latent Semantic Analysis [6] used in

ting L = W ensures that the resulting patterns do not have the area of information retrieval. A pattern-transaction co-

occurrence matrix C is constructed, where each row in the pattern matching and classi cation procedure using the two

matrix corresponds to a unique discovered pattern and each pattern sets. Each pattern in the positive set contributes a

column to an input transaction used for pattern discovery. positive vote to every primitive of any corresponding sub-

The (i, j )the entry of the matrix is a count of the number of string it matches within the transaction stream. Similarly

times pattern i appears in the j th transaction. Usually C is negative votes are accumulated for each primitive based on

a sparse matrix with a few positive entries scattered among how many patterns in the negative set exactly match a sub-

zeros. Each row of this sparse matrix can be thought of as string containing the primitive. These votes are used for

a representation of the corresponding pattern in a space of classifying the primitives as described below.

dimensionality equal to the number of transactions used for

4.1 Classif cation

pattern discovery. In general, these dimensions are not un-

Each primitive p in the input transaction stream is as-

correlated. So a measure of similarity among patterns such

signed a true (T ) or a fake (F ) scan label depending on the

as the correlation coe cient cannot be reliably computed in

number of votes it has accumulated for the two categories.

this space. Singular Value Decomposition is performed on

In order to do this classi cation, a Bayesian framework is

the co-occurrence matrix C resulting in the following factor-

adopted. Let ST and SF denote the sets of all true and

ization of C

fake scan patterns respectively. Then the likelihood of the

C = U V T (1)

primitive p being a part of a true scan can be written as

where U and V are orthogonal matrices of size m m and

P (T p) P (T, p) (2)

n n, assuming C is of size m n, i.e., there are a total

of m patterns discovered from a set of n transactions. is Similarly, the likelihood of p belonging to a fake scan is given

a rectangular matrix with only non-zero elements along its by

diagonal, which correspond to singular values. The columns

P (F p) P (F, p) (3)

of U and V are the eigenvectors of CC T and C T C respec-

tively. Choosing only the rst k largest singular values and We de ne the joint likelihoods P (T, p) and P (F, p) to be the

the corresponding eigenvectors leads to the best rank k ap- probabilities of the context of p belonging to the set ST and

proximation of C in the least squares sense. Further, each SF respectively.

row of matrix U is a representation of the corresponding

P (T, p) P (p context ST ) (4)

pattern (in C) in an orthogonalized space.

P (F, p) P (p context SF ) (5)

Since the number of transactions n is usually much smaller

than the number of patterns m, there are only n non-zero

where p context is used to denote the context or neighbor-

singular values in the SVD of C . Therefore only the rst n

hood of p. Each of the above likelihoods are proportional

columns of U comprise a full representation of the patterns

to the number of patterns in ST and SF respectively that

in an orthogonal space while allowing a complete reconstruc-

match a sub-string containing p. If we denote the positive

tion of C . Correlation coe cient computed in this space is a

and negative votes accumulated by p with V otesT (p) and

more reliable measure of similarity between patterns based

V otesF (p) respectively, then

on their co-occurrence within the transactions. We use this

P (p context ST ) V otesT (p) (6)

measure for clustering patterns together that have a high

correlation among them. A greedy agglomerative clustering P (p context SF ) V otesF (p) (7)

scheme is adopted, where the patterns are rst ordered in the

Appropriate normalization of the cumulative votes of the

decreasing order of their total number of occurrences. Then

primitive p for the two categories yields the posterior prob-

all the patterns that have a high correlation (> 0.5) with

abilities P (T p) and P (F p). The primitive is assigned to

the most repeated pattern are put into one cluster. This is

the class with the maximum posterior likelihood given it is

repeated with the next available most repeated pattern and

above a certain threshold value.

so on until all patterns are exhausted. Each cluster is then

4.2 True and fake scan detection

represented by the most occurring pattern among its mem-

bers. Only the cluster representatives are used for pattern

Classi cation of a transaction s primitive stream results

matching.

in a class label (T or F ) for those primitives whose poste-

4. PATTERN MATCHING AND CLASSIFI- rior likelihoods for the respective class are above the cho-

sen threshold. Other primitives remain unclassi ed. This

CATION usually results in continuous stretches of T s and F s with

The patterns derived from the above procedure de ne a unclassi ed primitives in between them. One continuous

model for true scans, which we refer to as the positive model stretch of T or F labels need not necessarily correspond to

for the sake of convenience. It can be hypothesized that fake a single true or fake scan respectively. A segmentation like

scans are similar to true scans but without the registration of procedure groups subsets of adjacent primitives that are as-

a barcode. With this assumption, a corresponding negative signed T labels into true scans. This is achieved by choosing

model is constructed for fake scans by eliminating the B a B primitive within each such subset as the anchor for the

primitive from the true scan patterns. Two tables are built, corresponding true scan. Other primitives such as P, S and

one corresponding to true scans (positive) and the other cor- D that form a part of the true scan are searched for locally,

responding to fake scans (negative). The two tables contain i.e., the nearest ones to the anchoring B primitive within

the same set of patterns except for the barcode primitive a small window. The size of the window is limited to the

being deleted from each pattern in the negative set. Given a set of non-B primitives around the anchor, which have been

primitive stream from a new transaction, each primitive in classi ed as belonging to the class T . Only one P, S and D

the stream is assigned a true or fake scan label based on a primitive is assigned to a true scan.

Among the available P s and S s within a window, prefer- a weighting scheme where the weights were derived from a

ence is given to the ones that occur closest to B but earlier classi er trained to distinguish true from fake scans using

in the stream. On the other hand, a D that is closest to B ground truth data.

but occurs later in the stream is preferably assigned to the

5.2 Support Vector Machine (SVM) weights

true scan. In the absence of a P, S or D primitive within

the neighborhood window of an anchoring B, a new such

Support Vector Machine (SVM) [15, 3] is used in a variety

primitive is created and assigned to the true scan. Similar

of classi cation problems because of a number of its favor-

grouping is performed to detect fake scans using S primitives

able properties including superior generalization ability. A

labeled as belonging to class F as anchors and searching for

two-class SVM estimates a hyperplane separating the train-

the nearest P and D primitives. In order to keep the number

ing points in a feature space such that the margin between

of false positives manageable, we assume that at least one

the hyperplane and the two classes is maximized. Assuming

each of P, S and D primitives should be present to provide

a linear kernel SVM, the weights are indicative of how useful

enough evidence for a fake scan and that two successive fake

the corresponding features are for the purpose of classi ca-

scans be at least 3 seconds apart.

tion. This has been used for feature selection [5, 9]. We

exploit this property of SVMs to learn the relative impor-

5. WEIGHTING VOTES tance of the discovered patterns for classifying a primitive

into true or fake scan based on the matches to its context.

Within a transaction stream, a primitive is assigned a vote

In order to achieve this, we sample a set of training points

by each of the patterns matching a sub-string containing the

belonging to true and fake scan classes and map them onto

primitive. A voting scheme where each pattern contributes

a feature space with dimensionality equal to the number of

the same vote to the matching primitives might not be opti-

patterns. The training points are a set of primitive strings

mal. Our goal is to be able to reliably distinguish true scans

centered around the B primitive for a true scan and the S

from fake scans based on the discovered patterns. Not all the

primitive for a fake scan.

patterns contain the same amount of information to distin-

The B primitive locations of true scans are randomly sam-

guish between the two classes. There may still be some re-

pled from the set of transaction streams used for training

dundancies left among patterns ltered out by co-occurrence

leaving out those transactions which have a fake scan in

analysis. Therefore it is not guaranteed that each pattern

them. Note that we hand labeled the locations of the fake

contributes independent information relative to others in the

scans in our transaction streams by a visual inspection of

set. Further some of the patterns may even be noisy, eg.,

the corresponding videos. There are a total of 13 fake scans

resulting from missing or multiple primitive detections for

in our data consisting of 1660 transactions. The number of

single pickup, scan or drop events during transactions. A

randomly sampled true scan points from training data is in

weighting scheme where patterns that carry independent in-

the order of a few hundreds. There is a large imbalance in

formation and are the most important for classi cation are

the number of training points for the two classes. Train-

weighted more relative to others is desirable. We experi-

ing on such data leads to a SVM that is biased towards the

ment with two weighting schemes that assign di erent rela-

class with the larger size [10]. A number of approaches have

tive votes to the patterns.

been proposed for SVM training in the presence of a large

5.1 Frequency weights imbalance in data for the two classes. These methods either

This is a simple weighting scheme where each pattern is undersample the majority class [17], oversample the minor-

assigned a weight inversely proportional to the number of ity class [4] or adjust the SVM margin to reduce the bias

times it occurs across all the transactions used for pattern towards majority class [16, 10].

discovery. This is mostly an empirical weighting scheme Our use of SVM is to learn a weight for each of the fea-

driven by experimentation. We tried di erent heuristic schemes ture dimensions, corresponding to a pattern, that quanti es

that were based on the frequency of patterns and the size the importance of that pattern for classi cation. It is desir-

of their clusters in the co-occurrence analysis (Section 3.4). able to have a non-negative weight for each pattern so that

Among them, this simple weighting scheme led to the best they can be easily interpreted. In order to avoid the data

results in terms of the number of falsely detected fake scans imbalance e ects and to ensure positive weights, we learn a

relative to the total number of detected true scans. zero-bias SVM with a particular mapping of the true and

A possible reason might be that smaller patterns tend to fake scans into a feature space. The mapping is such that

occur more frequently than longer ones among our discov- the true scan points always lie in the positive hyper-quadrant

ered patterns. Using a uniform voting scheme, the smaller and the fake scan points in the negative hyper-quadrant. A

patterns have a higher in uence in the classi cation decision hyper-quadrant refers to the extension of a quadrant in a

of a majority of primitives because smaller patterns match 2D space to a higher dimensional space. So the positive

more sub-strings than longer ones. Further each pattern in hyper-quadrant is the set of all points with only positive

the negative (F ) class is one primitive shorter than its coun- coordinates and similarly for the negative hyper-quadrant.

terpart in the positive class. All these factors contribute The sampled true scan points are matched with the set of

to an increased number of short noisy primitive segments patterns in the positive set to obtain a feature vector for

that are assigned to the negative (F ) class. With the in- each of those points. Only the dimensions corresponding to

verse frequency based voting scheme, longer (less frequent) the patterns with exact matches are set to suitable non-zero

patterns in uence the classi cation decision of many prim- positive values and the others are zero. Similarly the feature

itives leading to longer, less noisy segments of continuous vectors for fake scan points are obtained by pattern match-

T or F labels. However, this scheme causes many genuine ing with the negative set of patterns. However the values for

fake scans to be missed, based on our experiments (See Sec- the dimensions with exact matches are set to suitable nega-

tion 6). In order to circumvent this problem, we designed tive values in this case. Fig. 2 illustrates the mapping and

tern matches regardless of the pattern. Such a mapping

causes true scan points that match to only a few of the pat-

terns occurring less frequently to be moved more towards the

origin in the feature space relative to others. Since the SVM

hyperplane passes through the origin it seems that points

matching to less frequent patterns are most likely to be the

support vectors, which in uence the nal weights. This has

a tendency to favor lesser number of false positives in our

experiments. Table 1 shows the SVM weights obtained by

averaging the learned normalized mean SVM weights across

all the divisions of the data from 6 lanes into 3 training and

3 held-out lanes (See Section 6.1). Interestingly, the two ide-

alized checkout patterns (P BSD and P SBD) have the most

signi cant average SVM weights among the above, implying

2: A zero-bias SVM for classifying true scan points from

their usefulness for classi cation.

fake scans. The mapping of points is such that true scan

5.3 Combining frequency and SVM weights

samples always lie in the positive quadrant and the fake

scan samples in the negative quadrant. There are very few

The SVM-based weights emphasize those patterns that

fake scan samples relative to the number of true scan sam-

are useful for distinguishing true and fake scan points within

ples. The SVM weights are all positive and quantify the

the training data. Interestingly, these weights also lead to

relative importance of the feature dimensions (P1 and P2 )

an increased number of genuine fake scan detections (true

for classi cation.

positives) within the held-out data relative to using the fre-

quency weights (Section 6). However the false positive per-

Mean Normalized Weights

formance of the classi er using SVM weights is worse com-

0.7

0.6

pared to that using the frequency weights. This suggests

Mean weight

0.5

an interesting possibility of combining the two weights such

0.4

0.3

that the combined weights help achieve a better false pos-

0.2

0.1

itive performance relative to the SVM weights and better

0-50-100-***-*** 250 300

Pattern index

true positive performance relative to the frequency weights.

Normalized Weights

The combined weights are obtained by summing the SVM

0.8

and frequency weights, thereby reinforcing only those fre-

Weight

0.6

quently occurring short patterns that are useful for classi-

0.4

cation. For convenience, these weights are referred to as

0.2

(Frequency+SVM) weights.

0-50-100-***-*** 250 300

Pattern index

3: (Below) SVM weights after normalization (maximum

6. EXPERIMENTS

value 1) for each random split of the available data. The

learned weights for the di erent splits are superimposed.

6.1 Data

(Above) Averaged weights across all the random splits. The

Video data recorded from 6 lanes of a retail store during

weights imply relative importance of patterns for the task

one business day (approx. 16 hrs) of transactions was used

of classi cation. Clearly, some patterns are more important

for all the experiments reported in this work. During this pe-

than others.

riod, 18 di erent cashiers checked out a total of about 32,969

registered items within 1660 transactions. The video data

the SVM geometry in a 2D space corresponding to patterns was transformed into discretized and time ordered primi-

P1 and P2 . tives and merged with the barcode events from the TLog

The available data within a training set (3 lanes) (See Sec- data. Primitive streams corresponding to transactions on

tion 6) is split into two equal parts randomly (approx. 250 3 of the lanes were chosen for discovering patterns as de-

true scans and 2-3 fake scans) with one half used for SVM scribed in Section 3.3. The same set of patterns were used

training and the other half held out for testing the learned for all the experiments reported here. The entire data was

classi er. SVMs are trained with 100 di erent such ran- divided into two parts (3 lanes each). These two sets are

dom splits and the weights are averaged to obtain the nal referred to as training and held-out lanes respectively. All

such divisions ( 6 = 20) were considered and the results

weights. Considering one training division of all the data, 3

Fig. 3 shows the normalized weights after SVM training us- were averaged across the divisions. Note that the frequency

ing each of the 100 random splits of the data and also the weights remain the same for all the data divisions resulting

averaged weights across those splits. in the same true and fake scan detections unlike the other

The mapping into the feature space can be done in many two weighting schemes. Only the held-out statistics change

di erent ways. Our experiments favored an asymmetric across those divisions. Similarly for the unweighted Teiresias

mapping scheme between the true and fake scan training and all patterns (Section 6.2). The data from all the lanes

points. In this scheme, the positive value assigned to a was manually evaluated by six human subjects for the pres-

match between a true scan and a pattern is proportional to ence of sweethearting activities or operational errors leading

the frequency of the pattern. On the other hand the same to the failure of barcode registration for the corresponding

negative value is assigned to all the fake scan point and pat- items. A total of 13 such events were found.

Pattern #Repeats across transactions Average SVM weight

1584 0.005

P BD

1302 0.593

P BSD

782 0.009

P DBD

774 0.010

P BP D

734 0.127

P BSP D

629 0.127

P DBSD

590 0.597

P SBD

585 0.014

P DBSP D

1: The rst eight most repeated patterns across all transactions used for pattern discovery by Teiresias. Each of those pattern s

number of repeats (transactions in which they occur) and the learned average SVM weight are reported in the second and

third columns respectively. P BSD and P SBD represent idealized item checkouts and also have the most signi cant average

SVM weights. Other patterns represent parallel scans between successive items and scans with a missing S primitive. See

text for more details.

6.2 All patterns from 0.5 to 0.9 in steps of 0.1 and plot the true and fake

scan detection performances. The number of detected true

Our model for true scans is a set of discovered primitive

(or visual) and fake scans for data from the held out lanes

patterns with di erent combinations of P, S and D primi-

of one particular division of all the data is plotted in Fig. 4.

tives anchored around a single B primitive. The discovered

In a

Contact this candidate