Problem Statement

The main part of the thesis will give an example on how to use known methods in data analysis and simulation on an unknown example data set. The context of the data and thesis is customer/ user behavior modeling. The focus of the analysis will be the understanding, modeling and prediction of user purchases and cash flow.
The thesis will elaborate hypothesizes around the given context. Hypothesizes will be established for current questions in market and consumer research. Additionally, the hypothesizes can be updated throughout the thesis for better specification of problems and their understanding.

Relevance and Motivation

User Behavior Modeling (UBM) is a very important part in data analysis. Global players on the market are always interested in how to place and sell their products in a variety of options. Expenditure and shopping cart analysis are just two examples in the variety of methods. This variety makes it hard to find the right (i.e. a high quality) model. It can take several approaches with different ideas to get to a point of satisfactory insights. The question always remains the same:
What impels the market? In our case the problem can also be described with the question: What will the customer do/purchase next? The data we are using are e-mail recipes and order confirmations detailing the consumerism of people over a period of time. The data is not restrained to one supplier, but gives information from the point of view of the customers. Therefore the data models a more complete part of the market, compared to the data collected by just one supplier. A general problem is the variety of ideas on how to explore the data. There are many report and modeling techniques which could be applied. To give the thesis a more specific focus, we will concentrate on two groups of methods. The first is segmentation. Customers and items can be put in different groups, according to their purchase characteristics. customers can be defined by age, gender or location. Items will be put into their general market segments. A segmented customer base opens the way for specialized advertisement and product recommendations. The second approach is behavior predicting. With a unconstrained time series of receipts, one can simulate a sequence of purchases as the customer’s behavior on the market. It would be favorable for any supplier to predict the customers purchase decision. Especially logistical planning for storing and shipping profits if the next purchase of the customer is already known within a limited amount
of choices. It reduces the overhead in organizing a large scale distribution network. Distribution is not the only driving factor. Suppliers are interested in consumer choices to place products, through ads and marketing, most effectively. So, consumer decisions can be put into perspective for the whole market, giving a