Top 6 data to consider for feature engineering when modelling customer churn
Feature Engineering for customer churn.
A churn model can help you determine the most significant reasons customers decide to stop using your product or service, but it’s up to the data scientist building the model to decide which factors to train, test and ultimately include or exclude, a process called feature engineering.
In the machine learning workflow, the feature engineering part is the most creative part for data scientists. It is the part of the model development process where data scientists can employ their experience, business context and creativity to tailor the model and capture why churn happens in a specific business.
Usually when you are modelling churn you will want to test a mix of features related to customer engagement, customer demographics, product & pricing and customer satisfaction data if available.
In a broad sense, features are measurable characteristics from customers historical data that a machine learning model takes into account to predict future outcomes (in our case the decision relates to churn probability.) During the feature engineering process, data scientists create a set of attributes (input features) that represent behaviour patterns related to engagement level with a service or product
Typical features for modelling customer churn
When trying to predict churn, data scientists can consider features in the following broad categories
Some customers might churn because of the nature of their business, the number of employees or the revenue. If your business is a B2C then you might consider customer demographics such as age, education, income, etc..
The more information we have on our customers the better picture we can develop to model churn.
For B2C businesses the top customer features to consider:
· Race (ethnicity)
· Age (date of birth)
· Household income
· Home ownership (length of residence, home size, mortgage)
· Employment status
· Assets owned
· Marital status (head of household, spouse)
· Savings, mortgage, etc..
Product characteristic that define the product type can be a reason customers churn. Also churn rate can be different for different products, it is important that we capture product characteristics while modelling churn.
Product characterizes to consider could be, product delivery method: online vs offline, web app / mobile app, product price, product size etc… any characteristic that can help define your product.
RFM is a strategy for analyzing and estimating the value of a customer, based on three data points: Recency (How recently did the customer make a purchase?), Frequency (How often do they purchase), and Monetary Value (How much do they spend?)
These factors can be used to reasonably predict how likely (or unlikely) it is that a customer will re-purchase from a company.
RFM features are usually strong indicators in predicting customer behaviour for multitude of customer analysis.
· Recency — How recent was the last transaction of a customer
· Frequency — How frequently does a customer make a purchase with your business
· Money — How much money does a customer spend with you, this can be over their lifetime, per year or per subscription.
· Length of relationship — How long has a customer been a customer.
· Customer Trend: Increasing revenue, decreasing revenue or seesaw
Engagement features give interesting insights into customer behaviors that typically precede churn. These features capture how customers interact with your business and how has been the experience. For online products web traffic data such as the number of times they log in to your website / app every month, the time spent during each session, whether they have unsubscribed from marketing emails can be captured. Engagement metrics are an ideal starting point because they are relatively easy to measure with precision and they are generally good indicators of whether a customer’s intention to continue using a service or not. For online products Adobe analytics or Google analytics are a good source of engagement metrics
User experience and customer satisfaction features are usually the most reliable in explaining when and why customers churn, but they are also the hardest to capture. That’s because it’s impossible to ask each customer exactly how satisfied they are with a product or service and expect that they will answer honestly, if they even answer at all. Your data scientists will have to examine your data to identify information that can serve as a proxy for customer satisfaction. Product ratings and customer support calls are a good example. These metrics can clearly indicate an individual consumer’s sentiment, but they’ll need to be generalized for this information to be useful for measuring how happy your customers are overall.
Sometimes regardless of customer or product characteristics, usage or satisfaction, customer churn could be caused due to external factors out of your control. These factors are the most difficult to capture as there can be many confounding factors, but understanding business context is key in identifying external factors
External factor such as national economic indicators like GDP, central bank interest rates, inflation rates etc. or broadband speed, network coverage. etc. could be external factors that can be considered for modelling customer behaviour
Putting it all together
The key to successful feature engineering is for your data scientists to combine their domain expertise with input from stakeholders in your business to construct a comprehensive range of possible features to include in the model. In addition to the available measurable metrics data scientists should apply their creativity to engineer features such as revenue trend (monthly / yearly), usage trend (quarterly / monthly), rate of decline or increase in revenue or usage to capture customer behaviour leading to churn.
The next step is to measure the importance of each feature in explaining why churn occurs in your business and removing whatever is not a strong predictor, highly correlated features to reduce dimensionality i.e. the number of features in your model. We will cover the challenges of high dimensionality and how to reduce dimensionality in the next article.