Bank Marketing in SmartPredict
Since its creation SmartPredict has been a platform designed to facilitate our task as a data scientist but also to accomplish all steps of our Artificial Intelligence project in a single platform. In this blog we are going to describe the steps involved in carrying out an AI project for bank marketing.
2- Presentation of the BankMarketing project in SmartPredict
Bank marketing campaigns can be optimized and better-oriented owing to better customer knowledge.
Upon that, we have a dataset that describes a classic marketing campaign of a financial institution uploaded originally in the UCI Machine Learning Repository that we can also find at this link and in which attributes description can be found in the following kernel.
The project consists of creating a machine learning model that can predict whether a customer will open an account called a "term deposit" following a marketing campaign. Furthermore, we can identify the profiles of a customer, who is more likely to acquire the product.
As stated earlier, the whole project will be done in SmartPredict platform by following these steps :
- Exploring dataset in the Dataset Processor App and in the notebook both integrated into SmartPredict
- Building a flowchart in the build tab for training our model
- Deploying our model Deploy tab and test it in Test tab
But above all, we create a project and upload the dataset, for more information see the following documentation.
3- Exploring dataset in SmartPredict's Dataset Processor App and in its notebook
No need to enter code or go to another place, all the tools we need for our project are included in SmartPredict, like the application Dataset Processor App, where we can carry out explorations to better analyze our data, a crucial step for a machine learning project.
The exploration will take place in two stages:
- Visualizing dataset in Processing Tab
- Analyzing dataset using chart available in the Visualization Tab
3. 1- Visualizing dataset in Processing Tab
In Processing Tab, we have a better visualization of the dataset as it's presented in the form of a table. In addition, the type and quality of each column are indicated.
As shown in the figure below, the data quality 100% indicates that there’s no missing data neither wrong type. We can also see that there are both categorical and numerical columns.
3.2- Analyzing data using chart available in the Visualization tab
This application in SmartPredict allows us to visualize the data using different types of charts without entering codes.
It should be noted that at the time of speaking,with SmartPredict beta, the visualization tab in SmartPredict only supports 5000 lines of our data, but we can always have a better analysis of data by viewing the figures below.
- First, we will observe how our categorical type data, then the numerical one are distributed with a bar and a histogram chart respectively. Let's see some figures below.
By viewing these diagrams we note that, there are outliers in 'pdays' , and in 'previous' columns. So let's look closer to the values of these columns with the whole data in the great notebook customized in SmartPredict.
As we can see the 50% of values in 'pdays' and 'previous' are respectively "-1" and "0". Remember that 'pdays' column holds the "number of days that passed by after the client was last contacted from a previous campaign" so "-1" possibly means that the client wasn't contacted before or stands for missing data. So we suggest dropping this column while cleaning data.
Besides "previous" column holds the " Number of contacts performed before this campaign and for this client". Numbers for 'previous' above 34 are also really strange, so we suggest to impute them with average 'previous' values while cleaning data.
- The next analysis focuses, on how 'deposit' column value varies depending on other categorical and numerical columns' values using the great notebook customized in SmartPredict .
The following image shows the code that we run so that our data is accessible in SmartPredict's notebook.
These following figures show us how 'deposit' values depending on other categorical columns' values.
By interpreting the diagrams we can tell that, according to our dataset, the customers who less likely to subscribe for term deposits are:
- those with 'blue-collar" and "services" jobs
- those married
- those with 'cellular' type of contact.
Furthermore, let's see how statistically, 'deposit' values vary depending on numerical columns' values.
Looking at the diagrams we can conclude that people who subscribed for term deposit tend to have :
- greater balance and age values
- a fewer number of contacts during this campaign
4-Building flowchart for training our model in build tab
In SmartPredict, creating a machine learning project is no longer a question of coding but of dragging, configuring, and interconnecting modules in a "build space", in other words “building flowchart.” So in this section, we will mainly describe the modules, available in the drop_down list “core module” in the right pane toolbar, used in this project as well as their main functionalities.
Thus in this Bankmarketing project, we consider three main stages for building our flowchart:
- Cleaning data
- Data Selecting
- Training and evaluating model
4.1- Cleaning and preparing data
This step can be done in DataSet Processor app, but here we do it differently by using modules in the drop-down list "Data Preprocessing" which also offers complete tools to process data as shown in the figure below.
According to the analysis below, we should delete 'pdays' column. The DataFrame loader / converter module in the drop-down list Data Retrieval can performs this task by specifying the column to be deleted as shown in the diagram below.
Then, as stated before, we should replace values greater than 34, in the ‘previous column so we use Dataset Transformer module that we configure as shown in the diagram below.
Explanation of dataset Transformer configuration :
In the Expression field we add the target value so : [‘previous’].mean() because we want to replace values with the average of the previous column when the condition that we enter in the Conditions field, which is [‘previous’]>=34, is satisfied.
In Cibles field we indicate the column in which we want to apply the expression, so we indicate our column previous like this : [‘previous’]
In Default Values field we enter the values when the condition is not satisfied, so we enter [‘previous’] as the values will not change in this case.
Finally, we proceed to the encoding of the categorical columns with the One Hot Encoder module in which we just enter the name of categorical columns. So we use two of these modules: one for encoding the "deposit" data that will be the labeled data, and the other for enoding the other columns.
4.2- Dataset Selecting
When our data is ready, it’s time to split our data with the Generic Data Splitter module, as such as ‘deposit’ column constitutes the target label and the rest composes our train and test data.
Explanation of Generic Data Splitter configuration :
Axis =1 indicates column, and “index, where to split your data = -1”, indicates that ‘deposit’ is the last column.
Finally, Labeled data splitter is the module we use to split our data into train and test data We choose the 0,1 ratio which represents the number of test data among the number of training data.
4.3- Training and evaluating model
During this step again, it is a question of choosing, dragging and configuring modules.
First we choose our Machine Learning Algorithms among the existing ones in the drop_down list Machine Learning Algorithms. For this project we choose XGBoost Classifier and set it up as shown in the diagram below :
Then, we configure the Trainer ML module for training the model.
Finally, our flowchart close with an Evaluator module the configuration of which consists of choosing the type of metric to use, here we use accuracy metric.
After running our flowchart, a tooltip, which indicates the accuracy appears next to this module. We obtain 1,0 accuracy which expresses how well is our machine learning project in SmartPredict. The figure below shows us our flowchart.
The Data/Object logger module can always be connected to each module output to see the logs which help us to fix any error.
The ItemSaver module takes any type of object as input and generates a module that will be stored in the drop-down list Trained models. These modules will be used for another flowchart that we will build for deploying our model. So in this project, we should save two models as shown these followings figures :
- the trained machine learning model which we will name "XGBosst_Bank_Marketing_Classifier"
- the "one hot encoding" of the data which we will name "data_Bank_Marketing_on_hot_encoding"
4-Deploying our model in Deploy tab
Deploying our Machine Learning model will generate a REST API to which we can send our data and from which we receive the predictions returned by our model. For deploying our model, we need to assemble a new set of modules but in the Deploy tab this time. Once again, they can be dragged and dropped and are accessible from the right pane toolbar in the same location.
The construction of the flowchart for the deploy consists of having the same logic as in the previous step, based on how the data that we present to the web service will be processed by the API.
In the deploy tab, two predefined modules are already present :
- Web service in and
- Web service out.
So all flowchart in the Deploy tab starts with the Web service module which receives our data in "json" format and which returns the data in the form of a "dictionary".
Thus, by pursuing the same idea during the construction of the flowchart in build tab and according to the data, we must delete the "pdays" column: so firstly we use the same module DataFrame / loader converter but the difference is that it receives data in "dictionary" format and we are prompted this time to enter the column names to keep thus the column 'pdays' is excluded. So the configuration will be as shown in the figure below:
For this flowchart, we don't need the processing carried out by the DasetTransformer module. So we go straight to encoding.
So we need the One_hot_Encoding module but this time, in addition to the output of the DatasetTransformer module, it also receives the "data_Bank_Marketing_on_hot_encoding" model which is saved in the drop-down Trained Model. So no need to configure it because the model "data_Bank_Marketing_on_hot_encoding" is already presented as input, as shown in the figure below.
Next, the module that we use is the Predictor ML models module available in the drop_down Training and Prediction.
As you can guess, this module receives the XGBosst_Bank_Marketing_Classifier model that we have saved with the other Item Saver that we get in the drop-down Trained Model. This module does not receive any configuration.
Finally, we obtain the following flowchart.
Thus we click on the " rocket deploy icon" presented in the left slide bar, and a personal access token is generated that we can use to access the web service in which we presented data without the column 'deposit' and which can predict it (for example with POSTMAN).
To test our model, a Test tab is proposed by SmartPredict to which we enter the data (except the deposit value) between braces.
A "run button" in the right pane is then available when the input is correct. Once launched, a prediction ( if a customer will create a term deposit account) will be presented as an output. As shown in the figure below, we enter the first row of 'Bank marketing' dataset, and prediction is satisfying.
As we can see through this Bank marketing project, a machine learning project is now a breeze thanks to SmartPredict. Not only for this type of project but also for many related to artificial intelligence such as the computer vision project, image labeling, and others. It’s convincing that this platform should be used to optimize the AI modeling process to increase productivity, time savings, and efficiency.