All articlesData Science
in Data Science
a month ago

Bank Marketing in SmartPredict

1- Introduction

Since its creation SmartPredict has been a platform designed to facilitate our task as a data scientist but also to accomplish all steps of our Artificial Intelligence project in a single platform. In this blog we are going to describe the steps involved in carrying out an AI project for bank marketing.

2- Presentation of the BankMarketing project in SmartPredict

Bank marketing campaigns can be optimized and better-oriented owing to better customer knowledge.

Upon that, we have a dataset that describes a classic marketing campaign of a financial institution uploaded originally in the UCI Machine Learning Repository that we can also find at this link and in which attributes description can be found in the following kernel.

The project consists of creating a machine learning model that can predict whether a customer will open an account called a "term deposit" following a marketing campaign. Furthermore, we can identify the profiles of a customer, who is more likely to acquire the product.

As stated earlier, the whole project will be done in SmartPredict platform by following these steps :

  • Exploring dataset in the Dataset Processor App and in the notebook both integrated into SmartPredict
  • Building a flowchart in the build tab for training our model
  • Deploying our model Deploy tab and test it in Test tab

But above all, we create a project and upload the dataset, for more information see the following documentation.

3- Exploring dataset in SmartPredict's Dataset Processor App and in its notebook

No need to enter code or go to another place, all the tools we need for our project are included in SmartPredict, like the application Dataset Processor App, where we can carry out explorations to better analyze our data, a crucial step for a machine learning project.

The exploration will take place in two stages:

  • Visualizing dataset in Processing Tab
  • Analyzing dataset using chart available in the Visualization Tab

3. 1- Visualizing dataset in Processing Tab

In Processing Tab, we have a better visualization of the dataset as it's presented in the form of a table. In addition, the type and quality of each column are indicated.

As shown in the figure below, the data quality 100% indicates that there’s no missing data neither wrong type. We can also see that there are both categorical and numerical columns.

Data Visualization as a table in Processing Tab

3.2- Analyzing data using chart available in the Visualization tab

This application in SmartPredict allows us to visualize the data using different types of charts without entering codes.

It should be noted that at the time of speaking,with SmartPredict beta, the visualization tab in SmartPredict only supports 5000 lines of our data, but we can always have a better analysis of data by viewing the figures below.

  • First, we will observe how our categorical type data, then the numerical one are distributed with a bar and a histogram chart respectively. Let's see some figures below.

Job distribution with BarChart

Education distribution with BarChart

Housing distribution with BarChart

Marital distribution with BarChart

Pdays distribution with Histogram Chart

Previous distribution with Histogram Chart

Campaign distribution with Histogram Chart

Balance distribution with Histogram Chart

Note:

By viewing these diagrams we note that, there are outliers in 'pdays' , and in 'previous' columns. So let's look closer to the values of these columns with the whole data in the great notebook customized in SmartPredict.

Statistical value of 'pdays' and 'previous' column

As we can see the 50% of values in 'pdays' and 'previous' are respectively "-1" and "0". Remember that 'pdays' column holds the "number of days that passed by after the client was last contacted from a previous campaign" so "-1" possibly means that the client wasn't contacted before or stands for missing data. So we suggest dropping this column while cleaning data.

Besides "previous" column holds the " Number of contacts performed before this campaign and for this client". Numbers for 'previous' above 34 are also really strange, so we suggest to impute them with average 'previous' values while cleaning data.

  • The next analysis focuses, on how 'deposit' column value varies depending on other categorical and numerical columns' values using the great notebook customized in SmartPredict .

The following image shows the code that we run so that our data is accessible in SmartPredict's notebook.

Code to upload data into our notebook

These following figures show us how 'deposit' values depending on other categorical columns' values.

Deposit depending on contact

Deposit depending on Education

Deposit depending on Marital

Deposit depending on Job

Note:

By interpreting the diagrams we can tell that, according to our dataset, the customers who less likely to subscribe for term deposits are:

- those with 'blue-collar" and "services" jobs

- those married

- those with 'cellular' type of contact.

Furthermore, let's see how statistically, 'deposit' values vary depending on numerical columns' values.

Deposit by age statistically

Deposit by balance statistically

Deposit by campaign statistically

Note:

Looking at the diagrams we can conclude that people who subscribed for term deposit tend to have :

- greater balance and age values

- a fewer number of contacts during this campaign

4-Building flowchart for training our model in build tab

In SmartPredict, creating a machine learning project is no longer a question of coding but of dragging, configuring, and interconnecting modules in a "build space", in other words “building flowchart.” So in this section, we will mainly describe the modules, available in the drop_down list “core module” in the right pane toolbar, used in this project as well as their main functionalities.

Thus in this Bankmarketing project, we consider three main stages for building our flowchart:

  • Cleaning data
  • Data Selecting
  • Training and evaluating model

4.1- Cleaning and preparing data

This step can be done in DataSet Processor app, but here we do it differently by using modules in the drop-down list "Data Preprocessing" which also offers complete tools to process data as shown in the figure below.

Modules in Data Preprocessing

According to the analysis below, we should delete 'pdays' column. The DataFrame loader / converter module in the drop-down list Data Retrieval can performs this task by specifying the column to be deleted as shown in the diagram below.

Deleting 'pdays' column with DataFrame loader / converter module

Then, as stated before, we should replace values greater than 34, in the ‘previous column so we use Dataset Transformer module that we configure as shown in the diagram below.

Replace previous value greater than 34 by mean value with Dataset Transformer module

Explanation of dataset Transformer configuration :

In the Expression field we add the target value so : [‘previous’].mean() because we want to replace values with the average of the previous column when the condition that we enter in the Conditions field, which is [‘previous’]>=34, is satisfied.

In Cibles field we indicate the column in which we want to apply the expression, so we indicate our column previous like this : [‘previous’]

In Default Values field we enter the values when the condition is not satisfied, so we enter [‘previous’] as the values will not change in this case.

Finally, we proceed to the encoding of the categorical columns with the One Hot Encoder module in which we just enter the name of categorical columns. So we use two of these modules: one for encoding the "deposit" data that will be the labeled data, and the other for enoding the other columns.

Encoding 'deposit' column with One Hot Encoder module

Encoding other columns with One Hot Encoder module

4.2- Dataset Selecting

When our data is ready, it’s time to split our data with the Generic Data Splitter module, as such as ‘deposit’ column constitutes the target label and the rest composes our train and test data.

Separate target value from the data with Generic Data Splitter module

Explanation of Generic Data Splitter configuration :

Axis =1 indicates column, and “index, where to split your data = -1”, indicates that ‘deposit’ is the last column.

Finally, Labeled data splitter is the module we use to split our data into train and test data We choose the 0,1 ratio which represents the number of test data among the number of training data.

Splitting test and train data with Labeled data splitter

4.3- Training and evaluating model

During this step again, it is a question of choosing, dragging and configuring modules.

First we choose our Machine Learning Algorithms among the existing ones in the drop_down list Machine Learning Algorithms. For this project we choose XGBoost Classifier and set it up as shown in the diagram below :

Configuration of our machine learning model with XGBoost Classifier module

Then, we configure the Trainer ML module for training the model.

Configuration of Trainer ML module for training the model

Finally, our flowchart close with an Evaluator module the configuration of which consists of choosing the type of metric to use, here we use accuracy metric.

After running our flowchart, a tooltip, which indicates the accuracy appears next to this module. We obtain 1,0 accuracy which expresses how well is our machine learning project in SmartPredict. The figure below shows us our flowchart.

Flowchart build in Build Tab for training our model

Note :

The Data/Object logger module can always be connected to each module output to see the logs which help us to fix any error.

The ItemSaver module takes any type of object as input and generates a module that will be stored in the drop-down list Trained models. These modules will be used for another flowchart that we will build for deploying our model. So in this project, we should save two models as shown these followings figures :

- the trained machine learning model which we will name "XGBosst_Bank_Marketing_Classifier"

- the "one hot encoding" of the data which we will name "data_Bank_Marketing_on_hot_encoding"

Saving trained model by Item Saver named XGBoost_Bank_Marketing_Classifier

Saving the one_hot_encoding model of the data by ItemSaver named data_Bank_Marketing_on_hot_encoding

4-Deploying our model in Deploy tab

Deploying our Machine Learning model will generate a REST API to which we can send our data and from which we receive the predictions returned by our model. For deploying our model, we need to assemble a new set of modules but in the Deploy tab this time. Once again, they can be dragged and dropped and are accessible from the right pane toolbar in the same location.

The construction of the flowchart for the deploy consists of having the same logic as in the previous step, based on how the data that we present to the web service will be processed by the API.

In the deploy tab, two predefined modules are already present :

  • Web service in and
  • Web service out.

So all flowchart in the Deploy tab starts with the Web service module which receives our data in "json" format and which returns the data in the form of a "dictionary".

Thus, by pursuing the same idea during the construction of the flowchart in build tab and according to the data, we must delete the "pdays" column: so firstly we use the same module DataFrame / loader converter but the difference is that it receives data in "dictionary" format and we are prompted this time to enter the column names to keep thus the column 'pdays' is excluded. So the configuration will be as shown in the figure below:

Configuration of DataFrame loader/converter in deploy tab's flowchart

For this flowchart, we don't need the processing carried out by the DasetTransformer module. So we go straight to encoding.

So we need the One_hot_Encoding module but this time, in addition to the output of the DatasetTransformer module, it also receives the "data_Bank_Marketing_on_hot_encoding" model which is saved in the drop-down Trained Model. So no need to configure it because the model "data_Bank_Marketing_on_hot_encoding" is already presented as input, as shown in the figure below.

Two inputs of the One Hot Encoder modules in deploy tab

Default configuration of the module One Hot Encoder in deploy Tab

Next, the module that we use is the Predictor ML models module available in the drop_down Training and Prediction.
As you can guess, this module receives the XGBosst_Bank_Marketing_Classifier model that we have saved with the other Item Saver that we get in the drop-down Trained Model. This module does not receive any configuration.

Finally, we obtain the following flowchart.

Flowchart for the deployment in deploy tab

Thus we click on the " rocket deploy icon" presented in the left slide bar, and a personal access token is generated that we can use to access the web service in which we presented data without the column 'deposit' and which can predict it (for example with POSTMAN).

To test our model, a Test tab is proposed by SmartPredict to which we enter the data (except the deposit value) between braces.
A "run button" in the right pane is then available when the input is correct. Once launched, a prediction ( if a customer will create a term deposit account) will be presented as an output. As shown in the figure below, we enter the first row of 'Bank marketing' dataset, and prediction is satisfying.

5- Conclusion

As we can see through this Bank marketing project, a machine learning project is now a breeze thanks to SmartPredict. Not only for this type of project but also for many related to artificial intelligence such as the computer vision project, image labeling, and others. It’s convincing that this platform should be used to optimize the AI modeling process to increase productivity, time savings, and efficiency.