All articlesMachine Learning

Covid-19 global forecasting with SmartPredict

1- Introduction

COVID-19 is a contiguous virus that started in Wuhan in December 2019 and later declared as Pandemic on 2/11/2020, by WHO due to the high rate spreads throughout the world. Until May 26, 2020, we have recorded 346 700 deaths and 5 518 905 people infected worldwide. The fight against the COVID- 19 is not over as long as these numbers keep increasing. In this regard, it's worthwhile to forecast new cases in the world and in each country with Artificial Intelligence. Hoping you are as interested as much as I am, let's train and deploy a machine learning for this problem at the one and right place: SmartPredict.

Ready! Let's get started!

2- Presentation of the project

The project consists of training, deploying a machine learning model which can predict the number of COVID-19 infected cases, and deaths in a country (of our choice) and worldwide.

For this, we use 2 datasets provided by Johns Hopkins University: confirmed_data.csv, and deaths_data.csv, which respectively presents the daily number of infected cases and the daily number of deaths in some countries in the world from 1/22/20 to 5/19/20.

Therefore, it is a time-series forecasting by which we predict the daily number of infected cases first, then deaths from 6/19/20.

All steps will be done on "SmartPredict" platform, the right place on which we can perform any type of Artificial Intelligence project WITHOUT CODING:

- First, we extract useful data for training our model via SmartPredict 's notebook or with the DatasetProcessor module.

- Second, we build a flowchart which trains our model in Build TAB

- Third, we deploy our model in Deploy TAB

- Finally, we test our model in Test TAB

3- Coronavirus infected cases forecasting in SmartPredict

Step 1: Preparing data in SmartPredict's notebook

First of all, we should create an account and an empty project in SmartPredict.

Then, we should upload our datasets in SmartPredict or download it via SmartPredict's notebook from the web.

Next, we are going to extract useful data that will train our model (with reference to this work).
This step can be carried out, without coding, by using modules in the Core Module or by customing module. But this time we use SmartPredict's notebook by entering the following code.

# Getting confirmed_data.csv from the web

confirmed_df = pd.read_csv ("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

# Extracting data from fourth column

cols = confirmed.keys()

confirmed = confirmed.loc[:, cols[4]:cols[-1]]

# gathering daily "confirmed cases" worldwide and in China
dates = confirmed.keys()

world_cases = []

china_cases = [] 

for i in dates:

    world_cases.append(confirmed[i].sum())

    china_cases.append(confirmed_df[confirmed_df['Country/Region']=='China'][i].sum())

# Recording our dataset as a dataframe 
confirmed_world_cases = pd.DataFrame({'dates':dates, 'World_cases':world_cases})
china_cases = pd.DataFrame({'dates':dates, 'China_cases':china_cases})

#Saving data in SmartPredict
api.dataset_save(dataset=confirmed_world_cases, name='confirmed_world_cases1', dataset_type='csv')
api.dataset_save(dataset=china_cases, name='china_cases1', dataset_type='csv')


Now, we can see our training datasets: " confirmed_world_cases1 " and "china_cases1" in SmartPredict's Dataset, which represents respectively the infected cases worldwide and in China, so we can visualize them in DatasetProcessor.

Step2: Building and training a flowchart in Build Tab

Training a machine learning model in SmartPredict consists of building and running a flowchart by dragging, dropping, and interconnecting modules available in the CoreModule. The principle of the usual methods is maintained, but instead of coding, modules are selected and parameterized for having the desired operation.
All flowchart that we build in Build Tab always begins with the training dataset. Hence, we drag and drop our dataset module "confirmed_world_cases1" for training our machine learning model which will predict infected cases in the world.

Our second module is "Feature Selector" which selects the column which constitutes the features by which our model will be trained.

In case you would like to perform specific operations but the appropriate module does not exist in Core Module, it is possible to customize your own module. This is the case for the third module which transforms our times series according to the principle described in this article.
The following figure explains this transformation, and the gif below shows you how to create a Custom Module.

As we can see, we can create both functions and classes using Python code to save as a new module to use in your projects. You just have to insert your function in the existing code. Moreover, the comments will serve us as a guide.

So we are going to create a module named timeseries_supervised with the following code for transforming our time series dataset.

from smart_predict.modules.base.custom import CustomModule
import numpy as np 


class MyCustomModule(CustomModule):
    """MyCustomModule.

    A custom module must inherit the class CustomModule."""

    #: This dictionary holds the property of your custom module
    p = {
        # Input specification, the keys of the dict are the inputs' name
        'in': {
            'input_1': {
                # Type of input data.
                'type': 'any',

                # Shown name.
                'name': 'Input 1',

                # Short description.
                'description': 'Default Input'
            }
        },

        # Output specification, the keys of the dict are the outputs' name
        'out': {
            'output_1': {
                # Type of output data.
                'type': 'array',

                # Shown name.
                'name': 'Output 1',

                # Short description.
                'description': 'Default Output'
            },
            'output_2': {
                # Type of output data.
                'type': 'array',

                # Shown name.
                'name': 'Output 2',

                # Short description.
                'description': 'Default Output'
            },
            
           'output_3': {
                # Type of output data.
                'type': 'array',

                # Shown name.
                'name': 'Output 3',

                # Short description.
                'description': 'Default Output'
            }, 
            'output_4': {
                # Type of output data.
                'type': 'array',

                # Shown name.
                'name': 'Output 4',

                # Short description.
                'description': 'Default Output'
            }
            
        },

        # Params specification, the keys of the dict are the name of the param.
        'params': {
        # Read the documentation for more details on the kind of parameters and options available in SmartPredict.
            'operation_1': {
                'label': 'Window_size',
                'type': 'int',
                'default': 5,
                'input-type': "text",
                'description': 'Enter the number of features'
            },
             'operation_2': {
                'label': 'output_size',
                'type': 'int',
                'default': 1,
                'input-type': "text",
                'description': 'Enter the number of labels'
            },
            'operation_3': {
                'label': 'index to slice',
                'type': 'int',
                'default': 5,
                'input-type': "text",
                'description': 'Enter the number of labels'
            }
        },

        # Other description of the module.
        'doc': {
            'author': 'John Doe',
            'framework': 'tensorflow, sk-learn',
            'description': 'lorem ipsum dolor sit amet.'
        },

        # Version.
        'version': '0.0'
    }
    
   
    def prepare_data(self, data, win_size, output_size, index):
        idx = 0
        X = []
        Y = []
        
        while idx <= len(data):
           inp= data[idx: idx + win_size]
           out = data[idx + win_size : idx + win_size + output_size]
           
           #self.logger.debug(inp.shape)
           #self.logger.debug( inp)
           
           #self.logger.debug(out.shape)
           #self.logger.debug(out)
        
        
           if len(inp)== win_size and len(out)== output_size:
              X.append(inp)
              Y.append(out)
           idx = idx +1
           
            
        X = np.array(X)
        Y = np.array(Y)
    
        X1 = X[:index]
        X2 = X[(index+1):]
        
        Y1 = Y[:index]
        Y2 = Y[(index+1):]
        return X1, X2, Y1, Y2

        
   def run(self):
        """This method is called to run your module,

        Get input, read params, process data, set output."""

        # How to retrieve your input data.
        input_1_data = self.in_data['input_1']

        # How to retrieve your params value.
        wsize = self.param['operation_1']
        osize = self.param['operation_2']
        index = self.param['operation_3']

        x1 , x2 , y1 , y2 = self.prepare_data(input_1_data, wsize, osize, index)
        

        # This is how to set output data.
        
        self.out_data['output_1'] = x1
        self.out_data['output_2'] = x2
        self.out_data['output_3'] = y1
        self.out_data['output_4'] = y2

Once created we can drag an drop and set up our module as: window-size = 10, output_size = 1, index to slice = 80.

So we have 4 outputs from this module: X train, X test, Y train, Y test as in a supervised learning problem.

As you can guess, the next module will have to receive Xtrain, Y train, and our model (XGBoost Regressor). Then another module will evaluate the model with Xtest and Y test. We also need to save our trained model.
So our flowchart we are going to launch looks like this.

We use the Mean Absolute Error metric for evaluating the model, and we obtain 22 828.38 MAE which is very low compared to the dataset's standard deviation 1 826 330.06. So we can say that our trained model is excellent.

The following figure indicates these modules' settings.

Note: As figured in the figure above our trained model is saved as XGBoost_world_cases by an ItemSaver Module which is recoverable as a Module in the drop and down list "Trained Model" and we use in deploying the model. So let's go to the next step.

Step3: Deploying our trained model in Deploy Tab

Deploying our Machine Learning model will generate a REST API to which we can send our data and from which we receive the predictions returned by our model.

For deploying our model, we need to assemble a new set of modules. Once again, they can be dragged and dropped and are accessible from the right pane toolbar in the same location.

So the flowchart that we have built looks like the following figure.

Explanation:

- All flowchart in Deploy Tab always begins with the Web service IN module which receives data in JSON format as a dictionary.

- Then, we need to translate this dictionary into a dataframe by the DataFrame loader/Converter.

- During the training, our model receives arrays at its input, so once again we need the Features selector module.

- As it is no longer training but predicting, the module that we must use in Deploy Tab is the Predictor ML models which receives as input our trained model "XGBoost-Regressor-World-case".

- Finally, the flowchart ends up with the Web service OUT module which returns the prediction that we can recoverable as Web service.

All we need to do next is just launch the model as a REST API Webservice by clicking on the Rocket icon.

For more information see the documentation in deploy.

After having deployed our project, we should have received the REST APIWeb Service URL to use along with an access token. Such pieces of information can be copied and pasted by clicking on the copy icon. Those are the compulsory data we need to submit in order to call the AI model's Web Service.

If you don't know it yet, we can test our model directly with SmartPredict without going through another software. Let's go to Test TAB.

Step4: Testing our model in Test Tab

Testing our deployed trained model available as a Web service consists of presenting its inputs features in format JSON.

When we click on the arrow icon, outputs will be presented which is the predicting data of our Machine Learning.

As, in this time series forecasting, we have transformed the previous value of infected cases as features by our "time series to supervised" module, our inputs are the number of previous infected cases in the world. So we present in our model 10 previous (window size = 10) confirmed cases from the last registered. Then our model will predict the world cases on 5/28/20.

So the predicted value is 5 856 567 cases. This later can be presented as input for predicting the cases on 5/29/20.

To do so, we will need to save our output using a dataset. This blog will be updated on it.

Note :

For predicting China infected_cases, the flowchart in Build Tab and Deploy Tab stay the same. We should just change the dataset module and some parameters. And we can test in the same way as in world_case_forecasting.

4-Coronavirus deaths forecasting in SmartPredict

Hoping that the previous steps have allowed you to understand the principle of leading a problem on SmartPredict. So I encourage you to do the forecasting deaths.
Come on! For sure, it's not that difficult.
Besides, this will allow you to practice the way of carrying out a project on SmartPredict: an essential platform for carrying out an AI project faster and smarter.

Indication:

Throughout the construction of a flowchart, it is essential to know the type and size of the 'inputs and outputs' modules. For this, the DataObject logger Module allows us to specialize and display logs.

5- Conclusion

From now on Artificial intelligence is a breeze thanks to SmartPredict. It is a suitable place for realizing projects from SCRATCH without coding. It will be a needed platform for data scientists and AI practitioners for excelling in their tasks.

Subject Categories

SmartPredict