As a professional in data, one of the most important tasks of your job consists in presenting reports for your team, to expose the state of things in a compelling manner. However, not only is data profiling required for final reports, but it can also be used for other equally important purposes. Indeed, for preliminary insights, you need this right at the beginning, while you are still at the preprocessing data stage. In this blog article, let's see how you can get the most of it.
A Python library for Machine Learning is Pandas Profiling Tool. If you have a limited coding experience, or no time to remember lines of commands, you surely wish you could create data profiling reports easily, don't you? Well you can do that for sure. With the data visualizer module of SmartPredict , you will get all the pandas package inside a simple module.
- Type inference: detect the types of columns in a dataframe.
- Essentials: type, unique values, missing values
- Quantile statistics
- Descriptive statistics like mean, mode, standard deviation
- Most frequent values
- Correlations ( Spearman, Pearson and Kendall matrices°
- Missing values matrix, count, heatmap and dendrogram
This simple portable module offers you the full experience without the code worries.
(Find the PPT official documentation here)
SmartPredict data visualizer for profiling datasets and processing pipelines
Whether you are a data analyst, a data scientist or a product manager, SmartPredict has designed for you a special tool to realize colorful graphical reports in no time.
As you already know, in SmartPredict, data science pipelines are represented in the form of flowcharts, which makes it easier than ever to create and deploy artificial intelligence projects.
Based on the Pandas profiling tool, the drag-and-drop module for visualization can be transported anywhere on the flowchart and linked to any output for displaying before-after processing data.
For ways to do that, I invite you to follow this short tutorial on how to get visual reports with the help of the data visualizer.
1-Create a new project in Manualflow
2-Drag and drop your dataset into the build workspace
3-Look for the data visualizer under the control modules drop down list
4-Link the dataset to the data visualizer then run the project
5-Open the data visualizer's menu and get to the processing then profiling tab.
Notice that we can use the data visualizer to see the content of a dataset anywhere on the flowchart and even after processing operations just like with a data processor.
You have now seen how to use the data visualizer to gain valuable insights from your data exploration
Now start using it to brilliantly present your reports
Find the video here that explains it all in detail.
We have seen that data profiling is a necessary, if not an extremely important step before diving into a data science project. As it provides an overview of the data we deal with, it helps pinpoint the quantitative and qualitative relationships that exist between the elements of your dataset. It also offer an intuitive roadmap on where to clean your data set or where to aggregate features. To make it even more playful, SmartPredict has designed a high-level module based on Pandas Profiling Tool for visualizing the distribution of your data.
Try it out!