Introduction


In real-life scenarios, it is common to encounter datasets that are not clean in any form. Unlike humans, computers cannot easily interpret data presented in various formats. Examples of unclean datasets include:

<aside> 💡 Learning about your dataset is a good practice to better understand what you should do to clean up your dataset.

</aside>

Resolution: Missing Values


Machine learning algorithms can work properly when dealing with missing values. Therefore, all of them should be handled using several methods:

Resolution: Others


You can use libraries provided by any programming language to create a function to clean your dataset or even create your own algorithm. Since this is not a general problem, it can be solved using simple programming logic.

Main Reference


A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol: O’Reilly Media, 2019.