In real-life scenarios, it is common to encounter datasets that are not clean in any form. Unlike humans, computers cannot easily interpret data presented in various formats. Examples of unclean datasets include:
<aside> 💡 Learning about your dataset is a good practice to better understand what you should do to clean up your dataset.
</aside>
Machine learning algorithms can work properly when dealing with missing values. Therefore, all of them should be handled using several methods:
You can use libraries provided by any programming language to create a function to clean your dataset or even create your own algorithm. Since this is not a general problem, it can be solved using simple programming logic.
A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol: O’Reilly Media, 2019.