Data Science Book Reviews Series #001

I hope this book review helps the readers to choose the right book for their learning needs.

Data Science Book Reviews Series

Title of the Book : “Machine Learning using Python” authored by Manaranjan Pradhan & Prof. U Dinesh Kumar — 1st edition@2019 — List Price: Rs.534/-

Publisher: Wiley India Pvt.Limited, New Delhi

Book Review Rating — 7/10

This book gives the good understanding of machine learning concepts for the beginners. As the code is written in python, it’s suitable for the learners who’ve very basic knowledge of python language. As Python is a basic requirement for any Data Scientist, those aspiring to become Data Scientists, this is really a good read.

The book consists of 10 chapters. First three chapters introduced machine learning, Anaconda framework to write the ML code, some useful python libraries, probability distributions and hypothesis testing. Chapters 4 & 5 explained the supervised machine learning techniques: Regression & Classification. Chapters 6 is about ensemble algorithms. Last chapter is about text analytics. Rest of the chapters covers clustering, time-series and recommender systems.

The Most liked aspect of the book is neatly written codes in python with relevant data for every important topic of the Machine learning. Authors shared the link in the book to the codes and data sets that are used through out the book. The codes and data can be downloaded from the link provided. Codes are reproducible in Jupiter notebooks. Every important line of code is explained well with markup cells. You can practice the code while learning the concepts in the book. It’s way forward helpful to understand the new concepts in detail easily.

I like the way Machine learning is introduced in the first chapter. Topics like model development life cycle, Anaconda framework, ML algorithms are explained with the various steps in short and with relevant figures. Newer concepts are explained with crisp contents of theory. All the algorithms are demonstrated with relevant codes using the familiar MOOC data sets.

Numpy and Pandas libraries of Python are very essential to develop the code for ML modelling. The important packages and methods of Pandas for data analysis and visualisation are included in the book. The book does not offer very important libraries like re, date time, requests, collections, xml, etc that are very essential in preprocessing of the data. It is expected that you’ve basic to intermediate knowledge of Python to understand the code completely.

Majority of main graphical methods are shown in the second chapter of the book. Data visualisation with graphs and figures is at the best. Correlations, Heat maps, Dendographs and Hypothesis testing which are some of the most important aspects of ML are explained very well.

My favourite chapters are 4, 5, 7 & 8. Particularly, the basic assumptions of regression are clearly checked on the regression model to get to know how to improve the performance of regression model. Classification algorithms like Logistic Regression and Decision Tree are demonstrated with toy data sets. You can visualise the tree structure of the Decision Tree model with the code provided. You’ll enjoy learning about Encoding the categorical variables, Variance Inflation Factors, Multi-collinearity, RMSE, R-squared, accuracy, precision, recall, specificity, F-score, confusion matrix, ROC curve, optimal classification threshold and so on. Gain & Lift charts, Elbow curve methods are beautifully demonstrated. I really like the simple way of demonstration of the model diagnostics in the book.

Main algorithms in Time-Series, Recommender Systems and Clustering are included with beautiful codes. Chapter on Text analytics covers TF-IDF and Naive-Bayes algorithm for sentence classification nicely.

I strongly believe, hands on practice of these methods gives you great satisfaction of learning from the book. It’s better to practice well on the MOOC data to sharpen the skills.

The mathematical concepts underlying the algorithms are kept as minimal as required so that it’d be easy to understand the fundamentals of each algorithm. As you progress to apply the knowledge from this book on more and more new data, you will feel that there is still something missing to develop a better model. You may find it difficult to what transformation is necessary in the preprocessing state of ML life cycle. May be authors kept the theory short to make the readers at ease to learn as fast as they can. You may need to browse other books or Google to find the specific transformations needed to improve the model performance in terms of chosen metrics.

Hyper parameter tuning is one of the important aspect of model training. Although the books explains Grid Search and K-Fold cross validation very well, modern techniques are not mentioned. Hyper parameter techniques like Optuna, Bayesian Search using HyperOpt and Ray are worth practicing.

Important concepts like gini-impurity, cross entropy loss, entropy, mean squared error, mean absolute error, R-squared, bagging, boosting, impact of unbalanced data in classification problems, importance of feature engineering on the model performance, imputing the missing data and in-depth working of ensemble algorithms such as splitting criteria, its hyper parameters, etc., are not fully explained. Data science aspirants have to recall these topics thoroughly to tackle the interviews.

Overall, the book is worth reading and buying. You would definitely learn lot of important ML concepts whether you are beginner or intermediate. This book offers very promising content that you can refer whenever you need to search for the important ML topic or code. Most importantly, you can actually practice the majority of the ML algorithms very well with the help of codes written in this book.

Print quality and binding of the book is very good. Font and language is comfortable to the read. I’d say — It’s one of the best books within the budget to learn the “Machine Learning”. You can explore other books namely, “Machine Learning with Python Cookbook”, “Hands-On Machine Learning with Scikit-Learn and TensorFlow”, etc.

It’s very hard to rate the technical books. However, I’d like to rate this book 7 on 1–10 scale.

The topics like loading/reading the different types of files using python, web scraping using BeautifulSoup, latest AutoML platforms like PyCaret, deploying the model, latest hyper parameter tuning techniques would add more value to the book. Let’s hope, these topics would be incorporated in coming edition.

Do care to follow me to get to know about more book reviews!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohamed Rizwan

Data Scientist, Book Reviewer, Trainer, Script Writer. Loves to build NLP and Computer Vision based tools.