Data Science Book Reviews #4

Mohamed Rizwan

4 min readOct 26, 2021

Title of the book: “Getting Started with Google BERT”

Author: Sudharsan Ravichandiran

Edition: 1st edition@2021

Publisher: Packt Publishing Limited

List Price: Rs.2749/-

ISBN: 978–1838821593

About the Author

Sudharsan Ravichandiran is a data scientist, researcher, Artificial Intelligence enthusiast, and YouTuber. He completed his Bachelor’s in Information Technology at Anna University. His area of research focuses on practical implementations of deep learning and reinforcement learning, including Natural Language Processing and computer vision.

https://www.packtpub.com/product/getting-started-with-google-bert/9781838821593

Rating — 6.5/10

This book is divided into 3 sections of total 9 chapters. First 2 chapters are devoted to explain the building blocks of transformer and BERT. Subword tokenization algorithms are also discussed. Third chapter includes hands-on practical applications of fine tuning BERT on the downstream tasks.

Section 2 is all about various variants of BERT — ALBERT, RoBERTa, ELECTRA, SpanBERT, distilBERT, TinyBERT and transferring knowledge from BERT to neural networks.

In section 3, chapter 6 elaborates text summarization using BERT. Chapter 7 is dealt with multilingual BERT, XLM, XLM-Roberta and language specific BERT models. Chapter 8 explores sentence and domain specific BERT models like sentence-BERT, clinicalBERT and BioBERT. Final chapter of the book is devoted to get to understand the BART, videoBERT and ktrain library of BERT.

Book Review

This book gives the excellent understanding of transformer encoder and decoder blocks and working of BERT. Each and every concept of transformer and BERT architectures are elaborated step by step. I am sure that beginners could understand them very clearly and very easily. Code examples are written in Pytorch. Hence, it is expected that readers have basic skills in python, pytorch and huggingface.

I am very delighted of step by step approach of explaining the transformer encoder, decoder architectures and BERT as well. Self-attention and Multi-head attention mechanisms are explained very well with examples. Importance of positional encoding in transformer and layer normalisation are explained. Working of encoder and decoder is elaborated in such a way that readers would be really ecstatic. Why multi head attention layer is masked in decoder? The intuition to answer this question is examined with a nice example. Every component of transformer is thoroughly discussed so as to make the reader understand it’s use in the transformer architecture. First two chapters of this book are highlights.

Pre-training the BERT with Masked Language Modelling and Next Sentence Prediction is explained wonderfully. Token embedding, segment embedding, position embedding, WordPiece tokenizer, and subword tokenization algorithms are my favourites in the second chapter. However, code examples that present in this book to fine tune the pre-trained BERT and its variants for various downstream tasks are merely for intuition purpose only. There is no complete end to end code is available in the book, if you want to fine tune BERT or it’s variants for some down stream task.

The Albert and Roberta models are explained with toy data. You will get very good intuition of these models. However, ELECTRA and spanBERT are not easily understandable as it is not discussed in depth with required ML and mathematical concepts. You need to read the research articles of these architecture to get the complete picture of how and why these models are useful. Other important variants of BERT are distilBERT and TinyBERT. Both of these models are explained wonderfully. The necessity of these models also clearly explored. Important of data augmentation to train the student BERT also discussed elaborately. You will get to know some of the important varieties of BERT with references of research articles in the fifth chapter.

Text summarization is one of the key applications of the NLP. In the chapter 6, types of text summarization, fine tuning BERT for text summarization, and evaluation metrics (ROGUE1,2 & L ) are explained. However, end to end code examples for extractive and abstractive text summarization with BERT variants is not available.

Different pre-training strategies of Multilingual BERT (mBERT) are good to know in the 7th chapter of the book. XLM and XLM-Roberta are not elaborated clearly. Other language models on French, German, Spanish, Chinese, Dutch, Japanese, Italian, Portuguese and Russian also introduced.

I liked the way ‘Sentence-BERT’ is elaborated in the 8th chapter. You will get fair idea when and how to use sentenceBERT, primarily, for sentence similarity tasks. Using pre-trained sentence BERT as teacher and pre-trained XLM-R as student also elaborated to compute sentence similarity between two sentences in different languages. Clinical BERT and BioBERT introduced with examples relevant datasets.

In the last chapter, VideoBERT is introduced very well. BART and ktrain are some of highlights of this book. Both the concepts are terrifically explained with simple text sentences.

The book offers simplistic language to understand the content. How to use model cards in huggingface library or any other libraries are not included in the book. You may need to understand and learn deep learning frameworks, huggingface library and cross validation techniques in DL to implement these models practically. Huggingface offers a course to learn about practical implementation of models on different downstream tasks. You may feel dragged with repetitive content in some of the chapters. Language is informal and have several grammatical mistakes. Overall, the book is suitable read for beginners and intermediate to explore BERT and its variants.

You can explore similar books like this — “Transformers for Natural Language Processing” by Denis Rothman and “Advanced Natural Language Processing with TensorFlow 2” by Ashish Bansal.

If you find this book review useful, please share this article with your friends and do care to give a CLAP.

Follow me to keep posted coming Data Science Book Reviews!

Data Science Book Reviews #4

Written by Mohamed Rizwan

Responses (1)