How to choose appropriate transformer model from the 🤗 library to fine tune on the downstream task?

4 min readAug 31, 2021

This article guides you to understand about the 🤗 (huggingface library) model cards for downstream tasks.

Data

First task for us is to check whether the text data is monolingual or multilingual. If it is in Indian languages or foreign languages. Little digging into the data (exploratory analysis) gives us many other meaningful insights of the data. Whether it is needed preprocessing or not? Preprocessing is not needed in all the tasks, as it introduces noise and bias.

Downstream Task

There are wide variety of downstream tasks listed on top left of the huggingface models page. There are some models that can be used only for specific tasks. You can figure out easily reading the names of tasks.

🤗 model tasks — source: www.huggingface.co

Generally, any NLP model can used for variety of downstream tasks. However, the results vary drastically, if the model shown good results only for specific kind of task(s) only. You can find out the test results in the model card.

for example, BERT base (uncased) model — glue test results are under:

From the Model Card of BERT base model (uncased)

It’s advisable to read the description, limitations, bias and training procedure. If you like to understand the deep insights of a particular model, read the related research article on the journal http://arxiv.org/search.

Which model among Encoder only, Decoder only or sequence-to-sequence models to choose?

BERT is Encoder only model, GPT-2 is Decoder only and mBART is sequence-to-sequence model.

Encoder models use only the encoder of a Transformer model. Encoder models are best suited for tasks requiring an understanding of the full sentence.

Classification
Named Entity Recognition
Extractive question answering

Some examples of these models are ALBERT, DistilBERT, ELECTRA and RoBERTa.

Decoder models use only the decoder of a Transformer model. These are also called auto-regressive models. These models are best suited for tasks involving text generation.

Some examples of these models are CTRL, GPT variants and Transformer XL.

Encoder-decoder models (also called sequence-to-sequence models) use both encoder and decoder architecture of Transformer.

Sequence-to-sequence models are best suited for tasks generating new sentences depending on a given input.

Text Summarisation
Translation
Generative Question Answering (Generating Multiple Choise Questions)

Some examples of these models are BART, mBART, Marian and T5.

Find the model Card

Once you find out the type of model for the type of task, it’s easy to browse the 🤗 library to find the pre-trained models.

Suppose you decided to fine tune the pre-trained model on the question-answering task. Downstream task is ‘question-answering’ and the model is encoder only model. The text data is from more than one Indian language. It’s time to decide on the baseline.

When you click the task on 🤗 library page, it shows lot of encoder models. Which one to choose? You can not choose any random model to fine tune on the downstream task. Because it consumes lot of resources. In end, you may find it very difficult.

Simple strategy is to better understand the model from the model card of 🤗 library.

Atleast explore 5–6 model cards. When you read the model cards correctly, you may finally decide to go with 3–4 models.

Majority of the model cards show “Hosted inference API” on right side of the page as seen in the following figure. Copy and paste the question and context from the train data in the respective fields and click Compute button to get the results. The better the score and/or answer, the better model. Go with the model to fine tune on the downstream task. Please be keep in mind the hyperparameters from the configuration file(json file) of the model.

Files and versions

Many researchers share Github repository that containes the codes and examples of the model in the in the “Files and versions” section of the model card.

The following figure is ‘Files and versions’ of ‘bert-base-uncased’ model card

Pipeline Function

In case you stuck with selecting the appropriate model, you can try this function to evaluate the performance of models on the raw data.

It’s very powerful tool to evaluate the performance of the models on the given data without fine tuning.

Example

If you like the article, Do care to clap for more motivation.

I am releasing new article on the ‘Preprocessing of the files using Python” soon. This article will helps you hand-on practical skills to load, read and preprocess text, tabular, image, audio, video files efficienly.

Click the Follow button to keep you posted the unique and exciting content!!!

How to choose appropriate transformer model from the 🤗 library to fine tune on the downstream task?

Written by Mohamed Rizwan