Skip to main content

🏋🏻‍♂️ Fine-tuning

xTuring is easy to use. The library already loads the best parameters for each model by default.

For advanced usage, you can customize the finetuning_config attribute of the model object.

In this tutorial, we will be loading one of the supported models and customizing it's fine-tune configuration before calibrating the model to the desired dataset.

Load the model and the dataset

First, we need to load the model and the dataset we want to use.

from xturing.models import BaseModel
from xturing.datasets import InstructionDataset

instruction_dataset = InstructionDataset("...")
model = BaseModel.create("")

Load the config object

Next, we need to fetch model's fine-tune configuration using the below command.

finetuning_config = model.finetuning_config()

Print the finetuning_config object to check the default configuration.

Customize the configuration

Now, we can customize the generation configuration as we wish. All the customizable parameters are list below.

finetuning_config.batch_size = 64
finetuning_config.num_train_epochs = 1
finetuning_config.learning_rate = 1e-5
finetuning_config.weight_decay = 0.01
finetuning_config.optimizer_name = "adamw"
finetuning_config.output_dir = "training_dir/"

Start the fine-tuning

Now, we can run tune-up the model on our dataset to see how our set configuration works.

model.finetune(dataset=instruction_dataset)

Parameters

NameTypeRangeDefaultDesription
learning_ratefloat>01e-5The initial learning rate for the optimizer.
gradient_accumulation_stepsint≥11The number of updates steps to accumulate the gradients for, before performing a backward/update pass.
batch_sizeint≥11The batch size per device (GPU/TPU core/CPU…) used for training.
weight_decayfloat≥00.00The weight decay to apply to all layers except all bias and LayerNorm weights in the optimizer.
warmup_stepsint≥050The number of steps used for a linear warmup from 0 to learning_rate.
max_lengthint≥1512The maximum length when tokenizing the inputs.
num_train_epochsint≥11The total number of training epochs to perform.
eval_stepsint≥15000The number of update steps between two evaluations.
save_stepsint≥15000The number of update steps before two checkpoint saves.
logging_stepsint≥110The number of update steps between two logs.
max_grad_normfloat≥02.0The maximum gradient norm (for gradient clipping).
save_total_limitint≥14If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir.
optimizer_namestringN/AadamwThe optimizer to be used.
output_dirstringN/Asaved_modelThe output directory where the model predictions and checkpoints will be written.