🏋🏻♂️ Fine-tuning
xTuring is easy to use. The library already loads the best parameters for each model by default.
For advanced usage, you can customize the finetuning_config
attribute of the model object.
In this tutorial, we will be loading one of the supported models and customizing it's fine-tune configuration before calibrating the model to the desired dataset.
Load the model and the dataset
First, we need to load the model and the dataset we want to use.
from xturing.models import BaseModel
from xturing.datasets import InstructionDataset
instruction_dataset = InstructionDataset("...")
model = BaseModel.create("")
Load the config object
Next, we need to fetch model's fine-tune configuration using the below command.
finetuning_config = model.finetuning_config()
Print the finetuning_config
object to check the default configuration.
Customize the configuration
Now, we can customize the generation configuration as we wish. All the customizable parameters are list below.
finetuning_config.batch_size = 64
finetuning_config.num_train_epochs = 1
finetuning_config.learning_rate = 1e-5
finetuning_config.weight_decay = 0.01
finetuning_config.optimizer_name = "adamw"
finetuning_config.output_dir = "training_dir/"
Start the fine-tuning
Now, we can run tune-up the model on our dataset to see how our set configuration works.
model.finetune(dataset=instruction_dataset)
Parameters
Name | Type | Range | Default | Desription |
---|---|---|---|---|
learning_rate | float | >0 | 1e-5 | The initial learning rate for the optimizer. |
gradient_accumulation_steps | int | ≥1 | 1 | The number of updates steps to accumulate the gradients for, before performing a backward/update pass. |
batch_size | int | ≥1 | 1 | The batch size per device (GPU/TPU core/CPU…) used for training. |
weight_decay | float | ≥0 | 0.00 | The weight decay to apply to all layers except all bias and LayerNorm weights in the optimizer. |
warmup_steps | int | ≥0 | 50 | The number of steps used for a linear warmup from 0 to learning_rate. |
max_length | int | ≥1 | 512 | The maximum length when tokenizing the inputs. |
num_train_epochs | int | ≥1 | 1 | The total number of training epochs to perform. |
eval_steps | int | ≥1 | 5000 | The number of update steps between two evaluations. |
save_steps | int | ≥1 | 5000 | The number of update steps before two checkpoint saves. |
logging_steps | int | ≥1 | 10 | The number of update steps between two logs. |
max_grad_norm | float | ≥0 | 2.0 | The maximum gradient norm (for gradient clipping). |
save_total_limit | int | ≥1 | 4 | If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. |
optimizer_name | string | N/A | adamw | The optimizer to be used. |
output_dir | string | N/A | saved_model | The output directory where the model predictions and checkpoints will be written. |