Skip to main content

💽 Prepare and save dataset

We provide several type of datasets to use with your data. Depending on how you want to train and use your model, you can choose:

InstructionDataset

Here is how you can create this type of dataset:

From a python dictionary with the following keys:

  • instruction : List of strings representing the instructions/tasks.
  • text : List of strings representing the input text.
  • target : List of strings representing the target text.
from xturing.datasets.instruction_dataset import InstructionDataset

dataset = InstructionDataset({
"text": ["first text", "second text"],
"target": ["first text", "second text"],
"instruction": ["first instruction", "second instruction"]
})

TextDataset

Here is how you can create this type of dataset:

From a python dictionary with the following keys:

  • text : List of strings representing the input text.
  • target : List of strings representing the target text.
from xturing.datasets.text_dataset import TextDataset

dataset = TextDataset({
"text": ["first text", "second text"],
"target": ["first text", "second text"]
})

Save a dataset

You can save a dataset to a folder using the save method:

from xturing.datasets import ...
dataset = ...

dataset.save('path/to/a/directory')