💽 Prepare and save dataset
We provide several type of datasets to use with your data. Depending on how you want to train and use your model, you can choose:
- InstructionDataset - You want the model to generate text based on an instruction/task.
- TextDataset - You want the model to complete your text.
InstructionDataset
Here is how you can create this type of dataset:
- Dictionary
- Folder
From a python dictionary with the following keys:
- instruction : List of strings representing the instructions/tasks.
- text : List of strings representing the input text.
- target : List of strings representing the target text.
from xturing.datasets.instruction_dataset import InstructionDataset
dataset = InstructionDataset({
"text": ["first text", "second text"],
"target": ["first text", "second text"],
"instruction": ["first instruction", "second instruction"]
})
From a saved location:
from xturing.datasets.instruction_dataset import InstructionDataset
dataset = InstructionDataset('path/to/saved/location')
TextDataset
Here is how you can create this type of dataset:
- Dictionary
- Folder
From a python dictionary with the following keys:
- text : List of strings representing the input text.
- target : List of strings representing the target text.
from xturing.datasets.text_dataset import TextDataset
dataset = TextDataset({
"text": ["first text", "second text"],
"target": ["first text", "second text"]
})
From a saved location:
from xturing.datasets.text_dataset import TextDataset
dataset = TextDataset('path/to/saved/location')
Save a dataset
You can save a dataset to a folder using the save
method:
from xturing.datasets import ...
dataset = ...
dataset.save('path/to/a/directory')