💽 Prepare and save dataset
We provide several type of datasets to use with your data. Depending on how you want to train and use your model, you can choose:
- InstructionDataset - You want the model to generate text based on an instruction/task.
 - TextDataset - You want the model to complete your text.
 
InstructionDataset
Here is how you can create this type of dataset:
- Dictionary
 - Folder
 
From a python dictionary with the following keys:
- instruction : List of strings representing the instructions/tasks.
 - text : List of strings representing the input text.
 - target : List of strings representing the target text.
 
from xturing.datasets.instruction_dataset import InstructionDataset
dataset = InstructionDataset({
    "text": ["first text", "second text"],
    "target": ["first text", "second text"],
    "instruction": ["first instruction", "second instruction"]
})
From a saved location:
from xturing.datasets.instruction_dataset import InstructionDataset
dataset = InstructionDataset('path/to/saved/location')
TextDataset
Here is how you can create this type of dataset:
- Dictionary
 - Folder
 
From a python dictionary with the following keys:
- text : List of strings representing the input text.
 - target : List of strings representing the target text.
 
from xturing.datasets.text_dataset import TextDataset
dataset = TextDataset({
    "text": ["first text", "second text"],
    "target": ["first text", "second text"]
})
From a saved location:
from xturing.datasets.text_dataset import TextDataset
dataset = TextDataset('path/to/saved/location')
Save a dataset
You can save a dataset to a folder using the save method:
from xturing.datasets import ...
dataset = ...
dataset.save('path/to/a/directory')