🐍 Python API
This section includes the API documentation from the Finetuner codebase, as extracted from the docstrings in the code.
BaseModel
BaseModel.load(weights_dir_or_model_name)
Load a model from your local machine or xTuring Hub.
Parameters:
- weights_dir_or_model_name (str): Path to a local model to be load or a model from
xTuring
Hub.
CausalModel
model.finetune(dataset, logger = True)
Fine-tune the in-memory model on the desired dataset.
Parameters:
- dataset (Union[TextDataset, InstructionDataset]): The object of either of the 2 dataset classes specified in the library. If not passed, will throw an error.
- logger (Union[Logger, Iterable[Logger], bool]): If you want to log the progress in the default logger, pass nothing explicitly. Else, you can pass your own logger.
model.generate(texts = None ,dataset = None, batch_size = 1)
Use the in-memory model to generate outputs by passing either a
dataset
as an argument ortexts
as an argument which would be a list of strings.Parameters:
- texts (Optional[Union[List[str], str]]): Can be a single string or a list of strings on which you want to test your in-memory model.
- dataset (Optional[Union[TextDataset, InstructionDataset]]): The object of either of the 2 dataset classes specified in the library.
- batch_size (Optional[int]): For faster processing given your machine constraints, you can configure the batch size of the model. Higher the batch size, more the parallel compute, faster you will get your result.
model.evaluate(dataset, batch_size = 1)
Evaluate the in-memory model.
Parameters:
- dataset (Optional[Union[TextDataset, InstructionDataset]]): The object of either of the 2 dataset classes specified in the library.
- batch_size (Optional[int]): For faster processing given your machine constraints, you can configure the batch size of the model. Higher the batch size, more the parallel compute, faster you will get your result.
Save your in-memory model.
Parameters:
- path (Union[str, Path]): The path to the directory where you want to save your in-memory model. Can either be a string or a
Path
object (class found in pathlib).
InstructionDataset
Get an instruction data from a
.jsonl
file where each line is a json object with keys text, instruction and target.Parameters:
- path (Path): the path to the .jsonl file. Should be an object of the class
Path
from the pathlib.
InstructionDataset.generate_dataset
Generate your custom dataset given the HuggingFace engine.
Parameters:
- path (str): a string of the path where you want to save the generated dataset.
- engine (TextGenerationAPI): should be an object of one of the classes mentioned in the model_apis directory.
- num_instructions (Optinoal[int]): a cap on the size of sample set to be generated. Helps you create a more diverse dataset.
- num_instructions_for_finetuning (Optinoal[int]): size of the sample set to be generated. Uses up the credits from your account. Use this number very carefully.