Skip to main content

🐍 Python API

This section includes the API documentation from the Finetuner codebase, as extracted from the docstrings in the code.

BaseModel

BaseModel.load(weights_dir_or_model_name)

Load a model from your local machine or xTuring Hub.

Parameters:

  • weights_dir_or_model_name (str): Path to a local model to be load or a model from xTuring Hub.

CausalModel

model.finetune(dataset, logger = True)

Fine-tune the in-memory model on the desired dataset.

Parameters:

  • dataset (Union[TextDataset, InstructionDataset]): The object of either of the 2 dataset classes specified in the library. If not passed, will throw an error.
  • logger (Union[Logger, Iterable[Logger], bool]): If you want to log the progress in the default logger, pass nothing explicitly. Else, you can pass your own logger.

model.generate(texts = None ,dataset = None, batch_size = 1)

Use the in-memory model to generate outputs by passing either a dataset as an argument or texts as an argument which would be a list of strings.

Parameters:

  • texts (Optional[Union[List[str], str]]): Can be a single string or a list of strings on which you want to test your in-memory model.
  • dataset (Optional[Union[TextDataset, InstructionDataset]]): The object of either of the 2 dataset classes specified in the library.
  • batch_size (Optional[int]): For faster processing given your machine constraints, you can configure the batch size of the model. Higher the batch size, more the parallel compute, faster you will get your result.

model.evaluate(dataset, batch_size = 1)

Evaluate the in-memory model.

Parameters:

  • dataset (Optional[Union[TextDataset, InstructionDataset]]): The object of either of the 2 dataset classes specified in the library.
  • batch_size (Optional[int]): For faster processing given your machine constraints, you can configure the batch size of the model. Higher the batch size, more the parallel compute, faster you will get your result.

model.save(path)

Save your in-memory model.

Parameters:

  • path (Union[str, Path]): The path to the directory where you want to save your in-memory model. Can either be a string or a Path object (class found in pathlib).

InstructionDataset

dataset.from_jsonl

Get an instruction data from a .jsonl file where each line is a json object with keys text, instruction and target.

Parameters:

  • path (Path): the path to the .jsonl file. Should be an object of the class Path from the pathlib.

InstructionDataset.generate_dataset

Generate your custom dataset given the HuggingFace engine.

Parameters:

  • path (str): a string of the path where you want to save the generated dataset.
  • engine (TextGenerationAPI): should be an object of one of the classes mentioned in the model_apis directory.
  • num_instructions (Optinoal[int]): a cap on the size of sample set to be generated. Helps you create a more diverse dataset.
  • num_instructions_for_finetuning (Optinoal[int]): size of the sample set to be generated. Uses up the credits from your account. Use this number very carefully.