⚡️ Running model inference with FastAPI Server

After successfully fine-tuning your model, you can perform inference using a FastAPI server. The following steps guide you through launching and utilizing the API server for your fine-tuned model.

1. Launch API server from Command Line Interface (CLI)

To initiate the API server, execute the following command in your command line interface:

$ xturing api -m "/path/to/the/model"

info

Ensure that the model path you provide is a directory containing a valid xturing.json configuration file.

2. Health check API

Request
- URL : http://localhost:{PORT}/health
- Method : GET

Response

{
  "success": true,
  "message": "API server is running"
}

3. Inference API

Request
- URL : http://localhost:{PORT}/api
- Method : POST
- Body : The request body can contain the following properties:
  - prompt: Required, the prompt for text generation can be string or an array of Strings
  - params: Optional, Params for generation
  Here is an example for the request body:
```
{
  "prompt": ["What is JP Morgan?"],
  "params": {
    "penalty_alpha": 0.6,
    "top_k": 1.0,
    "top_p": 0.92,
    "do_sample": false,
    "max_new_tokens": 256
  }
}
```

Response

{
  "success": true,
  "response": ["JP Morgan is multinational investment bank and financial service headquartered in New York city."]
}

By following these steps, you can effectively run your fine-tuned model for text generation through the FastAPI server, facilitating seamless inference with structured requests and responses.

⚡️ Running model inference with FastAPI Server

1. Launch API server from Command Line Interface (CLI)​

2. Health check API​

Request​

Response​

3. Inference API​

Request​

Response​

1. Launch API server from Command Line Interface (CLI)

2. Health check API

Request

Response

3. Inference API

Request

Response