⚡️ Running model inference with FastAPI Server
After successfully fine-tuning your model, you can perform inference using a FastAPI server. The following steps guide you through launching and utilizing the API server for your fine-tuned model.
1. Launch API server from Command Line Interface (CLI)
To initiate the API server, execute the following command in your command line interface:
$ xturing api -m "/path/to/the/model"
Ensure that the model path you provide is a directory containing a valid xturing.json configuration file.
2. Health check API
Request
URL : http://localhost:{PORT}/health
Method : GET
Response
{
"success": true,
"message": "API server is running"
}
3. Inference API
Request
URL : http://localhost:{PORT}/api
Method : POST
Body : The request body can contain the following properties:
- prompt: Required, the prompt for text generation can be string or an array of Strings
- params: Optional, Params for generation
Here is an example for the request body:
{
"prompt": ["What is JP Morgan?"],
"params": {
"penalty_alpha": 0.6,
"top_k": 1.0,
"top_p": 0.92,
"do_sample": false,
"max_new_tokens": 256
}
}
Response
{
"success": true,
"response": ["JP Morgan is multinational investment bank and financial service headquartered in New York city."]
}
By following these steps, you can effectively run your fine-tuned model for text generation through the FastAPI server, facilitating seamless inference with structured requests and responses.