⚡️ Running model inference with FastAPI Server
After successfully fine-tuning your model, you can perform inference using a FastAPI server. The following steps guide you through launching and utilizing the API server for your fine-tuned model.
1. Launch API server from Command Line Interface (CLI)
To initiate the API server, execute the following command in your command line interface:
$ xturing api -m "/path/to/the/model"
Ensure that the model path you provide is a directory containing a valid xturing.json configuration file.
2. Health check API
- Request- URL : http://localhost:{PORT}/health 
- Method : GET 
 
- Response- {
 "success": true,
 "message": "API server is running"
 }
3. Inference API
- Request- URL : http://localhost:{PORT}/api 
- Method : POST 
- Body : The request body can contain the following properties: - prompt: Required, the prompt for text generation can be string or an array of Strings
- params: Optional, Params for generation
 - Here is an example for the request body: - {
 "prompt": ["What is JP Morgan?"],
 "params": {
 "penalty_alpha": 0.6,
 "top_k": 1.0,
 "top_p": 0.92,
 "do_sample": false,
 "max_new_tokens": 256
 }
 }
 
- Response- {
 "success": true,
 "response": ["JP Morgan is multinational investment bank and financial service headquartered in New York city."]
 }
By following these steps, you can effectively run your fine-tuned model for text generation through the FastAPI server, facilitating seamless inference with structured requests and responses.