In the world of AI, fine-tuning is one of the possible ways to ensure that we can refine LLM answers for a specific objective. At Ensolvers, the need for a consistent and refined response in different integrations we have to build by using OpenAI API has led us to explore fine-tuning as a strategic solution. While in this document we show our approach applied to OpenAI, it can be done in different AI models.
The motivation of using fine-tuning at Ensolvers started in a recent project in which we had the challenge to provide a set of previous results collected through past processes to improve accuracy in responses. In this case, the requirement was to find the similarity between two different products. More concretely, we have a main product, other potential products that might be similar, and we have to measure the "similarity risk".
Fine-tuning involves in this case providing the model with tailored data, such as examples of conversations it should engage in, and allowing it to iterate through these inputs to adapt to our desired response format. Our approach involved a continuous, iterative fine-tuning process. On a weekly basis, we submit real answers generated by the AI platform that were already curated by the user and marked as "correct". To transmit the training data, we employ a scheduled task that runs weekly. Each training piece is described in the JSON format described below.
In this structure,
USER_PROMPT product structure to be included in the prompt
ASSISTANT_RESPONSE denotes the AI's reply, which should adhere to the format below, so it can be easily parsed by the system.
To understand the full picture, it's important to note that after the reply is obtained from OpenAI API (here referred as ASSISTANT_RESPONSE), it is parsed and sent to the backend so it can be processed to generate a valuable result for the user. However, it is also stored so we can train the model. ASSISTANCE_RESPONSE structure is composed by the following pieces:
As aforementioned, this response is processed by the app backend and is sent to the frontend so the user can review it. Basically, the user is able to see all the potentially conflicting products and validate whether the risk of similitude is real or not. At this point, two things might happen: the user might mark it as a real risk or not. If the user does not provide any input, we assume that the result already provided by the AI is correct, so we don't take any action. On the other hand, if the user had to correct the result from the AI, then we include the input from the user in the training.
All this feedback is structured and submitted as a kind of training to OpenAI:
This code invokes the createFineTuningJob method, which takes four parameters:
Below you can find the methods that perform this task in different layers
In conclusion, fine-tuning OpenAI models proved to be a pivotal strategy in addressing the challenge of inconsistent and divergent responses. By iteratively customizing the model with our specific requirements, we achieved a more refined and reliable output. The weekly incorporation of real user-generated data, coupled with user assessments, ensured a continuous improvement loop. This approach not only enhanced the model's consistency but also allowed us to adapt it dynamically to evolving needs. We included in this article the core code pieces that make this possible from the implementation end.