undo
Go Beyond the Code
arrow_forward_ios

Improving AI responses via fine-tuning

February 19, 2024

Context / Introduction

In the world of AI, fine-tuning is one of the possible ways to ensure that we can refine LLM answers for a specific objective. At Ensolvers, the need for a consistent and refined response in different integrations we have to build by using OpenAI API has led us to explore fine-tuning as a strategic solution. While in this document we show our approach applied to OpenAI, it can be done in different AI models. 

Problem and Solution

The motivation of using fine-tuning at Ensolvers started in a recent project in which we had the challenge to provide a set of previous results collected through past processes to improve accuracy in responses. In this case, the requirement was to find the similarity between two different products. More concretely, we have a main product, other potential products that might be similar, and we have to measure the "similarity risk". 


Fine-tuning involves in this case providing the model with tailored data, such as examples of conversations it should engage in, and allowing it to iterate through these inputs to adapt to our desired response format. Our approach involved a continuous, iterative fine-tuning process. On a weekly basis, we submit real answers generated by the AI platform that were already curated by the user and marked as "correct". To transmit the training data, we employ a scheduled task that runs weekly. Each training piece is described in the JSON format described below.

In this structure, 

  • SYSTEM_MESSAGE represents the set of instructions guiding the AI's response, 
  • USER_PROMPT is the prompt sent to OpenAI. In this case we've opted for a structured format for potential competitors following the standard detailed below
  • ASSISTANT_RESPONSE represents OpenAI reply, which must adhere to a format also detailed below, which makes it easy to be parsed by the system.

USER_PROMPT product structure to be included in the prompt

ASSISTANT_RESPONSE denotes the AI's reply, which should adhere to the format below, so it can be easily parsed by the system.

To understand the full picture, it's important to note that after the reply is obtained from OpenAI API (here referred as ASSISTANT_RESPONSE), it is parsed and sent to the backend so it can be processed to generate a valuable result for the user. However, it is also stored so we can train the model. ASSISTANCE_RESPONSE structure is composed by the following pieces:

  • {AI_RISK}: Represents the risk of conflict (similarity) level identified by the AI, categorized as low, medium, or high.
  • {AI_VALUE}: Denotes a floating-point number ranging from 0 to 1, indicating the degree of risk associated with more detail.
  • {AI_PRODUCTNAME}: Corresponds to the product name.
  • {AI_PRODUCTCOMPARED}: The name of the product being compared.
  • {AI_MATCHINGFEATURES}: The matching features of both products.


As aforementioned, this response is processed by the app backend and is sent to the frontend so the user can review it. Basically, the user is able to see all the potentially conflicting products and validate whether the risk of similitude is real or not. At this point, two things might happen: the user might mark it as a real risk or not. If the user does not provide any input, we assume that the result already provided by the AI is correct, so we don't take any action. On the other hand, if the user had to correct the result from the AI, then we include the input from the user in the training.

All this feedback is structured and submitted as a kind of training to OpenAI:

This code invokes the createFineTuningJob method, which takes four parameters:

  •  The path of the training file we previously generated.
  •  The name of the model to be fine-tuned (in this case, the latest "risk-assessment" model).
  •  The number of epochs for training.
  •  A suffix for differentiation

Below you can find the methods that perform this task in different layers


Conclusion

In conclusion, fine-tuning OpenAI models proved to be a pivotal strategy in addressing the challenge of inconsistent and divergent responses. By iteratively customizing the model with our specific requirements, we achieved a more refined and reliable output. The weekly incorporation of real user-generated data, coupled with user assessments, ensured a continuous improvement loop. This approach not only enhanced the model's consistency but also allowed us to adapt it dynamically to evolving needs. We included in this article the core code pieces that make this possible from the implementation end.

To learn more about this topic, click here.
Agustín Ferres
Software Engineer & Solver

Start Your Digital Journey Now!

Which capabilities are you interested in?
You may select more than one.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.