Developing an API for implementing a thematic and dynamic Q&A using any AI solution involves in general setting up or training a system that can provide responses to questions in a particular domain or theme outside the traditional Frequently Asked Questions sections. This type of solution is increasingly popular among organizations looking to improve customer support and service by providing quick and accurate responses to common questions. In this article, we are going to show how we can implement this via OpenAI tooling.
The key for creating a Q&A solution with OpenAI’s API without having to train an entire solution from scratch lies in the prompt: it must be a template so it can be used for multiple topics and it should handle special situations, like when the question is completely unrelated to the topic of the Q&A. It also has to be relatively concise, as the longer it is, the more tokens will be consumed by the API call and the higher the costs. Examples can be provided in the prompt so that the AI produces responses with the same tone.
Regarding costs, OpenAI uses the concept of a "token" to charge for their API usage. A token is a discrete unit that represents a single word, punctuation mark, or other symbol in a text. Prices are per 1,000 tokens, the more we use, the more we get billed for them.
Now, let's focus on the solution. There are two options when it comes to OpenAI’s API completions, the Chat API and the Completion API. The main difference between them is in the fact that the Chat API can take a conversation, with multiple messages back and forth. While this may be great because we can keep context and ask questions based on it, the context has to be sent on each call, consuming more tokens and adding complexity, as we now have to keep the conversation of each user. To make matters worse, if the conversation gets long enough, the API fails, as it can take a maximum of 4096 tokens in a prompt.
The following is an example of a prompt for a Q&A bot using the Completion API
This is what a conversation with this bot looks like, when it's configured to answer questions about flowers, plants and gardening in less than 50 words and with “I don’t know about that” as the answer to unrelated questions.
An important parameter to adjust is the temperature. The higher it is, the more randomness there is in the answers. So, to get answers that are more deterministic, we should use a low temperature like 0.
Despite this powerful tool we've been testing successfully so far, it's very common that users ask a common subset of questions - the well-known Frequently Asked Questions or FAQ. for those cases, we have concrete answers for those common questions that are auto-completed as the user types. This allowed us to save the tokens for other more unexpected, complex questions.
Finally, it's important to mention that there exists a monitoring period in which automatic answers need to be observed with the originating question to discover potential pitfalls in the process. In particular, we've been alpha-testing this solution for weeks and so far, we haven't been forced to tune the current solution triggered with the prompt above.
In this article, we have discussed how to quickly implement a chatbot by using simple prompts via OpenAI's API. We've discussed the difference between the Chat and the Completion API and seen an example of how this all would look once implemented. Building this prototype just took a couple of days and our experience so far was very positive, which suggests that LLM (Large Language Models) have a huge potential to add value to current businesses at a very low cost.