What actually is "AI", and how is "AI" customised to specific industry use cases. In this article, we deep dive into the core and advanced concepts of tuning AI for custom processes.
People often think of "AI" as just chatGPT. Actually, chatGPT is just the starting point. Generally, there are two goals when we talk about tuning "AI" to a custom use case or process:
Infuse the Large Language Model (LLM) – the engine that powers "AI" – with industry, company or process specific knowledge
Providing relevant information in the prompt itself
Finetuning the knowledge into the "core" of the LLM itself
Experiment with (i) prompts and (ii) AI techniques to "dial up" the weightage of specific knowledge and increase consistency of output
E.g. Test prompts (which are often LLM-specific) to discover which ones work the best
E.g. Break down a complex task into smaller sub-tasks
E.g. Provide (dynamic) examples depending on the current task to guide LLM
E.g. Break down the reasoning steps for the LLM to work through one-by-one
E.g. Use multiple LLMs to generate answers, and then a judge LLM to select the best answer
E.g. Run an answer through another (also LLM-powered) reflection process, to check for errors, and correct them
E.g. See if we can reverse-engineer the question from the answer
Let's talk about each one at a time.
1. Infusing Industry/Process Specific Knowledge
LLMs are generally trained on publicly available data and do not "know" your industry, company or process specific knowledge. It needs to be taught. This step really is about how do you teach an LLM additional knowledge?
Fine-tuning
If you imagine an LLM as a locked box, one way is to unlock this box and put the new knowledge into it. This is an extremely simplified explanation however. In reality, the process by which this happens is called fine-tuning, and it involves highly complicated adjustments within what we term as an LLM's weights. Fine-tuning requires a lot of GPU processing power, and can take days or even weeks per iteration.
Think of it like an LLM has a few billion levers. Each change in a lever may affect other levers. Fine-tuning is the changing of millions of these levers, and observing the effects it has on other levers to the best of your ability.
RAG
A much simpler way to infuse knowledge is to use the same locked box, but include the additional information within the prompt itself. If you've ever come across the term Retrieval Augmented Generation (RAG), this is essentially what it means.
RAG is much simpler, controllable and faster to iterate and experiment with. The downside however, is that the specific knowledge might not get stronger attention versus the "innate" knowledge the LLM has. This is particular because LLMs all suffer from this effect called lost-in-the-middle – where more attention is paid to the start and ending of a prompt versus the middle. Therefore the longer the prompt, the worse the performance.
The success of RAG also depends on how "smart" the baseline LLM already is. An LLM model with stronger reasoning capabilities is able to utilise RAG with better outcomes versus a weaker one. As a result, in certain tasks (like sentiment analysis), RAG does not perform as well as fine-tuning.
2(i). Prompt Engineering, or Prompt Tuning
Even if knowledge is infused, the next problem we have to think about is how to get the LLM to understand what we want.
LLMs receive instructions in the form of a prompt, which is written just like how humans would talk to other humans. However, in reality, LLMs don't actually "understand" the words the way humans do. Therefore, what are optimal instructions to a human are often not always the case for LLMs.
It's almost like LLMs have their own lingo, and you have to speak to them with their lingo for best performance. What makes this even more tricky is that a good prompt is LLM specific - meaning to say it might work for one LLM but not another. This area of AI is called Prompt Engineering, or Prompt Tuning.
While Prompt Engineering sounds simple, it can actually get quite complex. The gist of it is that generally, there isn't a better way to go about this other than trial-and-error. Now there is of course manual trial-and-error, but recent exciting developments in AI include the use of AI to run its own trial-and-error and evaluation processes and just tell us the best prompt for a task in each LLM.
2(ii). Advanced AI Techniques
If you see running a query against an LLM model as a lego block, then advanced AI techniques is stacking these lego blocks in different ways to achieve a much better overall result.
One such technique is to force an LLM to reason and break down a task into sub-steps, and then work through each step one-by-one. Just as with humans, this structured process actually results in stronger outputs.
Another technique is to force LLMs to take its initial answer, and reflect on why it might or might not be correct before asking it to decide on a final answer. It seems comically odd, but these human-like methods are really good at improving outputs from LLMs.
A third technique involves having different LLMs generate their own answers and reasonings for those answers, then have a powerful LLM review all of this to select one option. Academic papers prove that this leads to more accurate overall results.
There are of course a lot more techniques that we don't have time to touch on. Suffice to say, one can end up squeezing a lot of performance out of "AI" by paying close attention to the rapidly unfolding developments in AI.
Conclusion
I hope this article has been helpful for those of you curious about what goes on beyond chatGPT. There really is so much depth to artificial intelligence, and so many possibilities yet unexplored. With OpenAI's preview of model chatGPT-o1 that has PHD-level reasoning, we clearly are only at the start of the AI journey.
Work with us to leverage AI in your QA or any compliance processes today!
Comments