top of page

Gen-AI: Think Beyond RAG

Why we need specialized Gen-AI products & services beyond COTS LLMs, RAG dev platforms?


Gen-AI has some unique characteristics, which rarely get discussed:


1.      Product: Building cool demos is easy but products are very hard to build. What is productizable with acceptable levels of accuracy is never an easy process. Note: The product is not just the RAG or the model – but it is the system that customers deploy, which may be far more complex.


2.      Change: LLM capabilities/reach also change weekly. Thus, RAG is a live system & needs regular refresh.


3.      Innovation: Rapid innovation means that lessons learned are often tribal knowledge.


4.      Use Models: Complexity scale starts with chatbots, moves to custom LLM apps and then to multi-step compound AI apps. For true Gen-AI ROI, we notice that multi-step inference with both RAGs & non-LLM inference (DL, XGBoost) are necessary.


5.      Cookbooks: Architectures while helpful but must be supported by recipes & cookbooks. LMM apps are still an experimental field.


6.      Custom Leaderboards: Combinations of LLM choice, fine tuning approach, embedding model, programming vs prompting, information retrieval (KG, for example) are all artifacts that need to be evaluated for each industry app – these form the basis for the multi-dimensional leaderboard for industry LLM apps.


7.      Data Stories: Building data stories (ingestion through prep) or data engineering for LLM-apps & compound LLM-apps is remarkably more complex than data engineering for reporting or even tabular-ML. However, getting the data story right remains a critical enabler.


8.      LLM++: LLMs by themselves often do not give the results needed – need knowledge graphs & specialized external tools to be intrinsically integrated.  Unsurprisingly, agentic workflows heavily depend upon external tool use in most cases.


9.      Design Hurdles: Key design hurdles include scaling data handling, context infusion (tuning, prompting), optimizing the embeddings, evals and of course getting beyond simple-chatbots, search &productivity tools. Apparently simple things like prompt engineering are brittle and need significant chains/trees of thought t get good results. Chunking and retrievers for products are full-fledged information retrieval architectures.


10.   Hallucination as a focus: Many Gen-AI efforts fail due to the inability to bring hallucinations down to acceptable levels. Several methods and approaches are being tried from use of knowledge graphs, mixture of models/experts, self-corrective agents etc., but this is still an art and needs serious consideration, as part of any implementation plans.


11.   LLMOps+TCO: Deployment and serving choices vary – from Amazon Bedrock to Google Collab Pro+, RunPod & Fireworks.ai and even Groq-Cloud. Infrastructure choices like VectorDB capabilities and ops functionality choices in monitoring, tuning & metrics, makes this a complex set of choices. Inevitably, this is a grade scale in terms of performance & cost.


12.   Productization Backbone: While LLMOps platforms do exist, this does not solve the problem of building complex, industry specific LLM-apps. Gen-AI engineering backbone on top of LLM-Ops are necessary for true productization, through continuous enhancements, monitoring & experimentation.


13.   ROI: Medium/long term ROI above some productivity gains and search 2.0 is difficult to quantify for most firms. The path around this is to think in terms of complex agent workflows, chains of LLM & traditional-AI inference + application of complex algorithms/math such as Hidden Markov models, non-linear optimizations.


14.   Agentic caution: While there’s a lot of focus on building agentic workflow-based service as software, there’s still a lot of progress needed, before we can get reasonable & dependable levels of accuracy. Have tried this for limited use cases such as code review process and data report generation, so broad applicability may have to wait.

 
 
 

Comments


bottom of page