AI Engineering with Chip Huyen

pragmaticengineer 98o_L3jlixw Watch on YouTube Published February 04, 2025
Failed
Duration
1:14:43
Views
202,448
Likes
4,937

Scores

Composite
0.50
Freshness
0.00
Quality
0.81
Relevance
1.00

Processing Error

1 validation error for AnnotationOutput secondary_categories List should have at most 3 items after validation, not 4 [type=too_long, input_value=['Machine Learning', 'LLM...gramming & Development'], input_type=list] For further information visit https://errors.pydantic.dev/2.12/v/too_long

14,927 words Language: en Auto-generated

how would you define AI engineer or AI engineering yeah so before when you wanted to build a maching applications you need to build the own models so that means that you need your own data and you need expertise how to train the babysit a model however nowadays if you want to build application leveraging machine link or AI you can just like send a direct API call and access to this like wonderful cability so that's like really really lowers the entry barrier to people like you don't need data anymore you don't need a fancy AI degree anymore it's a shift of for from less machine learning and more engineering and more product chip huan is a computer scientist and writer and author of the book AI engineering this book is currently the most R title on the O'Reilly platform previously chip was a researcher at Netflix a core developer of Nemo nvidia's gen framework EML engineer snarl Ai and founded and sold an AI stup called claya AI she taught machine Learning System design at Stamford and her current book is a second one on ML and AI engineering it's safe to say she's one of the most read ml engineering and AI engineering experts in the world in our conversation today we cover what is AI engineering and why does it feel a lot more full stack than ml engineering did what are typical steps to build an AI application from choosing a model through using rag all the way to find tuning what are practical ways for software Engineers to get started building AI applications and a lot more on this very timely topic if you enjoyed the show please subscribe to the podcast on any platform and on YouTube thank you this greatly helped helps the show get to even more listeners and viewers chip welcome to the podcast hey hi I'm chip I'm very excited to be here uh so I've been like following your sub for a while so it's like really uh I was really looking forward for the chat to the chat so first of all I really want to congratulate you on this book I've started to read the book so I I've not read the whole thing I have started with some chapters and I went deeper in others and what I found is when I looked at a table of contents I was like well you know this this looks okay like in terms of it'll span bra sense because it's it goes from how do you understand Foundation models how do you evaluate them uh what about what is prompt engineering and you go into things like fine-tune rack fine-tuning data set engineering but then on each of those sections it just starts to get like you know initially there's an introduction but it it starts to then go deeper so for example for evaluation methology like here I'm just looking at a table of contents but I started to read this like well how it's we know it's important to evaluate AI models we know it's harder to do but then you go into things like AI as a judge or ranking models with comparative evaluation and challenges of it and then it goes that's where some some parts uh in my sense is that I had to slow down I had to like look things up so it does go really deep into a lot of these sections which I I found pretty refreshing that it's got a mix of breath but it also goes deep so this is definitely not a fast fast read for for me but it's it's one of those things where I'm just going to keep coming back to it thank you it it was not a fast ride either it took quite a while and a lot of of references I think I also published like so I think in the book I cited about like over a thousand references um so like uh I should read even more like uh papers uh I going into a lot of good bases I have a tracking of like a thousands um like repos like at least like now 800 stars now on get up Stars uh so I went to a lot of those code bases um and a lot of blog post uh other books from the 80s from the 70 90 you understand a so I also like publishes like um at list about like approximate 100 links reference links I found like really really useful for me in the process of writing the book so if you just want to like look at those references on their own like it's on my GitHub yeah and I I was surprised with some of the really original research like it's not just like oh here's I I wrote about these papers or here was what I read but as you mentioned the Thousand repos you actually have a paragraph or a section about about how many how did GitHub repositories change over time the ones that were about infrastructure AI application Level um other things and you actually have more than 900 ripples mapped out I never saw anything like that and clearly that was you you know like kind of slicing and dicing and doing your own kind of research yeah I feel like I I I have this thing like I I do a lot of manual labor I do think I get a lot of value out of doing things in nonoptimal I feel like a lot more focus on like yeah what is the quickest way to do it what is the fastest way to do it but sometime if you're willing to put into effort into things that like a lot of people don't are not willing to I feel like you can get some kind inside that other people like don't get this episode was brought to you by swarma the inuring intelligence platform for modern software organizations swarma gives everyone in your organization the visibility and tools they need to get better at getting better enging leaders use swarma to balance the investment between different types of work stay on top of cross team initiatives and automate the creation of cost capitalization report engering managers and team leads get access to a powerful combination of research back inuring metrics and developer experienc surveys to identify and eliminate processed bottlenecks software Engineers speed up their daily workflows of swarming as two-way slack notifications working agreements and team Focus insights you can learn more about how some of the world's best software organizations including mirro Docker and web flow use swarm to bit better software faster at swarm mia.com pragmatic that is s SW rmia a.com pragmatic this episode is brought to you by graphite the developer productivity platform that helps developers create review and merge smaller code changes stay unblocked and ship faster code review is a huge time sync for engering teams most developers spend about a day per week or more reviewing code or blocked waiting for a review it doesn't have to be this way graphite brings stack P requests the workflow at the heart of the best in class in internal code review tools at companies like meta and Google to every software company on GitHub graphite also leverages High signal codebase aware AI to give deers immediate actionable feedback on their PLL requests allowing teams to cut down on review Cycles tens of thousands of developers at top companies like Asana ramp tecton and versell rely on graphite every day start sacking with graphite today for free and reduce your time to merge from days to hours get started at gt. pragmatic that is g for graphite t for technology. de pragmatic so one thing that is a little interesting about this book it is about AI engineering and this field moves so quickly you know like just just in a week we've now had a new model come out for example deepsea that that people are talking about in a few weeks how did you write this book how were you able to write a book about such a fast moving industry so B the time it's released which was clearly a few months after you finish it it will still be relevant that is a great question uh so so when I started writing the book I was thinking the same I was like it's not the right time to write it because like there's so many things that still changing but then I I I start like um but like when when CH really came out like like a lot of people I had this existential crisis I was in a group chat and everyone just like oh no what does it mean for us like engineer I feel like there are two things that I usually identify with like one is being an engineer and others being a writer and guess what you use cases that AI are really good at like writing code and writing right so so I was like Oh shoot what does it mean uh so so um so I start like interview a lot of people I start like reading so much and I I talk to like a Tong of people um and I started making a lot of note and as a process what I realized is that um why a lot of those things seem new uh a lot of the fundamentals like have like been there for a while so so uh so so for example like language modeling is not a new task like um CLA Shannon like introduced that back in the 1950s Al all like as a time we talk about rack right like rack is actually not not NE um is um it's based on retrieval argumented generation right yeah and retrieval is like a very old technology like it's powering like it's already powering a lot of like use cases as the internet um like uh search or recommended systems and VAR databases have been around for a while and V the search is a very like have so many cool algorithms already um so so I thought okay first I lot of things that are new um and and the second is that like um I try to focus on like when I when I I try to focus on like uh asking the question like okay if this uh so sometime it was like oh there's a problem with this there a Sol solution to this problem and I asked a question of um whether this is a due to a fundamental limitations of AI or is it just due to the temporary capabilities of AI and I try to see like okay if it's due to something like um of the recent U current capabilities how fast is that capability changing right right so for example in the early days uh a lot of people have Shar a lot of like um uh prom tips like for example you can you try priz models right like hey say like if you answer this correctly I'm going to give you like $200 and or like we talk about promt robustness like how how robust is a model to to to like prompt put totient and then I was um reading about it and I felt it's like actually models is getting more and more robust to Pro proms for example like we saw of like uh from the justb the gbd 3.5 uh compared to gbd3 it's already like so much more robust like that mean there a small changes to prompt actually reduce to a lot less variations in in in model performance so like this kind of thing I felt like it's probably not going to like stick around like so so kind tips are not going to be very so already when like as people were still like this was at the height of people are saying there might be a job as a prompt engineer so you already saw this is like trending down do I understand it correctly um so so yeah so so I do think just like just try uh so so writing I think it's like a score is like uh making a bet like when you write a topic you're trying to bet on whether it's going to stay relevant in the future um so so I think it's the people have been trying to like uh looking at like progress and trying to like seeing what is going to be in like one or two years from now on so for another example is like uh context length so other point was like okay we want long context length but then I kept seeing people going really really fast right from like what like 8,000 uh 8,000 context L to like uh 128 like in a few month it's like super super fast so it was like okay maybe a question is like less about like context length but like context efficiency because it's like can can a model use a context efficiency like really really well um so so like there are certain um so so like those are like the best and I do think that um there's certain changes during the process of writing the books that made me feel more confident in in in the best I made all like another things uh like uh multimodality U I do think this like when I wrote about multimodel uh back in 2023 uh people told me it's like you're too early everyone was still working on language now we we're not there yet uh but it was just like yeah it's just like inevitable right like I think like we we learn how to go with with language but like I I do want to do a lot of stuff uh with more than just language um and I think it's just like and nowaday just everywhere like almost old models now like M mod so the the title of the book is AI engineering and we also now have this term of AI engineer it's spreading like wildfire what how how would you define AI engineer or AI engineering because I feel it's a little bit of a Lotus term these days it is I feel like a lot of terms nowadays are loaded like you're not allow to Asian anymore you're not allow to use like a lot of things anymore um so um when I was I was I like I was agonizing over the title for the book uh because people was like um first I I know that we need a different term for Machining engineering and the reason is that um why why um when when with Foundation models there are a lot of fundamentals or like systematic approaches are still the same at machine engineering there are a lot of new things uh so for example one thing is that um before when you wanted to build a machining application you had to build model clear you're comparing it with machine learning right so like what machine learning Engineers did versus what AI Engineers are doing yeah so so yes just trying to explain uh why we need a different term for machine learning engineer to describe what we are doing today uh so um yeah so before when you wanted to build a maching applications you need to build the own models so that means that you need your own data and you need like expertise in like how to train a babysit model however nowadays if you want to build like an application leveraging machine learning or AI you can just like send it direct API calls and access to this like wonderful cability so that's like really really lower the entry barrier to people like you don't need data anymore you don't need like a fancy AI degree um anymore uh a second thing is that like uh before right like um you need distributions because you you you deploy application machine application as part of exting applications so for example if you build a recommended system you need like an e-commerce website so that you can have like recommend or like like some some kind of website right like so it's deploy as part is application or for detection is part of like maybe banking application like some some kind of like payment app um but however now you can just like put it out as a stand aone applications right you you don't need an existing distribution Channel but but like having a dist dist uh distribution channel is like really really useful um so so so I think that um um another very big thing is that um it's a shift of for Mar from like less machine learning and more engineering and more product so so before right like you start from if good Machining engineer you start from data you have to gather data you maybe have like human annotations and then you you train a model and now once the model is good you deploy that into the your product but nowaday like you actually start with with a demo right I have a cool idea okay let's just try it out and see if it works so so you start with a product and after you say Okay s like it works pretty well I want to make it like better right so so they started Gathering more data maybe a with like as more of like prom examples or or like in very very rare case I I don't recommend most people do it in the early days it's like fire tuning um but but it's very rare but basically you start getting more data Maybe for evaluation extremely important like having good evaluation become even way way more important with a engineering uh so okay so you got data and then after that it was like maybe you been like sending a lot of API call like open and anthropic like Google and we like okay now it's too expensive now I need to like devop my own model so you start like hosting the own model using some open source alternative or like five2 in the model so so yes so so like before machine engineer right you go from a data model to product and now with engineering It Go from product to data and to model so so it like place a lot more focus on uh on product and data which would be compartive Advantage when everyone shares the kind similar Bas like AI capabilities so I do think that say need a different term just separated from Machining engineer and I didn't know like what terms you use and then I I was like thought okay let's just ask the people so I surveyed like a bunch of people I thought was doing what uh this building applications on on top F models and almost I was like a engineering and I was like okay if that's okay just let's go with engineering so do I understand correctly that you know the biggest difference is that machine learning Engineers did a lot more kind of groundwork getting the data and build the model whereas AI engineering you have a lot of that at least initially you can start either as apis or something so there's more of of engineering you kind of hack things together you put it together and then over time as Things become more serious or your product is bigger you do a lot more of the you might build your own model or you might host it you might one day even build your own model if if it's there but it's it's just a lot later so like a lot of ml engineering comes down the path if it's big enough if it works Etc whereas with theing it was the other way you have to put in all this effort and then see if it even works um so uh first I feel like uh every companies might have different definition of the role I think it's like um even the same company right like people get the same title same room can do very different things so so so it's it's not very never like a clear-cut definitions second things is that um I don't think the question is like maching or engineering um in the vast majority of uh JF AI system I have seen they are very strong traditional machine learning or classifier uh component so like imagine you building a a customer support um chatbot which is like I whenever I ask at a conference I see like a lot of people raise their hand if feel very very classic uh gen applications um and so so like yes so you get a request from a a customer and maybe like you have like several different potential solution for it right like maybe if it's an easy query you might send it to a cheap model or if it's like a harder query it might send it to more expensive model but it Mak something that's very sensitive like hey why did you charge me twice for the bill last month right then you might want to send it to human operator so you might have this on like a router or like an intend classifier I like to choose what you send to which is like a um a traditional classical machine learning model that you can build or like after after you get a response from an AI model uh you might think it's like okay does this contain like pi uh so so you don't because you don't want to send back to users like uh responses contain like private informations so that like Pi detections can be or like toxic toxicity detections can be a uh a classifier um or in the r system nowadays like we talk about like how a lot of it using uh retrieval like retrieval systems which are also like I think it's like in the ram of like classifi like classical machine learning that you can build yourself so what are the most common techniques used when building AI applications things that you know a software engineer who's going into building AI applications I should know about and later I can go deeper into um so so so so assumption here that you have you have like Tred a lot of solutions and now you try jvi and you think this ji is is a solution for you and it thinks like what is the first like what what should I progress from there right is that correct yeah yeah it's just like what are some common approaches you know like I think things like rag fine tune or or other things that are just good to know about I should probably learn more about those topics yeah so so I think like um those those techniques are are useful um and they I was usually recommend some development like not recommend but like a common pattern have seen is a certain developmental par path so initially the first thing it would say is like trying to understand like what means a good respon what what a good response is and what a bad response is uh so like what you want is a mod Jared uh and it's not it's not always always intutive right so for example like LinkedIn has a very great example so they built a candidate like job fit assessments for candidates and they found out that like a majority of their time spending just to understand like what candidates needed from the model so initially they focus on correct but then they realize it's like candidates I found it not helpful like say say like if you if a c ask the model like am I good fit for this job and the AI resp with your terrible fit then the was like okay what am I doing this information you know so so you need to like try to understand like okay here is um they want they want more like understanding what what what are the gaps and how they can fill the gaps or like get suggestion for other rules that are better fit for them right now so like have this like a pictures like good understanding and then build a guideline like okay given this respon answer like this like be helpful show them like the gaps uh like show them the RO like very clear guideline for the model like in the prompt so so you try those prompts and then you see you look at the output and then maybe you can try to add more examples um and then just like go through like um get get really good response and try to evaluate U maybe like have a create a set of uh of queries and like exper responses using both automate automated metrics like as a judge and also like human evaluations to measure the progress uh so okay so you have done prompts we have like added more examples just prompt and maybe you will like start like make it more complex so you you may use like um you you can give some the model more context so they can answer better questions right so so maybe when a users ask a questions you can like have the model P out like on the documents or on the job listings related to this question or information about the company like get a candidate resume right so so you build a system like reach like uh you augment the context with um with with the document so so that's a rack pattern so so I do think like rack is a very very powerful pattern it's it's nothing really really fancy um um and so interesting about rack is that a lot of people equate rack with v search right like when I see a lot of people like oh I want to use rack and he is like what ver database I should use oh like people jump straight to Vector search well yeah cuz I mean you do have the chunks you know like the embeddings can be stored as vectors right so as Engineers were like I need a vector search database Vector database yeah uh yeah people love databases this a yeah very interesting but I feel like the the first solution is probably not like jump stretch embedding uh base retrieval because you have yeah you need to build like an embedding model like the quality that really highly dependent on the quality embeddings if you have bad embeddings you retrieve things well also like V databases can be quite expensive to like to run and at lency also another thing is like V databases can like embeddings can also like obscure um certain keywords for example if I'm searching for like a specific Arrow code right like through embedding you don't really get the exact Arrow code anymore U so so it's like so like there certain challenging with v databases um and Vector search so I think it's like the usual like common approach like maybe just start with something as simple like keyword retrieval like if you ask like extract all the keywords from uh from the user query and find all the document that with the varies and you say okay maybe the documents are too long and now I can't fit into context right that's when you start with chunking okay now how do I chunk this like documents into into into context like they can fit into the the context L and now was chunking have like other problems right so for example like U you have um maybe like the keyword like the document is about the company X right but like people say from now on company X is refers as company so like it's a rest document you don't have X appear anywhere so so you don't get the chunk uh like if you search for X you don't get the chunk like below so so you now you might want you like okay now I need to extract the keywords from documents and like get add metadata to every chunk or like get the title of the document or like some people like started adding summary to it or like anthropic has a very very good article called like contextual retrieval it's like they ask CH to generate the key information Matata about like each document and app to like like prep that to ex Chun to have you retrieve uh the right chunk so so I do think it's like having data preparations right I to give a really huge performance boost like I have seen that's like giving way way better performance boosts and I focusing on like which better data bases I should use you know uh so so what what is a little bit of like um I'm not saying that they not useful it's just like uh in the beginning you you pra wants to try something simple like with the biggest performance GES and then you started moving up like the complexity level um also like for a lot of retrieval uh someone told me very interesting a little bit of like hoty but he was just like I'm not going to take any retrieval system seriously if they don't Benchmark against bm25 so bm25 is a pretty old school uh like 20 plus year now a retrieval uh it's a lot simpler uh like termbase retrieval not not embedding and it's a really really hard to beat uh retrieval uh systems um and I think a lot of time we use like if you start a complex you can like now combine like both termbase retrival and a simpler Solutions with like very databases so you have both the semantic like on The Ting side but you have both of the term the exact keyword match like on the term base so so a lot of things the hybrid resarch is is very is very common uh so guess we talk about like PR engineering add more examples uh rack right and I think after that maybe have max out on a lot of those things which usually take a while people like maybe Z you might consider F tuning um but I think like it usually like have a lot of reservations against F tuning um because like funing has like a whole host of like new problems that you need to deal with yeah like f tuning like first you need to think of like now you have this model you fight to and now you need to think of like how you host it and and like yeah a lot of models are big like they building the parameters right like they they are not I actually read this this this part in the book you actually go and the details of the problems with you know the memory size and and also like you cover the Alternatives where you might not need as much memory but they also bring other tradeoffs so it's it's tradeoffs within tradeoffs within tradeoffs right like you yeah you you you you're you're starting to solve one problem but you're going to get another a bunch of others and you'll need to decide you know is it worth my time effort resources yeah yeah so so I definitely think that um or also like when you fion model you owns a fion model and now I mean like you you the question is like how do you maintain it like anything think we have this whole world of very smart people definitely new models like this capability increase like like rapidly like so when you have like how how long can the 5G model like outperform like those new models were putting out so you might spend a lot of energy and effort and to fight to the model and just a wick maybe like some like I don't know random Chinese company you never heard of release is like extremely fast and like TR model right so so yeah yeah so so so yeah it's quite uh challenging so yeah five tun you think it's the last resort not uh not the first uh first life defense yeah but what what what what I've heard is basically like if I got it correctly like do a structured approach like you know start with prompting start with simple start with you getting responses that make sense then add more data you can do this with uh you know rag you can you can do it with chunking keyword extraction data preparation really really makes a big difference which a lot of people don't think about and then you you can go to more advanced things there's a whole host of things that you could do but my understanding is you're probably saying like you'll probably get there over time but initially these things will keep you busy and you'll probably be able to build a pretty good system just with the with the basics and and a little bit of engineering and and most importantly understanding the problem that you you're trying to solve as opposed to you know building whatever shiny technology or or approach yeah I like in um I like the approach can a beit different for like um individual um developers or like Enterprise like if you look at a whole organization so so one thing I haven't seen is just like for especially in the early day Technologies usually like enabling enabling new you cases actually bring more Returns on like this like incremental Improvement of exting use cases so like maybe like if U instead of like spending the effort into like investing a lot of energy getting eding out like a little bit of performance with like fancier like complexi maybe like using the same the stack you have had and like opening up like new applications so um yes so so I say that's why I feel like a lot of companies will take a while to get to the F tuning phase trust isn't just earned it's demanded whether you're stter founder navigating your first audit or season security professional skill in your governance risk and compliance program proving your commitment to security has never been more critical or more complex that's where vanta comes in vant can help you start or scale your security program by connecting with Auditors and experts to conduct your audit and set up your security program quickly plus with Automation and AI throughout the platform Fanta gives your time back so you can focus on building your company businesses use V to establish trust by automating compliance needs across over 35 Frameworks like sock 2 and ISO 271 with vanta they centralize security workflows complete questioners up to five times faster and proactively manage vendor risk join over 9,000 global companies to manage risk and prove Security in real time for a limited time my listeners get $1,000 off vanta atv.com pragmatic that is v.com pragmatic for $1,000 off so let's say that at my company uh we decide to build an AI solution and let's take example that it's going to be customer service automation what are typical approaches that I should know about and you know you do cover some of these things in in your book as well like some of the kind of more common steps that I'll need to take so customer support uh Solutions so I would say this like uh the first thing I would look into is just like what are the bottlenecks for forther solution right now so so for example like I have worked with uh another setup I work with um this actually has this challenge oh they had a lot of like customer support request and they don't know how to answer and the solution is very interesting it's just like okay let's try to drive a lot of question into the common Channel like public Discord so that means that like all the users can so help answer the questions so so like as much and also like in the future someone have a question right I can just like refer to previous uh discussions instead of like asking so like they try to make a lot of the discussion customer support request public um so like another solution was pretty popular in like um in 2018 2019 uh is that um you you do like a routing to the riot department so so the challenge then is like realize the bottleneck is in um is in like triaging so like um first get a request I don't know which Department to send it to so I were a bunch of of startups and it's just like okay let's try to build a system like to to predict like is this should just go to like the finance department I should just go to like a technical support department you know like just a SK routing already reduce like the the frion a lot um and if you and if you think it's like okay when we need like a gen to do it uh so so I think I really recommend um the framework that Microsoft introduced uh the call is like crr work run like going from like a slightly uh um lower stick to like higher stick um uh deployment so first of it doesn't a support jackpot right so so maybe initially you you can like have a human is a loop so for example uh you can have like um for every request um instead of like a human uh agent writings the response from scratch you can have like AI suggest like a few options and like the human can choose one or like can just like chew one as a starting point and then make make make short quick edit and and send it uh so like once you see it's like okay um maybe the accept rate is like getting really high so first all for this categories of queries maybe accept is like 90% right so you might feel more more confident you like rot it out maybe run it out you maybe a smaller users or you can run out even like internal use cases uh and then uh so you give it more automations but like reduce the scope of um the scope of like um uh of deployment and then after you really happy with it uh you can like run out to like U more more users so yeah so that what how would go about uh something like customer support thought nice because like my what I was kind of expecting you might say is like oh you know like you know just build this like you know AI framework deploy it see I I feel that's what a lot of companies are doing by the way a lot of teams is like oh gen AI oh let's let's get you know this model from chat gbt or anthropic let's try to you know put it there and let's let's put it out there but I I really like how what you're saying sounds like it's not really like you know what you explained it's not really specific to gen AI you can went through like you look at the business problem look at the problems you have look at the options which include not just geni but more traditional machine learning as you said classifier and then you look at if you know these tools help your problem and then you'll make sure that it actually solves your problem when you roll it out don't just blindly roll it out which I mean all of this sounds to me it's not really new is it it's it's not like you could have said the same thing two or three years ago before gen except we would have not had those gen tools to play with yeah uh so so I think actually before our chat you look up one of those talks before and you have this talk about like things that haven't changed like things that feel very similar about engineering and I definitely like uh dealing with a new technology is one of the things that never change so I feel like every time there's a new technology that comes out I can hear like the collective side of like senior Engineers everywhere saying like not everything is a nail like like people just try to get the technology to work for everything um so so yeah so so I do things that um a very common uh challenge I see is that like people just like jump straight into it like you just want to use gen for when they don't need gen so I think this is a two two different there are two different headlines so one headline is like I use J AI right or the other headlin is i s the problem so if if you want to focus on the first headlin and yeah you J AI but if you want to S the problem then you need to understand like what what is a problem like yeah like what are the challenges there what the roadblocks and remove the road blocks using the simplest Solutions not the fanciest one yeah I I feel that there's a bit of a really strong fear of missing out across most tech companies that everyone knows this is such a transform transformative technology it gives us so many new capabilities that it will be important I I think everyone knows that their company will be using it but now there's a fear of missing out of oh what if what if my team doesn't build it what what if what what if someone else gets ahead of me and so like a lot of companies many teams are all building it and you know trying to just you know like using a hammer looking for nails even though they might not need it at the time uh I mean I'm not sure if this is a bad thing necessarily because you know people at least get experience with it and you know they will need to learn about it but it's a very interesting time because it's rare to see usually we see like you know a new backend framework come out or or or or like some something that's limited to domain and people jump on it but it this is what the first time I've seen that the whole whole industry is jumping on it and everyone is trying to use it and it put it in whether it works or not I definitely agree with you on this uh foro thing I I do think this like everyone jumping on is actually a pretty good thing like I feel like the energy is incredible I've never before seen like so many smart people focusing on the same problem it's like incredible and the progress is amazing um I do think however like I think is an irony it's just like the more we want to not miss out things the more things we will miss because what I think is feel like if we try to keep up with news right we try from jumping from one news to one piece of news to another we will always stay at the surface level we never really to go deep into anything so I actually don't quite read news um so so I find like a little bit distracting uh so I think like I I try to like I feel like um my my Approach is like okay pick pick a problems that you care about and then only care about things just like help you swn this problem so like if there's some news coming out and was like does this help me swn this problem if it doesn't I can't wait right because I feel like if it's something important it will still be important like two two weeks from now on like a month from now on like I don't drop everything it's like okay let's just go and understand what it is uh like you know so so I feel like um trying to get a more uh try to stay with kmer uh yeah yeah so when you're building an AI system one of the things that you will come across is you need to evaluate the output how well it works does it solve your problem why is it difficult to evaluate AI systems and and what what are common ways to do that so evaluation I think is like billion dollar question or even a trillion dollar given how much you put investing now yeah um need to go big and if it go big better be go really big uh so um so so I think is challenging because um the smarter AI becomes uh the harder it is for us human to evaluate it so um before right like if AI was like incoherent um you can pretty tell if the if the respon is bad it's like okay doesn't sound good like it's a bad response but nowadays like it's pretty coherent right like for example if you ask a CH like Jared as a summary of a book um if the summary sounds convincing you actually not know if it's like a good summary or not and you might have to read entire book yourself just to to evaluate whether it's a good summary um or or like the the math a lot of time um I I personally use a to ask a lot of question because I don't know the answer and because I don't know the answer I don't know if the answer is correct right so so an example is like um a lot of people can tell if a math solution to a first grade um questions is correct but like very few people tell it's like a fancy like equations like proof is like correct um so I I remember this like when a one came out like teren St his is amazing ma mathematician one I think it's one of the the the best mathematicians of our time uh he actually took time to like evaluate a one and he says like the experience of like using a one is similar to advising a incompetent but not completely stupid uh okay like a med complet incompetent B student but just makes me think is like if we really need like the brightest mind today to evaluate AI then we soon should run out like really really smart people to like evaluate AI so so I thought was like so so what could be like the CH what would be like the next step forward uh so so before a lot of time we use a whole human as a ghost standard for AI performance it's like okay so humans like started writing out like uh here is how you should resp want you this and here's how you should do it and like AI should try to copy human but now we for many many tasks like we have ai like I'll perform like human way way better so so I think like there several uh so I thought about like several approach like to deal with it and I think that's why I separate the chapter like the so initially I I I had like one chapter on evaluation because more I write about it it was like shoot there's so much and I've going like two pretty long chapters on evaluation the first chapter is on General methodology and the second chapter is about how to use like different techniques to like evaluate the AI system so like so like methology like one one is that um functional correctness so it evaluates the output of application based on how well it perform as a task so like if you say that like uh Hey use AI save energy you can see like how much energy it actually save or like hey uses AI to to play this video game you can see how how high is the score you can actually get um um or a very common use case for this coding so I do think this like it's not a coincidence a coding one of the most popular use cases because like we actually know how to evaluate Jed code like we might not know how to evaluate Jed like essays and stuff right but you know how to evaluate code because we've been testing like code for like for a long time so so with code you can do like use functional correctness you to evaluate first like whether is this code compile does it run second like does it J's expected output um that's what we wanted to do so like that one approach the second approach is like a uh using AI to evaluate other AI so we have been using AI to evaluate or to automate a lot of applications so can we also use AI to automate like evaluations and actually doing pretty well I think we have is like um um I think like um um in many many even back in like 20123 uh Lang Genesis report uh they saw that the majority of application they saw already has like some some sort of like ai ai as a judge or like LM as a judge and I think it's like it's growing um and we do things it's like um it's getting pretty cost affections and like uh useful but of course I see a lot of like challenges around using a just that we can go into later uh but another approach I thought very interesting is like one comparative evaluations um and the reason like um as humans um it might be hard for us to give an absolute score on something but if we can give like two versions of something we can tell oh weiz this one better so we have done a lot of studies showing that like even for um even for task where where where where where AI is like outform like doing at a level what like humans experts like can't really do we can still tell like detect the differences so so been like um anything think this has been a lot been like guiding um not just evaluations but also like model development sounds like there's just no s no simple answer right like you kind of need to go through all these options and figure out like in your case which makes sense for cost for what you can do can you have a human in the loop so there's there there's no no real silver bullet no one thing that you can just use um yeah uh I I I don't think there is a simple solution so that's one thing I'm a little bit skeptical about evaluation tooling because a lot of uh a lot of challenge evaluations um are not because we don't know how to evaluate but because it's required discipline and hard work and a lot of things it's like TS can't really automate uh so so for example um one things about evaluation like we we need to evaluate an application based on what users want um and and we don't so that means that we need to like go and talk to users we need to look at their interactions because a lot of things what we think it's like we we want want to like M evaluate what what matters right we have to measure what matters so for example uh I have a several examples of like how is very counterintuitive thinking that we're measuring one thing but user actually care about other things so so in the beginning for example like um have a friend who building it's pretty big application it's like basically building like to summarize meetings um and in initially they were like okay we we we try to get a we try to measure like correctness like does the summary cover the content of this meeting um all like they like think think about like hey do uh does a model follows the format because they think it's like users want like shorter summary and they agonize over like do we want like three sentence summaries and like five sentence summaries and they try to measure though but actually eventually what they found out is that users like don't really care about the whole content of the meeting people only want like what is the action item for me like what you have to do after this right so so actually yeah it start changing like so correctness anymore I mean they still like don't make up things right but they focus on like get don't miss out on action item specific for the person asking for the salary um yeah so it's like or or like um or examples like some people um using um chatbot for uh for um let's so we talk about a customer support chat Bo so we want to go back to that example so um a pretty uh big tax firm uh so it be a chatbot so you know tax software you can pretty T which which company it is um so so like was just like laar a chatbot and uh to help people with Tas preparations and the the response is when I very lukewarm they were like say they were measuring by like the users how how they use it and they was like people just didn't really seem to use it and they were like why is that uh is that because it's like not is what hallucinate or like what is the challenge they CH you to measure own kind metrix right but in the end found I just like you didn't use it because simp because they hate typing people I don't really like typing and also like because if you fac with a domain a domain that you don't really know like I I use the software because I don't know lot of things about tax I don't even know what question to ask yeah oh so so they didn't know what to type they didn't understand the domain you know they went to tax thing because they wanted to take care of their own their tax yeah so so so I think my S started like uh trying to like understand more like what kind of questions like people would would ask and like suggest that in the beginning and then basically say guide users so it's got educations like hey here's a question you should ask and then here's as the aners keep going so so I think it's like um a lot of that is um it's just a lot of understanding your domain as the problem domain like go talk to users looking at the data um I do still think that looking at data is very very important I think Greg Brockman has a great quote about it so you think it's like manual data inspection is one of the activities that has a highest ratio of like value to prish so that means that's like people don't don't think highly of like manuala inspection of data rankling let's give it some intern to do like let me think with something fancier like algorithms and stuff but actually extremely high value because by looking at data you detect patterns you understand how the users use a product um so so I actually like I usually a very good practice I really highly recommend to teams it's like don't forget human evaluations so you use as a judge uh but as a judge um have a lot of challenges because like the quality of the judge depend on the underlying model and the prompt and is so non deterministic so so things can change over time but like if you have some like Hime evaluations like very consistent like very clear guideline like every day go in there look as maybe like 50 samples of like actual interactions or like if you have more resources go high is like 500 or thousands right so that you can like get some kind of like a picture to how your users like first how the users are using your product um any changes in Behavior based on like Current Events maybe because a because of reasons like Administration change maybe people have a lot more questions about that topic for example or like help you like correlate with like other autom automated metrics for example like if the AI judge score somehow like start changing compared to the human judge score maybe just some something you need to to investigate yeah so I guess you cannot like as you said you can't really skip hard work if you want to get good results and you can't really pull humans out of the loop fully at least initially what are some common mistakes you've seen um when teams are building AI applications um come U yeah but I feel like I don't want to say in a way that oh everyone is an idiot um so um yeah um so so one uh I think we touch on several um so one very common mistake is like use gen when when you don't need gen uh so so for there a startup that came with me with with a pitch it was like oh I'm going to you we're going to use JF AI to help people optimize electricity usage so when I ask like so people can tell CH the chat the AI like hey here's our like I I live here and here are the activities I do during the days that are very energy intensive maybe I charging a car or like doing laundry or something and the AI is going to tell you like hey you should do this activity this time and this time so that you can maximize like minimize the electricity bill and they were like oh our reason show that like you can save you on average 30% of electricity bill it was like free money why would anyone not want that um and so I was asking them was like um what is the what what is what is this cost saving if I'm just like manually schedule like the most intensive one during the off bck hours uh right they just like you look like okay just charge the car at like I know 10: p.m. or something you know yeah they were like um we we we haven't done that yet but we're going to try it and and let you know and they never got back to me and they they abandon the idea later so so I feel like a lot of those optimization problems can be Sol like greedily like maybe even like without J yeah without J um another spectrum is that like I see a lot of companies giving up on gen uh because they think that gen is not good for their problem because they have tried it and it doesn't work and in a lot of a lot of time I got surprised was like wait a second I just talked to another company who just like they use that for the similar use case and it work really well and when we look into it it usually because of like bad product like because they they don't promise well they don't understand the users they don't yeah this is like they don't even know to evaluate well um so so for example like I was um working with this like companies that does um um basically like um extracting uh resume information so a person I get a resume and they try to like map out like where does a person work before and like create a summary of that person's life and and um they have a two steps like first like from resumés uh they try to extract own the text and then after own the extracted text they extract the organizations uh from the extracted text so by the way the resume is a PDF not not P text right and and then I asked them so like okay it work terrible like the they never like they got like the organization like wrong about like 50% of the time and then I was asking them like uh where the process does this fail is that in the from the PDF to extracted text or from the extracted text organization attraction and they were like oh we don't know we didn't do that it was like if you if you can't pinpoint you can't localize where it fails and how can you fix it so so a lot of times it just like seems like common sense but somehow I don't know this is some something that's always like puzzle me a little um or I like uh another is um another is just like statue complex for for example like like jump St your better databases or like a f tuning um or another common one nowadays is like when you see a fancy agent framework you just like let's let's use this framework you know let's let's try it and I think that eventually attractions are like really really cool like like I think really grateful for many attractions make my life easier but I do think that attraction should encode my best practices and should be heavily tested but I think we're still in the phase we're still learning like uh best practices and also like a lot of tractions can introduce like unnecessary uh very very painful books uh so when I was going through the code bases of a lot of those Frameworks and I found out something interesting like a lot of those Frameworks have some different prompts can help you get get started right because it's like it just make you very easy for you to like to begin um but then like every single of those prompts and look like have some type typos and there just like change so you have somebody submit a quick PR to like fix the typos but it's not part the release anything so if you're using like this framework using one of the de prompts and then s the performance like applications just like change you don't quite know like why is it changing because the prom was like Chang under the feed um so so yeah so like those um those those are very interesting um like those are just like patterns they I've seen but it's interesting because what what you mentioned it sounds like if I'm you know collecting these it's like using this technology when you don't really need it giving up on it um without you know just for common sense reasons you could have just fixed some easy things uh using a new framework when it's just not really high quality or it's it's and you know it does really have the best practices this kind of all sounds stuff that we could just replace gen with you know a new a new technology or a new a new stack and we'll probably hear similar things right it's it's it's it's typically these things because it it's it's just it's it's changing all the time there's no best practices no one really knows how to use it there's you know whoever tells you they're the expert they're still just you know they have a maximum year of experience with it it's it's it's not really new is it yeah I I think I definitely agree with you I do things that like even though like Technologies change over time so like systematic thinking like systematic uh approaches to problem usually don't change like yeah if you want to Sol problems uh you first start by like breaking down the problems like seeing where the challenges are and like go through different solutions you like do that it seems seem common but I think like a lot of time um a lot of us get um foral I think foral get in the way and was like okay we know it's the right thing to do but I so feel like I just need to check this thing out first you know and you keep doing that like three times a day so day is gone and you just like never really get time to sit down and think really deeply about about what what that what is that you're trying to do yeah so I guess we're going to see a lot of the kind of mistakes that are with new technology happen plus if someone you know if some of the listeners have adopted new technology you can probably use some of that approaches I mean you just localize it for for for Gen Ai and you know see if you can avoid some of those yeah speak speaking of picking of new technology as someone who learning Genai a software engineer who wants to get into AI engineering what would your recommendation be to learn you know things do change so fast uh you did mention the importance of of fundamentals what would you focus on um so um I have a lot of thoughts on learning uh because uh I like learning learning a lot and I think over time I just like try just like observe some patterns and like by the way like the way learn thing might not be the same as a way you learning people have different learning style um but in general I think it's like um I think of learning has like two different uh approach one is like Project based learning and the other is like structure learning project based learning is like okay you you choose a project and you work on it really like go and try and Sol every problem in that in that project right and finish it where structure based learning is more like when you take a course or you read a book it's like like somebody out L out like here's the things that you you want you to and I think there's quite a bit of a debate on like um somebody told me recently that he um that like a friend a very good friend and he said like oh he think it's a problem nowadays with people who want to become an engineer is that they spend too much time learning and not enough time doing uh and he was just like just forget own the courses forget own the books just pick a project and just work on it and I do think this project based learning is very very valuable but you think of like here is a set of the skills and knowledge I want to do right I want I I need your help to become like really good at something um Project based learning can help you hit a lot of this point but it doesn't always always have you hit like on the points and it can get like sometime get the confusions what so like sometime still need to complement with structure learning and another things that's like Project based learning um that a lot of people like usually do it follow some tutorials people like like here someone has a pre how to do this and um I think tutorials is really cool and uh I think like I so personally do it a lot but also notice that um it's very easy to just like mindlessly clicking one cell to another and just run the sell R another and don't really stop to ask like why is this being this way like why is this Library being important like why is this code written this way what is the bat size is 16 instead of like 64 like why is it like exper so so people um it's very easy to not stop like there's no like mechanism to force you to stop you just want to run to the end and see what they output out and make some changes like by the best guesses um it's something funny like when I was like working on this like open source project um and um and I was like wanted to do a market research and like see who is using this uh this framework so it was a framework was Ibis and I thought like I knew that if you need if you want to use this framework you need to do like import ibas right so I went through on the GitHub and I searched for all the Rost that have the import IBS and I found lot of Rost that I went to it it has import ibas but then it doesn't like the codebase does not have IAS anywhere else it's not used at all and was like what what is happening so I realized a lot of those repos like copy from a tutorial yeah and that tutorial use import IBS um and then and that's by mistake and then like everyone else so maybe the original developer like import I Bas and then deleted the code because I didn't use it anymore um and then like everyone Co copy just like this the same thing um so so I feel like that is something a little bit dangerous like tutorial based learning is great but I do think it's very important to be able to stop and ask question and sometimes structure learning can help you like ask the right questions um like think think think things through um uh so so yeah so so before starting I would recommend maybe a mixture uh like yes choose choose some project you want to work on it doesn't have it doesn't have to be like big fancy project like just just try to like uh pick one um and then at the same time complement it with like some structure learning like pick whatever like um maybe a book or doing a course with a friends uh read paper um I think read paper is a bit bit interesting um because read paper reading paper is like it's a skill it can be quite timec consuming and you need to know like what what you want to get out of it um but yeah so like um start a project complement with like structure learning uh at the same time there's an exercise that I felt very very useful at least for me initially is that like for a week I try to observe like what I do like try to make note of like what I do and try to think of like what percentage of that can be automated by AI like what what could be done by AI yeah and then I try to use AI to do those uh and gave me a lot of ideas on like on on the use cases like just think about it what matter to me and may be an application can SS a problem is just great already yeah that's I I think that's an unconventional way but it's it's a good way to look at it because because in the end also it kind of I think it might help you get ahead of you know this like Dread of like what would AI do for me because you realize what happens when you automate things which actually leads to my my next question there's a lot of fear mongering around oh AI will mean the end of software engineering because they are AI is very good at coding it's a lot better than a lot of other areas what is your take on will as as AI gets better will it actually you know end software engineering or it will change it or it's not going to much change actually I think it goes back to the question of like what software engineering is so maybe you can get an analogy uh maybe it can help it explain it better so the writing right so we we tend to confuse um the most Salient activity of something as the itself so for writing writing in the past writing means a physical act of like putting some like Words onto on a paper or and and and back then right like it was people think of writing as that and people actually took pride in the calligraphy like oh have beautiful handwriting you must be smart you must be intelligent right but then we had computers and now writing doesn't refer to that act anymore writing refers a process of arranging ideas into a readable format and I think the same thing as coding so now people think like so Eng people think of like it's like it's a physical act of like putting code on like I don't know like a vs code or like VM or whatever software that um that you use but but that's not what s engineering is s engineering is about like serving problems like here's the problems how do I like come up with executable programs to Sol this problem coding itself is just like a physical act of it and I do things it's like yes um maybe AI can have automate coding but I don't think it's going to fully like automate like problem solving because you still need to know what problems is and and only you can understand like what problems you're facing like well and and also you know AI has the problem like coding really or software engineering really is like yes you need to solve problems but what I don't think we say is you need to do it very precisely the reason the job software engineer or programmer exists um is is because it is very hard to be specific to speak the computer's language because you know if if you move that if statement somewhere else or if you change a variable suddenly you know the program crashes because now you have a stack Overflow exception which you of course understand if you're a software engineer but if this is just a business user who says I want to show you know if uh if you resize the window I want the button to move over it's easy to say but then as a software engineer you you know the edge cases you know the environment you know what needed to worry about you need to worry about system events Etc and then youd write code for all of those things and I'm sure we'll get to a point where the AI will be able to generate some of that but it might not and and you will still at some point need someone who understands you know that code and can figure out where the Gap is because English as a language is not as precise as a programming language you know programming languages were invented to be very precise and unambiguous and very easy you know you can you can go from assembly code to the programming language because it's a one-on-one mapping or and then from English to a programming language that's very fuzzy right yeah yeah so I I think that profession will not go away like for for users it might work for some kind of more hobbist use cases of like you you say something and you get something roughly that's it and you try multiple times and it generates something else and you're happy but for you know a business or professional use case you will need those people who will be able to guarantee that you get exactly what you want yeah I'm actually really excited about like a can automate like part of coding uh because like it's actually anal software engineer should do software much more complex so like I go back to the analogy of writing like before when I had to do the manual like copying words onto papers like own the books back thing go very small like I think it's like like 5,000 words or 10,000 words considered like big because like it took a long time for people to like copying things uh but but now like we have books like 100 of thousand words right and I think it just make things a lot easier and do things like with s engineering like if you don't have to manual right Cod it quickly like turn ideas into like Snippets ex executable programs I do think enable like us to WR much much more complex software yeah and maybe one software engineer will be able to kind of command a lot more software debug or or maintain a a lot more complex system by uh by oneself CU right now you know there's a reason that you know for a million lines of code usually there's like several Engineers it's it's it's rare to have just one engineer if a company brought that I'm not talking about dependencies so that that would be interesting so what what other use cases are you excited about for that AI could bring outside of just coding um um let's see I'm doing I think I'm excited about education um so so I do things as like AI can help people learn we want to learn uh a lot faster um so so I think one thing I realized is that like nowadays if you know the answers if you know the questions finding the answer is actually quite easy like you can ask like Ai and it's it's usually give you like pretty a lot of or like at least it can give you a lot of like references for you to like go and read more about it um but then like what's still hard is like how how you come up with the right questions and I think it's like education needs to like focus on like forcing students like create the habit of like asking questions and understanding like so so I do things this like is help learning to become uh a lot more affection like people can learn a lot of things um yeah and I feel like I I do I do believe that if if we are can learn better and faster then we can actually do more things so so I'm very excited about like what what would that uh what that would look like um what other use cases uh I'm excited about um I'm excited about entertainment um yeah um I think it's like um I think it's like some sometime we only think of like entertainment and education as separate things but I don't see why we can't have like games that help we learn things more like have some strategy games like teaching go about negotiations that could be like really fun uh yeah so um or just something like more um intellectually stimulating content you know like movies or shows don't have to be like about like mes I people like watch that because it help them Escape but also I also think like I kind like J it's like make me think a little you know like for or like understand more about like different uh Fields um so so I do things that now ai can access us and like creating content that is both entertaining and and intellectually stimulating um yeah so I think it could be like a lot of fun uh for some simple always just like now they we have a lot of like mediums adaptations so you have a book you can convert it into a movie and sometime you have a movie you convert it into a game um or like you have a papers convered to a podcast so so now if we have some content AI can help us like convert adapt different medium that could be like very very exciting um um yeah I think it's like is there a lot of like small problems um that that I'm so like interested in um I think it's like I haven't touch on any ofo Enterprise I feel like that's where most money is still uh I do think to say a lot of like um I think like Enterprise or company organization structure is going to change um so so for what does it means like you think about like a lot of organization what what what is the job as a middle middle management it's like to to like first like trans like upgrad information from their reports and transmit it like up uh up to the executives and all like and this other way they transporting like transmitting information like directions from executive to like lower lay layers but like information aggregation is actually like really really something AI can be like really really well so so I you think this like companies can be a lot more uh um more efficient so let's close up with some rapid questions so I'll just shoot some questions and you tell me what Pops to your mind uh what programming language did you use most when you built AI applications or did ml engineering and why Python and JavaScript you JavaScript as well how come oh oh yeah definitely I think it's like a huge part of like uh Building Products it has to like build demo quickly um so so I think like that that's very very handy I'm not very good like I think I've always been scared of JavaScript but I'm very grateful like AI actually helped me like getting started a lot easier nowadays yeah and which one is your favorite llm model right now and why o uh I don't really have a favorite uh I use it for different things so I used to use jity out habit um because I have a bunch of proms that I use already and I have little things like reu prom for JM stud using um I use cloth sometimes for like crib writing uh because I think it's less sometimes less cliche um I'm reading on D R1 and who's not reading about R1 so like just trying it up I don't think I have like a favorite I think I did some of the Lama uh like uh lava uh like which is the vision uh um version of like Lama before for like some some kind of interesting use case like from like uh uh screenshot to code like just trying to test this out um having fun with it um but yeah I'm I'm not emotionally attached to any of them yeah and what's a neat AI tool that you've used and that you like um so I have something that I built that's really help make good research uh so so one thing is like when when I saw a link like a papers um there a lot of links and and TR to get a so I realized when I when I read a links I I go through the same same process like I is the abstract uh I look up the authors do little of work um ask questions um I also check when I was out uh check the citation so it has a little tools like just like go and scan so I have a link just give me all the information on on yeah like tool to scratch your own Edge um yes just like I think it's just a beauty of AI now you you can build like it's taking you like very small amount of time like you just build a tool like just for you like like before it would take me like weeks but now I can just like do one of that like I don't know this is something to be excited about I I agree with that and what what are what are one or two books that you've read and would recommend um oh I recommend a lot of books but I feel like it's recommending books when feel like forcing people to like do what you enjoy uh so um I like books that like help me get a news person perspective um or like get give me like insight into some topics I don't know a lot about so so I really like the book like first of all like complex uh adaptive systems uh it's a very interesting book about a system thinking like how to design um like yeah like like um how to how to design like social dynamics so that to get people to work toward like the goals does you want you work on it's very interesting book uh have forc you to think about systems um I like the book uh selfish Gene um because uh you understand more like about Free Will and makes you like question a little stuff uh but it's the idea of like you can live on either through like jeans or through ideas like there two ways that you can live on it's like yeah so jeans could need to live on with your offsprings and like uh reproductions the other thing is like if you have ideas and the ideas can also like replicate like one like Ed memes like genes and memes um I like the book uh anti fragile uh so so so yeah so so the ideas thing um I think the author is is a very interesting character uh I I read his books I actually really like them um um yeah I I like that book um yeah I think I think there are a lot of books that I like no this thank you for the recommendations so thank you for being on the podcast I mean AI engineering is such a new field and it was great to hear from from yourself who has clearly gone very Broad and also very deep and has been in this field even before it was called AI engineering so thank you for this thank you so much for letting me ramble on on the show um yeah I really appreciate it and um I I think that I think one one thing for me I really enjoy from writing or talking is that I get feedback because somebody was like oh I um I think I'm I'm less interested like I agree with you I mean it's great to here but it can also like a little bit like not good for the ego but I really like like maybe a little push back like okay you didn't think about this uh you forgot this or I didn't take into account this one so I would love to get those feedback uh so if there anything that you feel like I miss out on like do let me know I really appreciate it thank you to chip for this conversation about AI engineering to get in touch with chip including to get feedback on her book you can find her contact details on her website Linked In the show notes below I've also found her book AI engineering to be a broad and deep overview of this important field it's a book that focuses on the fundamentals that are unlikely to change so if you want to learn these it's a good book to have in the pragmatic Eng year we've previously done several Ai and ml related deep Dives check them out also linked in the show notes below if you enjoyed this podcast please do subscribe on your favorite podcast platform and on YouTube thanks and see you in the next one

Summary

Chip Huyen discusses the evolution of AI engineering, emphasizing the shift from building models to engineering AI applications using APIs and frameworks, and highlights key techniques like RAG, fine-tuning, and evaluation strategies for building effective AI systems.

Key Points

  • AI engineering is a shift from traditional machine learning, where engineers build models from scratch, to leveraging pre-trained foundation models via APIs.
  • The field emphasizes engineering and product design over data collection and model training, with a focus on solving real-world problems.
  • Common techniques for building AI applications include prompt engineering, RAG (Retrieval-Augmented Generation), fine-tuning, and evaluation.
  • Evaluation is a critical challenge, with approaches like functional correctness, AI-as-a-judge, and comparative evaluation being used.
  • RAG is a powerful pattern, but simple keyword retrieval can be more effective than complex vector databases in early stages.
  • Fine-tuning is a last resort due to complexity and maintenance challenges, with hybrid approaches often being better.
  • The best approach is to start simple, understand the problem deeply, and avoid over-engineering with new frameworks.
  • AI engineering requires a mix of project-based learning and structured learning to build practical skills.
  • AI will not replace software engineering but will augment it, allowing engineers to build more complex systems faster.
  • Key tools for learning include Python, JavaScript, and LLMs like GPT, Claude, and Llama.

Key Takeaways

  • Start building AI applications by first understanding the problem and user needs, then use simple techniques like prompt engineering before moving to complex solutions.
  • Prioritize evaluation by using a mix of automated metrics and human judgments to ensure your AI system is effective.
  • Use RAG for information retrieval, but begin with simple keyword-based search before investing in complex vector databases.
  • Avoid jumping to fine-tuning; instead, focus on data quality, prompt engineering, and system design to improve performance.
  • Leverage AI tools to automate repetitive tasks, but remember that problem-solving and system design remain core engineering skills.

Primary Category

AI Engineering

Secondary Categories

Machine Learning LLMs & Language Models AI Tools & Frameworks

Topics

AI engineering machine learning engineering RAG fine-tuning evaluation prompt engineering AI applications customer support project-based learning structured learning

Entities

people
Chip Huyen Gergely Orosz
organizations
Netflix NVIDIA Claypot AI Stanford University Swarmia Graphite Vanta
products
AI Engineering book NeMo RAG Vanta
technologies
Foundation models APIs Vector databases Retrieval-Augmented Generation Prompt engineering Fine-tuning LLMs AI agents

Sentiment

0.80 (Positive)

Content Type

interview

Difficulty

intermediate

Tone

educational professional inspiring casual