RS335: Evaluating AI Model Performance with Stuart Grey

[00:00:00] Speaker A: Foreign. There's no question AI is changing everything about how we live and work. And if you fast forward into the future five or ten years, our lives are going to look really different than they do today. In this episode, I'm chatting with Stuart Gray about how he's using AI in his work life, in his startup and his personal life, and how he's thinking about teaching his students and his customers how to use AI. I like this conversation with Stuart a lot because it's not just one perspective. As I mentioned, he has several different kind of stakeholders and perspectives that he is exposed to AI through and that he has to consider it. And so he has some really interesting insights into how we all can think about using AI and then really specifically how he's using it. How as he's using different platforms and models and technologies, tools to, as he said, make parts of all of his life a lot better. It's not replacing wholesale anything today, but it's replacing the most tedious, kind of boring, lowest value ad parts of, in Stuart's case, a lot of his work life really quickly and really effectively. And so I think this is a really good first example in our series on AI of how Stuart is using this across a couple of different areas of his personal and work life. And we extract and kind of abstract that into ways that you can think about applying AI technology to your life as well. I hope you enjoy this conversation with Stuart. Okay, so Stuart, you're kind of like we're talking about you using AI in like two parts of your world, right? Like day job at the university and then like kind of running, running the business and startup. Like let's maybe start on the university side. Say the teaching with AI is like super interesting. Like, what does that look like for you now? What kind of tech are you using and what use cases are you being successful with adopting AI for your students? [00:01:58] Speaker B: Yeah, so I teach AI to engineers, mechanical engineers and aerospace engineers. And these are very technical young people. But I'm trying to teach them what tools are there and what are the possibilities because everyone's got to accept their future working life will be AI will be a major part of it. And there's certain people, certain colleagues who don't agree with that. And I think they're wrong. So you've got to say, okay, we're teaching people to work in industry. AI will be a part of that. So what I'm actually trying to get across when I'm teaching these things is the critical thinking aspect of, okay, it's not using it or prompt engineering, anything like that, because that changes like six months to six. That changes all the time. It's more look at what you're doing, think about it, try and understand and be able to explore the edges of these tools, like push them and see what they're good at, what they're not good at and then think about it. And I actually get them to build chatbots using ChatGPT to make a big prompt and the chatbot is meant to teach them or colleagues, their peers, the given topic and it's like ethics, sustainability, things like that, non technical stuff. And they've got a really. And these LLMs, maybe unsurprisingly they're not very good at teaching ethics out of the box but. But you can for a number of reasons, but you can push them and tweak them and it's that sort of aspect. And I also work with a lot of colleagues who. Different levels of experience and it's a lot of that hesitation coming into okay, what can it do? Is it just for generating essays when it's really not and it's given the confidence to say, look, you can't break it, just ask it for stuff, see what it can do. Think about the. We talk about the broad areas where it's good of sort of text transformation, like especially in the business side as well. Like that's a key use. If you've got it in one format, you want another format, bang, there you go, hours saved, frankly. And then think about, okay, ideation, yeah, creations or riffing again, it's going to give you lots of rubbish, but there'll be. It's food for thought. And if you are an active sort of participant in this conversation, then you can absolutely do it. And I think from the university side there's a lot. Universities are up in arms all over AI and people don't know what's going to happen and they think it's going to replace a lot of things and kind of make a lot of certain education, parts of education pointless. But I really don't agree because it's going to be part of the fabric and everyone's going to use it. And it's whether our job as educators, it's okay, use it correctly, use it for what it's good at. Use a tool for everything. Don't use it to write everything, don't use it to write that sort of really important letter to your boss asking for a raise or whatever. Like, you know, think about where to use. [00:04:40] Speaker A: I got one of those a few months ago. So Obvious. I was like, are you serious? [00:04:43] Speaker B: Yeah, yeah, yeah. And there's also like good things coming from this. It's been used really widely and when I'm teaching, I don't prescribe, okay, use, use Claude or use ChatGPT4 or whatever. It's like, use whatever's available because there's copilot available to everyone and like some students experiment. But what we've seen, like I supervise master's students and projects and see a lot of these sort of dissertation type reports and the quality of English has gone sky high like compared to what it was used to get very good technical sort of engineering master's level dissertations. The English wasn't great, but now the floor has been raised and I think that's an example of the sort of thing that's going to happen. It's like, it's not going to make everyone superstars, but it's going to raise a floor. And that's good in a lot of ways. Also poses challenges. If you're producing mediocre work, it means someone else can produce that mediocre work very quickly. I do see a lot of benefits for it too. [00:05:39] Speaker A: Yeah, yeah. Well, one of the things that I'm interested in, maybe we can kind of like, kind of kind of chat through is there are one off things that you might use it for. Like take this like yesterday, take this XML and pull all of these items out of it like that. That's one time. The other one is like projects. Right. I want to do this thing, I want to do it over and over again. And then, I mean, not even between like Claude and ChatGPT, but like within ChatGPT you have projects which are like collections of chats, I would say with some kind of Customs, instructions and GPTs. I don't feel like I have a really good grasp on like when to use each of those, much less like ChatGPT or Claude. I want to use Claude for writing because I think it writes better currently. But, but it doesn't have the same kind of like, I want to come back and do this thing over and over every week, like a GPT, because I use, I, I use probably GPTs more than anything currently. But. But if I came to you and said, okay, Stuart, I want to take, I want to take this transcript from a call and I want to do a thing with it, how would you approach that? [00:06:56] Speaker B: Yeah, so that is a great question. And I think I would give you a third way, which is more low tech, maybe in the write your prompt in your text Editor choice, plain text, notes, whatever it might be, Obsidian, whatever it might be, get a prompt, right. You can maybe template it if you want to use text expand or something like that. And just start playing with prompt output, prompt output and just iterate. This is the engineer in me. Like sort of take it back and just. You want, you want iterations, you want to learn from those iterations. And I think that learning, you want diversity of models. You want to play around with, you know, like ChatGPT01 or 4O or Claude, whatever it might be and see which one's better because they do have real styles. I see that too. I read enough student papers, I can eyeball like 20. [00:07:46] Speaker A: Oh, interesting. [00:07:47] Speaker B: Okay, that's okay. But I think it's that sort of take a more systematic approach to it. I don't particularly like the create your own GPT thing. I've played around with it, but it's. I much prefer having a set of prompts in a plain text or whatever, like lying around. Like I said, okay, this one. And then because you need the human judgment when you're learning, which is nearly all the time early on in these projects, you want to be flexible. You don't be reconstituting the GPT all the time re adding all the contextual data and stuff and instead just have like, okay, a standard set and just treat it as a research question. [00:08:25] Speaker A: I would say, okay, okay, okay, cool. So instead of me saying like, yeah, my go to is GPTs because it can do this thing over and over each time I come to it, I could look at Perplexity or Claude or whatever. Gemini maybe even like to say I have the prompt or the series of prompts like a GPT would provide and I could upload a bunch of background information kind of documentation to help it. I don't want to say fine too, but like narrow in on the context that I'm giving it and I just do that in a regular chat, I think that's it. [00:09:00] Speaker B: And I think you can then experiment, you can then see what each one does. See what the strengths and weaknesses are and then it's a quicker iteration cycle than. And I think that's a good approach. It doesn't tie you in either because yeah, some of these things, a lot of these things are developed by these companies to tie you in as well. Like once, once you, once you got all in projects in OpenAI. Okay, well that's where I'm going to work now, isn't it? I'm not going to go across to Claude Or Gemini. Yeah, not that I'm cynical, but I think as a good. Keeping it separate is a good strategy, I think, and just allows you then to sort of be far more flexible, far, far more flexible in your experimentation. [00:09:38] Speaker A: Kind of using it at a tool level instead of like a platform level. Right. Like really grokking the kind of basis of what's going on. Okay. [00:09:46] Speaker B: And I think another one to add to that list would be sort of local LLMs and something I use. And we'll get onto this maybe with the talk about what I use for my business. But you can run LLMs locally, smaller than the ChatGPTs but getting better all the time. And especially important if you are using sort of sensitive data or data you don't want to send to a big company. But they work too. And there's again a large selection of different sizes and speeds of model, things like that. And the tool I'd shout out would be olama. So the animal llama with an O in front of it, it's cross platform, install it. And then in your terminal you say ollama, run whatever model. And then you just chatting locally in your terminal and. And then a whole new world of experimentation and easy to sort of pipe things in a little bit of scripting and say okay Chuck, in all these files, all this text, these transcripts, this folder of transcripts. Right. [00:10:44] Speaker A: I want to come back to this as a very specific question I had for the tiny seed group the other day. But I want to get back to something you mentioned about trying the same thing in multiple different platforms or different models and seeing what you get. Like how taking a. Some engineer too but like take a systematic approach to how you evaluate that. Right. Like how do you instead of just like enter the prompt and upload the transcript and see what happens. Like how do you systematically evaluate three different models in OpenAI and then Claude and Gemini and all that kind of stuff. [00:11:21] Speaker B: Yeah, that's. It's the evaluation piece is really hard now. You can get all technical and sort of use LLMs to evaluate it, do all these things. I don't do that because I think that's a far later. If you're in production and you've got some really sort of high stakes production code, you want to have a real. It's like test suites for code is, you know, you want to have some sort of valuation in these early stages. It's something like if you were generating some text, you have the standard props, standard context to give to the model. You do it three times, you can Have a look at it. You can eyeball it and see what it's like and then try and correct it. And it's something where I'm very big on is you need to have that human step anyway, whatever you're generating, even if you're generating a thousand things, you need a human step in there somewhere. And okay, how many edits did I have to make to that? What did I have to change? And use that as the metric, Especially in these early days, you don't want to over instrumentalize the process early on. I think that's where you see a lot of stuff about, okay, using Genai in production, all these things, and it's people talking far later and bigger in the process than we are. Because typically we're doing things relatively small numbers like less than a million. Right. So it's far more that experimental process, I think. Yeah. So I'd use. The human brain is very good at spotting discrepancies and like, you know, the vibe, the smell, you know, whatever you want to call it. And that's what I do initially, definitely. But the other thing is comparing between different LLMs also you want to run it two or three times on the same one and see how consistent it is. Because sometimes it'll just absolutely wipe out and give you junk or make things up. And it's another thing to consider is, okay, will this reliably give me a good answer or does it give me a really good one? One in ten times. And what am I optimizing for? [00:13:17] Speaker A: Yeah, interesting. Okay, so. So for the repeatability within the same model, is that within the same chat or are you starting a new chat each time? [00:13:23] Speaker B: I'd start a new chat. Start a new chat. [00:13:25] Speaker A: Okay. So the previous context would affect the output. Cool. Okay, I figured. Cool. So I mean, just to like rephrase, like it's pretty subjective in the evaluation at like, like you're saying at our level and kind of stage. Right. Like, don't make it more complicated. Try a couple different models within each of the platforms. Try all different platforms and kind of generally eyeball and see. See what you get. Do you have like rules of thumb generally, like if I want to do this, I go to Claude. If I want to do this, I go to ChatGPT. Just to like, if folks want to skip that step and say like, oh, Stuart said just do this. [00:14:04] Speaker B: Yeah, it's dangerous. Well, step number one is always play with the newest models as they come out. So as things get released or announced, have a play with it. And I think that's where you can discover quite quickly. So these models come out quite frequently but just have a standard little task like my company is called student voice. So my standard approach and it's like okay, what is student voice? Ask that question and based on that answer because obviously I like the concept of student voice is like a concept within education how good that is something you know really well like sort of a shibboleth or test and then give that to and try every new model. Otherwise I think the newer models where they trade longer runtime for higher accuracy like O1 or ChatGPT are really interesting. I've had really good results them. I'm willing to wait 30 seconds because of the scales. We're doing things again, we're doing things 10 times, maybe not doing things billions of times. 30 seconds, fine. I'm willing to wait that longer than I think a lot of listeners will be too. And I see a clear benefit in reliability and overall quality coherence, understanding of what you're giving it, actually reading the context you give it and sort of passing it. I see a big benefit in those. So the ones that are more skewed towards that sort of runtime chain of thought reasoning I think are strong. So oh, wannabe one to try I think. And yeah, experiment. It's the same thing. Experiment. Buy one month subscription to the pro plans of the different ones and then cancel after a month. But have a play and get a feel for it. I like the text given by Claude I think as well there's another layer on this because as a non American some turns of phrase that come out of it because it's trained on, on like there's massive corpus of it of text from the Internet and a lot of it is from American English speakers. But certain little bits that rub non Americans, the Brits and the Australians are like oh, I can't. Like it's right. Yeah, it's a grammatically correct there. That's not, that's not how someone I know would speak. So yeah, there's a bit of that too. So one of my most common prompts is use British English because I forget and then I add that and then I get the answer again and it's like okay, that's far, far nicer. [00:16:28] Speaker A: That's right, that's more right. I. Yeah, it's funny, I was just looking at what I'm using. I think like by default with like chat GPT personal it's using chatgpt 4 maybe 4:04 o probably. Yeah, yeah. So I'll have to. Oh, regular 4. That's lame. I'm going to use 4.0. [00:16:53] Speaker B: Switching. Yeah, you'll see a big difference. So I went back to looking at. I think it was 3.5 or 4, because that's what an enterprise license had. And it's just like, oh, no, I remember that. Like, at the time, it was like, this is magic. How can it do this? And then you look now compared to the models today. [00:17:10] Speaker A: That was like six weeks ago, too. It's crazy. [00:17:14] Speaker B: And that's why I'm not going to say, use this model on that one. It's more general approaches because play with the latest ones because these things are getting better. It's an open question as to whether they'll get more specialized. It's like, will Claude lean into that? Okay. Better for writing. Will it. I'm not sure, but I do think always using more recent ones. And likewise, the local models used to be garbage because it couldn't run on a laptop, but now they are peer with sort of 4o sort of levels. Not as good as those reasoning ones, but they're really good. [00:17:44] Speaker A: Yeah. Yeah. Cool. [00:17:46] Speaker B: Yeah. [00:17:46] Speaker A: Let's circle back to, like, running. Running things locally, you know, like, so. So my question was, we're. We're a podcasting company. We have 300. I have 335 episodes of this podcast. I want to load 335 transcripts into a model and do stuff. Right. Like. Like what? My. My question was, like, I just want to write a book, Right. I have, you know, 300 hours of podcasts. Like, surely there's a book there somewhere. Like, how would you. How would you think about, like, one. Just, like, when and why do you start running models locally? And then, like, just a little bit of tooling talk. Because I know, like, I don't want to get too specific. Yeah, but just like, how do you think about it? [00:18:28] Speaker B: Yeah, exactly. So I think general approaches I'd take. There is. It's that experimentation thing again. And you can get these transcripts relatively easily. The sort of class of tools would be sort of the whisper tools, of which there's apps and things like that. Chuck them in. [00:18:49] Speaker A: And then we do it at a platform level. So, like, we have the transcripts. [00:18:53] Speaker B: Yeah, yeah, yeah, exactly. And then it's. So with the local models, the only good business reason to use them is when you don't want to give your data the data, you're giving it to a third party. So normally that involves customer data. So if it's just you playing around, use the Web interface for the tools, the local models. I got into that a bit of interest, but that would only get you like a couple of minutes of playing with it. But I use it on the company because I can use it to analyze all the data we have from our customers in interesting ways without giving that confidential data to anyone else. And likewise for you, it'd be maybe the transcripts of, you know, all the episodes on the platform and you can sort of run that through without sort of giving anything away or. And have full control over that pipeline, make it automated, things like that sort of pipe these things together a bit easier. So that's the rationale really. There are some ethical reasons of this, of control and things like that. I think it's. If you're an open source fan, it's a great thing, but in a more sort of prosaic case, it's just okay in terms of data protection again in the UK we're very strong on that. So it's from that reason. But then a typical flow then would be okay, you have these transcripts again. You experiment, you take the first few and what you're dealing with then. And again this will be out of date in five minutes is sort of the context window, right? [00:20:22] Speaker A: Yeah. [00:20:23] Speaker B: And how much you can give in. So now transcripts can get long. I've recorded a few sales calls and what is like not a particularly long or really talky sales call comes out as a huge, like a long text file like when you, when you paste it into an LLM. So I think it's that discovering models with a big enough context window, as in you can give them all this text. Two, this is the harder bit. Going back to our valuation piece. Are they actually reading it or just pretending that? Read the first few paragraphs, read the last few paragraphs again, which is a real problem. There's been a few prompts I've had to write where you tell it to do something, give it a big bit of context and then tell them to do it again at the end just to make sure it's like really understands what it's doing. And I think there, it's just a matter of experimentation. And then, okay, you can fit two episodes worth into the context window. Let's say then, okay, you can't do the whole thing. So you work out a prompt, experiment with a prompt. Okay, give me two transcripts. Condense that. Because there's a lot of. It's talking, it's humans talking, we can compress that. What are the key points? What are the Highlights what are sort of the most exciting emotional bits, whatever it might be. And the models are good at that. The models fall down and the hallucinations come. When you're asking it to create things like whole clothes. If you are saying this is some text, transform that text, whether it be a summary or not. They're really getting really good at it now. So then you just build that process out and, okay, each two transcripts, build that summary. Each two transcripts, build that summary. You've got 300, so you might have to do a couple of layers of abstraction here, but then you get your summaries, merge all those together and do the same thing and then iterate on that. And I think that'd be a really strong process because these things are really. I've used them for ideation, for writing, and they're really good if you give them some good input. So in terms of the day job at the university, give them academic papers, journal papers, things like that, PDFs in there, my own writing notes, things like that. And they can really synthesize this stuff and sort of get across the essence quite well. And then obviously human side after that. [00:22:34] Speaker A: Right, right, right, Cool. So you're running these, like, locally just on your MacBook or whatever. Like, this is not like, okay, cool. Yeah, okay. [00:22:41] Speaker B: Yeah. It's one of the benefits since the Mac went to the new architecture a few years ago, the M1s to M4s. One of the parts of that is the memory is really close to CPU and there's lots of it. And it was just done because Apple, like, wanted to do it. There weren't no particular foresight, but it makes it really good for running these sort of models compared to other sort of laptops and stuff. It's crazy, frankly, that it runs on laptop at all, but you get sort of really good ChatGPT4, just straight 4 equivalent model run on the laptop. No bother. [00:23:11] Speaker A: Oh, wow. Okay. Okay. You mentioned, like, ethical considerations and I don't, I don't want to get to, like, I, I don't. I'm not, I'm not an expert at all here. But, but like, I, I mean, at a, at a higher level, I do think a lot about, like, like, if you, if you follow this through to the end, like, what the hell are we going to be doing? And not five years, but maybe 10 years. Like, you know, I see the stuff on Twitter that like 80% of people will be working less than 20 hours a week in five years. And I'm like, first I'm like, Bullshit. And I'm like yeah, I mean look at what. Where this was two years ago. Right. Nowhere. Yeah. And it, and it's exponential. Right. So like I don't know like from a, from a societal and a business perspective, like how are you thinking about where this is going and how to position yourself best? [00:24:07] Speaker B: Yeah. Yeah. So I'll try and paraphrase what I tell my students because this is as you can imagine, this is a super common question from 18 to 22 year olds. [00:24:16] Speaker A: Sure. [00:24:17] Speaker B: Are we going to have anything to do? And what it's going to do, as I mentioned before, it's going to remove the possibility of being mediocre. Which means if you're generating generic stuff generally for general people, what's the point? And that can be. I'm not talking about just content. I'm talking about in a job, office job if you're just passing things from A to B. If you're the middleman or you're just moving reports around or turning emails into reports, that's gone. But that what it leads to there is hopefully and I try to be an optimist on this, I really do is the possibility that we become far more specialized and creative and you find the thing you are really, really good at, you bringing your special source be that personality or just technique or problem solver it may be and you're using these tools now there'll be some certain portion of people will go and live in the woods and okay not doing it but I think if everyone else it's going to be on building on top of these tools and I do think it'll be far more specialist. It'll be all the people facing stuff I, I see in all the AI world you see these sort of video avatars and things like that and, and that just I'm like for people for scaling up outbound and it's like that's not. No, no, that's not. Yeah that's not the way it's going to go feasibly because seeing someone face to face for half an hour is a million times better than some pre canned generated video. Like it's kind of not the point. Use it to remove the drudgery frankly remove the risk of mistakes. Automation isn't a new idea and doing that hundreds of years and go to talking back to the 20 hours a week thing sort of Keynes was saying this like the economists in sort of 1910. Oh yeah. @ the current rate everyone will be working four hours a week. It'd be brilliant. And that didn't happen we'll find new things to do, but hopefully maybe more satisfying things, more human things. Yeah. Which doesn't go down too well to the engineers I talk to because they're like, well, I like engineering. But then I get to the next level, which is engineering, as you know, is a very human interactive team sport, communication, all those sort of things. So then we get to the real nub of it, which is, which is quite nice. But yeah, it's scary times and it is coming and I think five, 10 years is going to be wildly different. I've got two young kids and I've not had them playing with it yet. I feel like an aversion. I don't want to sort of like ruin that innocence because as soon as they get something they can talk to on the phone or the iPad, that's them going to be away. And I think of what they're going to do. And yeah, that sort of human creativity aspect I think is going to be more and more important. [00:27:00] Speaker A: So it's interesting. I have two kids, 12 and 14. One of them has GPT on his phone. ChatGPT on his phone. And he talks to it all the time. Um, anytime he has a question, he does. It's so like, you know, from business, he doesn't go to Google anymore, he just goes to ChatGPT and talks to it. And it's fascinating. Like the stuff, the rabbit holes he goes down is so interesting. My, my older one, so that's my son, the younger one, the older one doesn't use, doesn't use it at all. So, so it's, it's interesting to see kind of how they're, how they're looking at it, using it. I agree. Like, I mean, just frankly, I look at like, okay, running Castos and like, you know, it, it is relatively AI resistant I think currently, you know, like, I don't think we're going to get replaced by AI tomorrow. Like, there's a question of podcasting in general. Like probably overall positive, like of a trend. Like if we were a content marketing agency, I'd be shitting the bed right now. But you know, but, but largely, I think actually podcasting to that human kind of in person thing is like on a positive tick. But yeah, I agree, the in person stuff and the personal connection stuff I think is the most valuable thing that we can all lean into. And then how do you get the technology or the tool or the platform or the use of AI to replace the bits of your work and your life that aren't that Personal connection part so you can spend more time doing that. Um, is kind of how, how I think about it. But it's super interesting from like, you know, being a boss and like the discussions with our team about like, hey, you know, our development team. Hey, how are you guys using this, you know, support team. How are you guys using this? Largely very positive, a fair amount of skepticism and like, I think mostly healthy caution maybe I'll say, you know, like, hey, I don't want to just throw my whole current set of responsibilities at this thing that I don't really understand. But like, hey, yeah, do like if it saves me two hours a week, that's cool. Like that, that's kind of, I think where we are as a, as a company, I am trying to take, we talked about this before we started recording is like, I'm trying to take like anything I have to do. I want to open ChatGPT and do it. And I try very hard to do that. And that's really where this whole kind of series came about. And it's like I want to do better at that and I want it to be a part of kind of everything I do because I do think it's really efficient. It's like a starting point at least. [00:29:46] Speaker B: Yeah, I really agree. I think it's a tool with way more uses than people realize and way more specific uses because it comes across as generic tool. Okay, give me a 500 word essay on the history of Mesopotamia. Right, fine, cool. But it's like, no, no, no. You usually got a really specific job in that you're doing whatever it may be, it's writing, understanding, ideation, any of that. It's so, so, so useful. And on the sort of the replacement side, again, I'm really strong that the in the future you'll need human. You continue to need human input on it because the ones who don't you forget, just let it do the whole thing, the whole stack automated and they sit back and do the four hour work week, dream, whatever, right? It's gonna, the output is not gonna be as good as if there's a human tweaking constantly looking at it, giving some judgment on it, like with student voice. So we analyze the text comments from university surveys of students. Right. And I did my PhD on autonomous systems, which is what they called AI before there was any money in it. So it's definitely possible to be too late and also too early. So I had these machine learning models categorizing the text. And you can be sure the first time I did that for a paying customer. I ran all the models and I checked every one by hand. And I remember it, I can still remember it was an all nighter. I went through the night checking a spreadsheet. I could have done it myself but I needed to automate this at some point. But it's that checking, you check. And that was before Genai. There's no hallucination there. It's just models being wrong. That's what we used to call it. Errors being incorrect. And you've got to keep that check in. And I think you can replace parts of your work wholesale, small parts. I think you can replace lots of your work partially. I think that's probably the way to come. I think a trap probably for more technical minded people is I can automate this end to end. Like I can create, say content marketing. I can create. [00:31:59] Speaker A: I'm going to have this mega chain agent. [00:32:03] Speaker B: Exactly. And it works. It takes one minute. Okay, so 60 minutes in an hour, 24 hours a day. I have to leave that running for a week. I've got 7,000 new articles. It's going to be brilliant. I'm going to take over the world when it doesn't work like that. And what you need to do is have, okay, automate the stuff it's good at. But then you got to have that human judgment, human injection of quality, of nuance, of personality, style, style, style, perfect. And so I think that'd be a key takeaway, I think is think about the tasks you do. Could AI replace the, the most boring, menial, 75% of that given task. Because we're placing something wholesale, like emails come in. I want it to automatically send an email out. It's a recipe for disaster, right? But if it could draft you one and then you check it or do something else or you know, transform the report into whatever different format like it could really save time but leave that room for you to check and learn. And importantly from an education point of view, being part of the process, if you see where it goes wrong, you can fix it. And that's why we talked about experimentation before. If you have a very simple. So here's my prompt, here's some outputs. Okay, it's doing American English, right? Okay, go to visual prompt, add use British English at the end, every area. Don't use if it's using certain words too much, whole delve thing, right? Okay, give it a list of words specifically not to use, run it again and it's that iteration. If you're not doing that, you're not Learning it's not learning. It doesn't stand the chance. And then something I've grown to accept is when I first started using these tools, my prompts would be like a paragraph. I don't know, there's a block of like, okay, I want to tell it what to do. Quite succinctly, but succinct. It needs all the information. Your prompt should be really dripping with. And do this, don't do that. Definitely look at this. And they're really quite structured and quite long and you iterate your way there, I think, would be the key approach. [00:34:04] Speaker A: Yeah, yeah, yeah. Interesting kind of talking about student voice in the business. Like, how are you finding a lot of kind of consistent success with it, chipping away at those parts of your work life? [00:34:17] Speaker B: Yeah, yeah. So, and why we've talked before was on the sort of content generation. So having just talked about how you definitely shouldn't have an automated process to generate thousands of articles. No, no. So I think for the AI content generation, there's two types in my mind. There's the type which is generating generic rubb. Right. And you could do that. I don't recommend you do it because I don't think there'll be any benefit. It's going to be penalized eventually, if not already. What I think there's a big opportunity and people don't seem to be doing a lot of that. That I've seen is as a business, any business, you will have certain types of data or information or knowledge, like in your, I always think in the database, but like, you know, in the business. And for us it's sort of millions of comments from students. For you it's transcripts of podcasts and lots of information about the podcast. And I think an approach would be for your business or your project, whatever it might be. Look at what data you have unique to you that you're generating. Anyway, what the students talking about, how long are the podcasts, what area they're in, whatever might be, and use that as the impetus to generate the content. As an example, I generated content based on what different students studying different types of courses were talking about. So, okay, mechanic engineers had these opinions about the library, and economic students had these opinions about getting feedback on their work. And because I've got enough data, I can say, okay, and yet then that's using local models because those comments cannot leave our systems using the local LLMs to generate a bullet point list of just like, yeah, these things. And then you take that bullet point list, which is topics, and then give that to other models, play around, build a little pipeline. Basically, I automated to the point of sending those bulleted lists to OpenAI API, get back paragraphs of content. And now if I look at it now again, talk about things getting out of date. I could probably do it in about half the lines of code. But then the key thing was then reading these documents and going, okay, reading it with a critical eye. Like, I mark a lot of work, so I've got practice doing this. But sit there and read it as if it was like a student giving it to you and say, no. And like a student giving it to you, you say, okay, rather than deleting the word delve, make sure you delete the word delve. And then go back to the prompt in your code and say, don't use delve. Right? Take that extra two seconds and then go through it, each one. And that's what I did. And I do. I generate all these sort of drafts. You can generate drafts? Thousands of drafts that I go through. And sometimes the drafts were just junk. It got the wrong idea. Bin it right? You know, just don't waste the time on it. And then you generate all this content that was super specific. And there isn't anywhere else, really. No one's been that specific about these issues raised. And it's useful content. I get sort of a long tail of users now coming to specific areas of like, find out, okay, what are nursing students thinking about this? Because the data is not there. And I think that's an approach. Use what you have. If you can use local models, play with those, but otherwise work out a way not to. If it's just numeric data and things like that, it's probably not a problem. And then generate what goes back to that human aspect, what you can uniquely create. If someone else can do it, there's no point doing it because someone will do it in a factory and just churn out millions of them. What can you and your company or your project, whatever it may be, whatever size doesn't have to be a big company with loads of data. It could be your experiments, whatever you're creating, use that as a starting point. And then, as we discussed, just build that pipeline to generate these drafts and then use your critique of the draft to improve it. And you just go round and round and round. And yeah, I generate hundreds of these and I continue to refine them. That's the last decoder to this, is that they were generated two models ago. So now I've got a workflow is a bit much of It I copy the markdown, paste it into. For the blog post, paste it into different models and say like, clean this up, you know, make it more coherent. Don't repeat yourself all these like. And the model's like, oh yeah, of course. And it makes a tighter, nicer, higher quality writing, you know, so you can sort of, you keep. It's gotta be a work in progress too. If you generate something. I talked about cringing at seeing ChatGPT 3.5 output, you'll be cringing at your own content. And you don't want that to happen. Cause your customers will have the same reaction you would. [00:38:41] Speaker A: Yeah, it's interesting. So we are doing a bit of this kind of programmatic stuff right now across a couple different areas. And yeah, we have all the content created already. We're slower to roll it out because we want to build it into. Being able to deploy it into WordPress in a way we can update it easily. It's not just like copy these cells and put them in these custom fields or whatever, but it's going to be truly programmatic. So we can go back and say, okay, here, update this spreadsheet or this doc or whatever with the newest model or this information. And I want to expand and add these three new fields. So yeah, we're looking at like, if we get it, as we get into Programmatic, to make it like repeatable and sustainable and probably update every six months or something like that, all these, you know, a couple hundred pages and we'll have several of these to where each quarter, each month, maybe even where we're kind of updating a project. [00:39:36] Speaker B: Yeah, yeah. And that's a good process. Something I did was I generated, I had articles where I had pre AI. I paid PhD students to write articles for me. Right. And because they're cheap and they're clever. So it was a good deal, I thought a lot more expensive than ChatGPT subscription. But I had all these articles and I was like, okay, what can I do more? And something you can do with existing content is like add FAQs. Like ChatGPT is brilliant for that. I, I saw a real uptick in my sort of visitors and all the metrics basically around sort of content. After I added that, because I had maybe 80 of those sort of handwritten articles by people that were good, good quality, and then just added those FAQs to the end of them, just, just appended them and got much better sort of keyword finding. And I. Could you also tweak those sort of. The prompt for that to say also mention, you know, the word student voice in there. Mention, you know, the best educational analysis software, whatever it might be. It's like you can sort of do it in a natural way. And I think those sort of ideas as it comes, like, okay, I can add, using these tools, I can add that to everything. Because previously you wouldn't have gone through the previous posts and an FAQ for every single one by hand because it just be. It seemed boring and kind of pointless, but now you can do that really quickly. And I think that sort of. That constant improvement, whether it be polishing or adding sections or maybe splitting things out or whatever it might be, can be really. [00:40:57] Speaker A: Yeah. Interesting. Interesting. The one thing I want to touch on here is you mentioned, like, having your prompts somewhere. Where is that somewhere for you? Is it? You mentioned Obsidian. Like, are you putting all your stuff there? [00:41:09] Speaker B: Okay, yeah, yeah, yeah. It's an Obsidian. So I try my best to be structured in these sort of things. It turns out it's still just one file called prompt, that's really long. [00:41:19] Speaker A: Okay, okay. [00:41:20] Speaker B: But having it somewhere in plain text, I think, like, independent of system, because then I can go in there, copy, paste them into. I do have a few of them. Once I've built out a bit of a process around doing anything more than a few times, be in work or in the business and say, writing a sort of pro forma letter. So I'm a big user, like text expander snippets, things like that. They're great for prompts too. So you have your prompt in there with the holes ready to go, fill it in, paste it, you know, paste it directly into, and then you go from there. So there's a few of the more regularly used ones live in a snippet, Hold a text expander for me. But whatever would work. And the rest, all the ideas, all the other prompts live in that one place at least. Then the other aspect is if you are coding this up, if you are putting it in a simple Python script of a here's your prompt and you give it to an API and get a response back. That's where it lives. And then you've got the version, it's all in git, it's all version control. It's quite nice. The more systematic you get naturally, the more structured the output gets and you can sort of find it again. [00:42:27] Speaker A: Yeah, And I think that's why I've liked the GPTs. Because, like, the prompt is there, right? Like the prompt is there. It doesn't have version controlling but, like, I don't have to, like, come up with this thing over and over, but I do, I do. Like, the one I use the most is like creating YouTube scripts. And I won't, I won't get into the whole thing. I'm going to do a whole video about how I do it. But. But it's now, I just updated it this week. It's like seven or eight steps long. Yeah. And where I need to go and review and update and give it input. And like, it doesn't. It wants to just be done. Like when I click the button the first time, you know, like it has a few steps and then it's like, okay, great, I'm done. Here you go. And I'm like, whoa, Jack. Like, yeah, yeah, I didn't give you this piece yet. And so I do, I do. I am finding I want a little more control over the kind of sequencing and the pace so I can have that input. But, you know, and that's, that's a good learning, right? Because, like, at the beginning I was like, it gave me a script. This is great. And the script, it sucked. [00:43:26] Speaker B: Right. I mean, yeah. [00:43:28] Speaker A: I will say for anyone out there, like, if it gets you publishing videos, great. And then just like, you know, just like anything, try to be 5% better or whatever, like the next time. And that's kind of where I am. I'm on the kind of third iteration of this right now, and it's really making me. Well, like you said, it's replacing a small part of that whole workflow and it's doing a really good job. [00:43:53] Speaker B: And another point I'd like to make is that if you play with these things and it should be thought of as play experimentation, however you want to sort of do it, if it's serious, it's experimentation. It could be exactly the same thing if it's fun and play is to try and get exposure to all the possibilities what it's doing. Because if you're using it. So I learned loads making that sort of content generation sort of pipeline. And then that actually led to a feature in the product because then I had a good idea of, okay, what these local models can do. Okay. Where the strengths and weaknesses. And then it gives that confidence to say, okay, I can actually do summarization of the comments now we do that all locally again, that was. We had customers taking our very structured outputs and just chucking them in ChatGPT and saying, look what we've made. I'm like, ah, don't do that. Like, great. But then that's like the perfect product development. When a customer says, look, I've made this Rube Goldberg machine of data loss. Could you make a better one? But I could only do that because I had the experience of doing it on sort of low stakes, sort of content generation. And I think that's where the content is good, because anyone writing content for their business knows if it's right or wrong, they can be the critic. There's no. It's easy. And you can learn how these things work, strengths and weaknesses, and then, yeah, use that to sort of. You'll get a. A more real notion of what they can do. Hopefully that can give you a more realistic idea of where that could fit into your app, if it makes sense at all. [00:45:26] Speaker A: Yeah, yeah. Cool. Cool. I think that's a good place to kind of wrap up, man. Anything you'd like to leave folks with? I like how we've abstracted and generalized as much as we can. Like, instead of giving really specific things, I think these will be more kind of timeless lessons for folks. But anything you want to leave folks with for them to go and play. [00:45:46] Speaker B: And learn, I think it is just have that sort of beginner's mindset. Imagine, imagine like 100 students are doing it in Glasgow, right? They're playing. Discovering these things. You can do it to just be fearless. And then I think the last concrete thing I'd say is if you can find out where your customers or future customers are using AI already and then build the first party proper version. That was such. When I got there, those multiple emails from staff saying, look at what we've done. I was like panicking, but it's like, it's brilliant. So if you now ask them, say, have you got any staff doing this? If you can find where people are using this already with what you provide, or something similar, that gives you really good signal. I think in terms of business development, that'd be a good takeaway, I think. [00:46:31] Speaker A: Yeah. Awesome. Love it, Love it. Stuart. Like, folks want to connect with you or kind of learn more. What's the best place to connect? [00:46:39] Speaker B: Stuart Gray, S, T, U R T, G, R E, Y and LinkedIn. Unfortunately, I've honed in on LinkedIn. I don't have time. I'm time stuff halved. LinkedIn works. You've got to play the game a little bit. But yeah, if anyone gets in touch, please do. Happy to chat about this stuff. I love teaching people this stuff, so always open to conversations. [00:46:58] Speaker A: Awesome, buddy, I appreciate it. Thank you.

Show Notes

Highlights from Craig and Stuart’s conversation:

Episode Transcript

Other Episodes

Episode

RS319: Mastering LinkedIn Video Content

Episode 0

RS112: Development Process, and Updates

Episode

RS083: Building Funnels that Convert