Hello welcome back to Sock Talk, episode 3. We're on a roll. Three is a charm. So last week we said that we would come back and look into Sora. John did you do your homework? I did a little bit of my homework but not nearly enough to satisfy an instructor like you. That is a nice reverse Uno. I took a look. I spent some time going over only what's available. Basically there's no deep research papers into exactly what they've been doing. So it's all kind of tied into their website and they're promoting it as well. So to summarize what little I learned, I spoke to some of our AI colleagues briefly today and gave them a very high level explanation and it's like am I remotely on the mark? And they're like yeah, close enough. So I'm going to take that close enough to at least high level explain a little bit of what's going on with Sora. A lot of these text to image generation models are using diffusion and the way that works is they'll generate some noise and then from that noise they're trained to understand patterns and recognitions and well this depends where we're going how high level we're taking it. There ultimately the outputs of the visual model are trying to align to the inputs of the text. The text is going to represent some sort of concepts. So let's say we've got a swimming turtle in the text. The AI model is trained in such a way that it has a concept. I say concept obviously it's a neural network. They can't really investigate that much. The diffusion model is then starting to pick out patterns and then through multiple iterations trying to formulate that turtle. Now again, let me be absolutely clear here to everyone listening. I am paraphrasing as well not paraphrasing but being as high level informed as I can and I'm by no means an expert on this. So with a single image that's relatively I say relatively easy. It's not being easy. It just seems easy now because we've got video. The difference is what they were struggling with before to make videos is they were doing it frame by frame and the consistency was hard. So even a while ago when you saw Will Smith eating spaghetti, I don't know if you saw that one. Yeah, I did. It was really hard to consistency. You see like hands forming and all this craziness. What they've been doing with Sora is they've redefined. So with text generation like chat GPT we have tokens. Tokens are what it's trying to produce and align with. They're using what they're calling patches which instead of tokens is space time rather than tokens. So space and time temporal meaning it's not just a single image. They're layering it up and they're looking at the whole video. So the whole video in this four dimensional space time they're looking for concepts that align with the text. So that means if you say swimming turtle, there is this four dimensional conceptual shape that starts forming and then that's why they have so much consistency. But because of that it starts forming and getting it somewhat actually consistent all the way through. It reminded me of when I thought about it like that. You ever see Donnie Darko? Yeah, sure. Do you remember in Donnie Darko when he started seeing these wormholes coming out at people because that was their pathways through time. I would imagine that's kind of something how Sora is working is it's finding these objects and how they transition through time and it's morphing into these which is why when whatever the video is some sort of conceptually a swimming turtle. I'm saying a swimming turtle because one of the examples they have is a swimming turtle or it's a butterfly in a drone actually that they transition from a drone flying into a butterfly flying underwater for some reason. It's apparently the things you can easily do now with AI and the transition between one to the other is really seamless and absolutely fluid which is nuts and ridiculous. I wanted to quote something they said. They are surprised by its emergent simulation capabilities meaning that it can just apparently simulate things without specifically being trained how to simulate anything other than just videos and all the other multimodal stuff. Here they say these capabilities suggest a continued scaling of video models as a promising path towards the development of highly capable simulators of the physical and digital world and the objects, animals and people that live within them. Another part was their properties emerge without any explicit inductive biases for 3D objects etc. They are purely a phenomena of scale. So there you go, a little like a high level and all of that is just coming from their website and as much as I can go into it because the second you start going into the algorithms and it starts losing me a little bit. I'm sure it becomes a conversation for other people and for a different audience. Yeah, exactly. Lots of people can have that discussion but that's not what we're here for. No, no, no. We're here to talk about the high level fun stuff and then draw off into multiple different directions and then maybe come back and tie on the bow at the end or not. Well, I don't know. Do you want to say anything on that? Sure, I read the same stuff you did and I was really surprised to hear that it starts with a Gaussian blur because that used to be how we hid details when generating computer imagery. You could apply Gaussian blur to the edges of things or between in order to give the clearer impression of movement between frames and 3D animation. Gaussian blur was one of the ways that you could just avoid having the artifacts of really sharp pixelated edges that could stand out when you went for in frame instead of supporting the illusion of movement. It broke the illusion of movement so Gaussian blur was a way to interfere with that and here they're using Gaussian blur and the concept of four dimensional space patches to create this illusion of movement. I think that's really cool and some of the videos they have on the site, the woman playing with the Shiba Inu. What are the names of that? Shiba Inu. Shiba Inu, thank you, the Japanese dog. Japanese, right? Yep, I believe so. So the woman playing with that dog and then the way its movement totally distorts it in the earlier versions of the show. I think they have three animated thumbnails side by side. Yeah, they're demonstrating the levels of compute. If they exponentially are computed, they get better results. Which is a beautiful illustration of what you were talking about, especially that quote that you said at the end about scale. So the more they throw at it, the more realistic this becomes and that's where I start to disagree with them. I'm okay right up until that point. But the other quote that you cited about how it can create a simulation of the natural world of the things that live in it, only if it happens to have considered those things. And I use considered very, very roughly. Because as you know, I don't think these models actually consider anything. They either have a concept that they have or they don't have a concept that they have. They're not counter balancing them. They're literally changing their concepts by iteration through more processing power, not by an attempt to improve. So there's beautiful animation on their site. For instance, the Big Sur coastal highway in California. And it looks gorgeous. There's beautiful animation of woolly mammoths charging through the snow towards the camera. In both of those vast 3D spaces, there is a lot of beautiful animation, secondary animation and tertiary animation going up. The fur moving, the waves crashing, the snow poofing up. But the trees in the background aren't moving. Even though you see trees for kilometers or hectares, I guess. The trees aren't moving. And if you've ever been at any coast, you know that when the waves are crashing, the trees are moving. Because waves aren't separate from wind and wind isn't separate from waves. But these software, the algorithm doesn't know that. So it's doing beautiful, really beautiful wave animation with no corresponding movement in the trees at all. I find that kind of thing interesting. Yeah. Like it's fundamental. It sometimes gets the absolute basics wrong. If they're saying here in simulating the world is like, well, simulating visual aesthetics, visual animations of the world, but not actually simulating the world. And that's such an important difference. There's, as I think I referenced last week, there is a movement in the field of UX to use AI to replace actually interviewing users. They say you can generate user interviews without having to go through the trouble of interviewing anyone. Well, that would be fine if there was a simulated model somewhere of how humans are actually experiencing things. But there isn't. Right. In the same way that there isn't a simulated model somewhere of how the space around Big Sur is actually co-existent across four dimensions. It sure looks good as long as you look at the right things. Yeah. What I also noticed looking through a lot of their videos is it's clear just the type of videos they've been using specifically linking back to Shutterstock. There's a bunch of them that have a kangaroo that's walking along and they show it changing the prompt of the kangaroo changes. And it's clear when some of them are being completely prompted by a specific style of animation and a specific style of creating something. And one of them even completely looks like a vector image over the top of a real scene, but its feet are perfectly planted into the ground and have perfect shadows and ambient occlusion. And it's such a weird thing that would never happen for anyone to create naturally because why would you be doing the feet perfectly and integrating them with the ground? But then the top of it looks like a vector planted over the top of the background scene. What you're experiencing there is the uncanny valley. Yes. Right. Where for those listening who aren't aware of it, the uncanny valley is when something is so out of place from the mental model you have, the anticipation out of what you're watching that it snaps you out of the illusion of movement or out of the illusion of reality. My favorite example of that is from The Simpsons because The Simpsons, when people say none of it is realistic, but it's consistently non-realistic in the movement of the characters and of their world. Their interactions are so consistent that you can fall into it and start thinking of them as characters instead of as drawings. And then you're in the bar and Mo says threaten somebody that is going to turn on the high quality lights. And he does so and all of a sudden he's drawn in a very different style so you see every wrinkled wart. That's the uncanny valley. And what you're seeing is the uncanny valley. And the uncanny valley is what these LLMs keep throwing at us. And that to me is the evidence that what they're throwing isn't in parallel with the way humans perceive it. It touches on it. But our peridylia, as I keep saying, is what makes us think that this is effective and real. We are convincing ourselves based on a lot of evidence that all of it is well done. And I think that's in some ways a danger. In other ways, I think it's a really good thing. And if you'll remind me sometime during this session, I'd love to bring up something that happened in the U.S. in the past week that relates to that. Okay, before we get there, while we stay on this so I can remember, last week we nearly dived into me saying, well, what do our brains do? You were mentioning that the learning of a human is very different. Now if I was going to do a devil advocate argument for all we're making actual aware AI entities here, I don't think we are, but let me play with it for a second. Enjoy. Babies are born into the world. They have a beautiful young brain that's ready to wire itself up. You ever look at a baby or a child, its eyes are wide open the whole time just trying to make sense of anything and everything. And it takes a long, well, I say a long time, it's only a couple of years, but over time they'll slowly start being able to recognize specific people, things, items, and they'll object permanence isn't a thing that comes for a while and things like that. That's why peekaboo is a hilarious game for a baby because... You are literally disappearing and reappearing. What cool magic. Yeah, exactly. So a lot of that is visual, the baby's lying down, the input is visual data over time. So are these models that we're training, we're giving them a lot of visual data and information on the world. So where's the difference? Okay, excellent question. And it's the kind of thing that's always coming up in discussions with us. Let me give a caveat before I answer. There are lots of different theories about how brains develop. There are lots of different theories about how we form and process and store and process and lose memories. And I'm not going to give the definitive answer that pleases everyone. Usually it will displease most and at least be a spark for conversation that way. If you happen to be listening to this and or watching this and thinking, "Hey, I sure would like to disagree with John, please disagree with John." You can do it privately, you can do it by reaching out to us and disagreeing. All of that is cool. So humans do not learn by stacking up visual impressions. That's not what we do. So near as I know, and there are lots of theories about this, so near as I know, what we do is we create and then expand and refurbish mental models of the world around us and of ourselves within that world. And the model of ourselves within the world is how we learn more about the world around us. So you currently have spatial models of your body and the way it interacts with space, including the details of your body, the way your fingers and thumb interact, the way your hands interact across space. That's what enables you to reach out and grab hold of things. That's what enables you to sit upright, even if the chair slips a little bit or to catch your balance when you stumble while you're walking. It's this dynamic model of who you are in space. That's also how you're forming an understanding of the world. These becoming infants might get these numbers wrong. In the first four years of their life, their brain doubles in size. In the next four years of their life, their brain doubles in size again. In the next eight years of their life, the brain doubles in size again. Are we talking about not physically doubling? Physically, yeah. Physically, okay. And then in the following eight years of their life, so sometime between 16 and 24, and all of these are general rules, sometime between 16 and 24, the brain's complexity continues to grow and its functionality continues to improve at a ridiculous rate until finally it stops expanding in radically new ways and becomes the adult brain at around the age of 24. What some people say, and I'm one of them, is that babies are hungry for information because they have all these connections waiting to grow. And I don't know which comes first, but they have lots of connections they're trying to get. And little kids always say why, and this is why they always say why. Because they need to build some context for the model they're building of the world. Large language models don't say why. They don't question things. They put together inferences, if you want to say. Maybe you could say inductive reasoning. They do inductive reasoning where data comes in and they use that to build models. Humans do deductive reasoning, which is the opposite of what everyone thinks because of Sherlock Holmes. Deductive reasoning is when you have a model and then you try to find things that will fit that. So when Sherlock Holmes notices the mud on the hem of ear trousers and the color of your shoes and the fact that you have ink from a certain train station ticket stub on your thumb and figures out where you've been, that's inductive reasoning. He's noticing these things and putting them together. When these videos are being made, information is being put together and formatted into a swimming turtle and then checked against itself, against itself as a four-dimensional model over time. That way we get the smooth swimming and Will Smith doesn't grow multiple hands to eat the spaghetti with because he's got hands that just have to be in different places. All of that is putting together ideas and then forming the model that says these are the same hands and have to be the same hands the whole time. Humans have that model right from the get-go and are refining it. Does that make sense? So it's a different form of reasoning. Now, as to the complexity of how memories are stored in the brain and that kind of thing, the general agreement of how that works is a 20 or 30-year-old model that nobody challenges but everybody agrees is wrong because it requires that there is a central modeling space in the brain that no one has been able to find. In short, I disagree. I don't think that human learning and machine learning are at all the same thing. Could they be someday? Absolutely. But right now it'll be coincidence if that happens because we don't know how humans do it. Absolutely. I'm always playing devil's advocate for the record. What seems to be interesting though with all of this is that at least the thing that always strikes me with the development of generative AI as it's been happening is how, especially in the beginning with Daydream and the likes of that that were the early image models, how psychedelic they were, which was weird. And then how the first generative models, their hands with the hands we know specifically extra fingers. That to me immediately struck me as someone who's done lucid dreaming and spent time, a weird, silly amount of my life trying to be consciously aware during dreams, which is not something many people experience throughout their lives. It's just something that maybe will happen by happenstance once or twice in their lives. But it turns out it's a skill that you can work on and get better at. One of the things that a lot of the community, there's a community of lucid dreamers and they'll discuss things and hands is something your brain never gets right in dreams. One of the ways that you can actually tell if you're in a dream is hold up your hand, count your fingers. If you have more than four fingers in one film, you're in a dream, which is interesting because it aligned with these earlier image generation models, which were struggling to do hands, which to me at least hints there's similar things going on. Similar to what extent, sure that's arguable, but it's super interesting that that was an immediate thing that happened and that aligns with dreams at least. I agree. It's interesting. My propensity is to say from a scientific standpoint, that is one factor out of millions of factors that you're experiencing while you're lucid dreaming and are dreaming in a lucid fashion and the one factor aligns, but the others don't. So that makes it easy to think that there's a connection there. There may be, I'm not saying there isn't, but I am saying that the observation of one coincidence in a field of millions isn't the observation of similarity, even though it feels that way to us, that's a natural way humans associate things. If you think about what I was saying before about the model of how we exist, the part of our brain that maintains that model has a very good sense of how many fingers we have. If you rub your thumb and fingers together in that soothing gesture that all primates do from infancy, if you do that, you have a very good sense of where your fingers are. If one was missing, you would know it. If there were three extras, you would know it and it would be really weird. But that's tactile, not visual. I don't know if the visual portions of our models have any notion of counting. Before we got started, while you were positioning the camera, there was a moment where there was that lovely camera effect. I tried to recall the word fractals and failed. Just like admitting to not having done my homework well enough, I'm happy to admit when I have mental lapses or inability to recall things when I want to. I think that's how I can keep track of myself and also how I can learn more. I don't want to hide that from anybody. I'm making a point of saying it now because I say it to my students all the time. We all make mistakes. If we try not to admit it, then we end up in this false image of ourselves that we have to defend and that's not a good way to learn. Anyhow, that fractal moment of camera within camera within camera, shot within shot within shot, that's a big thing in psychedelia. The very common psychedelic image of a hand in which every finger grows a hand. Yup, and then even just ancient religious architecture as well. It's everywhere, but even the hand where every finger grows a hand and those fingers grow hands, you see that in ancient paintings as well. I think that kind of psychedelia is a fundamental part of the models we build. Or just maths fundamentally as well. Absolutely, there's no doubt. I saw a great talk by an astronomer, oh man, 20 something years ago at the University of Brunswick in Eastern Canada. A visiting lecturer came in and talked about the natural right-handedness of most things on Earth down to a molecular level. And how if you go and look at asteroids or meteorites, I should say, if you look at meteorites that have come to Earth, some of them have more left-handed nature than right-handed nature. So the recurve that we notice in all things fractal, that fractal recurve, that natural tendency to repeat does seem to be quite universal. But we happen to be developing in an area of space that might be right-handed more and left-handed less. Not exclusively right-handed, but right-handed more and left-handed less, whereas other areas might be the other way around. Doesn't that depend on where your orientation is? I think I see what you're saying. So if you stand at the other end of the boomerang, it bends left, not right. Yeah. But an asteroid in space, if it's spinning or if it's more in one direction, if you just flip your frame the other way around. Yeah, it just flips your frame the other way around. But if you look at the structures of the molecules inside it, or let's say a DNA molecule, we haven't found DNA, confirmed DNA from other places yet. I think there's been some that people are saying is DNA from Mars, but I'm not sure that that's real. I don't think. I've advertised. But so DNA, we know, is a double helix that curves in a certain direction. No thanks to the two men who got the Nobel Prize for it, but rather to their student, whose name we should put it in the footnotes because she deserves to be mentioned. The student, one of them who came up with the idea and got ripped off. Anyhow, the helix turns in a certain way. And as long as you know top from bottom, then you know which direction that's corkscrewing in. Whether you look at it from one side or the other, the screws go the same way. It's just if you reverse it on its poles, it looks like it's going the other way. So it's the same thing with these genetic components, with these molecular components. I hope that makes some sense. If you think about it, you could think of the spirals of the galaxies, where we know that galaxies seem to form in spirals, like the Milky Way, which we only see a sort of a flat Milky Way, simply because we're inside it, off to the edge, but inside it. But when we see other galaxies, we can see that they're spirals. And for the most part, they're spirals that rotate in the same direction. I'm going to challenge that. I'm not sure if that's true. I think that's again, just depends on orientation. There's no universal plane. And I'm pretty sure all the galaxies, we see some going one direction, others in other directions, and it just depends where you're saying is up and where is down. If I'm wrong, I'm happy to accept that. It happens every single day, most of the time. And yeah, glad to be correct. Maybe I look and I ditto for you just said. I know you do. That's one of the reasons we get along. Yeah, yeah. It could be, that's not something I've completely looked into. But yeah, from where you are looking in time, then space and time will obviously always depend on what you perceive as orientation. I think it's a question of whether the curvature is across two dimensions of the three or one. If it's across one dimension, then it can look from another removed perspective as though you can basically flip it without having to flip your perspective, your internal perspective. You can flip its perspective, if that makes sense. It's because it's in three spaces or three axis. I know this is referred to as, well, there's, "tyrality" is a word that's coming to my head right now. And I know that's a whole thing in physics that I'm not completely-- Yeah, how neither am I. I can't speak to that. I can say that the phenomenon that we're talking about can tie neatly back into digital imagery. Here we go. What-- Bringing it back. What we talk about, an effect that used to happen with digital cameras in 3D spaces where if you moved anything too quickly-- actually, not just cameras, but even 3D models-- if you moved anything too quickly, there was a propensity for the three-dimensional joint around which it was rotating. So if you had a three-axis joint, something was rotating around, there was a propensity for it to flip and go the shorter distance across one of those three axes to get to the final point, which made for hilarious glitches or bloopers in animation when either a camera or a character suddenly rotated in ways they should. That was an interesting side trick. I'm going to stand by it. I say it was interesting. I'm the first one to put the label on it. It was interesting. Yeah, yeah. That'll be a fun one to summarize. And of course, where did we go with that one? We went to galaxies. We went to-- where was it coming-- let's rein it back to-- I'm going to rein it all the way back to Sora for a moment. Sure. Because it's something I started with in our last podcast talking about it's exciting, and I don't know where things are going to go. One thing that excites me, though, and I can predict-- and you said it worries me when people confidently make predictions. So I'm going to confidently make a prediction. It's a short term one, so it's an easy one. And it's something that from the videos we can see it can already do. I'm just extrapolating one extra step on how people are going to use it. So in one of them, there is a car driving along a dirt path. And then they have a simple prompt, make it a jungle. And now the car is driving through a jungle. And its visual fidelity is surprisingly good, worryingly good. Now, that immediately goes to me. There goes the whole field of VFX. And immediately, at least, excitingly opens it up to smaller studios, smaller indie scale things where they can't afford 10 VFX houses to create Marvel level worlds and effects. It means that your average group of friends could come together, record something at a beach, and then just prompt the AI engine to say, now make us in this place. They could even train it specifically with the type of place that they want. This is where I think-- this is another point I was going to get to. With all these tools-- and we've already seen how people are using this responsibly and not responsibly. There is the whole ethics of the training data in the first place, which is another comment and another thing to talk about. But right now, I'm going to focus on specifically how people are using AI right now. There are people who stand atop of it and use it as a tool and remain strong creative director. They always have the creative input. They're really making sure the AI is working for them and you're not working for the AI. By that, I mean-- I've gone in a circle. But these imaginary group of friends that come to the beach and they say, let's record a sci-fi where we're on an alien planet. They record their video, their little film. They come up with a little narrative to say, we go over here. This happens. That happens. And they record it. And then they just say to the AI, make an alien planet. Boom, it's an alien planet. That's not creative. That's boring. The better part of creative process for film specifically is when those people sit down and really think about what is this alien planet? How does it make? What are the colors? How can we use that-- the visual aesthetics to help the theme that we're trying to say over film? The larger, long creative process-- and it can still help you, but I imagine systems where we can at least train these models on our own inputs or at least direct longer and longer and longer through iterative processes and really narrow in on a unique, creative aesthetic that has been created by the humans and not just the generic first thing that the AI says. And so I'd like to start answering that. There are parts of what you said at first that I want to come back to about the new technology. But I'd like to approach this in inverse order and start with what you've just been saying. I've talked to you about the creative process. And this has been a problem as long as there have been tools for creative people to use. When Maslow's Golden Hammer, right? When you have a hammer, every problem looks like a nail. When you have a pencil, everything you produce looks like a pencil drawing. And that's not true. There are people who can make photo-realistic drawings with a pencil. But the propensity for most of us is to make a pencil drawing with whatever image we have in our brain of what that entails. And if you're using charcoal or conte, the image you're making changes. It is always of greater value if you use the tool in a way that surpasses the norm for the tool by imposing your creative image upon it, very much like what you were saying a moment ago. So I agree with you. The kids at the beach who want an alien planet would tell a better story if there was a purpose for the alien planet. And as you were alluding, if the alien planet has elements that address the human issues that are vital to the story, then that will carry value to the viewer beyond the fact that, oh, this is an alien planet. If you look at any good storytelling, that's the case. The deeper, perhaps unconsciously recognized and only unconsciously recognized elements of the background of the characters, of their backstories, of everything going on in the mental model you create, the story as you're experiencing it, the richer that is with information that you can unconsciously use to make predictions about what happens next or to attach a motion to what the character is experiencing, the better the story is. And the AI cannot do that unless it's purely by coincidence. So I agree. This is a problem, and it's going to be a problem for anyone who goes out and thinks they can use the tool quickly. We see the same thing in academia all the time now. Large number of students are using large language models to generate their assignments. And what we find is that not only is the structure of the assignment consistent whenever it's generated by most of the AIs-- and you can recognize a lot of content by the structure-- but the vapidity of our content is consistent. And it's saying a lot when you say that it produces writing that is more vapid than the usual undergraduate students' writing. That's saying a lot. Like saying it produces a lecture that's more boring than the usual undergraduate lecturer's lecture. It would take a lot of work, but AI can do that. It can produce really boring lectures, and it can produce really vapid writing. And it's only the students who take the time to become experts at it who can produce great writing using that tool, just like they would be able to produce great writing using a keyboard or using a pencil. And that brings me to what I wanted to talk about earlier. You said the ability to suddenly generate a jungle-- Yes. The possibility of putting special effects houses out of business. I don't think it does. When I was learning 3D animation for the first time, mid-1990s, and working with two most popular softwares at the time-- the two best softwares at the time-- softimage and then what became Maya, there was a point where particle effects totally changed. So for example, in Maya, you could use Maya embedded language to write script for particle effects. You could do some pretty cool sparkles, and you could do some pretty cool larger sparkles, and you could do some really shitty, shitty flame and some horrible smoke. And then all of that got better, and particle effects got better and better. And it got to the point where using the same tool that generated particles, you could use brushes in 3D and with a brush stroke, create a plant from several different predefined plants. And you can choose how wide does it get, how many branches come out of it, how quickly do they branch, and do their branches produce leaves or flowers? And all of that with preset variables that you could do. And very quickly, people were painting 3D jungle scenes of a Greek complexity that would have taken a very long time to model. You could now paint and animate. You could paint the starting position, the ending position, and have it morph either direct photographic morph, which everybody laughs at now because it's so horrible, or actual animated sensible morphs if you were willing to take the time to do it. And a lot of people were saying, that's a lot of modelers who just lost their jobs. But they did because what happened was a lot of special effects houses started producing really shitty jungles. And that just increased the demand for the people who could do it well. Audiences soon could see the difference, but the professionals could see the difference right from the beginning, and they would make a choice about which version they wanted to use. And I think that's what's going to happen with this, at least for the near future. I think people are going to be saying, yeah, we could generate that with AI, but we could get a professional to generate that with AI who is going to iterate more than we are, tweak it better than we will, and they'll produce a better final. I think that's much closer. Yep. I 100% hope that VFX is not going anywhere. It's an industry I love a lot. It's something I have spent a lot of time figuring out how to do. So I would really hope for those skills to not become obsolete overnight. And of course, they won't, because it's not just how to use the software. It's art in itself. Exactly. Exactly. And that ability to treat it like art and create art with it is exactly what I was trying to talk about there. The artist is not the tool they use. Absolutely. I don't know if you've ever seen any of the paintings that Dacasso did with a pen light, where he just stood in a dark room and painted things with a light in the air. Oh, I... A beautiful shot of him from Life magazine from the 1960s or 70s. It's already a very old man. He's painting a bowl with one continuous line of a pen light. The pen light is not an artistic tool except in the hands of an artist. Exactly. And with all of these AI tools, well, they are tools. So I was about to say, I don't think we'll get artists, but we will. People will figure out how to use these tools in ways that other people want, and they will care, they will treat it as a craft and they will get better and better at it. Absolutely. There are already people that do that, that manage to prompt and configure these generative models in ways that others don't. It strikes me all back with all of this generative AI. I wonder, have you read June, Frank Herbert's June? You recall the Butlerian Jihad. The Butlerian Jihad in June is something that happened in this world thousands and thousands of years ago, where they had almost utopian AI society with, they call them thinking machines. Then the thinking machines, I can't remember specifically why, what happened. I think the thinking machines just started doing that sci-fi thing. I forget what the reason was specifically. Long time ago, they stopped having an influence on life. Yes. And they, human civilization revolted against the thinking machines. Much to the point that in the June world, they have Mentats, which are thinking people. They get completely high on stimulant and then are able to do arithmetic in their heads. They are the computers. But to come back to, I wonder if we, there was, so there was two reactions to Sora. Universally, everyone's going, wow, but there was a surprising wave of people not happy. Much larger than you would think of people going, what are you doing? Stop. This is getting too far. And that was quite a significant sentiment of people saying this is too far. Yeah. I'm always surprised by that with art. If the point of the algorithm was to modify human genetics, I would be very interested in hearing all of the arguments against it. But when the point of the algorithm is to create things that we know are fake, that don't interact with us in the real world, then I don't have a problem. And if people are using the fake stuff in an unethical way, that's not the fault of the fake stuff. That's not necessarily the fault of the people who create the fake stuff. So yeah, I don't have a whole lot of time for people who say no, no, no, just because they can. Two-year-olds do that and they grow out of it. It would be good if other people could grow out of it too. A common belief, and I subscribe to this, is that two-year-olds say no because they can. They have the ability to say, I will not be carried. I have always been carried. The ability to say, I will not eat this. I've always eaten what you give me. They're starting to assert an individualism, a separateness from the rest of the world. But they grow out of that. Ideally, they reach the point where they say no sometimes and yes sometimes and ask for more information before they make a decision on other occasions. People who have a "neijerk reaction" to technological change, they're just being like two-year-olds. No, no, no, I can resist this, so I will resist this. Rather than becoming informed by change, they're resistant to change because of what it might inform them of. I actually agree. No, I actually agree. Talked in the earlier podcast how I'm an early adopter of technologies. From the get-go, I got excited by all of these and I've integrated them into my workflows quite tightly. Through various projects I've done, almost every project I commit to now, I'm using AI in some way, generative AI specifically. That's one of the reasons I enjoy talking with you about this stuff, is you have constantly changing and expanding information about it that's way beyond what I have. By all means, keep doing what you're doing. For sure. Now, to clarify a little the special thing that we're talking about that these tools perhaps on their own don't have. I'm always struck by a story. If it's true or not, I'm not exactly sure, but my friend confidently told it to me once, so I'm going to confidently tell it is true. There was a company that bought a mass-special material in a facility where they could take any whisky, analyze it, look at all of its components, and then perfectly recreate the taste of that whisky within days. Usually, for all of time to make whisky, you've got to stick them in a cask and let it age over time to get that oaky-deep taste and various other tastes that can be infused throughout the distilling process. They thought they had a million-dollar industry on their hands because they just said, "Well, we can perfectly, we can outdo everyone because our manufacturing times are nothing. We can just make great tasting Then they quickly learned that there's absolutely zero market for that. That's not why people like whisky. People like the story behind it. People like the fact that humans took their time to set aside space in the world, that humans put time and energy into putting all these resources in to create this product. It's the same reason why perfect recreations of artworks are never as valuable as the originals, because it's the human story behind it that matters more than the end product. For a lot of these generative AI artworks that are coming along, they're cheap. They're incredibly cheap. They are cheap. Nobody's buying anything. If you print a book of AI artworks, I doubt it will sell. Maybe it would have you told a complete lie in story behind it. You might be able to con people in, but I would pose to you that it would be very hard to sell a book of AI artworks when we're at a point where anybody can go and reproduce these very easily. Right. It's a wonderful story. If it's true, then somebody should alert the estate of Gene Roddenberry that Syntha Hall has been invented. I think it's very cool. I don't know about the validity of the story, but I appreciate the storytelling aspect, because that's, again, like you say, where humans get their value. I do think that there are stories you could tell about a book of AI artwork that would make it valuable to the humans. For sure. But then you're introducing a human story. Exactly. And that's what you would have to do. So if you were to say, this is a book of samples of the kind of thing you could generate, then there would be a market for that, because a lot of people who want to generate things with AI don't know how, don't know quite what's possible. Looking at this might give them some ideas. That's one thing. And then they'd be counting on your expertise and your insights to inform them about which ideas would be worth the money. The story of the creator, why it's valuable information to listen to in the first place. It's an alternative. You could have a book and say, here are some of the worst examples, because we all like worst example stories. So yeah, there are stories you could tell, but you would have to impose that story. I think if you counted on AI to create the story of why a book would be interesting, or a collection of images would be interesting, or a cartoon would be interesting. If you count on the AI to generate that, then it had better be trained on things which happened to coincidentally add up to story element and story structure. And this is why I'm always suggesting when we talk about this stuff that just a little bit of human control on what is learned and how it's learned could make a huge difference in what the AI is capable of doing. As an example of reducto ad absurdum, just a silly example, I don't care how many AI powered bots you have operating. I don't want to get artificial respiration from a series of bots that have to figure out how I breathe and why water and malums is bad. And much rather they knew that before they start figuring out how to treat me. Does that make sense? Too abstract? Too absurd? Yes. Okay. C'est la vie. No, all I mean is a little bit of instruction can get an AI around a roadblock that it might figure out on its own through multiple iterations. It's the same thing without the complex AIs that we're talking about with the more simplistic AIs, right? You've got a fleet of drones that are on limited battery life, but as the fire marshal, you need to use them to map out a factory fire and figure out where people can have safe access in and out, where to concentrate the forces that you have and even simple things like where to radio the people inside, which are the safe evacuation routes, which ones are. They cooperate wonderfully using algorithms that have been developed among other places at Lakeside Labs in Austria. They've got great programs for that. A lot of cool people doing the work, but they have to have some instruction, as I understand it, to get them started. Yeah. Yeah. I see what you're saying there. The minute you try to introduce a new element, a new depth to it, then they need to have some instruction of that. They may come to the idea that, "Oh, there's horses here and they require a wider path." There's orchids that have to be rescued and they can't just be near a flame. They have to be nowhere near heat. But unless you introduce those parameters, the AI will take generations to figure them out. In those iterations, you might not have time. And there's also the hard fact that coming back to your hammer, just because we have AI tools now doesn't mean they're always the best tool for the job. You're here. I'm so happy to hear you say that. Yes. And I think a lot of special effects houses will be finding out. And I think a lot of movie producers found that out during the writer's strike. There were a lot of scripts floating around that were just horrible scripts. Well, we recently had an example of that in Scotland. Did you hear about the Willy Wonka factory? Oh, yes. So I'm very, very, very confidently not using the tool very well. Or perhaps using it exactly as much as they wanted. Exactly as much as they wanted, yeah. They sold a lot of tickets. And maybe the idea was once I sell enough tickets, I can start building this stuff. Maybe. That was probably the case. Having worked in Silicon Valley a fair bit, I've met a lot of people who go into it with that attitude. And for reference, for anyone listening into the future where the immediate story doesn't make any sense, there was a Willy Wonka event in Glasgow, Scotland, where pretty much a scam and the person running it half arsed it, so to speak. Everything, the promotion and website, was all done by AI and got people excited. The scripts were in by AI. And then they didn't really deliver on the experience. The experience looked for the uninitiated as though it would be a fantastic 3D virtual reality experience full of candy. And according to one news story, when they got there, there were a couple of posters that were printed and put against bare concrete walls. Generative AI posters as well, specifically. But then the candy experience was one jelly bean and half a glass of lemonade for each person. Yep. Not quite the same as the giant mushrooms made of candy that were in the photos. No. No. There it is. Willy's chocolate experience, indulged in a chocolate fantasy like never before, captured the enchantment. The other brilliant part about it all was that it wasn't, to avoid copyright, it wasn't Willy Wonka. It was Willy something or other. Yeah, Willy's chocolate. No, I think you I think you call them Willy McSomething. Oh, did they? Yeah. I didn't know that he had a last name. He did. It was Willy McSomething, I'm sure. Look at that one. Hold on, go down. Go down. So if you're listening to audio only, we're looking at the website of the Willy Willy's chocolate experience dot com. And we have just scroll up. Look at that cacti-gating live performances. Passodize of sweets. Yeah. And and cheering because we know a passadise of sweet teats. Yep. We didn't even bother taking this looks like Dali. So I would imagine this came from GPT. They just were speaking directly to GPT to create and didn't even bother to edit the text because we know that a lot of these models still struggle doing image generation of text. Yeah, that's wonderful. Cacti-gating live performances. Carche tons. Carche tons. Exarchure Day lollipops. A passadise of sweet teats. Yeah. By the way, that's not a dirty word. The spelling here in North American English, T-E-A-T-S, that's what you call the organ that is used to funnel milk out of a cow. And the pronunciation of that in North American English is tit. Yeah. Yeah. I'm not sure how it's said in Scotland, but I just want to be clear. I wasn't saying a dirty word. That's the part of the cow I used to grab hold of in order to milk it. Yeah, I think we would say we're being polite, tit. Tit? Really? Yeah, we'll say that. Yeah, but that's not how it's pronounced in Canada. No, no, we do use the word tit all the time. Sometimes to describe people as well. Like someone who distinguishes between Canadian and Scottish translations. Yeah, right tit. Excellent. Nice to know what people are saying. So John, I want to remind you, you wanted to talk about something that happened in the US. Oh, yeah, great. Nice callback. There you go. I think it's quite related to what we're talking about. They're all bunch of tits. They truly are. Some of them more than others. And one in particular is famously a right tit to just about everyone who isn't upper class and white and of a German background, as he sometimes claims and sometimes doesn't. So Mr. Trump has recently released a bunch of campaign posters, which were obviously generated by AI. For one thing, he looks vastly more healthy than he does in real life. But these posters are specifically aimed at showing him with what they call an African-American demographic. So it's him with a bunch of black men in some photos or photos and some illustrations, him with a bunch of black women and other photos or illustrations, and him with mixed groups in other illustrations. And they're all generated by AI because there has never been a photo taken of Trump surrounded by people of a different race than he is, if you'll pardon my using a disgusting term, like race. So you will not find photos of Trump surrounded by black men or black women. You will not find Trump in a group where they've got arms around each other's shoulders. But they've released these images trying to say, yeah, here he is. Here he is a man of the people. The thing is, the images are visibly fake, not just because I don't believe Trump has ever smiled that way, but because of missing fingers, because of extra arms. There's a great photo of a black campaigner talking to a black potential voter, and he's got an extra arm down by his side. And somehow the people who generated the artwork and the people who approved the artwork and the people who put it online, somehow between all of them, none of them either noticed or dared to say, that's a three-armed guy. Yeah, that's quite the oversight. Fake news as he coined. Yes, indeed, as he might loudly proclaim. Very fake news, which unfortunately isn't news at all. But I'm really hoping that the rampant use of this technology by incompetence will knock some of the wind out of everyone's sails in terms of the hurry to jump on bandwagons. When it was just verbal encouragement and the occasional photograph, people created mental models of the horrible things that were, "Oh, they're trying to slip microchips into us with vaccines." They are, "Oh my gosh. Oh, you can cure that with bleach." You can, "Oh my gosh." There was nothing that forced you to admit on the spot, even from a position of ignorance, that this is fake news. Whereas when you see that the man next to Trump has three arms, maybe you can see for yourself, "Hey, that's a fake photo." So I like that. I hope that if a propagandist continue to make those mistakes. Yeah, making it completely absurd to the point where the emperor really has no clothes to the point, we all have to point out that the emperor has no clothes. Yeah, where even the emperor's favorite defenders might have to say, "Okay, yeah, we can see his bits." Yeah. Interestingly, we're talking about interesting weird things happening with these generative models. There's also on the other end of things, because both extremes are weird and wonderful these days, left and right. There was an inverse oopsie that Google's Gemini was making. I don't know if you saw this. So Google's Gemini model, it's multimodal and will create images for you. I experienced this with Mid-Journey myself because I was trying to make some images and content and it would not be very diverse. It would always just give me good looking white people all the time. So I would have to put in the word diverse. The second I do that gives you the white spectrum of humanity. Google apparently obviously knew that too, to the point where they were hiding that prompt into a lot of their images, to the point where that was making some embarrassing mistakes when people were writing things like, "Show me German Nazi soldiers," and they would all be of every race apart from Germans. So completely ran the other way of absolutely being irresponsible ultimately. Yeah. I'd like it very much if you would just repeat one time for the sake of all the listeners, the fact that Google did something deliberate to hide a flaw in their software that caused that error. Would you mind just saying that again? Because that's very cool and very important. The fact that they tried to correct it by hiding in a prompt of diverse into it to correct it. Yeah. So they knew that there was a weakness in the code that they were releasing and having everybody use that they just renamed as part of the big launch. I was an early user of that, so I get all the emails including the notice that it's changed names now and everybody can start playing with it. The fact that they decided, "Oh man, this looks like we're racist," every picture is clearly white and mostly male unless it's overly sexualized. Boy, we better hide a prompt in there to change that instead of saying, "Let's fix it." Let's hide a band-aid rather than fix the wound. That's exactly the kind of thing I'm talking about all the time with the companies that do anything to do with AI. They got away with it with a lot of other algorithms that they use. Now they're trying to get away with it with something that thinks in a way they don't understand, but that does do some kind of thinking, some kind of cognitive processing is going on there that the AI itself cannot explain. So why on earth would you want it to think in a flawed way and continue being accessible? Well, they very quickly took it down after that was identified. Once enough people posted pictures of black Nazis. That's the one that was in the news. But that happened. That was the second wave. The first wave was, "Hey man, we're not getting any diversity." That's when they should have taken it down rather than saying, "We're going to insert the word diversity into everybody's prompt." So it starts making diverse crowds so that our racist app starts making diverse decisions or showing diverse things. And I'm not saying that so that Google can come after me and say, "Hey, we're not a racist company." I'm not saying Google is a racist company. I'm saying that they created something that behaved in a way that looks racist and their solution was to hide it rather than fix it. Or quick fix, quick cheap fix. But not a fix. They didn't try to fix what was wrong. They tried to work around what was wrong. If you did that with a car you were selling to the public, you wouldn't be put out of business. If you did that with a drug you were selling to the public, you'd be put out of business. But because it's software, they get away with it. And that's really dangerous because software isn't just on our computers anymore. It's everywhere. Well, John, I think that's probably a good place to end this one. We're going to have to think of topics to talk about next. I hope some folks will suggest them. I understand we might have one or two listeners. Yep, yep. All three of you. All three of you, please let us know what you think we should talk about next. I'm going to look right into the camera. All three of you, please let us know what you think we should talk about next. And yeah, with any luck, we'll get some suggestions. If not, we'll come up with some ourselves and who knows how dangerous that could be. Absolutely. We might even dive more into the art side of things. We do quite often, but maybe we'll just talk about film. That would be lovely. I'd like that very much. Technology and history of film. Oh man. Don't get me started on the Lumiere brothers. We'll be here for hours. Yep, yep, yep. There we go. Okay, well, there you go folks. Something exciting to look forward to. Is Jimmy and John rambling on again, going off in weird tangents and maybe sometimes tying it back to what they initially started talking about. If you have been listening to this point, we really, really appreciate it. And thank you. And we hope to see you next time. Take care folks. Please come on back. Sock Talk is a production from the Robert Gardner University School of Computing. Today's episode was brought to you by the letter pi and the number pi.