Hello welcome back to
Sock Talk, episode 3.
We're on a roll.
Three is a charm.
So last week we said that we would come
back and look into Sora.
John did you do your homework?
I did a little bit of my homework but not
nearly enough to satisfy
an instructor like you.
That is a nice reverse Uno.
I took a look.
I spent some time going
over only what's available.
Basically there's no deep research papers
into exactly what they've been doing.
So it's all kind of tied into their
website and they're promoting it as well.
So to summarize what little I learned, I
spoke to some of our AI
colleagues briefly today
and gave them a very high level
explanation and it's like
am I remotely on the mark?
And they're like yeah, close enough.
So I'm going to take that close enough to
at least high level
explain a little bit of
what's going on with Sora.
A lot of these text to image generation
models are using
diffusion and the way that works
is they'll generate some noise and then
from that noise they're
trained to understand patterns
and recognitions and well this depends
where we're going how
high level we're taking it.
There ultimately the outputs of the
visual model are trying
to align to the inputs of
the text.
The text is going to
represent some sort of concepts.
So let's say we've got a
swimming turtle in the text.
The AI model is trained in such a way
that it has a concept.
I say concept obviously
it's a neural network.
They can't really investigate that much.
The diffusion model is then starting to
pick out patterns and then
through multiple iterations
trying to formulate that turtle.
Now again, let me be absolutely clear
here to everyone listening.
I am paraphrasing as well not
paraphrasing but being as
high level informed as I can
and I'm by no means an expert on this.
So with a single image that's relatively
I say relatively easy.
It's not being easy.
It just seems easy now
because we've got video.
The difference is what they were
struggling with before to
make videos is they were doing
it frame by frame and
the consistency was hard.
So even a while ago when you saw Will
Smith eating spaghetti,
I don't know if you saw
that one.
Yeah, I did.
It was really hard to consistency.
You see like hands
forming and all this craziness.
What they've been doing
with Sora is they've redefined.
So with text generation
like chat GPT we have tokens.
Tokens are what it's trying
to produce and align with.
They're using what they're calling
patches which instead of
tokens is space time rather
than tokens.
So space and time temporal meaning it's
not just a single image.
They're layering it up and they're
looking at the whole video.
So the whole video in this four
dimensional space time
they're looking for concepts that
align with the text.
So that means if you say swimming turtle,
there is this four dimensional conceptual
shape that starts forming and then that's
why they have so much consistency.
But because of that it starts forming and
getting it somewhat
actually consistent all
the way through.
It reminded me of when I
thought about it like that.
You ever see Donnie Darko?
Yeah, sure.
Do you remember in Donnie Darko when he
started seeing these
wormholes coming out at people
because that was their
pathways through time.
I would imagine that's kind of something
how Sora is working is it's finding these
objects and how they transition through
time and it's morphing
into these which is why
when whatever the video is some sort of
conceptually a swimming turtle.
I'm saying a swimming turtle because one
of the examples they
have is a swimming turtle
or it's a butterfly in a drone actually
that they transition
from a drone flying into a
butterfly flying
underwater for some reason.
It's apparently the things you can easily
do now with AI and the transition between
one to the other is really seamless and
absolutely fluid which
is nuts and ridiculous.
I wanted to quote something they said.
They are surprised by its emergent
simulation capabilities
meaning that it can just apparently
simulate things without specifically
being trained how to
simulate anything other than
just videos and all the
other multimodal stuff.
Here they say these capabilities suggest
a continued scaling of
video models as a promising
path towards the development of highly
capable simulators of the
physical and digital world
and the objects, animals and
people that live within them.
Another part was their properties emerge
without any explicit
inductive biases for 3D objects
etc.
They are purely a phenomena of scale.
So there you go, a little like a high
level and all of that
is just coming from their
website and as much as I can go into it
because the second you
start going into the algorithms
and it starts losing me a little bit.
I'm sure it becomes a conversation for
other people and for
a different audience.
Yeah, exactly.
Lots of people can have that discussion
but that's not what we're here for.
No, no, no.
We're here to talk about the high level
fun stuff and then draw
off into multiple different
directions and then maybe come back and
tie on the bow at the end or not.
Well, I don't know.
Do you want to say anything on that?
Sure, I read the same stuff you did and I
was really surprised
to hear that it starts
with a Gaussian blur because that used to
be how we hid details
when generating computer
imagery.
You could apply Gaussian blur to the
edges of things or
between in order to give the
clearer impression of movement between
frames and 3D animation.
Gaussian blur was one of the ways that
you could just avoid
having the artifacts of really
sharp pixelated edges that could stand
out when you went for in
frame instead of supporting
the illusion of movement.
It broke the illusion of movement so
Gaussian blur was a way
to interfere with that and
here they're using Gaussian blur and the
concept of four
dimensional space patches to create
this illusion of movement.
I think that's really cool and some of
the videos they have on
the site, the woman playing
with the Shiba Inu.
What are the names of that?
Shiba Inu.
Shiba Inu, thank you, the Japanese dog.
Japanese, right?
Yep, I believe so.
So the woman playing with that dog and
then the way its
movement totally distorts it in
the earlier versions of the show.
I think they have three animated
thumbnails side by side.
Yeah, they're
demonstrating the levels of compute.
If they exponentially are
computed, they get better results.
Which is a beautiful illustration of what
you were talking
about, especially that quote
that you said at the end about scale.
So the more they throw at it, the more
realistic this becomes and
that's where I start to disagree
with them.
I'm okay right up until that point.
But the other quote that you cited about
how it can create a
simulation of the natural
world of the things that live in it, only
if it happens to have
considered those things.
And I use considered very, very roughly.
Because as you know, I don't think these
models actually consider anything.
They either have a concept that they have
or they don't have a
concept that they have.
They're not counter balancing them.
They're literally changing their concepts
by iteration through
more processing power,
not by an attempt to improve.
So there's beautiful
animation on their site.
For instance, the Big Sur
coastal highway in California.
And it looks gorgeous.
There's beautiful animation of woolly
mammoths charging through
the snow towards the camera.
In both of those vast 3D spaces, there is
a lot of beautiful
animation, secondary animation
and tertiary animation going up.
The fur moving, the waves
crashing, the snow poofing up.
But the trees in the
background aren't moving.
Even though you see trees for kilometers
or hectares, I guess.
The trees aren't moving.
And if you've ever been at any coast, you
know that when the
waves are crashing, the
trees are moving.
Because waves aren't separate from wind
and wind isn't separate from waves.
But these software, the
algorithm doesn't know that.
So it's doing beautiful, really beautiful
wave animation with no
corresponding movement
in the trees at all.
I find that kind of thing interesting.
Yeah.
Like it's fundamental.
It sometimes gets the
absolute basics wrong.
If they're saying here in simulating the
world is like, well,
simulating visual aesthetics,
visual animations of the world, but not
actually simulating the world.
And that's such an important difference.
There's, as I think I referenced last
week, there is a
movement in the field of UX to
use AI to replace
actually interviewing users.
They say you can generate user interviews
without having to go
through the trouble of
interviewing anyone.
Well, that would be fine if there was a
simulated model somewhere
of how humans are actually
experiencing things.
But there isn't.
Right.
In the same way that there isn't a
simulated model somewhere
of how the space around Big
Sur is actually
co-existent across four dimensions.
It sure looks good as long as
you look at the right things.
Yeah.
What I also noticed looking through a lot
of their videos is
it's clear just the type
of videos they've been using specifically
linking back to Shutterstock.
There's a bunch of them that have a
kangaroo that's walking
along and they show it changing
the prompt of the kangaroo changes.
And it's clear when some of them are
being completely
prompted by a specific style of
animation and a specific
style of creating something.
And one of them even completely looks
like a vector image over
the top of a real scene,
but its feet are perfectly planted into
the ground and have
perfect shadows and ambient
occlusion.
And it's such a weird thing that would
never happen for anyone
to create naturally because
why would you be doing the feet perfectly
and integrating them with the ground?
But then the top of it looks like a
vector planted over the
top of the background scene.
What you're experiencing
there is the uncanny valley.
Yes.
Right.
Where for those listening who aren't
aware of it, the uncanny
valley is when something
is so out of place from the mental model
you have, the
anticipation out of what you're
watching that it snaps you out of the
illusion of movement or out of the
illusion of reality.
My favorite example of that is from The
Simpsons because The
Simpsons, when people say none
of it is realistic, but it's consistently
non-realistic in the
movement of the characters
and of their world.
Their interactions are so consistent that
you can fall into it
and start thinking of
them as characters
instead of as drawings.
And then you're in the bar and Mo says
threaten somebody that is
going to turn on the high
quality lights.
And he does so and all of a sudden he's
drawn in a very
different style so you see every
wrinkled wart.
That's the uncanny valley.
And what you're seeing
is the uncanny valley.
And the uncanny valley is what these LLMs
keep throwing at us.
And that to me is the evidence that what
they're throwing isn't in
parallel with the way humans
perceive it.
It touches on it.
But our peridylia, as I keep saying, is
what makes us think
that this is effective and
real.
We are convincing ourselves based on a
lot of evidence that
all of it is well done.
And I think that's in some ways a danger.
In other ways, I think
it's a really good thing.
And if you'll remind me sometime during
this session, I'd love
to bring up something that
happened in the U.S. in the past week
that relates to that.
Okay, before we get there, while we stay
on this so I can
remember, last week we nearly
dived into me saying,
well, what do our brains do?
You were mentioning that the learning of
a human is very different.
Now if I was going to do a devil advocate
argument for all
we're making actual aware
AI entities here, I don't think we are,
but let me play with it for a second.
Enjoy.
Babies are born into the world.
They have a beautiful young brain that's
ready to wire itself up.
You ever look at a baby or a child, its
eyes are wide open the
whole time just trying to
make sense of anything and everything.
And it takes a long, well, I say a long
time, it's only a couple
of years, but over time
they'll slowly start being able to
recognize specific people,
things, items, and they'll
object permanence isn't a thing that
comes for a while and things like that.
That's why peekaboo is a hilarious game
for a baby because...
You are literally
disappearing and reappearing.
What cool magic.
Yeah, exactly.
So a lot of that is visual, the baby's
lying down, the input
is visual data over time.
So are these models that we're training,
we're giving them a lot of
visual data and information
on the world.
So where's the difference?
Okay, excellent question.
And it's the kind of thing that's always
coming up in discussions with us.
Let me give a caveat before I answer.
There are lots of different theories
about how brains develop.
There are lots of different theories
about how we form and
process and store and process
and lose memories.
And I'm not going to give the definitive
answer that pleases everyone.
Usually it will displease most and at
least be a spark for
conversation that way.
If you happen to be listening to this and
or watching this and
thinking, "Hey, I sure
would like to disagree with John, please
disagree with John."
You can do it privately, you can do it by
reaching out to us and disagreeing.
All of that is cool.
So humans do not learn by
stacking up visual impressions.
That's not what we do.
So near as I know, and there are lots of
theories about this,
so near as I know, what
we do is we create and then expand and
refurbish mental models of
the world around us and of
ourselves within that world.
And the model of ourselves within the
world is how we learn
more about the world around
us.
So you currently have spatial models of
your body and the way it
interacts with space, including
the details of your body, the way your
fingers and thumb interact,
the way your hands interact
across space.
That's what enables you to reach out and
grab hold of things.
That's what enables you to sit upright,
even if the chair slips
a little bit or to catch
your balance when you
stumble while you're walking.
It's this dynamic model
of who you are in space.
That's also how you're forming an
understanding of the world.
These becoming infants
might get these numbers wrong.
In the first four years of their life,
their brain doubles in size.
In the next four years of their life,
their brain doubles in size again.
In the next eight years of their life,
the brain doubles in size again.
Are we talking about
not physically doubling?
Physically, yeah.
Physically, okay.
And then in the following eight years of
their life, so sometime
between 16 and 24, and all
of these are general rules, sometime
between 16 and 24, the
brain's complexity continues
to grow and its functionality continues
to improve at a
ridiculous rate until finally
it stops expanding in radically new ways
and becomes the adult
brain at around the age
of 24.
What some people say, and I'm one of
them, is that babies
are hungry for information
because they have all these
connections waiting to grow.
And I don't know which comes first, but
they have lots of
connections they're trying to
get.
And little kids always say why, and this
is why they always say why.
Because they need to build some context
for the model they're
building of the world.
Large language models don't say why.
They don't question things.
They put together
inferences, if you want to say.
Maybe you could say inductive reasoning.
They do inductive reasoning where data
comes in and they use
that to build models.
Humans do deductive reasoning, which is
the opposite of what
everyone thinks because of
Sherlock Holmes.
Deductive reasoning is when you have a
model and then you try
to find things that will
fit that.
So when Sherlock Holmes notices the mud
on the hem of ear
trousers and the color of your
shoes and the fact that you have ink from
a certain train
station ticket stub on your
thumb and figures out where you've been,
that's inductive reasoning.
He's noticing these things
and putting them together.
When these videos are being made,
information is being put
together and formatted into a
swimming turtle and then checked against
itself, against itself as a
four-dimensional model
over time.
That way we get the smooth swimming and
Will Smith doesn't grow
multiple hands to eat the
spaghetti with because he's got hands
that just have to be in different places.
All of that is putting together ideas and
then forming the
model that says these are
the same hands and have to be the same
hands the whole time.
Humans have that model right from the
get-go and are refining it.
Does that make sense?
So it's a different form of reasoning.
Now, as to the complexity of how memories
are stored in the brain
and that kind of thing,
the general agreement of how that works
is a 20 or 30-year-old
model that nobody challenges
but everybody agrees is wrong because it
requires that there is a
central modeling space in
the brain that no one
has been able to find.
In short, I disagree.
I don't think that human learning and
machine learning are
at all the same thing.
Could they be someday?
Absolutely.
But right now it'll be coincidence if
that happens because we
don't know how humans do
it.
Absolutely.
I'm always playing
devil's advocate for the record.
What seems to be interesting though with
all of this is that at
least the thing that always
strikes me with the development of
generative AI as it's been
happening is how, especially
in the beginning with Daydream and the
likes of that that were
the early image models,
how psychedelic they
were, which was weird.
And then how the first generative models,
their hands with the
hands we know specifically
extra fingers.
That to me immediately struck me as
someone who's done lucid
dreaming and spent time,
a weird, silly amount of my life trying
to be consciously
aware during dreams, which
is not something many people experience
throughout their lives.
It's just something that maybe will
happen by happenstance
once or twice in their lives.
But it turns out it's a skill that you
can work on and get better at.
One of the things that a lot of the
community, there's a
community of lucid dreamers and
they'll discuss things and hands is
something your brain
never gets right in dreams.
One of the ways that you can actually
tell if you're in a
dream is hold up your hand,
count your fingers.
If you have more than four fingers in one
film, you're in a
dream, which is interesting
because it aligned with these earlier
image generation
models, which were struggling
to do hands, which to me at least hints
there's similar things going on.
Similar to what extent, sure that's
arguable, but it's super
interesting that that was an
immediate thing that happened and that
aligns with dreams at least.
I agree.
It's interesting.
My propensity is to say from a scientific
standpoint, that is one
factor out of millions
of factors that you're experiencing while
you're lucid dreaming and are dreaming in
a lucid fashion and the one factor
aligns, but the others don't.
So that makes it easy to think that
there's a connection there.
There may be, I'm not saying there isn't,
but I am saying that
the observation of one
coincidence in a field of millions isn't
the observation of
similarity, even though
it feels that way to us, that's a natural
way humans associate things.
If you think about what I was saying
before about the model
of how we exist, the part
of our brain that maintains that model
has a very good sense of
how many fingers we have.
If you rub your thumb and fingers
together in that soothing
gesture that all primates
do from infancy, if you do that, you have
a very good sense of
where your fingers are.
If one was missing, you would know it.
If there were three extras, you would
know it and it would be really weird.
But that's tactile, not visual.
I don't know if the visual portions of
our models have any notion of counting.
Before we got started, while you were
positioning the camera,
there was a moment where there
was that lovely camera effect.
I tried to recall the
word fractals and failed.
Just like admitting to not having done my
homework well enough,
I'm happy to admit when
I have mental lapses or inability to
recall things when I want to.
I think that's how I can keep track of
myself and also how I can learn more.
I don't want to hide that from anybody.
I'm making a point of saying it now
because I say it to my
students all the time.
We all make mistakes.
If we try not to admit it, then we end up
in this false image of
ourselves that we have
to defend and that's
not a good way to learn.
Anyhow, that fractal moment of camera
within camera within
camera, shot within shot within
shot, that's a big thing in psychedelia.
The very common psychedelic image of a
hand in which every finger grows a hand.
Yup, and then even just ancient religious
architecture as well.
It's everywhere, but even the hand where
every finger grows a
hand and those fingers grow
hands, you see that in
ancient paintings as well.
I think that kind of psychedelia is a
fundamental part of the models we build.
Or just maths fundamentally as well.
Absolutely, there's no doubt.
I saw a great talk by an astronomer, oh
man, 20 something years
ago at the University of
Brunswick in Eastern Canada.
A visiting lecturer came in and talked
about the natural
right-handedness of most things
on Earth down to a molecular level.
And how if you go and look at asteroids
or meteorites, I should
say, if you look at meteorites
that have come to Earth, some of them
have more left-handed
nature than right-handed
nature.
So the recurve that we notice in all
things fractal, that
fractal recurve, that natural
tendency to repeat does
seem to be quite universal.
But we happen to be developing in an area
of space that might be
right-handed more and
left-handed less.
Not exclusively right-handed, but
right-handed more and
left-handed less, whereas other areas
might be the other way around.
Doesn't that depend on
where your orientation is?
I think I see what you're saying.
So if you stand at the other end of the
boomerang, it bends left, not right.
Yeah.
But an asteroid in space, if it's
spinning or if it's more in one
direction, if you just
flip your frame the other way around.
Yeah, it just flips your
frame the other way around.
But if you look at the structures of the
molecules inside it, or
let's say a DNA molecule, we
haven't found DNA,
confirmed DNA from other places yet.
I think there's been some that people are
saying is DNA from Mars, but I'm not sure
that that's real.
I don't think.
I've advertised.
But so DNA, we know, is a double helix
that curves in a certain direction.
No thanks to the two men who got the
Nobel Prize for it, but
rather to their student,
whose name we should put it in the
footnotes because she
deserves to be mentioned.
The student, one of them who came up with
the idea and got ripped off.
Anyhow, the helix turns in a certain way.
And as long as you know top from bottom,
then you know which
direction that's corkscrewing
in.
Whether you look at it from one side or
the other, the screws go the same way.
It's just if you reverse it on its poles,
it looks like it's going the other way.
So it's the same thing with these genetic
components, with these
molecular components.
I hope that makes some sense.
If you think about it, you could think of
the spirals of the
galaxies, where we know
that galaxies seem to form in spirals,
like the Milky Way,
which we only see a sort of
a flat Milky Way, simply because we're
inside it, off to the
edge, but inside it.
But when we see other galaxies, we can
see that they're spirals.
And for the most part, they're spirals
that rotate in the same direction.
I'm going to challenge that.
I'm not sure if that's true.
I think that's again,
just depends on orientation.
There's no universal plane.
And I'm pretty sure all the galaxies, we
see some going one
direction, others in other
directions, and it just depends where
you're saying is up and where is down.
If I'm wrong, I'm happy to accept that.
It happens every single
day, most of the time.
And yeah, glad to be correct.
Maybe I look and I
ditto for you just said.
I know you do.
That's one of the reasons we get along.
Yeah, yeah.
It could be, that's not something I've
completely looked into.
But yeah, from where you are looking in
time, then space and
time will obviously always
depend on what you
perceive as orientation.
I think it's a question of whether the
curvature is across two
dimensions of the three or one.
If it's across one dimension, then it can
look from another
removed perspective as though
you can basically flip it without having
to flip your perspective,
your internal perspective.
You can flip its
perspective, if that makes sense.
It's because it's in
three spaces or three axis.
I know this is referred to as, well,
there's, "tyrality" is a
word that's coming to my
head right now.
And I know that's a whole thing in
physics that I'm not completely--
Yeah, how neither am I.
I can't speak to that.
I can say that the phenomenon that we're
talking about can tie
neatly back into digital imagery.
Here we go.
What--
Bringing it back.
What we talk about, an effect that used
to happen with digital
cameras in 3D spaces where
if you moved anything too quickly--
actually, not just
cameras, but even 3D models-- if
you moved anything too quickly, there was
a propensity for the
three-dimensional joint
around which it was rotating.
So if you had a three-axis joint,
something was rotating
around, there was a propensity
for it to flip and go the shorter
distance across one of
those three axes to get to the
final point, which made for hilarious
glitches or bloopers in
animation when either a camera
or a character suddenly
rotated in ways they should.
That was an interesting side trick.
I'm going to stand by it.
I say it was interesting.
I'm the first one to put the label on it.
It was interesting.
Yeah, yeah.
That'll be a fun one to summarize.
And of course, where
did we go with that one?
We went to galaxies.
We went to-- where was it coming-- let's
rein it back to-- I'm
going to rein it all the
way back to Sora for a moment.
Sure.
Because it's something I started with in
our last podcast
talking about it's exciting,
and I don't know where
things are going to go.
One thing that excites me, though, and I
can predict-- and you
said it worries me when
people confidently make predictions.
So I'm going to
confidently make a prediction.
It's a short term
one, so it's an easy one.
And it's something that from the videos
we can see it can already do.
I'm just extrapolating one extra step on
how people are going to use it.
So in one of them, there is a
car driving along a dirt path.
And then they have a
simple prompt, make it a jungle.
And now the car is
driving through a jungle.
And its visual fidelity is surprisingly
good, worryingly good.
Now, that immediately goes to me.
There goes the whole field of VFX.
And immediately, at least, excitingly
opens it up to smaller
studios, smaller indie scale
things where they can't afford 10 VFX
houses to create Marvel
level worlds and effects.
It means that your average group of
friends could come
together, record something at a
beach, and then just prompt the AI engine
to say, now make us in this place.
They could even train it specifically
with the type of place that they want.
This is where I think-- this is another
point I was going to get to.
With all these tools-- and we've already
seen how people are
using this responsibly and
not responsibly.
There is the whole ethics of the training
data in the first place, which is another
comment and another thing to talk about.
But right now, I'm going to focus on
specifically how people
are using AI right now.
There are people who stand atop of it and
use it as a tool and
remain strong creative
director.
They always have the creative input.
They're really making sure the AI is
working for them and
you're not working for the AI.
By that, I mean-- I've gone in a circle.
But these imaginary group of friends that
come to the beach and
they say, let's record
a sci-fi where we're on an alien planet.
They record their
video, their little film.
They come up with a little narrative to
say, we go over here.
This happens.
That happens.
And they record it.
And then they just say to
the AI, make an alien planet.
Boom, it's an alien planet.
That's not creative.
That's boring.
The better part of creative process for
film specifically is
when those people sit down
and really think about
what is this alien planet?
How does it make?
What are the colors?
How can we use that-- the visual
aesthetics to help the
theme that we're trying to say
over film?
The larger, long creative process-- and
it can still help you,
but I imagine systems
where we can at least train these models
on our own inputs or
at least direct longer
and longer and longer through iterative
processes and really
narrow in on a unique, creative
aesthetic that has been created by the
humans and not just the
generic first thing that
the AI says.
And so I'd like to start answering that.
There are parts of what you said at first
that I want to come back to about the new
technology.
But I'd like to approach this in inverse
order and start with
what you've just been
saying.
I've talked to you
about the creative process.
And this has been a problem as long as
there have been tools
for creative people to use.
When Maslow's Golden Hammer, right?
When you have a hammer,
every problem looks like a nail.
When you have a pencil, everything you
produce looks like a pencil drawing.
And that's not true.
There are people who can make
photo-realistic drawings with a pencil.
But the propensity for most of us is to
make a pencil drawing
with whatever image we have
in our brain of what that entails.
And if you're using charcoal or conte,
the image you're making changes.
It is always of greater value if you use
the tool in a way that
surpasses the norm for
the tool by imposing your creative image
upon it, very much like
what you were saying a
moment ago.
So I agree with you.
The kids at the beach who want an alien
planet would tell a better
story if there was a purpose
for the alien planet.
And as you were alluding, if the alien
planet has elements that
address the human issues
that are vital to the story, then that
will carry value to the
viewer beyond the fact
that, oh, this is an alien planet.
If you look at any good
storytelling, that's the case.
The deeper, perhaps unconsciously
recognized and only
unconsciously recognized elements
of the background of the characters, of
their backstories, of
everything going on in the
mental model you create, the story as
you're experiencing it, the
richer that is with information
that you can unconsciously use to make
predictions about what
happens next or to attach a motion
to what the character is experiencing,
the better the story is.
And the AI cannot do that unless it's
purely by coincidence.
So I agree.
This is a problem, and it's going to be a
problem for anyone
who goes out and thinks
they can use the tool quickly.
We see the same thing in
academia all the time now.
Large number of students are using large
language models to
generate their assignments.
And what we find is that not only is the
structure of the
assignment consistent whenever it's
generated by most of the AIs-- and you
can recognize a lot of
content by the structure--
but the vapidity of our
content is consistent.
And it's saying a lot when you say that
it produces writing
that is more vapid than the
usual undergraduate students' writing.
That's saying a lot.
Like saying it produces a lecture that's
more boring than the usual undergraduate
lecturer's lecture.
It would take a lot of
work, but AI can do that.
It can produce really boring lectures,
and it can produce really vapid writing.
And it's only the students who take the
time to become experts at
it who can produce great
writing using that tool, just like they
would be able to produce
great writing using a keyboard
or using a pencil.
And that brings me to what I
wanted to talk about earlier.
You said the ability to
suddenly generate a jungle--
Yes.
The possibility of putting special
effects houses out of business.
I don't think it does.
When I was learning 3D animation for the
first time, mid-1990s,
and working with two most
popular softwares at the time-- the two
best softwares at the
time-- softimage and then
what became Maya, there was a point where
particle effects totally changed.
So for example, in Maya, you could use
Maya embedded language to
write script for particle
effects.
You could do some pretty cool sparkles,
and you could do some
pretty cool larger sparkles,
and you could do some really shitty,
shitty flame and some horrible smoke.
And then all of that got better, and
particle effects got better and better.
And it got to the point where using the
same tool that generated
particles, you could use
brushes in 3D and with a brush stroke,
create a plant from several
different predefined plants.
And you can choose how wide does it get,
how many branches come
out of it, how quickly
do they branch, and do their branches
produce leaves or flowers?
And all of that with preset
variables that you could do.
And very quickly, people were painting 3D
jungle scenes of a
Greek complexity that would
have taken a very long time to model.
You could now paint and animate.
You could paint the starting position,
the ending position,
and have it morph either
direct photographic morph, which
everybody laughs at now
because it's so horrible, or
actual animated sensible morphs if you
were willing to take the time to do it.
And a lot of people were saying, that's a
lot of modelers who just lost their jobs.
But they did because what happened was a
lot of special effects
houses started producing
really shitty jungles.
And that just increased the demand for
the people who could do it well.
Audiences soon could see the difference,
but the professionals
could see the difference
right from the beginning, and they would
make a choice about
which version they wanted to
use.
And I think that's what's going to happen
with this, at least for the near future.
I think people are going to be saying,
yeah, we could generate
that with AI, but we could
get a professional to generate that with
AI who is going to
iterate more than we are,
tweak it better than we will, and they'll
produce a better final.
I think that's much closer.
Yep.
I 100% hope that VFX
is not going anywhere.
It's an industry I love a lot.
It's something I have spent a lot of time
figuring out how to do.
So I would really hope for those skills
to not become obsolete overnight.
And of course, they won't, because it's
not just how to use the software.
It's art in itself.
Exactly.
Exactly.
And that ability to treat it like art and
create art with it is
exactly what I was trying
to talk about there.
The artist is not the tool they use.
Absolutely.
I don't know if you've ever seen any of
the paintings that
Dacasso did with a pen light,
where he just stood in a dark room and
painted things with a light in the air.
Oh, I...
A beautiful shot of him from Life
magazine from the 1960s or 70s.
It's already a very old man.
He's painting a bowl with one continuous
line of a pen light.
The pen light is not an artistic tool
except in the hands of an artist.
Exactly.
And with all of these AI
tools, well, they are tools.
So I was about to say, I don't think
we'll get artists, but we will.
People will figure out how to use these
tools in ways that other
people want, and they will
care, they will treat it as a craft and
they will get better and better at it.
Absolutely.
There are already people that do that,
that manage to prompt and
configure these generative
models in ways that others don't.
It strikes me all back with
all of this generative AI.
I wonder, have you read
June, Frank Herbert's June?
You recall the Butlerian Jihad.
The Butlerian Jihad in June is something
that happened in this
world thousands and thousands
of years ago, where they had almost
utopian AI society with, they
call them thinking machines.
Then the thinking machines, I can't
remember specifically why, what happened.
I think the thinking machines just
started doing that sci-fi thing.
I forget what the
reason was specifically.
Long time ago, they stopped
having an influence on life.
Yes.
And they, human civilization revolted
against the thinking machines.
Much to the point that in the June world,
they have Mentats,
which are thinking people.
They get completely high on stimulant and
then are able to do
arithmetic in their heads.
They are the computers.
But to come back to, I wonder if we,
there was, so there was
two reactions to Sora.
Universally, everyone's going, wow, but
there was a surprising
wave of people not happy.
Much larger than you would think of
people going, what are you doing?
Stop.
This is getting too far.
And that was quite a significant
sentiment of people
saying this is too far.
Yeah.
I'm always surprised by that with art.
If the point of the algorithm was to
modify human genetics, I
would be very interested
in hearing all of the
arguments against it.
But when the point of the algorithm is to
create things that we know are fake, that
don't interact with us in the real world,
then I don't have a problem.
And if people are using the fake stuff in
an unethical way, that's not the fault of
the fake stuff.
That's not necessarily the fault of the
people who create the fake stuff.
So yeah, I don't have a whole lot of time
for people who say
no, no, no, just because
they can.
Two-year-olds do that
and they grow out of it.
It would be good if other
people could grow out of it too.
A common belief, and I subscribe to this,
is that two-year-olds say no because they
can.
They have the ability to
say, I will not be carried.
I have always been carried.
The ability to say, I will not eat this.
I've always eaten what you give me.
They're starting to assert an
individualism, a
separateness from the rest of the world.
But they grow out of that.
Ideally, they reach the point where they
say no sometimes and
yes sometimes and ask for
more information before they make a
decision on other occasions.
People who have a "neijerk reaction" to
technological change, they're
just being like two-year-olds.
No, no, no, I can resist
this, so I will resist this.
Rather than becoming informed by change,
they're resistant to
change because of what it might
inform them of.
I actually agree.
No, I actually agree.
Talked in the earlier podcast how I'm an
early adopter of technologies.
From the get-go, I got excited by all of
these and I've
integrated them into my workflows
quite tightly.
Through various projects I've done,
almost every project I
commit to now, I'm using AI
in some way, generative AI specifically.
That's one of the reasons I enjoy talking
with you about this
stuff, is you have constantly
changing and expanding information about
it that's way beyond what I have.
By all means, keep
doing what you're doing.
For sure.
Now, to clarify a little the special
thing that we're
talking about that these tools
perhaps on their own don't have.
I'm always struck by a story.
If it's true or not, I'm not exactly
sure, but my friend
confidently told it to me once,
so I'm going to
confidently tell it is true.
There was a company that bought a
mass-special material in a
facility where they could take
any whisky, analyze it, look at all of
its components, and
then perfectly recreate the
taste of that whisky within days.
Usually, for all of time to make whisky,
you've got to stick them
in a cask and let it age
over time to get that oaky-deep taste and
various other tastes that
can be infused throughout
the distilling process.
They thought they had a million-dollar
industry on their hands
because they just said, "Well,
we can perfectly, we can outdo everyone
because our
manufacturing times are nothing.
We can just make great tasting
Then they quickly learned that there's
absolutely zero market for that.
That's not why people like whisky.
People like the story behind it.
People like the fact that humans took
their time to set aside
space in the world, that
humans put time and energy into putting
all these resources in
to create this product.
It's the same reason why perfect
recreations of artworks are never as
valuable as the originals,
because it's the human story behind it
that matters more than the end product.
For a lot of these generative AI artworks
that are coming along, they're cheap.
They're incredibly cheap.
They are cheap.
Nobody's buying anything.
If you print a book of AI
artworks, I doubt it will sell.
Maybe it would have you told a complete
lie in story behind it.
You might be able to con people in, but I
would pose to you that
it would be very hard
to sell a book of AI artworks when we're
at a point where
anybody can go and reproduce
these very easily.
Right.
It's a wonderful story.
If it's true, then somebody should alert
the estate of Gene
Roddenberry that Syntha Hall
has been invented.
I think it's very cool.
I don't know about the validity of the
story, but I appreciate
the storytelling aspect,
because that's, again, like you say,
where humans get their value.
I do think that there are stories you
could tell about a book
of AI artwork that would
make it valuable to the humans.
For sure.
But then you're
introducing a human story.
Exactly.
And that's what you would have to do.
So if you were to say, this is a book of
samples of the kind of
thing you could generate,
then there would be a market for that,
because a lot of people
who want to generate things
with AI don't know how,
don't know quite what's possible.
Looking at this might
give them some ideas.
That's one thing.
And then they'd be counting on your
expertise and your insights
to inform them about which
ideas would be worth the money.
The story of the creator, why it's
valuable information to
listen to in the first place.
It's an alternative.
You could have a book and say, here are
some of the worst
examples, because we all like
worst example stories.
So yeah, there are stories you could
tell, but you would
have to impose that story.
I think if you counted on AI to create
the story of why a
book would be interesting,
or a collection of images would be
interesting, or a
cartoon would be interesting.
If you count on the AI to generate that,
then it had better be
trained on things which happened
to coincidentally add up to story element
and story structure.
And this is why I'm always suggesting
when we talk about this
stuff that just a little
bit of human control on what is learned
and how it's learned
could make a huge difference
in what the AI is capable of doing.
As an example of reducto ad absurdum,
just a silly example, I
don't care how many AI
powered bots you have operating.
I don't want to get artificial
respiration from a series
of bots that have to figure
out how I breathe and why
water and malums is bad.
And much rather they knew that before
they start figuring out how to treat me.
Does that make sense?
Too abstract?
Too absurd?
Yes.
Okay.
C'est la vie.
No, all I mean is a little bit of
instruction can get an AI around a
roadblock that it might
figure out on its own
through multiple iterations.
It's the same thing without the complex
AIs that we're talking
about with the more simplistic
AIs, right?
You've got a fleet of drones that are on
limited battery life, but
as the fire marshal, you
need to use them to map out a factory
fire and figure out
where people can have safe
access in and out, where to concentrate
the forces that you
have and even simple things
like where to radio the people inside,
which are the safe
evacuation routes, which ones
are.
They cooperate wonderfully using
algorithms that have been
developed among other places
at Lakeside Labs in Austria.
They've got great programs for that.
A lot of cool people doing the work, but
they have to have some
instruction, as I understand
it, to get them started.
Yeah.
Yeah.
I see what you're saying there.
The minute you try to introduce a new
element, a new depth to
it, then they need to have
some instruction of that.
They may come to the idea that, "Oh,
there's horses here and
they require a wider path."
There's orchids that have to be rescued
and they can't just be near a flame.
They have to be nowhere near heat.
But unless you introduce those
parameters, the AI will
take generations to figure them
out.
In those iterations,
you might not have time.
And there's also the hard fact that
coming back to your
hammer, just because we have
AI tools now doesn't mean they're always
the best tool for the job.
You're here.
I'm so happy to hear you say that.
Yes.
And I think a lot of special effects
houses will be finding out.
And I think a lot of movie producers
found that out during
the writer's strike.
There were a lot of scripts floating
around that were just horrible scripts.
Well, we recently had an
example of that in Scotland.
Did you hear about
the Willy Wonka factory?
Oh, yes.
So I'm very, very, very confidently not
using the tool very well.
Or perhaps using it
exactly as much as they wanted.
Exactly as much as they wanted, yeah.
They sold a lot of tickets.
And maybe the idea was once I sell enough
tickets, I can start building this stuff.
Maybe.
That was probably the case.
Having worked in Silicon Valley a fair
bit, I've met a lot of
people who go into it with
that attitude.
And for reference, for anyone listening
into the future where
the immediate story doesn't
make any sense, there was a Willy Wonka
event in Glasgow,
Scotland, where pretty much a scam
and the person running it
half arsed it, so to speak.
Everything, the promotion and website,
was all done by AI
and got people excited.
The scripts were in by AI.
And then they didn't really
deliver on the experience.
The experience looked for the uninitiated
as though it would be
a fantastic 3D virtual
reality experience full of candy.
And according to one news story, when
they got there, there
were a couple of posters
that were printed and put
against bare concrete walls.
Generative AI posters
as well, specifically.
But then the candy experience was one
jelly bean and half a
glass of lemonade for each
person.
Yep.
Not quite the same as the giant mushrooms
made of candy that were in the photos.
No.
No.
There it is.
Willy's chocolate experience, indulged in
a chocolate fantasy
like never before, captured
the enchantment.
The other brilliant part about it all was
that it wasn't, to
avoid copyright, it wasn't
Willy Wonka.
It was Willy something or other.
Yeah, Willy's chocolate.
No, I think you I think you
call them Willy McSomething.
Oh, did they?
Yeah.
I didn't know that he had a last name.
He did.
It was Willy McSomething, I'm sure.
Look at that one.
Hold on, go down.
Go down.
So if you're listening to audio only,
we're looking at the
website of the Willy Willy's
chocolate experience dot com.
And we have just scroll up.
Look at that
cacti-gating live performances.
Passodize of sweets.
Yeah.
And and cheering because we know a
passadise of sweet teats.
Yep.
We didn't even bother
taking this looks like Dali.
So I would imagine this came from GPT.
They just were speaking directly to GPT
to create and didn't
even bother to edit the
text because we know that a lot of these
models still struggle
doing image generation of text.
Yeah, that's wonderful.
Cacti-gating live performances.
Carche tons.
Carche tons.
Exarchure Day lollipops.
A passadise of sweet teats.
Yeah.
By the way, that's not a dirty word.
The spelling here in North American
English, T-E-A-T-S,
that's what you call the organ
that is used to funnel milk out of a cow.
And the pronunciation of that in North
American English is tit.
Yeah.
Yeah.
I'm not sure how it's said in Scotland,
but I just want to be clear.
I wasn't saying a dirty word.
That's the part of the cow I used to grab
hold of in order to milk it.
Yeah, I think we would
say we're being polite, tit.
Tit?
Really?
Yeah, we'll say that.
Yeah, but that's not how
it's pronounced in Canada.
No, no, we do use the
word tit all the time.
Sometimes to describe people as well.
Like someone who distinguishes between
Canadian and Scottish translations.
Yeah, right tit.
Excellent.
Nice to know what people are saying.
So John, I want to remind you, you wanted
to talk about
something that happened in the
US.
Oh, yeah, great.
Nice callback.
There you go.
I think it's quite related
to what we're talking about.
They're all bunch of tits.
They truly are.
Some of them more than others.
And one in particular is famously a right
tit to just about
everyone who isn't upper
class and white and of a German
background, as he sometimes
claims and sometimes doesn't.
So Mr. Trump has recently released a
bunch of campaign
posters, which were obviously
generated by AI.
For one thing, he looks vastly more
healthy than he does in real life.
But these posters are specifically aimed
at showing him with what
they call an African-American
demographic.
So it's him with a bunch of black men in
some photos or photos
and some illustrations, him
with a bunch of black women and other
photos or illustrations,
and him with mixed groups
in other illustrations.
And they're all generated by AI because
there has never been a
photo taken of Trump surrounded
by people of a different race than he is,
if you'll pardon my
using a disgusting term,
like race.
So you will not find photos of Trump
surrounded by black men or black women.
You will not find Trump in a group where
they've got arms around
each other's shoulders.
But they've released these images trying
to say, yeah, here he is.
Here he is a man of the people.
The thing is, the images are visibly
fake, not just because I
don't believe Trump has
ever smiled that way, but because of
missing fingers, because of extra arms.
There's a great photo of a black
campaigner talking to a
black potential voter, and he's
got an extra arm down by his side.
And somehow the people who generated the
artwork and the people
who approved the artwork and
the people who put it online, somehow
between all of them, none
of them either noticed or
dared to say, that's a three-armed guy.
Yeah, that's quite the oversight.
Fake news as he coined.
Yes, indeed, as he might loudly proclaim.
Very fake news, which
unfortunately isn't news at all.
But I'm really hoping that the rampant
use of this technology
by incompetence will knock
some of the wind out of everyone's sails
in terms of the hurry
to jump on bandwagons.
When it was just verbal encouragement and
the occasional photograph, people created
mental models of the horrible things that
were, "Oh, they're
trying to slip microchips
into us with vaccines."
They are, "Oh my gosh.
Oh, you can cure that with bleach."
You can, "Oh my gosh."
There was nothing that forced you to
admit on the spot, even
from a position of ignorance,
that this is fake news.
Whereas when you see that the man next to
Trump has three arms,
maybe you can see for
yourself, "Hey, that's a fake photo."
So I like that.
I hope that if a propagandist continue to
make those mistakes.
Yeah, making it completely absurd to the
point where the emperor
really has no clothes to
the point, we all have to point out that
the emperor has no clothes.
Yeah, where even the emperor's favorite
defenders might have to
say, "Okay, yeah, we can see
his bits."
Yeah.
Interestingly, we're talking about
interesting weird things
happening with these generative
models.
There's also on the other end of things,
because both extremes
are weird and wonderful
these days, left and right.
There was an inverse oopsie
that Google's Gemini was making.
I don't know if you saw this.
So Google's Gemini model, it's multimodal
and will create images for you.
I experienced this with Mid-Journey
myself because I was
trying to make some images and
content and it would not be very diverse.
It would always just give me good looking
white people all the time.
So I would have to
put in the word diverse.
The second I do that gives you the white
spectrum of humanity.
Google apparently obviously knew that
too, to the point where
they were hiding that prompt
into a lot of their images, to the point
where that was making
some embarrassing mistakes
when people were writing things like,
"Show me German Nazi
soldiers," and they would all
be of every race apart from Germans.
So completely ran the other way of
absolutely being
irresponsible ultimately.
Yeah.
I'd like it very much if you would just
repeat one time for the
sake of all the listeners,
the fact that Google did something
deliberate to hide a flaw in their
software that caused
that error.
Would you mind just saying that again?
Because that's very
cool and very important.
The fact that they tried to correct it by
hiding in a prompt of
diverse into it to correct
it.
Yeah.
So they knew that there was a weakness in
the code that they
were releasing and having
everybody use that they just renamed as
part of the big launch.
I was an early user of that, so I get all
the emails including the notice that it's
changed names now and
everybody can start playing with it.
The fact that they decided, "Oh man, this
looks like we're
racist," every picture is
clearly white and mostly male
unless it's overly sexualized.
Boy, we better hide a prompt in there to
change that instead of
saying, "Let's fix it."
Let's hide a band-aid
rather than fix the wound.
That's exactly the kind of thing I'm
talking about all the
time with the companies that
do anything to do with AI.
They got away with it with a lot of other
algorithms that they use.
Now they're trying to get away with it
with something that
thinks in a way they don't
understand, but that does do some kind of
thinking, some kind
of cognitive processing
is going on there that the
AI itself cannot explain.
So why on earth would you want it to
think in a flawed way and
continue being accessible?
Well, they very quickly took it down
after that was identified.
Once enough people posted
pictures of black Nazis.
That's the one that was in the news.
But that happened.
That was the second wave.
The first wave was, "Hey man, we're not
getting any diversity."
That's when they should have taken it
down rather than
saying, "We're going to insert
the word diversity
into everybody's prompt."
So it starts making diverse crowds so
that our racist app
starts making diverse decisions
or showing diverse things.
And I'm not saying that so that Google
can come after me and
say, "Hey, we're not a racist
company."
I'm not saying Google
is a racist company.
I'm saying that they created something
that behaved in a way
that looks racist and their
solution was to hide
it rather than fix it.
Or quick fix, quick cheap fix.
But not a fix.
They didn't try to fix what was wrong.
They tried to work around what was wrong.
If you did that with a car you were
selling to the public,
you wouldn't be put out of
business.
If you did that with a drug you were
selling to the public,
you'd be put out of business.
But because it's
software, they get away with it.
And that's really dangerous because
software isn't just on
our computers anymore.
It's everywhere.
Well, John, I think that's probably a
good place to end this one.
We're going to have to think
of topics to talk about next.
I hope some folks will suggest them.
I understand we might
have one or two listeners.
Yep, yep.
All three of you.
All three of you, please let us know what
you think we should talk about next.
I'm going to look right into the camera.
All three of you, please let us know what
you think we should talk about next.
And yeah, with any luck,
we'll get some suggestions.
If not, we'll come up with some ourselves
and who knows how
dangerous that could be.
Absolutely.
We might even dive more
into the art side of things.
We do quite often, but maybe
we'll just talk about film.
That would be lovely.
I'd like that very much.
Technology and history of film.
Oh man.
Don't get me started
on the Lumiere brothers.
We'll be here for hours.
Yep, yep, yep.
There we go.
Okay, well, there you go folks.
Something exciting to look forward to.
Is Jimmy and John rambling on again,
going off in weird
tangents and maybe sometimes
tying it back to what they initially
started talking about.
If you have been listening to this point,
we really, really appreciate it.
And thank you.
And we hope to see you next time.
Take care folks.
Please come on back.
Sock Talk is a production from the Robert
Gardner University School of Computing.
Today's episode was brought to you by the
letter pi and the number pi.