Rendered at 11:32:46 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
manas96 2 days ago [-]
In my day job I program rigid body behaviour in real time amongst other simulations.
I think rigid body contact is hard to learn as it is inherently discontinuous.. something you discover when trying to code a solver.
As such I always use this prompt as a test:
"A video of a jenga brick tower falling over as a brick is removed. The physics of each brick must be realistic."
It gave me a video of where bricks suddenly disapper or morph into others[1]. The linked video is after 2-3 iterations of me insisting on realistic physics. If you are just glancing at this, you would believe it is realistic.
That said this is still very impressive and one more step towards .. IDK what. But I am a bit reasurred that at least my job won't be fully replaced with AI :)
My point here being that representationally, it might be possible to learn good dynamics without a radically different approach/arch. There are already models that extract 3D tracking points from videos, so they could possibly be leveraged for learning dynamics (which on its own gives precedent for end-to-end approaches also possibly working).
manas96 1 days ago [-]
Thanks for the additional reading. I've often thought about LLMs and their ability to represent the physical world with its laws. And always concluded it is not really possible to do so with "just" text tokens and their relations in a latent space. It looks to me there are different approaches being taken to tackle this:
* You could instruct your LLM to interact with a simulator to run experiments and infer behaviour
* You could edit the transformer model and inject spatially relevant data rather than text as is done in above paper
* You could change the architecture to be more condusive for representating a world state. I.e., LeCun's JEPA world model.
* You could further enhance some of the above by using a differentiable physics engine (eg. NVIDIA Newton) to calculate losses directly.
But at the end of the day if a model has any hope to always produce realistic physics, it HAS to learn the laws of nature in some form or other. It looks to me that the next big leap could be achieved by combining the last two approaches.
P.S.: I like discussing such topics. If anyone knows a forum or discord with like-minded people, please let me know :)
E-Reverance 16 hours ago [-]
> P.S.: I like discussing such topics. If anyone knows a forum or discord with like-minded people, please let me know :)
Unironically twitter (and only use the "Following" tab as opposed to the "For You")
Make an account that only follows university affiliated researchers with less than 1000 followers. In my experience discord servers get suffocated by beginners and crackpots because conversations don't naturally self-organize into their own threads.
manas96 15 hours ago [-]
Thanks, I'll try using the "Following" tab. I have a lurker account but never really used it because I only ever saw crap in "For You".
AgentMasterRace 1 days ago [-]
I'm not sure why, especially because you're a developer... But damn, the amount of people that expect AI to just one shot stuff is hilarious. Half of the time I make a typo or something, should I be laughed out of the room?
AlecSchueler 1 days ago [-]
They said the given example took 2-3 iterations. If you think it could be done in 4-5 etc maybe you could share your own result?
manas96 1 days ago [-]
I did prompt additional times insisting on realistic physics..
nine_k 2 days ago [-]
Such videos are essentially dreams: how it feels that the planks should move, not what equations of rigid body physics would compute. And the feeling is realistic (even if overly dramatic in the end). If "stylistic transfer" works for static pictures spread out in space, why won't it work for the character of motion spread out in time?
darkwater 2 days ago [-]
I wonder what's the training data that makes it generate the final "explosion"...
Unai 24 hours ago [-]
Interestingly, the video on the announcement also starts with some papers and a toy car on a wooden table exploding like those jenga pieces.
jddj 2 days ago [-]
A little too much Michael Bay
tiahura 2 days ago [-]
I was thinking eleven.
badsectoracula 2 days ago [-]
The physics engine glitching is very realistic :-P
sbinnee 1 days ago [-]
Classic 3d simulation artifact with boundary conditions. I remember for an assignment where I had to model liquid with rigid bodies, they would suddenly gain infinite force at the corner and just disappear. It's clear that they must have used a lot of these kinds of synthetic data. But what's impressive to me, every release of these models, I am feeling less and less uncanny valley.
oceansweep 2 days ago [-]
Totally unrelated, but what would you say the feasibility of writing simulation software for simulation of/replicating body movements during/in a martial arts technique would be?
I’ve often thought it would be very handy to have a proper simulator for being able to simulate and identify inefficiencies in one’s technique, but no idea whether it would be feasible to do.
manas96 1 days ago [-]
I think modelling accurate articulated body dynamics is feasible but when you add deformation (muscles) it gets much harder.
jackling 2 days ago [-]
Would be similar to the typical simulations of humanoids. If you need to model the deformations of the human body, or get a proper model of tendons that make up humans, it'll be more difficult, but possible.
Proper simulators for those exist, you essentially need an engine with a compliant contact model. MuJoCo is the goto here, see:
These explicitly model biological muscles. IIRC it was originally created to model human hands (I could be misremembering though).
Really depends on the fidelity you want.
Edit: I also work in rigid body simulation for robotics.
manas96 1 days ago [-]
Indeed, it entirely depends on which axis you want to focus on. A loose trade-off chart would be speed, stability and accuracy. You can only have two of these in a simulator.
Robotics folks probably want speed and accuracy. I'm from the video game industry so I generally look for speed and stability.
Note: This is a loose analogy and recent techniques are already blurring the lines between these axis.
soperj 16 hours ago [-]
That[1] video looks very Twin towers. Falls in on itself and then explodes.
christoff12 2 days ago [-]
thanks for intro to streamable
staindk 2 days ago [-]
In my experience (from a couple of years ago), Streamable can be great but it's just worth checking what their current retention policy is like.
We were sharing game clips with each other and after a while realised our old clips were just gone, being deleted after 30 or 90 days or something.
christoff12 12 hours ago [-]
noted!
manas96 2 days ago [-]
it was the first link I got after googling free video hosting sites
christoff12 12 hours ago [-]
I guess I haven't tried hosting/sharing anything outside of an unpublished youtube video or GDrive link in a long time.
aaroninsf 1 days ago [-]
Some serious clipping
torginus 2 days ago [-]
While at a cursory glance it looks as impressive as always, subtle spatial errors, and geometry that changes as it goes out of sight and comes back again hints at the fact that Google has still yet to solve the problem of deep spatial understanding.
Which considering just how pretty and detailed this whole thing looks, imo points at a fundamental issue at how these things are trained - it's as if there's no structure to its knowledge and training, like how an artist trained to draw would first try to understand simple 2d composition, then perspective, then light and shadow, mastering each concept and gradually building up a hierarchical understanding - it seems like its trying to learn everything at once.
I would rather see an AI model that I could give a floorplan of a building and it would generate an accurate flythrough on any path, even if it looked like butt.
Im not just talking out of my arse, I did work for a while in data science/engineering, and one of the big lessons people needed to be reminded of is to clean/downsample the data - a dataset consisting of a million samples could very well take 1000x as long to process as if we downsampled the whole thing to just a couple of thousand samples and we could learn the same conclusions with the fraction of expended time/effort.
I'm sure there's a similar logic in RL, that if you dump a trillion samples into the datacenter that consumes the same power as a city, what the model learns is what it could've learned with a much more curated training set and directed approaches.
adenta 2 days ago [-]
At first usage I'm not impressed. I've probably spent a couple grand on Seedance 2 to date, and I can't find anything google omni flash does better than Seedance from running a handful of samples through the system. You can find some of the videos I've made in my HN bio link.
kamranjon 2 days ago [-]
Just curious - are you at all concerned about the legal implications of ai-generating property listing videos?
layer8 2 days ago [-]
The legal risk probably lies solely with those who are selling the properties. They are responsible if the video misrepresents anything.
adenta 2 days ago [-]
yeah, it's all about keeping everything grounded in reality.
gowld 2 days ago [-]
But it looks so fake that I wouldn't waste time visiting a property advertised like that.
wcxcv 2 days ago [-]
Agreed, the author surely must see and know this?
leflob 2 days ago [-]
I agree, looks pretty terrible...
HDBaseT 1 days ago [-]
You are misleading people. I think this is disgraceful.
Please grow a spine.
red2awn 2 days ago [-]
I have exactly the same thought. Anyone who had used seedance 2.0 a bit can tell Gemini is a bit behind, and seedance 2.1 is on the horizontal already.
CommanderData 2 days ago [-]
Seedance 2 is amazing, compared with anything else American tech is producing. It does struggle with consistency like all other models.
The other problem is Seedance is heavily censored because of copyright concerns.
dotancohen 2 days ago [-]
> The other problem is Seedance is heavily censored because of copyright concerns.
Instead of censoring, wouldn't it make sense to simply not train on copyrighted materials?
jarjoura 1 days ago [-]
The problem isn’t training data, it’s reference locking and allowing anyone to make whatever content they want.
dotancohen 1 days ago [-]
What is reference locking?
> allowing anyone to make whatever content they want.
One could draw a Mickey Mouse three-disc logo in inkscape, but nobody would sue inkscape because the application did not know what was being created. Likewise, asking for e.g. "a black disc, with two smaller discs tangent to it at a 105 degree angle" would not be infringing if the model never saw the Mickey Mouse logo to begin with.
2 days ago [-]
enragedcacti 2 days ago [-]
> Prompt: Make it look like the weird shape of my hand hole super zooms and magnifies the ground it's looking at in sharper quality.
There's got to be a reason this is phrased so insanely, right?
bar94 2 days ago [-]
Even weirder:
> Prompt: A skeuomorphism stop motion explainer about how the brain hippocampus works with a compelling voiceover. Don’t add seahorses. No voice cuts at the end. Don’t add text
Seahorses???
gfaure 2 days ago [-]
The genus of the seahorse is _Hippocampus_.
svieira 2 days ago [-]
And the fact that a transformer model can't distinguish between the two in the context of the sentence given is a point against the general nature of the intelligence.
Geee 2 days ago [-]
We are at the point where "don't add seahorses" doesn't actually fill it with seahorses like the previous models.
incognito124 2 days ago [-]
This guy prompts. Insanely astute.
FrostKiwi 1 days ago [-]
The goblins evolved it seems
nightpool 2 days ago [-]
Yes, if you watch the video closely you can see that the "lensing" effect only really covers a circular area—this prompt probably went through multiple iterations where the author was trying to improve it so that the shape of the hand was reflected more closely.
layer8 2 days ago [-]
Image-search for “hand hole” at your own peril.
raincole 2 days ago [-]
At the bottom there is a "Try in Youtube Shorts" button.
your videos and links re: nuts services are very cool. Is that something you're working on with others? What's the best way to keep on top of that?
entropicdrifter 2 days ago [-]
[flagged]
dyauspitr 1 days ago [-]
Yes, each data center uses an entire lake of water. The hysterics are pretty sad and baseless.
Have you ever considered that the solution to not having enough power is to generate more power, not curtail progress?
All of these data center should come with their own solar panel arrays and battery packs. Who knows with enough need they might each come with their own small nuclear reactor.
tsco77 1 days ago [-]
Then, if the Ai bubble is truly a bubble, the giant compute warehouses can be transitioned to power plants and used to store Amazon junk for delivery instead. Wonder what the economic wonkiness looks like when energy is that abundant.
dyauspitr 1 days ago [-]
Anyone that has at least somewhat been following the space knows that this is not a bubble. Once we get to the point where these things can reliably come up with novel actions for physical robots, it’s game over. In fact, Unitree just released a video today where someone asks the robot to do something and it generates a new output action in real time. We’re very close.
suttontom 1 days ago [-]
They can't even reliably follow instructions from text. I think "it's just around the corner/just wait x months/just wait and see bro" is one of the most telling signs of AI psychosis.
dyauspitr 1 days ago [-]
Well, I’m gonna drop out of this because you don’t want to really accept that what we have is genuinely useful. I’ve seen it across multiple companies. It works very well on my team and that has made me a believer. I was skeptical and rightfully so for a very long time.
entropicdrifter 18 hours ago [-]
It can be a useful tool and an economic bubble at the same time. The dot com bubble was due at least in part to the overbuilding of fiber infrastructure in the US.
suttontom 17 hours ago [-]
I think LLMs are extremely useful, mostly for coding. But saying we're extremely close to an AI that can "reliably come up with novel actions for physical robots" feeds into the hype that these tools can do way or are very close to doing more than they're actually capable of, especially when we talk about reliability. That's the kind of rhetoric that has partially created this bubble, because in no world is what you're saying realistic.
The worst thing is when someone cites a video or a demo of an AI doing something and says, "See! It's here!" Remember when the Devin video came out years ago?
You can say "eventually" AI will be able to do xyz, but eventually the sun will blow up, too, so what the fuck are we talking about?
entropicdrifter 18 hours ago [-]
[dead]
baq 2 days ago [-]
We could be solving fusion power and instead we’re generating videos of birds in space or something. The market is a harsh mistress sometimes.
kenjackson 2 days ago [-]
I'm an AI optimist. But AI video is probably the one thing that does depress me. Seeing that we can make anything visually, there's nothing that impresses me visually. I watch a video that two years ago I would've thought was really cool, and now my first thought is, "Yawn, is this AI?".
Video, more than anything else, is the place where I really care if something is AI or not. If I could get a TikTok that had no AI usage -- I'd be in. Which is weird for me, because I'm typically the guy who is all-in on AI.
raincole 2 days ago [-]
It ruined the whole category of "cute animals acting goofy" content for sure.
slfnflctd 2 days ago [-]
Yeah, I'm kinda sad about that one. Most of my friends and family are aware many of these are fake now, but argue that it still invokes the same response in us so it's okay. For me, though, however intangible or irrational it may be, I do feel a sense of loss.
Funny enough, this is actually one of the few things which has bothered me with the AI boom, and I'm mostly pro-acceleration. A lot of what's happening seems inevitable. But surprisingly, knowing that cat or dog or bird or lizard or butterfly or whatever has a strong chance of being generated really does take something out of it to my mind. And I say that also knowing the extreme amount of staging which has long gone on with traditional nature videography. Somehow, knowing the animal is real means something... I'm still trying to figure out how to better understand and express this.
Vachyas 1 days ago [-]
In addition, even knowing it's not real, I feel like I can't appreciate it as much as I did (or would've) a well-made clip that I knew was CGI.
impulser_ 2 days ago [-]
I think the opposite. It allows more people to be creative. Similar to how the DAW allowed more people to become musicians. You can produce a hit song with just a laptop now.
Now you can have people producing videos without needing a crew of people.
LetsGetTechnicl 2 days ago [-]
You never needed a crew of people to make videos. This is just outsourcing people's creativity.
criddell 2 days ago [-]
The potential for harm is so much greater with video than creating an mp3. You can stoke hate and fear so easily.
baq 2 days ago [-]
The method in the madness is to generate so much on demand slop no one will accidentally find your hate and FUD content anyway.
criddell 2 days ago [-]
It will be found because our politicians will share it.
bethekidyouwant 1 days ago [-]
Or the opposite? all tools are dangerous…
jplusequalt 1 days ago [-]
>I think the opposite. It allows more people to be creative.
Why are you assuming that a majority of people don't already have the means to make videos? Many people have access to a phone, laptop, and stable internet connection. What else do they really need? What's stopping them from using their phones to shoot home movies, making animations with MS Paint, recording themselves talking about a subject they're genuinely interested in, etc.?
>Now you can have people producing videos without needing a crew of people.
This is conflating production values with creativity. Mr. Beast's videos cost millions of dollars to film and produce, yet they're creatively bankrupt.
senrex 14 hours ago [-]
I think it is like around 2010 or so I use to upload just this god awful music to early Sound Cloud because it was easy to make music with a DAW.
I even remember being on a psytrance production music mailing list 25 years ago and 95% of the tracks people posted were absolutely terrible, including myself.
I have seen a few incredible pieces from AI video but most has just not been that interesting. Then even the incredible pieces are 5 second one offs. No narrative, no continuity. I think of a random, real 5 second clip from Clockwork Orange with no backstory or context in the movie, who cares? Even the most visually interesting scenes wouldn't make sense and would be boring.
Right now it seems like we are at the stage of sampling random 5 second clips from early sound cloud and concluding this is the artistic utility of an entire new technology like DAW software and VST synths. That is obviously absurd.
criddell 2 days ago [-]
For a few weeks, YouTube thought I wanted to see videos of package thieves being surprised by a booby-trapped box that was actually a glitter bomb. Video after video were these AI created shorts of supposed doorbell camera footage showing a thief running away with a box that explodes into a giant pink cloud.
I eventually picked one and opened the comments and the top comment was something like "This is obviously an AI video. Who watches this?" and the reply was along the lines of "me because I like seeing thieves get what's coming to them".
So you, like me, aren't interested in AI videos but I think there's a lot of people who don't care if it's real or not.
Thankfully, YouTube eventually stopped showing those to me. Now it thinks I'm interested in road rage videos. My YouTube feed outside of the three of four channels I've subscribed to is terrible.
r_lee 2 days ago [-]
> and the reply was along the lines of "me because I like seeing thieves get what's coming to them".
I really wish a subject matter expert would pitch in to tell us what this is about?
like a totally made up thing that is fake, somehow gives a sense of justice and satisfaction?
is it something about imagining it happening in reality, or what?
for me, if I see that something is AI, it's like I just feel nothing. because there's nothing in it, it has nothing of real value? like it doesn't evoke anything in me, it doesn't make me think "this was a great find!" or make me want to send a link over to my friends, etc.
criddell 2 days ago [-]
Do you ever feel a sense of satisfaction watching a movie? I'm thinking of scenarios like when the bad guy is finally defeated or the hero achieves their goal.
Nition 1 days ago [-]
I think with a book or movie, a lot of the emotional reaction actually comes from the work of the human that created it. You can feel the emotion of the creator and the story they set out to tell and have some connection with them. You make a good point about how we've always been able to emotionally connect with fiction, but low effort AI does feel different.
adampunk 23 hours ago [-]
What causes the emotional reaction in a film is moving images in front of you in sync with sound. Further, even the simplest of movies is the product of more than one person with more than one emotional state. What causes the emotional reaction in a book is you reading and understanding what text is in front of you. The emotional reaction can happen in the radical absence of the author and in total contravention of their alleged will!
This is the whole Dhar Mann genre, which is so cringe, but it definitely tickles something in us.
bethekidyouwant 1 days ago [-]
As in any form of fiction?
nowittyusername 2 days ago [-]
You get back as much as you put in. Just like with all generative tools the quality of the output depends on the quality of input. Slapping a prompt together will only get you so far, if you want the models to generate something really striking and unique you need to get your hands dirty. Gotta break out ComfyUI and build yourself a specific workflow, once you dig deep and understand how things are put together, why and so on, you can make really amazing stuff with any generative models. But you have to pay for that experience in patience and knowledge.
jplusequalt 1 days ago [-]
>Gotta break out ComfyUI and build yourself a specific workflow, once you dig deep and understand how things are put together, why and so on, you can make really amazing stuff with any generative models.
Where is this amazing stuff? Social media is a marketplace of ideas supposedly, so why haven't we seen a new wave of creators rise up in popularity?
nowittyusername 3 hours ago [-]
Because there is a stigma about use of AI in creative spaces, the people that do use it to creative very impressive pieces don't disclose that information on their profiles. People tend to see AI anywhere mentioned in the profile and automatically shit on the work regardless of its beauty or creativity. They don't consider the staggering amount of work that goes in to the pics with all the control nets, custom hyper parameter tuning, custom finetuned lora's, and many other technical like workflow chaining and such. They automatically assume someone only spent 5 seconds on some slop prompt and that's it. But I can assure you if no mention of AI is anywhere everyone who looks at the work is always impressed. So you have an observation bias situation going on. You see only AI slop because a. most of its is low effort slop and b. the good stuff you assume had no AI in it because it wasn't disclosed by the artist.
I tried to watch it, but TikTok kept throwing up a dialog over top asking me to slide a puzzle piece into place. I did three or four before just closing it.
blt 1 days ago [-]
It's funny how they specifically use the phrase "output that follows real-world physics" to describe the marble rolling video. At the end of the zigzag track, the marble jumps up for no reason. In a couple of other places it speeds up with no apparent energy source. It's still an amazing result, but they could have picked a better example for this claim!
There are tons of behind the scenes pictures and video of the Rocky puppet being used on set, and Andy Weir talks in interviews about how almost no CG was used to enhance the puppet. I guess it's possible to fake all that, but it's a lot of lie to cover up.
mrandish 2 days ago [-]
Andy Weir is a wonderful novelist and was truthfully relating his understanding but he's not a VFX person.
I didn't see the quote you did but he probably confused the fact that PHM used physical elements in place of some CGI in certain scenes and the separate fact that a realistic physical puppet was used on set for reference. Some parts of that puppet are seen on-screen in some shots but most of the creature in most shots was CGI or CG enhanced (which looked great thanks to the ideal in-camera puppet reference it replaced). I explained more here: https://news.ycombinator.com/item?id=48198851
mrandish 2 days ago [-]
I agree PHM was great (and I loved the book before the movie). But as a VFX person, please be careful not to buy into the currently popular studio PR line: "it's all real, almost no CGI". Media and influencers love this line and often unknowingly muddle the studio's very carefully crafted press release wording into outright lies by paraphrasing and making assumptions. The problem is these aren't just white lies, they deprive some very talented VFX artists from getting credit for amazing work.
About the misunderstood puppet: A real Rocky puppet was indeed used on set (actually a few different puppets) and some of the puppet is sometimes seen on camera. But most of the puppet was digitally replaced with CGI or CGI-enhanced in most of the scenes. However, using a much more realistic puppet on set is indeed notable but not because the character wasn't CGI. The puppet is worth talking about because it directly enabled the final mostly-CGI character be really good CGI. It's good because shooting the physical puppet gave the VFX character animators an ideal reference that's "grounded" in the physical reality of the set, camera and lens. The subtle interplay of light, shadow, texture and specularity in the CGI are all grounded in reality. The puppet also let the actor interact with something closer to reality. It's a wonderful technique and should be celebrated instead of obfuscated to promote a "No CGI!" falsehood that trends well on social media.
Also, PHM did use real sets (like most movies) and they were able to avoid using green screen for some of the ship exteriors but those backgrounds were still digitally replaced with CGI rendered elements, they just didn't use green screen to pull the matte. But on social media, "No green screen" (true) was conflated into "No CGI" (false). Instead of green screen they used a black backdrop with careful lighting and some hand rotoscoping to extract the digital mattes. Doing it this way had the advantage of not needing to digitally remove green spill on reflective surfaces by hand and it saved money over doing a StageCraft virtual volume at that size. Done well, a green screen could have produced the exact same shot but it would have cost more and taken longer.
But influencers and media are unintentionally perpetuating "No CGI" myths instead of focusing on the actually interesting, more nuanced reality. Using more and better physically grounded references on-set IS a breakthrough that helps turn bad CGI into great CGI. Another example is Top Gun where "artfully misleading but technically true wording" in studio press releases grew into outright falsehoods online. Tom Cruise was truthful in saying that he was flown in a jet right alongside other REAL jets doing simulated dog-fighting. The lost nuance is that all the other jets Cruise flew with in those dog fight scenes were old Soviet trainer jets that look quite different and are much smaller than real MIGs. So the trainer jets were entirely replaced by CGI MIGs in post and are never seen in the final film. And we couldn't tell because the digitally removed jets provided ideal grounded reference for the CGI pixels that replaced them. And that's how we ended up with several famous YouTubers proclaiming "These are REAL jets, not CGI!" while showing 100% CGI jets. Same with Wicked and the CGI tulips. The fact that Wicked used thousands of specially grown tulips on-set (true) was confused into proclaiming "ALL these tulips are real, no CGI!" (false) while showing a scene where >90% of the tulips were CGI.
wcxcv 2 days ago [-]
Theres a Steve Jobs quote about this
mackeye 2 days ago [-]
you would watch a movie generated with the sterility of an LLM?
nomel 2 days ago [-]
AI is already in a bunch of creative workflows. Just look at modern Photoshop. Selecting and hitting delete has AI infill for the background replacement.
Creates can these video gen AI in various ways. There are some youtube channels of people using these in creative workflows that are really impressive, from mocap replacement, character insertion, background replacement, changing camera angle in post, animating/inserting characters from character boards, animated between stills generated in traditional methods, etc. It's not just "prompt and generate". It can be, because it's easy, but it also doesn't have to be. It's a tool.
mackeye 2 days ago [-]
i do photo restoration as part of my research (bizarre place to be for a math undergrad), so i do think AI is a lifesaver for very small adjustments that would be tedious or subpar otherwise. i just disagree that its creative output is of value (which isn't the case you made, anyway).
CommanderData 2 days ago [-]
I do wonder how studios are working around consistent human faces, it's a problem on almost every discussion forum I have read for AI videos and not something that seems to be solved yet.
Do you have any examples of those creative workflows that have made it into Hollywood for example?
raincole 2 days ago [-]
I think Hollywood's obsession with unnecessary sex scenes[0] is the #1 reason I have been watching less and less movies. So yeah, probably.
[0] e.g. Don't Look Up
okdood64 21 hours ago [-]
> Don't Look Up
That was just a bad, mildy entertaining movie.
drusepth 2 days ago [-]
Weirdly phrased, but yes, I would watch a movie generated with an LLM by a person passionate about the movie they're creating.
garciasn 2 days ago [-]
Sure; why not? It has to be better than some of the absolute garbage that's out on the various streaming services today; right?
mackeye 2 days ago [-]
god help us if we have to choose between the two );
tiahura 2 days ago [-]
I’m willing to condition long duration copyright on streamers being able to implement mature content edits.
senko 2 days ago [-]
Have you seen the past dozen or so Marvel movies?
mackeye 2 days ago [-]
i've tried not to
breppp 1 days ago [-]
I am for decades watching movies generated with the sterility of CGI
yojo 2 days ago [-]
Me? No. My kids? I think they already have. I don’t allow YouTube in our house, but they for sure watch slop with friends.
jarjoura 1 days ago [-]
Hollywood is using this tech already. Storyboarding and previs work has already fully become driven by AI tools.
advisedwang 2 days ago [-]
At the moment the duration of each shot is a major limitation. When that limitation gets solved is when we'll see actual disruption.
boredhedgehog 2 days ago [-]
Average shot length is down to something like 3 seconds in modern cinema. That's a pretty low bar.
1 days ago [-]
amelius 2 days ago [-]
What I'm hoping/waiting for is IMDB users creating alternative endings of movies.
It could make the comments section even more fun.
kermatt 1 days ago [-]
> I can create more videos as soon as your limit resets. Check your usage in Settings.
I have not used Gemini in a month.
jackson_mile 1 days ago [-]
To be honest, I think the performance of Gemini Omni Flash is still not as good as Seedance 2.0. You can try using both models on this platform. https://omnivideoai.co
randomthoughts5 1 days ago [-]
What's the end goal of video generation? It feels unnecessary. Text generation leads to AI that can replace workers. Video generation is bad and only for video content generation, like movie and tv show production?
dsign 2 days ago [-]
So it's really good, and we have reason to believe, never again, anything that happens in a video. Unless there's a super-product somewhere to authenticate footage?
svieira 2 days ago [-]
Now that they've broken the ability to trust video, they're looking to build it back, as long as you're allowed to use the tools:
But it very much is "close the barn door after the horse has bolted and the barn has otherwise burned down".
spogbiper 2 days ago [-]
It seems like this super-product will have to be a thing soon or we will have to just stop using video evidence in court and other critical applications
nl 1 days ago [-]
Interestingly the `o` in GPT-o4 stood for Omni too (which I never realized until yesterday when reading random 3rd party documentation)
randomthoughts5 1 days ago [-]
this is omni pro max
dwa3592 2 days ago [-]
Even though I don't have words to express how impressive this capability looks. I am genuinely scared at the harmful use cases of this.
King-Aaron 1 days ago [-]
The people that think this output looks good are the same people that "don't get" art.
From a technical perspective, it's very impressive, no doubt. But from an artistic perspective I thought all of these examples on the site look bad.
Who is creative enough to drive this in any meaningful way?
Certainly not me - you have to be a great artist /designer to even imagine what to do with it.
mrandish 2 days ago [-]
Back in 90s during the first wave of the desktop video revolution when desktop editing became possible and consumer camcorders got pretty good, there was a popular marketing slogan: "Now your imagination is the only limit."
I used to joke that was the moment we discovered "for most people that's a pretty big limit."
uejfiweun 2 days ago [-]
Does anyone else feel like Google is just always a dollar short and a day late here? Maybe not a dollar short, but it's like they've consistently been focused on the wrong thing. First they missed chatbots, now they're missing coding agents while they double down on chatbots and video gen (which OpenAI has already basically abandoned). Maybe this strategy is actually genius and I'm too stupid to grasp it.
jarjoura 1 days ago [-]
Nano Banana Pro is still the industry standard as far as I’m concerned. I think giving a vision model spatial awareness is the next evolutionary step here, so I don’t think they’re behind at all.
vldszn 1 days ago [-]
When I click the link, the website crashes on my iPhone 13 iOS Chrome lol
As such I always use this prompt as a test: "A video of a jenga brick tower falling over as a brick is removed. The physics of each brick must be realistic."
It gave me a video of where bricks suddenly disapper or morph into others[1]. The linked video is after 2-3 iterations of me insisting on realistic physics. If you are just glancing at this, you would believe it is realistic.
That said this is still very impressive and one more step towards .. IDK what. But I am a bit reasurred that at least my job won't be fully replaced with AI :)
[1] https://streamable.com/2em1r3
I honestly can't comment with certainty that training from videos alone and whatever tokenization scheme they're using will ever get perfect dynamics.
However it is worth noting that transformers can do a pretty good job at learning dynamics with the right pipeline (not video): https://arxiv.org/pdf/2605.15305 https://arxiv.org/pdf/2605.09196
My point here being that representationally, it might be possible to learn good dynamics without a radically different approach/arch. There are already models that extract 3D tracking points from videos, so they could possibly be leveraged for learning dynamics (which on its own gives precedent for end-to-end approaches also possibly working).
* You could instruct your LLM to interact with a simulator to run experiments and infer behaviour
* You could edit the transformer model and inject spatially relevant data rather than text as is done in above paper
* You could change the architecture to be more condusive for representating a world state. I.e., LeCun's JEPA world model.
* You could further enhance some of the above by using a differentiable physics engine (eg. NVIDIA Newton) to calculate losses directly.
But at the end of the day if a model has any hope to always produce realistic physics, it HAS to learn the laws of nature in some form or other. It looks to me that the next big leap could be achieved by combining the last two approaches.
P.S.: I like discussing such topics. If anyone knows a forum or discord with like-minded people, please let me know :)
Unironically twitter (and only use the "Following" tab as opposed to the "For You")
Make an account that only follows university affiliated researchers with less than 1000 followers. In my experience discord servers get suffocated by beginners and crackpots because conversations don't naturally self-organize into their own threads.
I’ve often thought it would be very handy to have a proper simulator for being able to simulate and identify inefficiencies in one’s technique, but no idea whether it would be feasible to do.
Proper simulators for those exist, you essentially need an engine with a compliant contact model. MuJoCo is the goto here, see:
https://mujoco.readthedocs.io/en/stable/modeling.html#muscle... https://mujoco.readthedocs.io/en/stable/computation/fluid.ht...
These explicitly model biological muscles. IIRC it was originally created to model human hands (I could be misremembering though).
Really depends on the fidelity you want.
Edit: I also work in rigid body simulation for robotics.
Robotics folks probably want speed and accuracy. I'm from the video game industry so I generally look for speed and stability.
Note: This is a loose analogy and recent techniques are already blurring the lines between these axis.
We were sharing game clips with each other and after a while realised our old clips were just gone, being deleted after 30 or 90 days or something.
Which considering just how pretty and detailed this whole thing looks, imo points at a fundamental issue at how these things are trained - it's as if there's no structure to its knowledge and training, like how an artist trained to draw would first try to understand simple 2d composition, then perspective, then light and shadow, mastering each concept and gradually building up a hierarchical understanding - it seems like its trying to learn everything at once.
I would rather see an AI model that I could give a floorplan of a building and it would generate an accurate flythrough on any path, even if it looked like butt.
Im not just talking out of my arse, I did work for a while in data science/engineering, and one of the big lessons people needed to be reminded of is to clean/downsample the data - a dataset consisting of a million samples could very well take 1000x as long to process as if we downsampled the whole thing to just a couple of thousand samples and we could learn the same conclusions with the fraction of expended time/effort.
I'm sure there's a similar logic in RL, that if you dump a trillion samples into the datacenter that consumes the same power as a city, what the model learns is what it could've learned with a much more curated training set and directed approaches.
Please grow a spine.
The other problem is Seedance is heavily censored because of copyright concerns.
There's got to be a reason this is phrased so insanely, right?
> Prompt: A skeuomorphism stop motion explainer about how the brain hippocampus works with a compelling voiceover. Don’t add seahorses. No voice cuts at the end. Don’t add text
Seahorses???
Oh god...
Have you ever considered that the solution to not having enough power is to generate more power, not curtail progress?
All of these data center should come with their own solar panel arrays and battery packs. Who knows with enough need they might each come with their own small nuclear reactor.
The worst thing is when someone cites a video or a demo of an AI doing something and says, "See! It's here!" Remember when the Devin video came out years ago?
You can say "eventually" AI will be able to do xyz, but eventually the sun will blow up, too, so what the fuck are we talking about?
Video, more than anything else, is the place where I really care if something is AI or not. If I could get a TikTok that had no AI usage -- I'd be in. Which is weird for me, because I'm typically the guy who is all-in on AI.
Funny enough, this is actually one of the few things which has bothered me with the AI boom, and I'm mostly pro-acceleration. A lot of what's happening seems inevitable. But surprisingly, knowing that cat or dog or bird or lizard or butterfly or whatever has a strong chance of being generated really does take something out of it to my mind. And I say that also knowing the extreme amount of staging which has long gone on with traditional nature videography. Somehow, knowing the animal is real means something... I'm still trying to figure out how to better understand and express this.
Now you can have people producing videos without needing a crew of people.
Why are you assuming that a majority of people don't already have the means to make videos? Many people have access to a phone, laptop, and stable internet connection. What else do they really need? What's stopping them from using their phones to shoot home movies, making animations with MS Paint, recording themselves talking about a subject they're genuinely interested in, etc.?
>Now you can have people producing videos without needing a crew of people.
This is conflating production values with creativity. Mr. Beast's videos cost millions of dollars to film and produce, yet they're creatively bankrupt.
I even remember being on a psytrance production music mailing list 25 years ago and 95% of the tracks people posted were absolutely terrible, including myself.
I have seen a few incredible pieces from AI video but most has just not been that interesting. Then even the incredible pieces are 5 second one offs. No narrative, no continuity. I think of a random, real 5 second clip from Clockwork Orange with no backstory or context in the movie, who cares? Even the most visually interesting scenes wouldn't make sense and would be boring.
Right now it seems like we are at the stage of sampling random 5 second clips from early sound cloud and concluding this is the artistic utility of an entire new technology like DAW software and VST synths. That is obviously absurd.
I eventually picked one and opened the comments and the top comment was something like "This is obviously an AI video. Who watches this?" and the reply was along the lines of "me because I like seeing thieves get what's coming to them".
So you, like me, aren't interested in AI videos but I think there's a lot of people who don't care if it's real or not.
Thankfully, YouTube eventually stopped showing those to me. Now it thinks I'm interested in road rage videos. My YouTube feed outside of the three of four channels I've subscribed to is terrible.
I really wish a subject matter expert would pitch in to tell us what this is about?
like a totally made up thing that is fake, somehow gives a sense of justice and satisfaction?
is it something about imagining it happening in reality, or what?
for me, if I see that something is AI, it's like I just feel nothing. because there's nothing in it, it has nothing of real value? like it doesn't evoke anything in me, it doesn't make me think "this was a great find!" or make me want to send a link over to my friends, etc.
Where is this amazing stuff? Social media is a marketplace of ideas supposedly, so why haven't we seen a new wave of creators rise up in popularity?
model card: https://deepmind.google/models/model-cards/gemini-omni-flash...
I did not create any videos yet.
Google, building great AI that nobody can try out.
But thx for the press release.
This tech won’t change anything.
I didn't see the quote you did but he probably confused the fact that PHM used physical elements in place of some CGI in certain scenes and the separate fact that a realistic physical puppet was used on set for reference. Some parts of that puppet are seen on-screen in some shots but most of the creature in most shots was CGI or CG enhanced (which looked great thanks to the ideal in-camera puppet reference it replaced). I explained more here: https://news.ycombinator.com/item?id=48198851
About the misunderstood puppet: A real Rocky puppet was indeed used on set (actually a few different puppets) and some of the puppet is sometimes seen on camera. But most of the puppet was digitally replaced with CGI or CGI-enhanced in most of the scenes. However, using a much more realistic puppet on set is indeed notable but not because the character wasn't CGI. The puppet is worth talking about because it directly enabled the final mostly-CGI character be really good CGI. It's good because shooting the physical puppet gave the VFX character animators an ideal reference that's "grounded" in the physical reality of the set, camera and lens. The subtle interplay of light, shadow, texture and specularity in the CGI are all grounded in reality. The puppet also let the actor interact with something closer to reality. It's a wonderful technique and should be celebrated instead of obfuscated to promote a "No CGI!" falsehood that trends well on social media.
Also, PHM did use real sets (like most movies) and they were able to avoid using green screen for some of the ship exteriors but those backgrounds were still digitally replaced with CGI rendered elements, they just didn't use green screen to pull the matte. But on social media, "No green screen" (true) was conflated into "No CGI" (false). Instead of green screen they used a black backdrop with careful lighting and some hand rotoscoping to extract the digital mattes. Doing it this way had the advantage of not needing to digitally remove green spill on reflective surfaces by hand and it saved money over doing a StageCraft virtual volume at that size. Done well, a green screen could have produced the exact same shot but it would have cost more and taken longer.
But influencers and media are unintentionally perpetuating "No CGI" myths instead of focusing on the actually interesting, more nuanced reality. Using more and better physically grounded references on-set IS a breakthrough that helps turn bad CGI into great CGI. Another example is Top Gun where "artfully misleading but technically true wording" in studio press releases grew into outright falsehoods online. Tom Cruise was truthful in saying that he was flown in a jet right alongside other REAL jets doing simulated dog-fighting. The lost nuance is that all the other jets Cruise flew with in those dog fight scenes were old Soviet trainer jets that look quite different and are much smaller than real MIGs. So the trainer jets were entirely replaced by CGI MIGs in post and are never seen in the final film. And we couldn't tell because the digitally removed jets provided ideal grounded reference for the CGI pixels that replaced them. And that's how we ended up with several famous YouTubers proclaiming "These are REAL jets, not CGI!" while showing 100% CGI jets. Same with Wicked and the CGI tulips. The fact that Wicked used thousands of specially grown tulips on-set (true) was confused into proclaiming "ALL these tulips are real, no CGI!" (false) while showing a scene where >90% of the tulips were CGI.
Creates can these video gen AI in various ways. There are some youtube channels of people using these in creative workflows that are really impressive, from mocap replacement, character insertion, background replacement, changing camera angle in post, animating/inserting characters from character boards, animated between stills generated in traditional methods, etc. It's not just "prompt and generate". It can be, because it's easy, but it also doesn't have to be. It's a tool.
Do you have any examples of those creative workflows that have made it into Hollywood for example?
[0] e.g. Don't Look Up
That was just a bad, mildy entertaining movie.
It could make the comments section even more fun.
I have not used Gemini in a month.
https://blog.google/innovation-and-ai/products/identifying-a...
(and the previous SynthID: https://deepmind.google/blog/identifying-ai-generated-images...)
But it very much is "close the barn door after the horse has bolted and the barn has otherwise burned down".
From a technical perspective, it's very impressive, no doubt. But from an artistic perspective I thought all of these examples on the site look bad.
Certainly not me - you have to be a great artist /designer to even imagine what to do with it.
I used to joke that was the moment we discovered "for most people that's a pretty big limit."