Denis Shiryaev uses algorithms to colorize and sharpen old movies, bumping them up to a smooth 60 frames per second. The result is a stunning glimpse at the past.
ON APRIL 14, 1906, the Miles brothers left their studio on San Francisco’s Market Street, boarded a cable car, and began filming what would become an iconic short movie. Called A Trip Down Market Street, it’s a fascinating documentation of life at the time: As the cable car rolls slowly along, the brothers aim their camera straight ahead, capturing women in outrageous frilly Victorian hats as they hurry across the tracks. A policeman strolls by wielding a billy club. Newsboys peddle their wares. Early automobiles swerve in front of the cable car, some of them convertibles, so we can see their drivers bouncing inside. After nearly a dozen minutes, the filmmakers arrive at the turntable in front of the Ferry Building, whose towering clock stopped at 5:12 am just four days later when a massive earthquake and consequent fire virtually obliterated San Francisco.
Well over a century later, an artificial intelligence geek named Denis Shiryaev has transformed A Trip Down Market Street into something even more magical. Using a variety of publicly available algorithms, Shiryaev colorized and sharpened the film to 4K resolution (that’s 3,840 horizontal pixels by 2,160 vertical pixels) and bumped the choppy frame rate up to 60 frames per second, a process known as frame interpolation. The resulting movie is mesmerizing. We can finally see vibrant colors on those flamboyant Victorian hats. We can see the puckish looks on those newsboys’ faces. And perhaps most importantly, we can see in unprecedented detail the … byproducts that horses had left on the ground along the cable car’s tracks.
And Shiryaev—product director of the company Neural.love, which provides AI-driven video enhancements for clients—hasn’t stopped at 1906 San Francisco. He’s waved his magic AI wand over another historical film, the Lumière brothers’ 1895 French short of a train pulling into a station and spilling its passengers onto the platform. You can also take a trip through New York City in 1911, or join the astronauts of Apollo 16 as they drive their lunar rover around the moon in 1972. All of the films are gussied up with remarkable clarity, giving us modern folk an enchanting view into life long ago.
To be clear, you can’t call these restorations of films, because the algorithms aren’t just getting rid of imperfections—they’re actually filling in approximations of the data missing from old, blurry, low-frame-rate films. Basically, the algorithms are making stuff up based on their previous training. For example, the algorithm DeOldify, which handles coloration, was trained on over 14 million images to build an understanding of how objects in the world are usually colored. It can then apply that knowledge to old black-and-white films, painting the old footage with vibrant hues. “This is an important thing,” says Shiryaev. “We call it an enhancement, because we are training neural networks. And when neural networks redraw in pictures, it’s adding a new layer of data.”
“So colorization is enhancement,” he adds. “Upscaling is an enhancement. Frame interpolation is an enhancement.” Shiryaev also removes visual noise—those momentary blips and black lines that flash across the screen—and maybe that could be considered a restoration. But film archivists would scoff at the idea of the rest of Shiryaev’s AI wizardry being a kind of restoration, because it layers in so much extra data, and much of that data is machine-learning guesswork, which is not necessarily historically perfect. “We don’t want to argue with people from archives,” Shiryaev says. “We really value their job.”
Let’s take these enhancements one by one. That DeOldify colorization algorithm learns how to recognize certain objects—trees, grass, people wearing different clothes—from a neural network training process. It learns which colors typically correspond to which objects, so when it recognizes them in a historical black-and-white film, it can guess which colors they might have been. The algorithm isn’t perfect by any means: It can only discern the color of objects it’s already seen many times in its training. “We sometimes have terrible problems with flags,” says Shiryaev, “because it wasn’t trained to do it.”
Next is upscaling. This algorithm learns from a neural network trained on pairs of images, one high-quality and the other low-quality. “The neural network is trying to make this low-quality version of the image look exactly like the bigger version of this image,” says Shiryaev. After it has learned the patterns that allow it to transform a specific part of a low-res image into a clearer version, when the algorithm looks at a low-resolution historical film, it can upscale it to a higher resolution by analyzing the pixels. For example, Shiryaev adds, “You have a bright pixel here, a bright pixel here, and a dark pixel in the middle. That means you know how to redraw it four times bigger.”
The frame interpolation algorithm is trained on a database of videos and learns the relationship between a given frame and the one that follows. From these, it gleans patterns from how objects like people and cars generally change position from one frame to the next. “We want to feed the model to see as many samples as possible, so that when you’ve see something like this before, then you can use similar information,” says Ming-Hsuan Yang, a computer scientist at the University of California at Merced. Yang developed the algorithm that Shiryaev used, called Depth-Aware Video Frame Interpolation, or DAIN. “You can think of that as a photographic memory,” he says.
So when you show the DAIN algorithm a historical video like A Trip Down Market Street, which the Prelinger Archives in San Francisco scanned at 16 frames per second, the system looks at a frame and then guesses what the next frame should be. It will actually generate new frames to go between the original ones, estimating where objects would have been positioned in those interstitial frames. The algorithm does this repeatedly until it achieves 60 frames per second, so now when you play it all back, it looks like everyone’s moving around much more smoothly.
Yang’s algorithm can also take a modern movie and boost it from 30 frames per second to a bonkers 480. He can then slow it down by a factor of 16 and it will still run nicely, whereas if he’d done the same thing with a film shot at 30 frames per second it’d be extremely choppy. There would just be too few frames to play with.
More uncannily still, Shiryaev’s system can render distinct faces out of amorphous face-like blobs. The quality of the original video below, taken sometime in the 1910s in Tokyo, was particularly bad—faces were sometimes too blurry to make out. Here a neural network trained on a database of faces takes a guess at how a blurry face should be rendered, based on its knowledge of how the pixels on a face are generally arranged. “But we cannot say that this is accurate and that the face looks exactly like it was 100 years ago,” says Shiryaev.
This is where we get into tricky territory. Like colorization and upscaling, an algorithm filling in the details of a person’s face is adding data to the film, and in a sense fictionalizing the past. Such changes to the historical record miff some archivists, says Rick Prelinger, founder of the Prelinger Archives, which gathers amateur videos and other “ephemeral” films like old-timey ads and educational movies. Purists would rather preserve old films as artifacts, no matter the quality, and not subject them to the whims of AI.
Prelinger himself doesn’t have a problem with it. “If somebody does a mashup of Munch’s The Scream or if Duchamp paints a mustache on the Mona Lisa, it’s embellishment, it’s annotation, it’s a remix,” he says. “And we should be free to remix.”
That said, when it comes to Shirayaev’s updates on the Mills and Lumiere brothers’ historical films, “I don’t know that they’re honestly adding that much,” Prelinger adds. “But it’s fiction, moving archival footage into the uncanny valley, where we no longer kind of have a sense of what was real and what wasn’t real.” This, he says, might give viewers the false impression that all historical films ought to look so sharp and vibrant, when in reality filmmakers in the early 20th century were working with rudimentary equipment. And, of course, time is generally not kind to the film itself, which degrades over the course of 100 years. That is also history—the quality of the film speaks to the way it was shot—a history that purists argue should be preserved, not “enhanced” with new data.
But let’s go ahead and dive deeper down this philosophical rabbit hole. Shiryaev’s enhancements fictionalize certain aspects of a scene that are unknowable a century after the film was shot. For example, we can’t be certain that the AI got the color perfectly right for those Victorian hats. The result is beautiful, but necessarily imperfect. Yet isn’t the herky-jerky black-and-white A Trip Down Market Street itself imperfect? The world back then existed in color, and people and cars moved through it smoothly. So, in the end, which film is a truer representation of that scene: the original or Shiryaev’s version?
And in a sense, these AI enhancements are continuing a tradition from the days of silent film. Back then, each viewer’s experience would have been unique. Different movie house operators hired their own accompanists to play along to the film. These musicians often just improvised, until the movie industry began to standardize accompanists’ scores, beginning in 1908. The music added drama and helped drown out the clamor of the projector—and wouldn’t that have enhanced the movie? And wasn’t each improvising accompanist putting their own spin on the theatergoer’s experience? Shiryaev is doing much the same, remixing old films into his vision (or really, the AI’s vision) of what life really looked like in the early 20th century. “People usually say something like, ‘It’s the closest that you can have to a time-travel experience,’” Shiryaev says.
And speaking of. Filmmakers play with the look of a scene all the time to transport us to a different era. “Remember in Spinal Tap,” Prelinger asks, “when they go back and talk about how it used to be like a Liverpool skiffle group, and then a psychedelic group, and they make the videos look just the way they would have looked then? It’s part of the fun.”
There’s no reason, Prelinger adds, that historical films in their pure form can’t coexist with enhanced counterparts. “I think there’s something great about reproducing the image exactly as it exists,” he says. “But I do not object to somebody making entertainment out of it. It sensitizes people to the fact that this stuff is there.”
…
This article first appeared in www.wired.co
Seeking to build and grow your brand using the force of consumer insight, strategic foresight, creative disruption and technology prowess? Talk to us at +971 50 6254340 or mail: engage@groupisd.com or visit www.groupisd.com/story