Google introduces a novel AI instrument, relying on visual cues rather than textual input.
Google introduces a novel AI instrument, relying on visual cues rather than textual input.
Individuals can input visual representations of subjects, environments, and aesthetics prior to Whisk merging them into a single image.
As per a blog post from Google, Whisk serves as a "creative tool" for prompt inspiration, differentiating it from traditional image editors. In essence, Whisk is more about entertainment than producing polished professional work.
Leading tech companies like Google and OpenAI are eager to unveil consumer products showcasing their cutting-edge AI technology, despite cautions that the unregulated development of AI could pose risks to humanity.
Ever since OpenAI debuted its text-to-image creation tool, Dall-E, in 2021, AI-generated art has skyrocketed in popularity on social media and become a central focus of consumer products. Google's Whisk fits into this category, being an image-to-image generator based on the concept of text-to-image generators.
Users employing Whisk can "rework" the final image by modifying their inputs and blending categories, resulting in diverse images such as plush toys, enamel pins, or stickers. Additional text can guide specific details, but it's not necessary for image creation.
As Thomas Iljic, a product management director at Google Labs, mentioned in a statement, Whisk allows users to "remix" a subject, setting, and style in fresh and innovative ways, offering swift visual exploration rather than flawless edits.
Google's Whisk is built around the generative AI developed by DeepMind, the AI lab that Google acquired in 2014.
Whisk operates by incorporating Gemini, Google's primary AI offering, which was introduced in December 2023, with Imagen 3, the most recent text-to-image generator released by DeepMind in December.
When users upload their images, Gemini generates a caption, which is then fed into Imagen 3. This captures the "essence" of the subject without producing an exact replica, enabling remixing of the final image but potentially deviating from the prompt.
For instance, the generated image may exhibit different heights, hairstyles, or skin tones than the prompt images, Google explained in a blog post.
When Google initially launched Gemini's text-to-image creator in February, it faced initial criticism due to historically inaccurate images produced by the tool.
Initially, Whisk is only accessible as a website on Google Labs for U.S.-based users, with further development still in progress.
OpenAI also recently introduced a text-to-video generator called Sora, highlighting the competitive landscape for consumer products.
According to Dan Ives, managing director and senior equity analyst at Wedbush Securities, Whisk represents another "flex the muscles moment" for Google in the AI and tech race.
"DeepMind is a crucial asset for Google," Ives emphasized, noting that AI products are part of Google's "treasure chest" of new products for 2025, which also encompasses a new Android operating system in collaboration with Samsung and Qualcomm.
The business strategy of Google involves introducing consumer products that showcase their advanced AI technology, such as Whisk, which is a creative tool that sets itself apart from traditional image editors. After using Whisk, users can rework the final image by modifying their inputs and blending categories, catering to various businesses that produce plush toys, enamel pins, or stickers.