Visual and voice search now influence a huge share of discovery and product research, with over half of online searches conducted via voice assistants and Gen Z showing a strong preference for image-based search. For e-commerce managers and local SEO experts in New Zealand and the UK, treating visual search optimisation, voice search SEO, image SEO, and schema markup for voice as “the hidden 40 per cent” of your SEO strategy is how you win visibility in a multisensory search landscape.

Voice search stats and adoption: why this matters now

Voice technology has shifted from novelty to infrastructure. Recent reports estimate that more than 50 per cent of global online searches are now conducted via voice assistants, supported by over 8.4 billion voice-enabled devices in use worldwide. One in three households owns at least one smart speaker, and around 71 per cent of consumers say they prefer using voice search over typing when they can.​

Voice Search Statistic

Smartphones account for about 58 per cent of all voice searches, with smart speakers handling roughly a quarter and the rest happening via TVs, cars and wearables. This means voice search is no longer limited to “Hey Google, what’s the weather in Auckland”. It influences how users find local businesses, compare products and navigate ecommerce journeys across devices. For NZ and UK brands, this is critical: if your store, menu or product answers are not voice ready, assistants will simply highlight competitors who are.​

Siteimprove and SEMrush both stress that voice search SEO requires a shift to conversational, long tail and question-led content that matches how people actually speak. This is where Digital Hothouse’s SEO and Local SEO services can help you pivot from traditional keyword lists to voice-led intent patterns that match real usage.

Visual search opportunities across platforms

Visual search technology has accelerated just as quickly. Platforms such as Google Lens, Pinterest Lens, Snapchat Scan and Amazon’s StyleSnap allow users to search using photos instead of text, dramatically changing how people discover products and brands. Pinterest data indicates that its Lens tool handles more than 600 million visual searches each month, with more recent industry reports suggesting 2.5 billion-plus monthly visual queries across platforms. For Gen Z, more than 60 per cent say they prefer visual search to text when shopping, especially in fashion, home décor and lifestyle categories.

Pinteerst Lens Usage Statistic

SEMrush and Moz highlight visual search as a major growth channel for e-commerce, noting that visual searchers often have higher purchase intent because they are literally searching with a product or style in mind. Retail and lifestyle brands are seeing 25 to 40 per cent higher conversions from visual search traffic when images are optimised correctly, with multiple angles, high resolution, real-life context and structured data. For NZ and UK retailers, this is a powerful way to compete with global marketplaces by making your catalogue discoverable via photos of similar products already in users’ hands.

Google’s own image search guidelines emphasise that high-quality, original images with descriptive context, alt text and appropriate structured data are more likely to surface in Google Images and visual search results. This creates a clear roadmap for e-commerce and local businesses wanting to unlock visual search visibility.

Optimising for voice: conversational and question-led queries

Voice search SEO starts with acknowledging that people do not talk like they type. Voice queries tend to be longer, more conversational and more likely to take the form of direct questions like “Where’s the best Thai restaurant near me” or “What time does the store on Queen Street close tonight”. Neil Patel and SEMrush both stress that successful voice optimisation means shifting from short, head terms to natural language phrases and questions that represent real spoken intent.

Practical steps include:

  • Creating FAQ sections that directly mirror common voice queries and answering them in concise, 30 to 50-word passages that assistants can easily read aloud.​
  • Using schema markup for voice-friendly content such as FAQPage, HowTo and LocalBusiness, which helps assistants understand your opening hours, services and location details accurately.​
  • Focusing on local modifiers and context for NZ and UK audiences, including suburb names, landmarks and colloquial phrases that real customers use.

Voice search also rewards content with a strong E-E-A-T structure. Assistants are more likely to draw answers from sources they “trust” based on topical authority, consistency and user behaviour. Digital Hothouse’s SEO team often combines content rewrites, structured data implementation and Local SEO clean up so Google, Siri and Alexa can confidently surface your business in spoken answers across both markets.

Image optimisation for visual search discovery

Image SEO is no longer just about alt text for accessibility; it’s about making your images understandable to both classical image search and visual search engines that use image recognition to match patterns, products and scenes. Google Developers outlines best practices such as embedding images with HTML <img> tags rather than CSS, using supported formats, creating responsive images and ensuring pages containing images have descriptive titles and surrounding text.

Modern image SEO best practice includes:

  • Using original, high-quality photos instead of generic stock, since search engines and users prefer sharp, authentic visuals.​
  • Renaming files with descriptive, keyword-supportive names and using keyword-rich but natural alt text that describes what is in the image and its context.
  • Adding structured data for products, local businesses or recipes so that images can appear in rich results and product carousels.
  • For local SEO, geotagging photos of your storefront, interior and team to help connect images to place.​

Neil Patel’s work on visual search and Google Lens notes that Lens prioritises high-resolution product images with clear backgrounds and multiple angles, and that adding context via captions, schema and surrounding copy improves match accuracy. For NZ and UK brands, a simple rule is: shoot like a retailer, label like an SEO, structure like a developer.​

As a caveat to this, we have found ourselves revisiting a lot of our alt text optimisation work in recent months. In the past, we have “over-optimised” these alt tags as a way of appearing in image search – a tactic that has yielded strong results in the past, however, as search engines become better at recognising visual images, the alt tags need to be more descriptive of the actual image itself rather than optimised to manipulate search results. Keyword placement is still important, but it just needs to be done naturally rather than overtly.

Theoretical case study: implementing a multi-sensory search strategy

Imagine a multi-location NZ retailer selling outdoor gear, with growing UK expansion via ecommerce. Historically, their SEO has focused heavily on category pages, blogs and classic blue link rankings. A multi-sensory search strategy reshapes this:

1.      Voice search insights

The team analyses internal search logs and SEMrush/Search Console data to identify common voice style queries like “best waterproof jacket for hiking near Queenstown” or “open outdoor stores near Bristol on Sunday”. They add FAQ blocks to location pages and key guides, answering these queries directly and marking them up with FAQPage and LocalBusiness schema.​

2.      Visual search enhancements

Following Google Developers and Moz guidance, all product images are reshot with consistent lighting, clean backgrounds and lifestyle variants on local trails. Images are compressed, renamed and given descriptive alt text such as “women’s blue waterproof hiking jacket on Routeburn Track NZ” to tie visuals to local search demand. We can see in this example that the at tag is still optimised for “womens blue waterproof hiking jacket”, however, it feels much more natural.

Women’s blue waterproof hiking jacket on Routeburn Track NZ

3.      Structured data and Local SEO

The technical team implements robust Product and LocalBusiness schema, ensuring each store’s opening hours, address and geo coordinates are accurate for voice-powered local queries. They also ensure Google Business Profiles are fully optimised with photos, attributes and Q&A that match spoken queries.​

4.      Content formats for multi-sensory discovery

The content team produces short video gear guides and infographics on layering for NZ and UK weather, then adds full transcripts and descriptive captions to make them indexable for both voice and visual search. These assets are embedded on key pages and promoted via social channels to feed additional behavioural signals.​

Over 12 to 18 months, the retailer sees improved rankings in image search, more impressions in Google Lens style visual results and higher visibility in voice-informed queries like “best hiking jacket near me”. While not all interactions result in immediate clicks, brand recall, store visits and direct searches increase, proving the value of visual and voice optimisation as part of the broader SEO mix.

Content formats for visual and voice search

Video Optimisation for Visual and Voice Search Example

Visual and voice search both benefit from content that is structured, multimodal and accessible. SEMrush, Search Engine Land and Moz all underline the importance of pairing rich media with strong text layers so search engines and assistants can interpret assets accurately.

Key formats include:

  • Video with transcripts and chapters: how-to guides, product walkthroughs and local stories that include spoken answers to common questions, backed by full text transcripts for indexing.​
  • Infographics with HTML support: visual explanations accompanied by descriptive on-page text and alt attributes so visual search engines can contextualise images.​
  • Audio content with show notes: podcasts or audio snippets paired with detailed descriptions and structured headings that highlight key questions answered, aiding voice discovery.​

For e-commerce managers and local SEO specialists, the goal is to design every asset so that both humans and machines can understand it fully. Digital Hothouse’s SEO and Local SEO services integrate this thinking into content roadmaps, ensuring that image SEO, alt text strategy, schema markup for voice and visual search optimisation are built in from the start rather than retrofitted later.

Conclusion

Visual and voice search represent the next frontier of discovery, where 40 per cent of your SEO opportunity lies beyond traditional text queries. For e-commerce managers, local SEO experts, content writers and digital strategists serving New Zealand and UK audiences, mastering image SEO, schema markup for voice, alt text strategy and visual search optimisation is no longer optional. It is how you stay visible when customers search with their eyes, voices and cameras.

Ready to unlock the hidden 40 per cent of your SEO strategy? Contact the Digital Hothouse team today for expert guidance on implementing a voice and visual SEO strategy. Our SEO and Local SEO specialists will audit your current assets, optimise images and content for multi-sensory discovery, and build a roadmap that drives traffic, conversions and brand authority across text, voice and visual search surfaces.

Make sure you check in with some of our most recent posts to get a full understanding of how the search landscape is changing and how visual and voice search are just a part of the puzzle along with AI search discovery, organic visibility, shopping experience and more.

FAQ: visual and voice search SEO

How big is voice search really for SEO?

Current estimates suggest that over 50 per cent of global online searches involve voice assistants and that 71 per cent of consumers prefer voice when possible, with smartphones driving the majority of these queries. This makes voice an essential consideration rather than a niche add-on.​

Which platforms matter most for visual search in 2026?

Google Lens and Pinterest Lens are leading visual search tools, handling billions of queries monthly, with Gen Z showing a strong preference for image-first discovery. Retail, fashion, home décor and food brands see some of the strongest conversion lifts from visual search traffic.

What is the most important factor for image SEO?

High-quality, original images combined with descriptive alt text, relevant filenames and appropriate structured data are the foundation. Embedding images using HTML and including them in sitemaps also helps search engines index them correctly.

How do I optimise my site for voice search?

Use conversational language, create concise answers to common questions, implement schema markup for FAQs and local details and ensure your local listings are consistent and up to date. Focus on natural, question-led queries that reflect how users actually speak.

Where does Digital Hothouse fit into the visual and voice search strategy?

Digital Hothouse helps NZ and UK businesses integrate visual search optimisation, voice search SEO, image SEO and Local SEO into a cohesive strategy. Our SEO and Local SEO teams work together to optimise site structure, media assets and content so your brand is discoverable in text, image and voice-driven experiences.

Share this story