In 2000, Yahoo had pole position to win one of the biggest market opportunities of all time, as one of the early World Wide Web’s most popular and fastest-growing services.
The internet was still relatively new (17 million websites, compared to today’s 1.6 billion) and companies like Yahoo were in a clumsily named category sometimes referred to as “starting pages” or “portals”—gateways to services like email, news, finance, and sports. Yahoo was running away with this traffic, because it had the friendliest interface and the best content at that time for this new “web” experience.
In June of that year, Yahoo chose Google as its “default search engine provider” and Yahoo’s search box was suddenly advertised as “powered by Google.” Then users found themselves simply going to Google for that search.
Today Google (aka “Alphabet”) enjoys a $1.7 trillion dollar market cap, while Yahoo is remembered as an also-ran in the early commercial internet, one that somehow failed to capitalize on being in exactly the right place at exactly the right time.
Search is the value driver
The history lesson isn’t just that Google won the internet with search.
It’s that search is what won every dominant tech player its market share. It won apps (App Store) and music (iTunes) for Apple, social for Facebook, e-commerce for Amazon, and more. All of today’s most valuable tech brands are masters of search in their application domains. The market has shown us time and again that search is unequivocally the value driver, and that those who master search, control markets.
But many developers today still struggle to understand search as a fundamental part of their application platform. Some look at search as something to be “bolted on” to the application post-facto, while others retreat into LIKE queries in SQL and other half measures.
If you are trying to wrap your head around the importance of search in your application platform strategy, let’s talk about what’s at stake, and why you need to get this right.
Search is a conversation with your users
If you walk into a pharmacy and say, hey, I’m looking for a COVID-19 at-home test kit, and they walk away without answering your question, how does that make you feel? Ignored? Disrespected? You’re not coming back, that’s for sure.
Search is a conversation with your users. Search is how you make it easier for them to interact with your data. What’s more important than that?
10 years ago, developers working with search were mostly just trying to parse the text. Natural language, the analysis chain, and getting the index set up—all of that was driven by decades of research into understanding how languages are composed, what words are important, how to handle diacritics, and things like that.
Then search evolved to the concept of learning to rank, so that over time you could reorder search results based on what you’d observed from user conversations in the past. That’s a great baseline search functionality that every search engine today still offers.
Surfacing data before your users know they are looking for it
Today we’re seeing a major evolution in how search anticipates what data users want before they even know they are looking for it. I land on Netflix and it already knows I want this movie or that I’m interested in this show—it’s the canonical example of personalization, powered by search indexing and machine learning.
Underneath these use cases of predicting what users want is math that tries to mimic how our brains work. Vector space—words, sentences, or phrases represented in a graph by where they appear in a language model—are driving this movement.
Search is moving from text representation to vector representation. The digital native world of ubiquitous internet, ubiquitous e-commerce, and ubiquitous smartphones is pushing us into the next phase of multi-model information retrieval. Whether the Metaverse wins or a different future platform emerges, sometimes the interface will be text, sometimes it will be voice, and sometimes it will be images or video. Eventually it may even be neural links directly to the brain.
Vector representation makes this type of multi-model information retrieval possible in search. This is discovery that’s not possible with text alone. If someone under 20 says a new song is sick, that’s probably going to have a different meaning than if someone over 60 says exactly the same thing. We all speak differently, and when we try to anticipate what someone wants we have to parse both who they are and what they are looking for at the same time.
SQL LIKE queries are a dead end; so are proprietary engines
As a developer, the decisions you make today in how you implement search will either set you up to prosper, or block your future use cases and ability to capture this fast-evolving world of vector representation and multi-modal information retrieval.
One severely blocking mindset is relying on SQL LIKE queries. This old relational database approach is a dead end for delivering search in your application platform. LIKE queries simply don’t match the capabilities or features built into Lucene or other modern search engines. They’re also detrimental to the performance of your operational workload, leading to the over-use of resources through greedy quantifiers. These are fossils—artifacts of SQL from 60 or 70 years ago, which is like a few dozen millennia in application development.
Another common architectural pitfall is proprietary search engines that force you to replicate all of your application data to the search engine when you really only need the searchable fields. Maintaining both a document store for search and a separate store for truth leads to significant complexity, increased storage costs, and latency for the modern full-stack developer, who now must be both search expert and part-time database administrator.
Operational workloads like search are adaptive and dynamic. They’re “post-SQL,” obsoleting expensive and ineffective LIKE and CONTAINS operations in legacy databases.
First steps: User journeys and destinations
Developers who have bought into the importance of search can easily find themselves trying to boil the ocean—building a specialized external system and trying to get everything right on the first try. Whereas the wise engineer is going to simplify and iterate.
Understanding your users is the first step in every successful search implementation that I’ve seen. You have to audit their destinations, and then map out different user paths, just like with user interface design.
Typically you will find that while the user paths might be different, they often start from the same place and reach the same destination. Getting a very precise understanding of what your users are trying to do and how you get them there will reveal the commonalities that bring focus and simplicity to your development efforts around search.
Marcus Eagan is a contributor to Solr and Lucene and is staff product manager of Atlas Search at MongoDB. Before that, he was responsible for developer tools at Lucidworks. He was a global tech lead at Ford Motor Company, and he led an IoT security startup through its acquisition by a router manufacturer. Eagan works hard to help underrepresented groups break into tech, and he has contributed to open source projects since 2011.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to email@example.com.
Copyright © 2022 IDG Communications, Inc.