One of the big problems with traditional search engines is that they rely on string matching to find the results that match a user's query. This often leads to many false positives because the search engine looks for any result that contains the same characters as the query. This means that the search will only be able to find the correct content if there aren’t any misspellings, different word orders, or synonyms.
Thankfully, with the advances in search technologies like Artificial Intelligence (AI), natural language processing (NLP), and machine learning (ML) algorithms, site search engines have become a lot more powerful in delivering relevant results.
Understanding search - what makes one search engine different from another
In the early days of search, Boolean logic was used to comb through database records and find those that matched the search criteria. This was an effective way to search for documents when the number of items in a database was small. But as databases grew, Boolean searches became unwieldy. They often produced too many results, mostly irrelevant to the original search. Let's take a look at how search standards evolved over time
Evolution of search standards
TF-IDF
Term frequency-inverse document frequency or TF-IDF is a statistical measure used to evaluate a word's importance to a document in a collection or corpus. TF-IDF is based on two intuitions: first, that the importance of a word in a document is proportional to how often it appears in that document, and second, that the significance of a word in a collection of documents is inversely proportional to how often it occurs in any document in the collection.
BM25
The basic idea behind BM25 is that a good search engine should be able to retrieve relevant documents even if the query terms are not present in the documents. It is based on the probabilistic retrieval framework and ranks documents based on their relevance to a given query. BM25 first computes the TF-IDF (term frequency-inverse document frequency) weights for each term in each record. These weights are then used to compute a score for each document-query pair. The higher the score, the more relevant the document is to the query.
BM25F
BM25F (Best Match 25 with Field Bias) is an extension of BM25 that incorporates term weights that reflect the significance of each term within specific fields of the documents being ranked. It considers various factors, including the number of times a term appears in a document, the length of the document, and the overall importance of the document. This allows for more accurate retrieval when searching for documents with multiple terms that occur in different fields within the document (e.g., title, body, etc.).
Even with keyword search technology and standards improving with time, keyword-based search remains inept in understanding a crucial element of search - the user intent. This is where semantic search and concept search prove to be instrumental in offering advanced search functionality.
Concept search - what it is and how it works
In AI search, a concept is a representation of a set of objects, properties, relations, or functions that can be distinguished from other such sets. Concept search is a search technology used to find information based on a user's conceptual understanding of a topic rather than the traditional keyword-based approach.
In contrast to semantic search, which relies on language processing and textual analysis to interpret the user's intent, concept search relies on a database of pre-defined concepts and relationships between those concepts. This means concept search can be more accurate than semantic search when understanding the user's intent.
One way to think of concept search is as an "ask an expert" system, where the expert is the database of concepts and relationships. When you ask a question using keywords, the system has to guess what you are looking for. With concept search, you are more likely to get results that match your intentions.
Text to math
To understand how concept search works, it's first necessary to understand what a vector is. A vector is a mathematical representation of something with direction and magnitude. In concept search, vectors are used to represent the relationships between concepts.
Concept search is based on a vector space model, which represents documents as vectors of terms. The similarity between the two documents is then computed as the cosine of the angle between their vectors. Concept search engines typically return results based on a combination of relevance and novelty, with more relevant results appearing higher up in the list.
When you perform a concept search, the search engine looks at the relationship between words to understand the concept behind your search. This is done by looking at how often certain words appear together in a document and using that information to create a "vector" of related terms. The more often two terms appear together, the more closely related they are considered to be.
So, for example, if you were to search for "machine learning," the vector might include related terms like "algorithm," "data," "model," and "prediction." By understanding the concept behind your search, the search engine can return results that are more relevant to what you're looking for.
Semantic Search - what it is and how it is different from concept search
Semantic search is based on the idea that the best way to understand the meaning of a query is to look at the entire sentence or passage containing it rather than just the individual words. This approach allows for a more natural way of searching, where users can simply enter a few keywords and get results that are relevant to their query.
There are a number of different algorithms that can be used for semantic search, but one of the most popular is latent semantic analysis (LSA). LSA looks at an extensive collection of documents and identifies relationships between the terms used in them. It then uses these relationships to determine the meaning of new queries.
The main difference between concept search and semantic search is that concept search looks for documents that are related to a particular topic or concepts. In contrast, semantic search looks for documents semantically associated with a specific query. Additionally, concept search relies heavily on word embeddings to map out the meaning of terms, while semantic search relies more on graph-based techniques to identify relationships.
Wrapping Up
The promising rise of machine learning and AI-based search techniques like concept search and semantic search has led to enhances search performance combined with superior user experience.
AI-powered search and discovery solutions like Zevi understand this and leverage the power of neural search and natural language processing to unlock intelligent on-site search functionality for businesses. Zevi offers advanced search capabilities such as auto-correct, autocomplete, synonym handling, typo and spelling-error tolerance, and search personalization to create seamless online experiences for your customers.
To explore more, book a free demo with us today.