UNPKG

chunk-match

Version:

NodeJS library that semantically chunks text and matches it against a user query using cosine similarity for precise and relevant text retrieval

1 lines 5.93 kB
A powerful method to enhance the accuracy and relevance of generative AI outputs is through the combination of vector embeddings and Retrieval-augmented generation (RAG). In this article I will explain how they work and show you a fun project of mine that utilize these mechanisms.What are Vector Embeddings?Vector embeddings are a way to represent complex concepts like words in a numerical format that is easier for computers to understand. For example, the sentence �The hat is a classic fedora style, made of black straw with a black and white striped band around the center.� might be represented by the 768-dimensional vector, [0.05241162329912186, -0.0720922201871872, -0.008915719576179981, �, 0.011770552024245262, 0.031724244356155396, 0.0102184247225523].I have worked with 768-dimensional vector representations, but there are also 1536-dimensional vectors and I can only see this trend of dimensional growth continuing as we try to represent more complex concepts.Vector similarityOnce you have vector embeddings for texts and concepts, you can compare them to identify similarities. This process is akin to the k-Nearest Neighbors (kNN) algorithm. In kNN, you calculate the distance between a new data point and existing data points in a training set. Similarly, with vector embeddings, you vectorize the input prompt and compare it to other vectors in the vector space. The closest matching vector indicates the most relevant concept or text.Cosine similarity is a popular method for measuring the similarity between vector embeddings, especially in applications like text similarity and image search.What is RAG?Retrieval-augmented generation (RAG) is a two-step process:Extract relevant information from a knowledge base or corpus of text using techniques like semantic search or vector embeddingsCombine retrieved information with original prompt to generate a more comprehensive and accurate text responseA customer service chatbot for an appliance company could use RAG to provide accurate answers to user queries. When a customer asks, �How do I replace the HEPA filter in my Model XYZ air purifier?� the chatbot would retrieve the relevant instructions from the user manual and generate a step-by-step response.Summarization is another application of RAG. When you provide the Large Language Model (LLM) with a document and ask it to provide you a summarization, the generated output summary is based on the document that was provide. This prevents the model from generating creative but irrelevant responses.What problem do they solve?One significant challenge LLMs face is the generation of inaccurate or fabricated information, often referred to as �hallucinations.� These responses can be misleading, especially when the model appears highly confident in its output.To mitigate this issue, techniques like RAG and vector embeddings can be employed. By grounding the generated text in factual information from external sources, these methods enhance the accuracy and relevance of LLM responses, reducing the likelihood of hallucinations.A sample app: MR Stylist[M]ultimodal [R]AG Stylist was something inspired by Dale Markowitz� AI Stylist. I thought I would put a generative AI spin on it using Vertex AI with Gemini 1.0 Pro. This project aims to create a system that recommends clothing from your wardrobe based on an image of someone else�s outfit.Male model source: via Google searchNOTE: this is just a very early prototype and you may not agree with the prompts I chose or the logic I used, and that�s okay � we�re all here to learn.How it worksStep 1: Creating a Wardrobe Embedding DatabaseData Collection: I start by taking pictures of a few key pieces from your wardrobe, focusing on different types of clothing.Embedding Generation: I use a script called embed_wardrobe.py to process these images. This script generates a text prompt describing each piece's type, color, and style. These descriptions are then fed into a text embedding model, creating a unique vector representation for each clothing item in your wardrobe.Storage: Currently, these vector embeddings are saved in a simple CSV file. However, for larger wardrobes, a more scalable solution like Firestore, a cloud-based vector database, would be more efficient.Step 2: Recommending Wardrobe MatchesUser Interface: A simple Flask web application serves as the user interface. Users can upload an image (e.g., of a model or celebrity) wearing an outfit.Image Analysis: The uploaded image is then fed a text prompt asking for a detailed description of each clothing item in the picture, including its style, color, and any design details. This ensures separate descriptions for individual pieces.Text Embedding & Similarity Matching: Each description is processed by the same text embedding model used in Step 1, generating a vector representation. We then calculate the Cosine similarity between these generated vectors and the vectors in your wardrobe database. The clothing items in your wardrobe with the highest cosine similarity scores are considered the closest matches and are recommended to the user.What�s next?My initial clothing recommendation project, built with Gemini 1.0 Pro models, was a fun learning experience. Now, I�m excited to revisit and enhance it using the latest Gemini 1.5 Flash models. These newer models offer improved speed and efficiency, which should translate to a faster and more cost-effective system and a better user experience.To further refine the project�s performance, I plan to explore different prompting techniques. Few-shot prompting, for example, could help me achieve a more consistent and predictable format for the generated clothing descriptions. Additionally, I�m interested in experimenting with image embedding models. By directly processing the images, these models might provide a different perspective and potentially improve the accuracy of the clothing recommendations.