The Next Generation Semantic Search
Next Generation Semantic Search grew out of a game a friend runs on WhatsApp called Picardle. Instead of guessing words, players are shown screenshots from Star Trek: The Next Generation and try to identify the episode by describing its plot. Exact episode titles are rarely guessable, so scoring is based on semantic closeness rather than string matching.
To automate that judgment, I built a semantic search system over the full set of TNG episodes. The goal wasn’t classification or question answering, but similarity: given a free-form description of an episode, return the most semantically related synopses.
The system uses WordLlama, a compact NLP embedding model derived from large language models. WordLlama extracts the token embedding codebook from a state-of-the-art LLM (e.g. LLaMA-3-70B) and retrains it as a small, context-less embedding model. The result is a fast, lightweight representation comparable to GloVe, Word2Vec, or FastText, but grounded in modern LLM token spaces.
I scraped and cleaned episode synopses, normalized formatting, and embedded each synopsis into a vector space using WordLlama. User guesses are embedded using the same model, and cosine similarity is used to rank candidate episodes. Because the embeddings are context-free and compact, the system performs well without fine-tuning or large infrastructure.
To make the experiment usable, I deployed a simple front end using Gradio and hosted it as a Hugging Face Space. The interface accepts a natural-language description and returns the closest matching episodes, making the underlying similarity judgments visible and testable.
This project is less about novelty and more about fit: choosing an embedding approach that’s small, inspectable, and appropriate to the task. It demonstrates how modern semantic search can be built without full LLM inference, while still benefiting from representations distilled from large models.
The live demo is available here:
https://huggingface.co/spaces/m-butler/picardle