Is it possible to create vector indices on attributes, populate them using OpenAI embedding model, and perform similarity search using cosine similarity?
Consider a vertex: movie with an attribute: tagline.
-
Create vector indices on attribute:
tagline
in vertex:movie
-
Populate vector indices: Calculate vector representation for each
movie. tagline
usingOpenAI
. Add vector to the vertex:movie
as attribute:tagline_embedding
. - Perform similarity search using cosine similarity
neo.4j cypher query for reference
CREATE VECTOR INDEX
kg.query("""
CREATE VECTOR INDEX movie_tagline_embeddings IF NOT EXISTS
FOR (m:Movie) ON (m.taglineEmbedding)
OPTIONS { indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}"""
)
POPULATE
kg.query("""
MATCH (movie:Movie) WHERE movie.tagline IS NOT NULL
WITH movie, genai.vector.encode(
movie.tagline,
"OpenAI",
{
token: $openAiApiKey,
endpoint: $openAiEndpoint
}) AS vector
CALL db.create.setNodeVectorProperty(movie, "taglineEmbedding", vector)
""",
params={"openAiApiKey":OPENAI_API_KEY, "openAiEndpoint": OPENAI_ENDPOINT} )
PERFORM SIMILARITY SEARCH
- CALCULATE EMBEDDING FOR QUESTION
- IDENTIFY MATCHING MOVIES BASED ON SIMILARITY OF QUESTION AND
TAGLINEEMBEDDING
VECTORS
kg.query("""
WITH genai.vector.encode(
$question,
"OpenAI",
{
token: $openAiApiKey,
endpoint: $openAiEndpoint
}) AS question_embedding
CALL db.index.vector.queryNodes(
'movie_tagline_embeddings',
$top_k,
question_embedding
) YIELD node AS movie, score
RETURN movie.title, movie.tagline, score
""",
params={"openAiApiKey":OPENAI_API_KEY,
"openAiEndpoint": OPENAI_ENDPOINT,
"question": question,
"top_k": 5
})