How to: create vector indices on vertex.attribute, populate them using OpenAI embedding model, and perform similarity search using cosine similarity?

Is it possible to create vector indices on attributes, populate them using OpenAI embedding model, and perform similarity search using cosine similarity?

Consider a vertex: movie with an attribute: tagline.

  1. Create vector indices on attribute: tagline in vertex: movie
  2. Populate vector indices: Calculate vector representation for each movie. tagline using OpenAI. Add vector to the vertex:movie as attribute:tagline_embedding.
  3. Perform similarity search using cosine similarity

neo.4j cypher query for reference

CREATE VECTOR INDEX

kg.query("""
  CREATE VECTOR INDEX movie_tagline_embeddings IF NOT EXISTS
  FOR (m:Movie) ON (m.taglineEmbedding) 
  OPTIONS { indexConfig: {
    `vector.dimensions`: 1536,
    `vector.similarity_function`: 'cosine'
  }}"""
)

POPULATE

kg.query("""
    MATCH (movie:Movie) WHERE movie.tagline IS NOT NULL
    WITH movie, genai.vector.encode(
        movie.tagline, 
        "OpenAI", 
        {
          token: $openAiApiKey,
          endpoint: $openAiEndpoint
        }) AS vector
    CALL db.create.setNodeVectorProperty(movie, "taglineEmbedding", vector)
    """, 
    params={"openAiApiKey":OPENAI_API_KEY, "openAiEndpoint": OPENAI_ENDPOINT} )

PERFORM SIMILARITY SEARCH

  • CALCULATE EMBEDDING FOR QUESTION
  • IDENTIFY MATCHING MOVIES BASED ON SIMILARITY OF QUESTION AND TAGLINEEMBEDDING VECTORS
kg.query("""
    WITH genai.vector.encode(
        $question, 
        "OpenAI", 
        {
          token: $openAiApiKey,
          endpoint: $openAiEndpoint
        }) AS question_embedding
    CALL db.index.vector.queryNodes(
        'movie_tagline_embeddings', 
        $top_k, 
        question_embedding
        ) YIELD node AS movie, score
    RETURN movie.title, movie.tagline, score
    """, 
    params={"openAiApiKey":OPENAI_API_KEY,
            "openAiEndpoint": OPENAI_ENDPOINT,
            "question": question,
            "top_k": 5
            })