Hi, currently I’m trying to identify pattern of same type for a potential fraud as an example.
I’m aware that we have pre build algorithms like centrality, community detection, page rank so on…, I have a very niche doubt where I want to know which algorithm can be best fit or if i can customize the algorithm (if yes, how can i do it using gsql), what I want to achieve here is that to find a pattern of a particular individual based on certain attribute values and find similar individual like that person(example : if days >3, status as “non approved”, record as “clear” something like this) but when I apply these pre build algorithm it gives me scores for vertex and not the score based on attributes so for example if the vertex is “person” it gives me score of person to which all vertex it is connected to, what I want is to get score of each user with potential score so that it would help me differenciate (example attribute of person like age so score for each age, his productivity etc…) I’m stuck with these hope this helps I know it sounds a bit confusing but the basic idea is to get score of attribute values not the vertex.
I have even tried mentioning the “result attribute” value but that is not coming .
many thanks in advance.
Just saw your question. Are you still looking for help?
Let me see if I understand what you want to do:
- You have a certain individual as a reference or goal, and you want to find other persons who have similar characteristics. Is that correct?
This is a pretty classic task, whether you have a graph or not. There are a couple of issues you need to address:
- What are the particular characteristics that you care about? Do you want to choose, or do you want an ML system to automatically choose, based on a set of learning data?
- Are those characteristics all in numerical format, or are they categorical properties? Can a value be “close”, or it is simply “matches / doesn’t match”?
Hi Victor, thank you for the suggestion
Yes, I’m still trying to figure this out.
Point 1: Yes, we are trying to find similar characteristics but the reference person/goal is what we are creating using certain conditions as I have mentioned and is not readily available in the data.
Point 2: The characteristics we care about are the one’s we can choose and would not want the ML system to automatically do it for us and in this case there is no learning data (it’s something not exactly what might come under Unsupervised learning)
Addition to point 2 what we really want the ML Algorithm to do is once we define a rule and create an individual as a reference point then we want algorithm to give us similar Individual like the one we have referred/created for example: when the new data comes in based on the condition we have set we would want the algorithm to find similar individuals with the same patterns/criteria that is being set (sorry for the confusion but that’s one way i can explain this)
These characteristics are mixture of both strings and numbers (i.e both categorical and numeric data) and the value for identifying the patterns can be close they need not be equal.
Hope this answers on the same line of what all is required so that you can help us with the same.
Thank you for your help and hopefully we can find some solution.
I’m still not sure what you are saying about Point 1. If you have a particular Person in mind, and you can see each person’s attributes and connections to other entities, then we can perform similarity analysis. We have several out-of-the-box similarity algorithms, but you probably either need to massage your data to fix the algorithm, or modify the algorithm to work with your data.
Algorithm library - 2 useful categories:
In a tabular database, you would have the characteristics as columns in a table. With a graph database, you can do that too, or you can work with the connections to neighboring elements. For example, in a business organization graph, you can say that person A is very similar to person B because:
- They both have several “supervises” connections to “Data Scientist”, though different numbers of connections.
- They both have a “reports to” connect to “CTO”
- One has the title “Director of Data Science” and one has “Machine Learning Manager”. There are no words in common, but if you have some auxiliary language information, you could see that they are similar.
- The could also be information about experience, skills, location, etc. That information could be either attributes of the Employee vertices or connected through edges.
No algorithm works out of the box with mixed numerical and categorical data. It sounds like you are familiar with those concepts. If you want to get detailed help with your specific graph schema and data types, we can take a look at that.
P.S. Looks like we are in different time zones. I will try to sync better.