neo4j introduction operation and some thoughts

summary

The open source NoSQL database, the native graph database, was developed in 2003, using scala and java language, and was released in 2007;
One of the most advanced graph databases in the world, which provides native graph data storage, retrieval and processing;
The Property graph model is adopted to greatly improve and enrich the graph data model;
Cypher, an exclusive query language, is intuitive and efficient;

Storage structure of Neo4j

In Neo4j, its database files are divided into four categories for classified storage:

label
node
attribute
relationship

neo4j browser

Basic statement

// Running statement
MATCH (n) DETACH DELETE n;
// Find all
match(n) return n;

//Create a single node
CREATE (a:Person {name:'a', born:1997}) return a;
//Create multiple node s
CREATE (b:Person {name:'b', born:1997}),(c:Person {name:'c', born:1961}) Return b,c;
//Two statements for creating relationships
1.according to Id Create relationship
start m =node(1025),f=node(1024) create (m)-[n:gift]->(f) return m,f;
2.Create relationships based on conditions
CREATE (m:Person{name:'a'})-[:gift]->(f:Person{name:'b'}) return m,f;
//Create relationships for previously defined nodes
MATCH(n:Person{name:'a',born:1997}),(b:Person{name:'b',born:1997}) merge(n)-[:teacher]->(b);
                                                                                    
//Condition query
MATCH (n:Person{age:'18'}) return n;
match(n:Person) where n.age = '18' return n;
//Specify number
MATCH (n:Test) RETURN n LIMIT 25；
//Query first level relationship
match q=(n:A{name:'House 2'})-[]-()return q;
//Query secondary relationship
match q=(n:A{name:'House 2'})-[]-(),p=(n:A{name:'House 2'})-[]-()-[]-() return p,q;

//Modify statement
MATCH (n:Person)WHERE n.name="a" SET n.born = 2003RETURN n;
                                               
//Delete statement (when the node has a relationship, it cannot be deleted)
match(n:Person{name:'Zhang San'}) delete n;
//If there is a relationship, delete the relationship first
MATCH(n:Person{name:'a',born:1997}),(b:Person{name:'d',born:1987}) merge(n)-[p:children]->(b) delete p;
//Delete node b and its relationship
MATCH(n:Person{name:'a',born:1997}),(b:Person{name:'b',born:1997}) merge(n)-[p:teacher]->(b) delete p,b;
//Delete Test and all its relationships
match p=(n:Test)-[]-() delete p;
match(n:Test) detach delete n;

Application scenario

Recommendation engine

In 2012, google officially released the knowledge map search engine
2013 facebook open knowledge map search portal

knowledge graph

Star Wars character relationship map

Social network map

In the social network, if you want to know Trump's daughter Ivanka, query the shortest path to get to know Ivanka, and everyone in this path is single. And no matter who you are, you are likely to know Ivanka through 6 ~ 7 people. This is the famous six degree separation theory (small world effect): any two strangers can always have an inevitable connection or relationship through a certain way of contact.

In the hypothetical program implementation, a global character relationship network can be built through Neo4j, and Ivanka can be found by specifying the start node and end node.

MATCH (p1:Person {name:"xxx"}),(p2:Person{name:"ivanka "}),
p=shortestpath((p1)-[*..6]->(p2))
RETURN p

other

In the multi-dimensional association analysis scenario of anti fraud in the financial industry, anti fraud is already a core application. Through the graph database, association analysis can be done for different individuals and groups, from the behavior of people in a specified time, such as the IP address of places they have been Correlation analysis of used MAC addresses (including mobile phone, PC, WIFI, etc.) and social networks, whether they have appeared near the same geographical location at the same time point, and whether there is historical transaction information between bank accounts.

In daily operation, enterprises will deal with customers, partners, channels and investors, which also determines that enterprises are widely involved in all fields of society and present complex aspects. Therefore, they can query and mine information layer by layer through the enterprise data map.

Figure advantages and disadvantages of database

Relational database was originally designed to deal with paper forms and tabular structure. When relational database modeled complex relationships in the actual model, it did not do very well, that is to say, relational database did not do well in dealing with links, and its expansibility was also very poor.

Relational database is a powerful mainstream database. After 40 years of development and improvement, it has been very reliable, powerful and practical, and can store a large amount of data.

What are relational databases not good at? When you look for data items, relational patterns, or relationships between multiple data items, they often end in failure.

In the search of understanding Ivanka just mentioned, we deal with the social network map with depth of 6. Now suppose two people are randomly selected, and whether there is a path to associate them with a relationship length of up to 5. For a social network containing 1 million people and about 50 friends per person, the performance comparison between the traditional relational database and Neo4j is as follows:

The results clearly show that graph database is the best choice for associated data.

Any shallow traversal query that goes beyond looking for direct friends or looking for friends will slow the search because of the number of indexes involved. Because graph database uses graph traversal technology, the amount of data to be calculated is far less than that of relational database, so it is very fast.

The graph database is not perfect. Although it makes up for the defects of many relational databases, it also has some inapplicability, such as the following fields:

Record a large amount of event based data (such as log entries or sensor data);
Processing large-scale distributed data, similar to Hadoop;
Binary data storage;
It is suitable for structured data stored in relational database.

expectation

Although relational database is still the best choice for storing structured data, graph database is more suitable for managing semi-structured data, unstructured data and graphic data. If the data model contains a large amount of associated data, and you want to use an intuitive, interesting and fast database for development, you can consider trying the graph database.

In the actual production environment, a truly mature and effective analysis environment should include relational database and graph database, which can be combined for effective analysis according to different application scenarios.

On the whole, there are still many unsolved problems in graph database, and many technologies need to be developed, such as super node problem and distributed large graph storage. It can be predicted that with the expansion of Internet data, graph database will usher in a development opportunity, and various computing and data mining jobs based on graph will become more and more hot.