mondodb aggregation framework

mondodb aggregation framework

brief introduction

MongoDB aggregation framework is mainly used to transform and combine documents in the collection.
Thus, the data are analyzed and utilized.

The basic idea of aggregation framework is to create a pipeline for processing a series of documents using multiple constructs.

These constructs include filtering, projecting, grouping, sorting, limiting and skipping.

Ways to use aggregation frameworks

db. Collection. aggregate (build 1, build 2, build 3...)
Note: Since the aggregated results are to be returned to the client, the aggregated results must be limited to 16M, which is the maximum response message size supported by MomgoDB.


Production test data

        for(var i=0;i<100;i++){
                for(var j=0;j<4;j++){
                          db.scores.insert({"studentId":"s"+i,"course":"curriculum"+j,"score":Math.random()*100});
            }
         }
400 data will be generated, with one student having four courses:
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc58b"), "studentId" : "s0", "course" : "Course 0", "score" : 5.999703989474325 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc58c"), "studentId" : "s0", "course" : "Course 1", "score" : 5.040777429297738 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc58d"), "studentId" : "s0", "course" : "Course 2", "score" : 63.74926111710963 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc58e"), "studentId" : "s0", "course" : "Course 3", "score" : 66.50186953829127 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc58f"), "studentId" : "s1", "course" : "Course 0", "score" : 84.74255359132857 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc590"), "studentId" : "s1", "course" : "Course 1", "score" : 85.7428949944855 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc591"), "studentId" : "s1", "course" : "Course 2", "score" : 33.198227427842056 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc592"), "studentId" : "s1", "course" : "Course 3", "score" : 5.346516072417174 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc593"), "studentId" : "s2", "course" : "Course 0", "score" : 97.1040803312415 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc594"), "studentId" : "s2", "course" : "Course 1", "score" : 20.47611488352149 }
   ...


Introduction to Pipeline Operators

Each operator accepts a series of documents, processes them in response, and then passes the converted document as a document to the next operator. The last operator returns the result.


Pipeline operator $match

Used for filtering a collection of documents, where all regular operators can be used. Usually placed at the front of the pipe.
The reasons are as follows.
1: Quickly filter unnecessary documents to reduce the amount of data for subsequent operations.
2: Screen before projecting and grouping. Queries can use indexes

Find three data with score greater than 90

    db.scores.aggregate({"$match":{"score":{$gt:90}}},{"$limit":3})

    { "_id" : ObjectId("597ffbc1d8c0f25813bcc593"), "studentId" : "s2", "course" : "Course 0", "score" : 97.1040803312415 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc5a6"), "studentId" : "s6", "course" : "Course 3", "score" : 95.37347295013626 }
    { "_id" : ObjectId("597ffbc1d8c0f25813bcc5ab"), "studentId" : "s8", "course" : "Course 0", "score" : 90.49825247672626 }

Pipeline operator $project

Used to extract fields from documents, you can specify inclusion and exclusion fields, or rename fields.

Show studentId score only

   db.scores.aggregate({"$match":{"score":{$gt:90}}},{"$project":{studentId:1,score:1,_id:0}},{"$limit":3})

{ "studentId" : "s2", "score" : 97.1040803312415 }
{ "studentId" : "s6", "score" : 95.37347295013626 }
{ "studentId" : "s8", "score" : 90.49825247672626 }


Rename studentId to Sid $studentId to indicate value

    db.scores.aggregate({"$match":{"score":{$gt:90}}},{"$project":{"sid":"$studentId",score:1,_id:0}},{"$limit":3})


    { "score" : 97.1040803312415, "sid" : "s2" }
    { "score" : 95.37347295013626, "sid" : "s6" }
    { "score" : 90.49825247672626, "sid" : "s8" }


Pipeline operators can also use expressions to meet more complex requirements.

Supported operators and corresponding syntax:

  • 1:$add : [expr1[,expr2,...exprn]]

  • 2:$subtract:[expr1,expr2]

  • 3:$multiply:[expr1[,expr2,...exprn]]

  • 4:$divice:[expr1,expr2] 5:$mod:[expr1,expr2]


Question 1: Query out the score of people who score more than 90 points plus 20 points

db.scores.aggregate({"$match":{"score":{$gt:90}}},{"$project":{"sid":"$studentId","score":1,_id:0,"newScore":{$add:["$score",20]}}},{"$limit":3})

{ "score" : 97.1040803312415, "sid" : "s2", "newScore" : 117.1040803312415 }
{ "score" : 95.37347295013626, "sid" : "s6", "newScore" : 115.37347295013626 }
{ "score" : 90.49825247672626, "sid" : "s8", "newScore" : 110.49825247672626 }


Date expression of pipeline operator $project

The combined framework contains expressions for extracting date information as follows: $year, $month, $week, $dayOfMonth, $dayOfWeek, $dayOfYear, $hour, $minute, $second.

Insert a date document

 db.userdatas.insert({"date":new Date()})
 { "_id" : ObjectId("5983f5c88eec53fbcd56a7ca"), "date" : ISODate("2017-08-04T04:19:20.693Z") }

 db.userdatas.aggregate({"$match":{"_id":ObjectId("5983f5c88eec53fbcd56a7ca")}},
 {"$project":{"year":{"$year":"$date"},"month":{"$month":"$date"}}})

 { "_id" : ObjectId("5983f5c88eec53fbcd56a7ca"), "year" : 2017, "month" : 8 }


String expression of the pipe operator $project

  • 1: $substr: [expr, start position, number of bytes to fetch]

  • 2:$concat:[expr1[,expr2,...exprn]]

  • 3:$toLower:expr

  • 4:$toUpper:expr

db.scores.aggregate({"$project":{"sid":{"$concat":["$studentId","--"]}}},{"$limit":2})

{ "_id" : ObjectId("597ffbc1d8c0f25813bcc58b"), "sid" : "s0--" }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc58c"), "sid" : "s0--" }


The logical expression of the pipe operator $project

1:$cmp:[expr1,expr2]: Compare two expressions, 0 means equal, big before positive number and big after negative number.

2:$strcasecmp:[string1,string2]: Compare two strings, case-sensitive, only valid for strings composed of Roman characters.

3:$eq,$ne,$gt,$gte,$lt,$lte :[expr1,expr2]

4:$and,\$or,\$not

5:$cond:[booleanExpr,trueExpr,falseExpr]: If the boolean expression is true, return the true expression, otherwise return the false expression.

6:$ifNull:[expr,otherExpr]: If expr is null, return otherExpr, otherwise return expr


Question 1: Compare field studentId to character "s3"

db.scores.aggregate({"$project":{"result":{"$strcasecmp":["$studentId","s3"]},"studentId":1}},{"$skip":8},{"$limit":12})


{ "_id" : ObjectId("597ffbc1d8c0f25813bcc593"), "studentId" : "s2", "result" : -1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc594"), "studentId" : "s2", "result" : -1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc595"), "studentId" : "s2", "result" : -1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc596"), "studentId" : "s2", "result" : -1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc597"), "studentId" : "s3", "result" : 0 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc598"), "studentId" : "s3", "result" : 0 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc599"), "studentId" : "s3", "result" : 0 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc59a"), "studentId" : "s3", "result" : 0 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc59b"), "studentId" : "s4", "result" : 1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc59c"), "studentId" : "s4", "result" : 1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc59d"), "studentId" : "s4", "result" : 1 }
{ "_id" : ObjectId("597ffbc1d8c0f25813bcc59e"), "studentId" : "s4", "result" : 1 }


group grouping

Used to group documents according to different values of a particular field. Once you have selected the grouping fields, you can pass them to the "_id" field of the $group function.
  • 1:$sum:value: For each document, add value to the calculated result

  • 2:$avg:value: Returns the average value of each group

  • 3:$max:expr: Returns the maximum value in the group

  • 4:$min:expr: Returns the minimum value in the grouping

  • 5:$first:expr: Returns the first value of the grouping, ignoring other values. This operation is meaningful only when the data order is clearly known after sorting.

  • 6:$last:expr: Returns the last value of the grouping, as opposed to the previous one

  • 7:$addToSet:expr: If expr is not included in the current array, add it to the array

  • 8:$push:expr: Add expr to the array


Question 1: Grouping by people

db.scores.aggregate({$group:{"_id":"$studentId"}}) 


Question 2: Calculate the top ten students of each person's average score.

db.scores.aggregate({$group:{"_id":"$studentId","avgscore":{"$avg":"$score"}}},{"$project":{"score":1,"avgscore":1}},{"$sort":{"avgscore":-1}},{"$limit":10})
{ "_id" : "s68", "avgscore" : 83.18088197080301 }
{ "_id" : "s29", "avgscore" : 81.47034353997472 }
{ "_id" : "s87", "avgscore" : 80.55368613639939 }
{ "_id" : "s14", "avgscore" : 78.15436971363175 }
{ "_id" : "s33", "avgscore" : 76.56089285481109 }
{ "_id" : "s35", "avgscore" : 74.60393122282694 }
{ "_id" : "s99", "avgscore" : 73.79357383668147 }
{ "_id" : "s6", "avgscore" : 72.401925493081 }
{ "_id" : "s89", "avgscore" : 71.40932193466372 }
{ "_id" : "s41", "avgscore" : 70.28421418013374 }


Component 1: {group:{"_id": "$studentId", "avg score": {"$avg": "$score"}}} Calculates the average score according to the student group.

Component 2: {"$project": {"score": 1, "avgscore": 1}: Displaying score avgscore field_id automatically displays

Component 3: {"$sort": {"avgscore": -1}: Sort in descending order according to avgscore

Component 4: {"$limit": 10} displays 10 items


Question 3: Find out 10 people with the highest total score for each person.

    db.scores.aggregate({"$group":{"_id":"$studentId","count":{"$sum":"$score"}}},{"$sort":{"count":-1}},{"$limit":10})

    { "_id" : "s68", "count" : 332.72352788321206 }
    { "_id" : "s29", "count" : 325.8813741598989 }
    { "_id" : "s87", "count" : 322.21474454559757 }
    { "_id" : "s14", "count" : 312.617478854527 }
    { "_id" : "s33", "count" : 306.24357141924435 }
    { "_id" : "s35", "count" : 298.4157248913078 }
    { "_id" : "s99", "count" : 295.1742953467259 }
    { "_id" : "s6", "count" : 289.607701972324 }
    { "_id" : "s89", "count" : 285.6372877386549 }
    { "_id" : "s41", "count" : 281.136856720535 }


Question 4: Each subject with the highest score.

    db.scores.aggregate({"$group":{"_id":{"sid":"$studentId","course":"$course"},"max":{"$max":"$score"}}},{"$project":{"max":1,"course":1}},{"$sort":{"max":-1}},{"$limit":10})
{ "_id" : { "sid" : "s63", "course" : "Course 1" }, "max" : 99.90599299936622 }
{ "_id" : { "sid" : "s35", "course" : "Course 3" }, "max" : 99.30405860298266 }
{ "_id" : { "sid" : "s64", "course" : "Course 0" }, "max" : 99.1537696673095 }
{ "_id" : { "sid" : "s50", "course" : "Course 2" }, "max" : 98.52420500572934 }
{ "_id" : { "sid" : "s20", "course" : "Course 0" }, "max" : 98.28681013083249 }
{ "_id" : { "sid" : "s40", "course" : "Course 0" }, "max" : 98.2529822556756 }
{ "_id" : { "sid" : "s30", "course" : "Course 2" }, "max" : 98.2195611010153 }
{ "_id" : { "sid" : "s65", "course" : "Course 3" }, "max" : 97.93186806691125 }
{ "_id" : { "sid" : "s10", "course" : "Course 1" }, "max" : 97.87209639127002 }
{ "_id" : { "sid" : "s23", "course" : "Course 3" }, "max" : 97.36489703312259 }



Split command: $unwind

Used to split each value in an array into separate documents.


Question 1: Split the score field array.

    db.userdatas.find({"name":"u4"})
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : [ 7, 4, 2, 0 ] }

    db.userdatas.aggregate({"$match":{"name":"u4"}},{"$unwind":"$score"})
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : 7 }
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : 4 }
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : 2 }
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : 0 }

Component 1: {"$match": {"name":"u4"}} queries the document with name=u4, and then gives the result to Component 2.

Component 2: {"unwind":"score"} splits the score into separate documents.


Sort command: $sort

It can be sorted according to any field, the same grammar as in normal queries. If you want to sort a large number of documents.
Sorting in the first phase of the pipeline is strongly recommended, where indexes can be used.


Question 1: Sort by age field

    db.userdatas.find()
{ "_id" : ObjectId("59789a56bc629e73c4f09e1c"), "name" : "wang wu", "age" : 45 }
{ "_id" : ObjectId("59789a74bc629e73c4f09e1e"), "name" : "wang wu", "age" : 8 }
{ "_id" : ObjectId("59789ac0bc629e73c4f09e20"), "name" : "wang wu", "age" : 33 }
{ "_id" : ObjectId("597f357a09c84cf58880e40e"), "name" : "u1", "age" : 37 }
{ "_id" : ObjectId("597f357a09c84cf58880e40f"), "name" : "u1", "age" : 37 }
{ "_id" : ObjectId("597f357a09c84cf58880e410"), "name" : "u5", "age" : 78 }
{ "_id" : ObjectId("597f357a09c84cf58880e412"), "name" : "u3", "age" : 32 }
{ "_id" : ObjectId("597f357a09c84cf58880e411"), "name" : "u4", "age" : 30, "score" : [ 7, 4, 2, 0 ] }
{ "_id" : ObjectId("597fcc0f411f2b2fd30d0b3f"), "age" : 20, "score" : [ 7, 4, 2, 0, 10, 9, 8, 7 ], "name" : "lihao" }
{ "_id" : ObjectId("597f357a09c84cf58880e413"), "name" : "u2", "age" : 33, "wendang" : { "yw" : 80, "xw" : 90 } }
{ "_id" : ObjectId("5983f5c88eec53fbcd56a7ca"), "date" : ISODate("2017-08-04T04:19:20.693Z") }

    db.userdatas.aggregate({"$sort":{"age":-1}},{"$match":{"age":{"$exists":1}}},{"$project":{"name":1,"age":1,"_id":0}})

{ "name" : "u5", "age" : 78 }
{ "name" : "wang wu", "age" : 45 }
{ "name" : "u1", "age" : 37 }
{ "name" : "u1", "age" : 37 }
{ "name" : "wang wu", "age" : 33 }
{ "name" : "u2", "age" : 33 }
{ "name" : "u3", "age" : 32 }
{ "name" : "u4", "age" : 30 }
{ "age" : 20, "name" : "lihao" }
{ "name" : "wang wu", "age" : 8 }


Component 1: {"$sort": {"age": -1}} descends according to the age field and gives the data to Component 2.

Component 2: {"$match": {"age": {"$exists": 1}}} filters out the fields where age exists, and then gives the data to Component 3.

Component 3: {"$project": {"name": 1, "age": 1,"_id": 0}} projection only shows name, age field.

Keywords: MongoDB

Added by Singularity on Sat, 08 Jun 2019 01:03:21 +0300