Go deep into LINQ | unveil IQueryable

stay LastIn depth LINQ | dynamic construction of LINQ expression In this blog post, we explore the power of expression and use it to dynamically build a JSON based rule engine. In this article, we start with expressions instead. Considering the diversity of expression types and the complexity of expression tree, what is a good way to decompose expression tree? Can we mutate the expression to make it behave differently?

First, if you haven't read it Article 1 Please take a few minutes to read the article. The source code of this series is placed in GitHub:

https://github.com/JeremyLikness/ExpressionExplorer

1 Preparation

First, suppose I have a common CLR entity class (you may have heard of it called POCO), which is called {Thing. Here is its definition:

public class Thing
{
    public Thing()
    {
        Id = Guid.NewGuid().ToString();
        Created = DateTimeOffset.Now;
        Name = Guid.NewGuid().ToString().Split("-")[0];
    }

    public string Id { get; set; }
    public string Name { get; set; }
    public DateTimeOffset Created { get; private set; }

    public string GetId() => Id;

    public override string ToString() =>
        $"({Id}: {Name}@{Created})";
}

For simulation, I added a static method to make it easy to generate N numbers of # things:

public static IList<Thing> Things(int count)
{
    var things = new List<Thing>();
    while (count-- > 0)
    {
        things.Add(new Thing());
    }
    return things;
}

Now I can generate a data source and query it. Here is a LINQ expression that can generate 500 "things" and query them:

var query = Thing.Things(500).AsQueryable()
    .Where(t =>
        t.Name.Contains("a", StringComparison.InvariantCultureIgnoreCase) &&
        t.Created > DateTimeOffset.Now.AddDays(-1))
    .Skip(2)
    .Take(50)
    .OrderBy(t => t.Created);

If you call ToString() on query, you will get the following result:

System.Collections.Generic.List`1[ExpressionExplorer.Thing]
    .Where(t =>
        (t.Name.Contains("a", InvariantCultureIgnoreCase)
            AndAlso
        (t.Created > DateTimeOffset.Now.AddDays(-1))))
    .Skip(2)
    .Take(50)
    .OrderBy(t => t.Created)

You may not notice that query has a property called Expression.

The way expressions are built won't be too mysterious. Start with the list, enumerable The Where method is called. The first parameter is an enumerable list (IEnumerable < T >), and the second parameter is a predicate. Inside the predicate, string Contains is called. Enumerable. The Skip method receives an enumerable list and an integer representing the count. Although the syntax for building queries looks simple, you can think of it as a series of progressive filters. The Skip call is an extension method of the enumerable list, which gets the result from the Where call, and so on.

Also to help understand, I drew an illustration to illustrate this:

However, if you want to parse the expression tree, you may be surprised. There are many different expression types, and each expression has a different parsing method. For example, BinaryExpression , has a , Left , and a , Right , but , MethodCallExpression , has a list of , Arguments , expressions. Just traversing the expression tree, there are many type checks and conversions!

2 another Visitor

LINQ provides a special class called ExpressionVisitor. It contains all the logic needed to recursively parse the expression tree. You just need to pass an expression into the {Visit} method, and it will access each node and return the expression (more about it later). It contains node type specific methods that can be overloaded to intercept the process. The following is a basic implementation that simply rewrites some methods to write information to the console.

public class BasicExpressionConsoleWriter : ExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression node)
    {
        Console.Write($" binary:{node.NodeType} ");
        return base.VisitBinary(node);
    }

    protected override Expression VisitUnary(UnaryExpression node)
    {
        if (node.Method != null)
        {
            Console.Write($" unary:{node.Method.Name} ");
        }
        Console.Write($" unary:{node.Operand.NodeType} ");
        return base.VisitUnary(node);
    }

    protected override Expression VisitConstant(ConstantExpression node)
    {
        Console.Write($" constant:{node.Value} ");
        return base.VisitConstant(node);
    }

    protected override Expression VisitMember(MemberExpression node)
    {
        Console.Write($" member:{node.Member.Name} ");
        return base.VisitMember(node);
    }

    protected override Expression VisitMethodCall(MethodCallExpression node)
    {
        Console.Write($" call:{node.Method.Name} ");
        return base.VisitMethodCall(node);
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.Write($" p:{node.Name} ");
        return base.VisitParameter(node);
    }
}

To use it, simply create an instance and pass an expression to it. Here, we will pass our query expression to it:

new BasicExpressionConsoleWriter().Visit(query.Expression);

After running, it outputs non intuitive results, as follows:

call:OrderBy  call:Take  call:Skip  call:Where
constant:System.Collections.Generic.List`1[ExpressionExplorer.Thing]  unary:Lambda
binary:AndAlso  call:Contains  member:Name  p:t  constant:a
constant:InvariantCultureIgnoreCase  binary:GreaterThan  member:Created  p:t
call:AddDays  member:Now  constant:-1  p:t  constant:2  constant:50
unary:Lambda  member:Created  p:t  p:t

Pay attention to the access order. It may take some time to understand this logic, but it makes sense:

  1. OrderBy is the outermost call (last in first out), which accepts a list and a field

  2. The first parameter of OrderBy , is the list, which is provided by , Take ,

  3. Take , needs a list, which is provided by , Skip ,

  4. Skip needs a list provided by Where

  5. Where , requires a list provided by the , Thing , list

  6. The second parameter of Where is a predicate lambda expression

  7. ... It's binary logic, and also

  8. On the left side of binary logic is a "Contains" call

  9. (skip a bunch of logic)

  10. The second parameter of Take # is 50

  11. The second parameter of Skip is 2

  12. The OrderBy property is Created

Did you Get the logic here? Understanding how the tree is parsed is the key to making our Visitor more readable. Here is a more obvious output implementation:

public class ExpressionConsoleWriter
    : ExpressionVisitor
{
    int indent;

    private string Indent =>
        $"\r\n{new string('\t', indent)}";

    public void Parse(Expression expression)
    {
        indent = 0;
        Visit(expression);
    }

    protected override Expression VisitConstant(ConstantExpression node)
    {
        if (node.Value is Expression value)
        {
            Visit(value);
        }
        else
        {
            Console.Write($"{node.Value}");
        }
        return node;
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        Console.Write(node.Name);
        return node;
    }

    protected override Expression VisitMember(MemberExpression node)
    {
        if (node.Expression != null)
        {
            Visit(node.Expression);
        }
        Console.Write($".{node.Member?.Name}.");
        return node;
    }

    protected override Expression VisitMethodCall(MethodCallExpression node)
    {
        if (node.Object != null)
        {
            Visit(node.Object);
        }
        Console.Write($"{Indent}{node.Method.Name}( ");
        var first = true;
        indent++;
        foreach (var arg in node.Arguments)
        {
            if (first)
            {
                first = false;
            }
            else
            {
                indent--;
                Console.Write($"{Indent},");
                indent++;
            }
            Visit(arg);
        }
        indent--;
        Console.Write(") ");
        return node;
    }

    protected override Expression VisitBinary(BinaryExpression node)
    {
        Console.Write($"{Indent}<");
        indent++;
        Visit(node.Left);
        indent--;
        Console.Write($"{Indent}{node.NodeType}");
        indent++;
        Visit(node.Right);
        indent--;
        Console.Write(">");
        return node;
    }
}

A new entry method, Parse, is introduced to Parse and set indentation. The Indent property returns a line feed and the correct number of tabs based on the current Indent value. It is called by each method and formats the output.

Rewriting VisitMethodCall and VisitBinary can help us understand how they work. In VisitMethodCall, the name of the method is printed with an open parenthesis representing the parameters (. Then these parameters are accessed in turn, and recursion will continue for each parameter until it is completed. Then print the closed parentheses). Because the method explicitly accesses the child node instead of calling the base class, the node is simply returned. This is because the base class also recursively accesses parameters and causes duplication. For binary expressions, first print an open angle <, then the left node accessed, then the type of binary operation, then the right node, and finally the closed node. Similarly, the base class method is not called because these nodes have been accessed.

Run this new visitor:

new ExpressionConsoleWriter().Visit(query.Expression);

Better readability of output results:

OrderBy(
    Take(
        Skip(
            Where( System.Collections.Generic.List`1[ExpressionExplorer.Thing]
            ,
                <t.Name.
                    Contains( a
                    ,InvariantCultureIgnoreCase)
                AndAlso
                    <t.Created.
                    GreaterThan.Now.
                        AddDays( -1) >>t)
        ,2)
    ,50)
,t.Created.t)

To see the full implementation, LINQ's own expression StringBuilder contains everything you need to print the expression tree in a friendly format. You can view the source code here:

https://github.com/dotnet/runtime/blob/master/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/ExpressionStringBuilder.cs

The ability to parse expression trees is quite powerful. I'll dig deeper into it in another blog post. Before that, I want to solve the elephant in the room: besides helping to parse the expression tree, what's the meaning of the expression returned by the Visit # method? Facts have proved that ExpressionVisitor can do more than check your query!

3 intrusion query

One of the magic features of ExpressionVisitor is that it can quickly form a query. To understand this, please consider this scenario: your task is to build an order input system with powerful query function, and you must complete it quickly. After reading my article, you decided to use Blazor WebAssembly and write LINQ query on the client. You use a custom visitor to skillfully serialize the query and pass it to the server, where you deserialize and run it. Everything went well until the safety audit. There, it was determined that the query engine was too open. A malicious client can issue extremely complex queries and return a large number of result sets, which paralyzes the system. What would you do?

One advantage of using the Visitor method is that you don't have to reconstruct the entire expression tree in order to modify a child node. The expression tree cannot be changed, but the Visitor can return a new expression tree. You can write the logic of modifying the expression tree and receive the complete expression tree and modification content at the end. To illustrate this, let's write a special Visitor called "ExpressionTakeRestrainer":

public class ExpressionTakeRestrainer : ExpressionVisitor
{
    private int maxTake;
    public bool ExpressionHasTake { get; private set; }

    public Expression ParseAndConstrainTake(
        Expression expression, int maxTake)
    {
        this.maxTake = maxTake;
        ExpressionHasTake = false;
        return Visit(expression);
    }
}

The special , ParseAndConstrainTake , method calls , Visit , and returns an expression. Note that it uses expression hastake to mark whether the expression has a Take. Suppose we only want to return five results. In theory, you can add "Take" at the end of the query:

var myQuery = theirQuery.Take(5);
return myQuery.ToList();

But where is the fun? Let's modify an expression tree. We will only cover one method, that is, VisitMethodCall:

protected override Expression VisitMethodCall(MethodCallExpression node)
{
    if (node.Method.Name == nameof(Enumerable.Take))
    {
        ExpressionHasTake = true;
        if (node.Arguments.Count == 2 &&
            node.Arguments[1] is ConstantExpression constant)
        {
            var takeCount = (int)constant.Value;
            if (takeCount > maxTake)
            {
                var arg1 = Visit(node.Arguments[0]);
                var arg2 = Expression.Constant(maxTake);
                var methodCall = Expression.Call(
                    node.Object,
                    node.Method,
                    new[] { arg1, arg2 } );
                return methodCall;
            }
        }
    }
    return base.VisitMethodCall(node);
}

This logic checks whether the method call is enumerable Take. If so, it sets the "ExpressionHasTake" flag. The second parameter is the number to be read, so the value is checked and compared with the maximum value. If it exceeds the maximum allowed, a new node will be established to limit it to the maximum. This new node will be returned instead of the original node. If the method is not enumerable Take, then the base class will be called and everything will be resolved "as usual".

We can test it by running the following code:

new ExpressionConsoleWriter().Parse(
    new ExpressionTakeRestrainer()
        .ParseAndConstrainTake(query.Expression, 5));

Look at the following results: the query has been modified to take only 5 pieces of data.

OrderBy(
    Take(
        Skip(
            Where( System.Collections.Generic.List`1[ExpressionExplorer.Thing]
            ,
                <t.Name.
                    Contains( a
                    ,InvariantCultureIgnoreCase)
                AndAlso
                    <t.Created.
                    GreaterThan.Now.
                        AddDays(-1) >>t)
        ,2)
    ,5)
,t.Created.t)

But wait Are there five!? Try running this:

var list = query.ToList();
Console.WriteLine($"\r\n---\r\nQuery results: {list.Count}");

And, unfortunately, what you will see is 50 The number of original "fetches". The problem is that we generated a new expression, but we didn't replace it in the query. In fact, we can't This is a read-only property, and the expression is immutable. So what now?

4. Transplanting flowers and trees

We can simply implement iorderedqueryable < T > to make our own query. This interface is a collection of other interfaces. The following is the detailed rules of the interface requirements.

  1. ElementType - this is a simple type of the element being queried.

  2. Expression - the expression behind the query.

  3. Provider - this is the query provider, which completes the actual work of applying queries. Instead of implementing our own providers, we use built-in, in this case LINQ to objects.

  4. GetEnumerator - it will be called when running the query. You can create, expand and modify it at will, but once it is called, the query will be materialized.

Here is an implementation of "TranslatingHost", which translates queries:

public class TranslatingHost<T> : IOrderedQueryable<T>, IOrderedQueryable
{
    private readonly IQueryable<T> query;

    public Type ElementType => typeof(T);

    private Expression TranslatedExpression { get; set; }

    public TranslatingHost(IQueryable<T> query, int maxTake)
    {
        this.query = query;
        var translator = new ExpressionTakeRestrainer();
        TranslatedExpression = translator
            .ParseAndConstrainTake(query.Expression, maxTake);
    }

    public Expression Expression => TranslatedExpression;

    public IQueryProvider Provider => query.Provider;

    public IEnumerator<T> GetEnumerator()
        => Provider.CreateQuery<T>(TranslatedExpression)
        .GetEnumerator();

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

It's quite simple. It receives an existing query and uses , ExpressionTakeRestrainer , to generate a new expression. It uses an existing provider (for example, if this is a query from dbset < T > and EF Core is used on SQL Server, it will be translated into an SQL statement). When the enumerator is requested, it does not pass the original expression, but the translated expression.

Let's use it:

var transformedQuery =
    new TranslatingHost<Thing>(query, 5);
var list2 = transformedQuery.ToList();
Console.WriteLine($"\r\n---\r\nModified query results: {list2.Count}");

The result this time is what we want Only 5 records are returned.

So far, I've introduced checking an existing query and replacing it. This is helpful when you execute a query. If your code is executing query Tolist(), then you can modify the query at will. But what happens when your code is not responsible for materializing queries? If you expose a class library, such as a warehouse class, what happens if it has the following interface?

public IQueryable<Thing> QueryThings { get; }

Or when using EF Core:

public DbSet<Thing> Things { get; set; }

How do you "intercept" a query when the caller calls {ToList()? This requires a Provider, which I will cover in detail in the next article in this series.

Keywords: ASP.NET linq

Added by Scriptor on Tue, 01 Feb 2022 05:59:42 +0200