Design and implementation of Django sub table

Demand background

The rapid growth of data leads to interface performance problems, and sub table is a more effective solution. This paper mainly discusses the following problems

  1. Which model s need sub tables.
  2. By table.
  3. How to create tables dynamically.
  4. How to dynamically add, delete, query and modify corresponding tables.
  5. How to handle ForeignKey and ManyToManyField after table splitting.
  6. migration problem after table splitting.

Which model s need sub tables

First, start from the interface performance, test the interface with jmeter to find out the interface with performance problems, and then add the management log to find the relevant models that affect the performance. Basically, they are some models with large amount of data, many association relationships and multiple textfield queries. What I want to say here is that some of Django's built-in interfaces have serious performance problems, such as bulk_update is a batch update interface. If there is a many to many relationship in the updated object, you can see from the debug log that Django has done a lot of table linking operations, which will be particularly slow, even for several seconds.

Finally, our system determines to divide the five large tables according to a labeling task (the amount of data varies from tens of thousands to hundreds of thousands).

Table splitting method

Vertical table splitting is to vertically divide a table with many fields into N tables. These N tables have a 1-to-1 relationship. This scheme is not applicable to our system.

Horizontal table splitting, as the name suggests, is to divide a table horizontally into N tables. The table structure is the same. Tables are divided according to different rules, such as a table with an odd id and an even id. In this scheme, we use task id to divide tables, one task and one table.

How to create a table dynamically

Django provides the method create to create a table based on the model_ Model. For details, please refer to Official documents , the following is a concrete implementation example.

How to dynamically add, delete, query and modify corresponding tables

With the method of creating tables based on the model, what we need to consider is how to dynamically generate the model. I investigated the practices of the online bigwigs and sorted them out. There are roughly two methods:

  1. Encapsulate the method in the model, and then dynamically generate the model through the type function, so as to dynamically add, delete, check and modify the table. You can check this article for details (scheme 2 in this article can be directly ignored, and direct splicing of sql is actually a scheme), but this method has a fatal disadvantage, that is, it needs to re encapsulate the get, filter, update and other methods of the model, which is really troublesome.
  2. Second encapsulate the model and dynamically modify the DB in the Meta class by passing parameters_ Table field. This method can completely reuse all methods of Django orm, and the implementation is very simple. Now let's realize it.
from django.db import models, connection

# Secondary encapsulation of the model
def get_task_data_model(task_id=None, init=False):
    table_name = 'task_data' # Here is the old data without sub table. It needs to be compatible
    split_table_name = 'task_data_%r' % task_id
    if task_id and (split_table_name in connection.introspection.table_names() or init):
        table_name = split_table_name
    class TaskData(models.Model):
        status = models.IntegerField(default=0)
        desc = models.TextField(null=True)
        ctime = models.DateTimeField(auto_now_add=True)
        mtime = models.DateTimeField(auto_now=True)

        class Meta:
            db_table = table_name
    return TaskData


# Create the corresponding table according to the dynamic model
with connection.schema_editor() as schema_editor:
    logger.info("start create table [%s]" % table_name)
    task_data_model = get_task_data_model(task_id, init=True)
    schema_editor.create_model(task_data_model)
    logger.info("create table [%s] successfully" % table_name)
        
# Add, delete, check and modify according to the dynamic model. All Django orm methods can be used
task_data_model = get_task_data_model(task_id)
task_data_model.objects.create()            # increase
task_data_model.objects.filter()            # check
task_data_model.objects.filter().delete()   # Delete
task_data_model.objects.filter().update()   # change

How to handle ForeignKey and ManyToManyField after table splitting

To illustrate this problem, let me give an example. Suppose I have two model s before splitting the table, User and Project. There is a creator field in the Project, and the foreign key is associated with the User table:

creator = models.ForeignKey(
    User,
    db_constraint=False,
    null=True,
    on_delete=models.DO_NOTHING
)

We know that the foreign key actually stores an id. if the user divides the tables and each table is self incremented, then the creator_id will be repeated, so we don't know which sub table user it comes from. So is ManyToManyField. There are several ways to solve this problem

  1. After splitting a table, you cannot use the self incrementing id of a single table. Maintain a global self incrementing id, and then judge which sub table it comes from through the id. this method is cumbersome to implement, and there will be bug s if you are careless.
  2. A field is added to the Project table to record which sub table is associated. An additional field is added to the filter when querying the associated information.
  3. Change the ForeignKey field to the IntergerField type. You can't use '.' to query. Finally, we use this method.

Keywords: Python Django

Added by reapfyre on Tue, 28 Sep 2021 05:42:41 +0300