Large Data Migration

Preface:

The company is facing restructuring with around 4 million data.The database design tables of the old and new systems have changed, and the relationships between tables have changed. Some even require a third-party interface to obtain data.

Dependency:

  1. php7
  2. laravel 5.5+
  3. redis

Migration logic:

Old Database-->New Database-->Third Party Platform (e.g. Taobao etc.)

Time spent in the test process

  • For the first time, it took 9 hours to migrate from the old database to the new one, with over 4 million data
  • The second time, migrating from the old database to the new one, more than 4 million data, took 80 minutes
  • The third time, it took more than 20 minutes to migrate from the old database to the new one with more than 4 million data

 

Migrating from Old Data-->Process for Optimizing New Database Schemas

  • For the first time, a file was edited with laravel console command, utilizing chunk features such as:

//Get only 1000 at a time and insert them into the new database, which may cause memory_limit problems if 1000 changes to larger

class MigrateGoodSpecCommand extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string Script commands
     */
    protected $signature = 'migrate:good_spec';

    public function handle(){

       DB::connect('old_Old Database')->from('good_spec')->chunk(1000, function($data){

         //Find related data for other related tables such as classification of old data, brand, warehouse, inventory, bc, cc tax, etc.

       //Organize data $insertData, direct database insertion

          GoodSpecModel::insert($insertData);

        });
    }
 
}

 

  • The second time, read-write separation occurs by placing the read data in the redis queue and then writing to the database
class MigrateGoodSpecCommand extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string Script commands
     */
    protected $signature = 'migrate:good_spec';

    public function handle(){

       DB::connect('old_Old Database')->from('good_spec')->chunk(1000, function($data){

         //Find related data for other related tables such as classification of old data, brand, warehouse, inventory, bc, cc tax, etc.

       //Organize data $insertData into MigrateGoodSpecDataJob Queue task

          MigrateGoodSpecDataJob::dispatch($insertData)->onConnection('redis')->onQueue('migrateGyGood_sku');

        });
    }
 
}
//Implement ShouldQueue interface for queue tasks
class MigrateGoodSpecDataJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
    use LogTrait;

    protected $insertData;
    /**
     * Create a new job instance.
     *
     * @return void
     */
    public function __construct($insertData)
    {
        $this->insertData = $insertData;
    }

    /**
     * Execute the job.
     *
     * @return void
     */
    public function handle()
    {
        GoodSpecModel::insert($data);
    }
}
  • Third, edit optional data range commands, similar to paging, that perform multiple command inserts
class MigrateGoodSpecCommand extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string Script commands
     */
    protected $signature = 'migrate:good_spec {limit}';

    public function handle(){
        $arr = $this->getBetweenId();
        if (empty($arr['min']) || empty($arr['max']))
        {
            $this->log('Data out of range:'.json_encode($arr));
        }
       DB::connect('old_Old Database')->from('good_spec')
            ->where('Id', '>=', $arr['min'])
            ->where('Id', '<', $arr['max'])
        ->chunk(1000, function($data){

         //Find related data for other related tables such as classification of old data, brand, warehouse, inventory, bc, cc tax, etc.

       //Organize data $insertData into MigrateGoodSpecDataJob Queue task

          MigrateGoodSpecDataJob::dispatch($insertData)->onConnection('redis')-    >onQueue('migrateGyGood_sku');

        });
    }

    public function getBetweenId()
    {
        //maxId=0,5000000
        $str = $this->argument('limit');
        $arr = explode(',', $str);
        $maxLimit = $arr[0]+$arr[1];
        $minId = GoodsModel::orderBy('Id', 'asc')->offset($arr[0])->limit(1)->value('Id');
        $maxId = GoodsModel::orderBy('Id', 'asc')->offset($maxLimit)->limit(1)->value('Id');
        if(empty($maxId))
        {
            $maxId = GoodsModel::orderBy('Id', 'desc')->where('Id', '>', $minId)->value('Id');
            $maxId = $maxId +1;
        }
        return [
            'min' => $minId,
            'max' => $maxId
        ];
    }
 
}

 
  • First Trial

Command can only be executed once: php artisan migrate:good_spec

Consider:

This command can only be read, but rewriting data is too slow for a single thread, so imagine that we can read and write while decoupling the two to improve migration speed

  • Second Trial

Read command can only be executed once: php artisan migrate:good_spec

Multiple write commands can be executed similar to multithreaded, open multiple terminals to execute: php artisan queue:work redis-queue=migrate

Reflection:

When reading and writing, it is found that when 2 million data is written, the speed of reading decreases dramatically.

The migration speed can be improved by first reading the data and putting it in the queue, then writing it.

Since the read command can only be executed once, the data range command has been changed to improve speed, and the data can be read simultaneously using the pseudo multithreading principle.

That is, you can read different data at the same time and write different read-through data, so you have a third trial

  • Trial 3: Multiple read commands can be executed

Each command here performs 500,000 reads and 4 million data is divided into the following commands

                    php artisan migrating:good_specifications_ext 0,500000
                    php artisan migrating:good_specifications_ext 500000,500000
                    php artisan migrating:good_specifications_ext 1000000,500000
                    php artisan migrating:good_specifications_ext 1500000,500000
                    php artisan migrating:good_specifications_ext 2000000,500000
                    php artisan migrating:good_specifications_ext 2500000,500000
                    php artisan migrating:good_specifications_ext 3000000,500000
                    php artisan migrating:good_specifications_ext 3500000,500000
                    php artisan migrating:good_specifications_ext 4000000,500000

Multiple write database commands can also be executed, with multiple terminals open: php artisan queue:work redis-queue=migrate

Consider:

Feels like 500,000 copies per command are being read, optimized, and multithreaded.Do you have any other better options?Welcome to the comments section.

Because of its dependence on redis, 4 million data costs 2-3g of memory.

Keywords: Database PHP Redis Laravel

Added by laflair13 on Sat, 20 Jul 2019 05:11:07 +0300