Standard two body model implemented with CUDA Thrust

Standard two body model implemented with CUDA Thrust

The space target orbit parallel computing technology based on CUDA has four sections, of which the contents of the first, second and third sections are as follows

1. Orbit calculation requirements and mission analysis of space target based on CUDA

2 Calculation of spatial coordinate system transformation matrix based on CUDA
2.1 spatial coordinate system transformation based on SOFA
2.2 spatial coordinate system transformation realized by CUDA Thrust
2.3 spatial coordinate system transformation realized by CUDA Runtime**

3 track prediction algorithm of standard two body model based on CUDA
3.1 two body motion model of space target and its CPU implementation
3.2 standard two body model implemented by CUDA Thrust
3.1 standard two body model implemented by CUDA Runtime

1, Direct implementation and efficiency analysis

The two body model is implemented by Thrust, and the first is direct implementation. The definition of the Thrust functor is as follows

struct twoBodyModelTransFunctor_Thrust_1{
	double		_jdOfBt2Epo;	//Gap between start time and ephemeris time
	OrbitElement _orbitElement;
	int		_stepMilliSecond;
	__device__ pvOfOrbit operator()(int& i, mat3x3& c2tmat)const {
		double timeOfCycle=PI*2/(sqrt(MIU_EARTH/_semimajorAxis)/_semimajorAxis);
		double rmat[3][3];
		iauIr(rmat);
		iauRz(-_orbitElement._RAAN*PI / 180.0, rmat);
		iauRx(-_orbitElement._inclination*PI / 180.0, rmat);
		iauRz(-_orbitElement._argumentOfPerigee*PI / 180.0, rmat);
		double	t = (double)_stepMilliSecond*i / 86400.0 / 1000.0 + _jdOfBt2Epo;
		t = fmod(t, timeOfCycle);
		t *= 86400;
		double M=t*sqrt(MIU_EARTH/_semimajorAxis)/_semimajorAxis;
		M = fmod(M, PI * 2);
		double	radM = _meanAnomaly*PI / 180 + M;	//Add initial value
		double E0 = radM;
		double E1;
		while (1){
			E1 = radM + _orbitElement._eccentricity * sin(E0);
			if (fabs(E1 - E0) < 1e-10)break;
			E0 = E1;}
		E1 = fmod(E1, PI * 2);
		double f = 2 * atan((sqrt((1 + _eccentricity) / (1 - _eccentricity))) *tan(E1 / 2));
		double  tmpcosf = cos(f);
		double r=_semimajorAxis*(1-_eccentricity*_eccentricity)/(1+_eccentricity*tmpcosf);
		double h = sqrt(MIU_EARTH*(r + r * _eccentricity*tmpcosf));
		double vt = h / r;   //Tangential component of velocity
		double vr = MIU_EARTH / h * _eccentricity*sin(f); //Radial component of velocity
		double pt[3];
		pt[0] = r, pt[1] = 0.0, pt[2] = 0.0;
		iauRxp(mat, pt, ECIp);
		iauRxp(c2tmat._mat, ECIp, pv._ECFp._vec);
		return pv;};};

The input of the function is the sequence number and transformation matrix, which are pre generated arrays and passed as parameters when you call thrust::transform. For each thread, the corresponding data is provided as a parameter. The output returns the calculation result data structure, including the position vector and velocity vector of inertial system and ground solid system, a total of 4 vectors (the velocity vector calculation part is omitted in the above code). Code for calling imitation function, omitted.

Its efficiency is shown in the table below

Sampling pointsCPU version time (s)Thrust version time (s)
144000.0110.002
864000.0990.016
8640000.5860.098
864000*52.7980.509

CUDA version takes less than 20% of CPU version. Its efficiency has been improved, but it is not as obvious as the transformation of spatial coordinate system.

2, Optimize

The following optimization is based on the direct Thrust version. The main optimization point is the calculation of the conversion matrix from the orbital coordinate system to the inertial coordinate system. This matrix is the same for all sampling points. Therefore, the calculation is carried out on the CPU side and the calculation results are directly transmitted to the GPU (the member variables in the Thrust imitation function should be stored in the constant memory), and the calculation is no longer carried out in the CUDA thread.

The optimized functor code is

struct twoBodyModelTransFunctor_Thrust_3{
	double		_jdOfBt2Epo;	//Gap between start time and ephemeris time
	OrbitElement _orbitElement;
	int		_stepMilliSecond;
	double	_rmat[3][3];
	__device__ pvOfOrbit operator()(int& i, mat3x3& c2tmat)const {
		double timeOfCycle=PI*2/(sqrt(MIU_EARTH /_semimajorAxis)/_semimajorAxis);
		double	t = (double)_stepMilliSecond*i / 86400.0 / 1000.0 + _jdOfBt2Epo;
		t = fmod(t, timeOfCycle);
		Same as the previous code;};};

The difference between the code and the previous version of the code lies in the addition of a transformation matrix_ rmat[3][3], that is, the transformation matrix from orbital coordinate system to inertial coordinate system, is no longer calculated by each thread.

Correspondingly, in the code calling the imitation function, the calculation of the transformation matrix from the orbital coordinate system to the inertial coordinate system needs to be completed. The code for calling the imitation function is as follows.

int		propagateOrbitWithTwoBody_Thrust_3(OrbitElement eo, const timeOfSpace& ept, const timeOfSpace& t0, int step, device_vector<mat3x3>& d_mats, device_vector<pvOfOrbit>& d_pvs)	{
	twoBodyModelTransFunctor_Thrust_3 functor;
	double jd00, jd01, jd10, jd11;
	iauCal2jd(t0._year, t0._month, t0._day, &jd10, &jd11);
	double tmp = (jd10 + jd11);
	tmp+=double((60*(60*t0._hour+t0._minute)+t0._second)*1000+t0._millisecond)/86400;
	iauCal2jd(ept._year, ept._month, ept._day, &jd00, &jd01);
	double tmp1 = jd00 + jd01;
	tmp1+=double((60*(60*ept._hour+ept._minute)+ept._second)*1000+ept._millisecond)/86400;
	functor._jdOfBt2Epo = tmp - tmp1;
	double rmat[3][3];
	iauIr(rmat);
	iauRz(-eo._RAAN*PI / 180.0, rmat);
	iauRx(-eo._inclination*PI / 180.0, rmat);
	iauRz(-eo._argumentOfPerigee*PI / 180.0, rmat);
	memcpy(functor._rmat,rmat,9*sizeof(double)) ;
	functor._orbitElement = eo;
	functor._stepMilliSecond = stepMilliSecond;
	d_pvs.resize(d_mats.size());
	thrust::device_vector<int> d_tms(d_mats.size());
	thrust::sequence(d_tms.begin(), d_tms.end(), 0, 1);
	thrust::transform(d_tms.begin(), d_tms.end(), d_mats.begin(), d_pvs.begin(), functor);}

It can be seen that these codes can be written according to the general CPU code, including the assignment of imitation function member variables, etc. the initialization and call of CUDA are completed within Thrust, which really simplifies the development process.

Calculate the time of 864000 * 5 sampling points, i.e. about 4 million sampling points, as shown in the table below

methodTime (s)
CPU2.798
Direct Thrust0.509
Optimize Thrust0.435

3, Full code link

The complete source code (combined with the two body model implementation) has been uploaded, and the link is source code

Keywords: C++ Algorithm CUDA

Added by mclamais on Thu, 06 Jan 2022 02:22:34 +0200