Standard two body model implemented with CUDA Thrust
The space target orbit parallel computing technology based on CUDA has four sections, of which the contents of the first, second and third sections are as follows1. Orbit calculation requirements and mission analysis of space target based on CUDA
2 Calculation of spatial coordinate system transformation matrix based on CUDA
2.1 spatial coordinate system transformation based on SOFA
2.2 spatial coordinate system transformation realized by CUDA Thrust
2.3 spatial coordinate system transformation realized by CUDA Runtime**
3 track prediction algorithm of standard two body model based on CUDA
3.1 two body motion model of space target and its CPU implementation
3.2 standard two body model implemented by CUDA Thrust
3.1 standard two body model implemented by CUDA Runtime
1, Direct implementation and efficiency analysis
The two body model is implemented by Thrust, and the first is direct implementation. The definition of the Thrust functor is as follows
struct twoBodyModelTransFunctor_Thrust_1{ double _jdOfBt2Epo; //Gap between start time and ephemeris time OrbitElement _orbitElement; int _stepMilliSecond; __device__ pvOfOrbit operator()(int& i, mat3x3& c2tmat)const { double timeOfCycle=PI*2/(sqrt(MIU_EARTH/_semimajorAxis)/_semimajorAxis); double rmat[3][3]; iauIr(rmat); iauRz(-_orbitElement._RAAN*PI / 180.0, rmat); iauRx(-_orbitElement._inclination*PI / 180.0, rmat); iauRz(-_orbitElement._argumentOfPerigee*PI / 180.0, rmat); double t = (double)_stepMilliSecond*i / 86400.0 / 1000.0 + _jdOfBt2Epo; t = fmod(t, timeOfCycle); t *= 86400; double M=t*sqrt(MIU_EARTH/_semimajorAxis)/_semimajorAxis; M = fmod(M, PI * 2); double radM = _meanAnomaly*PI / 180 + M; //Add initial value double E0 = radM; double E1; while (1){ E1 = radM + _orbitElement._eccentricity * sin(E0); if (fabs(E1 - E0) < 1e-10)break; E0 = E1;} E1 = fmod(E1, PI * 2); double f = 2 * atan((sqrt((1 + _eccentricity) / (1 - _eccentricity))) *tan(E1 / 2)); double tmpcosf = cos(f); double r=_semimajorAxis*(1-_eccentricity*_eccentricity)/(1+_eccentricity*tmpcosf); double h = sqrt(MIU_EARTH*(r + r * _eccentricity*tmpcosf)); double vt = h / r; //Tangential component of velocity double vr = MIU_EARTH / h * _eccentricity*sin(f); //Radial component of velocity double pt[3]; pt[0] = r, pt[1] = 0.0, pt[2] = 0.0; iauRxp(mat, pt, ECIp); iauRxp(c2tmat._mat, ECIp, pv._ECFp._vec); return pv;};};
The input of the function is the sequence number and transformation matrix, which are pre generated arrays and passed as parameters when you call thrust::transform. For each thread, the corresponding data is provided as a parameter. The output returns the calculation result data structure, including the position vector and velocity vector of inertial system and ground solid system, a total of 4 vectors (the velocity vector calculation part is omitted in the above code). Code for calling imitation function, omitted.
Its efficiency is shown in the table below
Sampling points | CPU version time (s) | Thrust version time (s) |
---|---|---|
14400 | 0.011 | 0.002 |
86400 | 0.099 | 0.016 |
864000 | 0.586 | 0.098 |
864000*5 | 2.798 | 0.509 |
CUDA version takes less than 20% of CPU version. Its efficiency has been improved, but it is not as obvious as the transformation of spatial coordinate system.
2, Optimize
The following optimization is based on the direct Thrust version. The main optimization point is the calculation of the conversion matrix from the orbital coordinate system to the inertial coordinate system. This matrix is the same for all sampling points. Therefore, the calculation is carried out on the CPU side and the calculation results are directly transmitted to the GPU (the member variables in the Thrust imitation function should be stored in the constant memory), and the calculation is no longer carried out in the CUDA thread.
The optimized functor code is
struct twoBodyModelTransFunctor_Thrust_3{ double _jdOfBt2Epo; //Gap between start time and ephemeris time OrbitElement _orbitElement; int _stepMilliSecond; double _rmat[3][3]; __device__ pvOfOrbit operator()(int& i, mat3x3& c2tmat)const { double timeOfCycle=PI*2/(sqrt(MIU_EARTH /_semimajorAxis)/_semimajorAxis); double t = (double)_stepMilliSecond*i / 86400.0 / 1000.0 + _jdOfBt2Epo; t = fmod(t, timeOfCycle); Same as the previous code;};};
The difference between the code and the previous version of the code lies in the addition of a transformation matrix_ rmat[3][3], that is, the transformation matrix from orbital coordinate system to inertial coordinate system, is no longer calculated by each thread.
Correspondingly, in the code calling the imitation function, the calculation of the transformation matrix from the orbital coordinate system to the inertial coordinate system needs to be completed. The code for calling the imitation function is as follows.
int propagateOrbitWithTwoBody_Thrust_3(OrbitElement eo, const timeOfSpace& ept, const timeOfSpace& t0, int step, device_vector<mat3x3>& d_mats, device_vector<pvOfOrbit>& d_pvs) { twoBodyModelTransFunctor_Thrust_3 functor; double jd00, jd01, jd10, jd11; iauCal2jd(t0._year, t0._month, t0._day, &jd10, &jd11); double tmp = (jd10 + jd11); tmp+=double((60*(60*t0._hour+t0._minute)+t0._second)*1000+t0._millisecond)/86400; iauCal2jd(ept._year, ept._month, ept._day, &jd00, &jd01); double tmp1 = jd00 + jd01; tmp1+=double((60*(60*ept._hour+ept._minute)+ept._second)*1000+ept._millisecond)/86400; functor._jdOfBt2Epo = tmp - tmp1; double rmat[3][3]; iauIr(rmat); iauRz(-eo._RAAN*PI / 180.0, rmat); iauRx(-eo._inclination*PI / 180.0, rmat); iauRz(-eo._argumentOfPerigee*PI / 180.0, rmat); memcpy(functor._rmat,rmat,9*sizeof(double)) ; functor._orbitElement = eo; functor._stepMilliSecond = stepMilliSecond; d_pvs.resize(d_mats.size()); thrust::device_vector<int> d_tms(d_mats.size()); thrust::sequence(d_tms.begin(), d_tms.end(), 0, 1); thrust::transform(d_tms.begin(), d_tms.end(), d_mats.begin(), d_pvs.begin(), functor);}
It can be seen that these codes can be written according to the general CPU code, including the assignment of imitation function member variables, etc. the initialization and call of CUDA are completed within Thrust, which really simplifies the development process.
Calculate the time of 864000 * 5 sampling points, i.e. about 4 million sampling points, as shown in the table below
method | Time (s) |
---|---|
CPU | 2.798 |
Direct Thrust | 0.509 |
Optimize Thrust | 0.435 |
3, Full code link
The complete source code (combined with the two body model implementation) has been uploaded, and the link is source code