Note: Due to Proprietary Knowledge agreements, I cannot share the entire code of the simulation, however I can describe my problem solving process and share some snippets of my code.
Core Distribution of a CPU versus a GPU.
The CPU is purposed build for sequential computer operations. Hosting only a few cores, these processors are purpose built for a variety of complex tasks. The GPU however, hosts hundreds of different cores, with dedicated high-speed memory and integer and floating-point processors. If you need to perform simple arithmetic calculations repeatedly on a large data set, then parallelizing your code to process on the GPU can greatly reduce the time complexity of your algorithm.
In this project, I received a MATLAB simulation which modeled the force dynamics of cellular microtubules pushing and pulling the centrosomes as part of the mitosis process. These microtubules would dynamically form and disintegrate within the cell and could number anywhere from three hundred to fifteen hundred microtubules at any given time. The data of any given microtubule would be stored as an indexed entry in twenty four different arrays.
The GPU only outperforms the CPU provided a sufficient space complexity and algorithmic complexity. This is because using the GPU in MATLAB requires specific time consuming steps in order to perform.
To solve these problems, my MATLAB GPU version of the simulation pre-initialized large arrays which held required values of all the microtubules which could exist in, and held a Boolean array to dictate whether these microtubules existed or not, influencing whether they would contribute to the overall force dynamics of the model.
%Total number of possible MT's if you consider growth curves and time steps
n_MT_GPU = n_MT_init;
upBound = 1500;
extraMT = upBound - n_MT_GPU;
%Initializes extra space for new microtubuals. Instead of appending entries
%or removing entries to for creating and deleting a MT, all possible MT's are
%initialized beforehand, a specific subset will be marked as existing by a logic array while
%the rest are marked as non-existing. Both existing and non-existing MT's
%will have their location data updated with each time step according to the
%motion of the centrosomes, however only the existing subset will be
%calculated as contributing to the force calculations on the centrosomes
%GPU Versions of Initializes Arrays
%Centrosome Info
centers_1_GPU = 2*(xRadius-7.5)*rand(1,nc)-xRadius+7.5; %Line 22 initialization
centers_2_GPU = 2*(yRadius-7.5)*rand(1,nc)-xRadius+7.5; %Line 23 initialization
centers_1_old_GPU = centers_1_GPU;
centers_2_old_GPU = centers_2_GPU;
vel_centers_1_GPU = (centers_1_GPU - centers_1_old_GPU)/dt;
vel_centers_2_GPU = (centers_2_GPU - centers_2_old_GPU)/dt;
%Centrosome Force Applications
F_DCL_1_GPU = zeros(1,nc,'gpuArray');
F_DCL_2_GPU = zeros(1,nc,'gpuArray');
%MT Info
c_attach_GPU = randi([1 nc],1,upBound,'gpuArray'); %Line 44 initialization
MT_length_GPU = [0.5*rand(1,n_MT,'gpuArray'),0.01*ones(1,extraMT,'gpuArray')]; %Line 48 initialization, Line 8 createnewMTs
MT_angle_GPU = rand(1,upBound,'gpuArray')*2*pi; %Line 50 initialization, Line 9 createnewMTs
%Initializes info about the centers which is connected to each MT
mt_centers_1_GPU = zeros(1,upBound,'gpuArray');
mt_centers_2_GPU = zeros(1,upBound,'gpuArray');
for nc_iter = 1:nc
mt_centers_1_GPU(c_attach == nc_iter) = centers_1_GPU(nc_iter);%Line 52-54 initialization, Line 7 createnewMTs
mt_centers_2_GPU(c_attach == nc_iter) = centers_2_GPU(nc_iter);
end
mt_vel_1_GPU = zeros(1,upBound,'gpuArray');
mt_vel_2_GPU = zeros(1,upBound,'gpuArray');
X_MT_1_GPU = (MT_length_GPU+c_rad).*cos(MT_angle_GPU)+mt_centers_1_GPU;%Line 56 initialization
X_MT_2_GPU = (MT_length_GPU+c_rad).*sin(MT_angle_GPU)+mt_centers_2_GPU;%Line 56 initialization
mt_vec_1_GPU = (1./(MT_length_GPU)).*(X_MT_1_GPU-(mt_centers_1_GPU+c_rad*cos(MT_angle_GPU)));%Line 57-60 initialization
mt_vec_2_GPU = (1./(MT_length_GPU)).*(X_MT_2_GPU-(mt_centers_2_GPU+c_rad*cos(MT_angle_GPU)));%Line 57-60 initialization
%MT State Data
MT_state_GPU = ones(1,upBound,'gpuArray');
%Initializing another array called MT_Flavor to keep track of force
%application
MT_flavor_GPU = zeros(1,upBound,'gpuArray');
%For now, by convention will make it such that f applying spindle pole
%dynein force then it's flavor will be 1, if applying no force then it's
%flavor is 0, will likely have to change later if an MT can contribute to
%multiple different force types.
%Other GPU parameters which aren't CPU initialized
k2_GPU =fac*gv_astral*dt*MT_length_GPU;
ProbRand = rand(1,upBound,'gpuArray');
Exist = boolean([ones(1,n_MT,'gpuArray'),zeros(1,extraMT,'gpuArray')]);
f_MT_1_GPU = zeros(nc,upBound,'gpuArray');
f_MT_2_GPU = zeros(nc,upBound,'gpuArray');
net_force_1_GPU = zeros(1,nc,'gpuArray');
net_force_2_GPU = zeros(1,nc,'gpuArray');
%Scrap variables to let the Spindle_Pole_Dynein Function work
a = 1;
b = 1;
c = 1;
xclose = 1;
yclose = 1;
The model accounted for several force contributing processes, one in particular is the force contributions between microtubules connecting to different centrosomes and pushing against each other via the motor protein Dynein. Translating these calculations to the GPU provided the most significant improvement to the models efficiency as the calculation required checking the connection of each microtubule to every other microtubule, under a $O(N^2)$ space complexity.
%This function calcualtes the force component generated by dynein acting on
%the spindle pole between respective centrosomes.
%The function uses the velocity of the centers in accordance with their
%locaion in a pervious timestep, this will be caculated earlier outside
%this force function
function [F_DCL_1_GPU,F_DCL_2_GPU,MT_flavor_GPU] = Spindle_Pole_Dynein_GPU(F_DCL_1_GPU,F_DCL_2_GPU,f_MT_1_GPU,f_MT_2_GPU,Exist,c_attach_GPU,MT_length_GPU,X_MT_1_GPU,X_MT_2_GPU,mt_vec_1_GPU,mt_vec_2_GPU,mt_centers_1_GPU,mt_centers_2_GPU,...
mt_vel_1_GPU,mt_vel_2_GPU,centers_1_GPU,centers_2_GPU,vel_centers_1_GPU,vel_centers_2_GPU,MT_flavor_GPU,ProbRand,nc,MinDBind,vf,f0_dynein,v0_dynein,LengthFac,a,b,c,xclose,yclose)
%Iterates over each centrosome, and sees if the microtubual will apply
%a force to that centrosome
for i = 1:nc
[f_MT_1_GPU(i,:),f_MT_2_GPU(i,:),MT_flavor_GPU] = arrayfun(@Sub_Spindle_Pole_Dynein_GPU,Exist,f_MT_1_GPU(i,:),f_MT_2_GPU(i,:),c_attach_GPU,MT_length_GPU,X_MT_1_GPU,X_MT_2_GPU,mt_vec_1_GPU,mt_vec_2_GPU,mt_centers_1_GPU,mt_centers_2_GPU,...
mt_vel_1_GPU,mt_vel_2_GPU,centers_1_GPU(i),centers_2_GPU(i),vel_centers_1_GPU(i),vel_centers_2_GPU(i),MT_flavor_GPU,ProbRand,i,MinDBind,vf,f0_dynein,v0_dynein,LengthFac,a,b,c,xclose,yclose);
end
%Now that the matrix containing force contributions from each MT has
%been filled out, we iterate over that matrix, summing up the
%contributions onto each centrosome
for j = 1:nc
F_DCL_1_GPU(j) = sum(f_MT_1_GPU(j,:)) + sum(sum(f_MT_1_GPU(:,c_attach_GPU==j)));
F_DCL_2_GPU(j) = sum(f_MT_2_GPU(j,:)) + sum(sum(f_MT_2_GPU(:,c_attach_GPU==j)));
end
end
This sub-function script was parallelized to run on every microtubule simultaneously on the GPU.
%This function is built to operate using arrayfun to permute across every
%MT to calculate if it contributes a force component onto a specific
%centrosome. Orchestrated to check the simplest parameters first in order
%to speed up calculation
%i is the target centrosome for the force, j is the other centrosome we are
%looking at
function [f_MT_1_GPU,f_MT_2_GPU,MT_flavor_GPU] = Sub_Spindle_Pole_Dynein_GPU(Exist,f_MT_1_GPU,f_MT_2_GPU,c_attach_GPU,MT_length_GPU,X_MT_1_GPU,X_MT_2_GPU,mt_vec_1_GPU,mt_vec_2_GPU,mt_centers_1_GPU,mt_centers_2_GPU,...
mt_vel_1_GPU,mt_vel_2_GPU,centers_1_GPU,centers_2_GPU,vel_centers_1_GPU,vel_centers_2_GPU,MT_flavor_GPU,ProbRand,i,MinDBind,vf,f0_dynein,v0_dynein,LengthFac,a,b,c,xclose,yclose)
%If this MT doesn't contribute any force amount to the centrosome, then
%it's contribution is zero. It is preemptively set to zero and only by
%passing all the conditional checks will it's contribution become nonzero.
f_MT_1_GPU = 0;
f_MT_2_GPU = 0;
%Checks existence of the MT and whether it will bind if it can, and ensures
%that the specific MT isn't bound to the target centrosome i
if Exist && (c_attach_GPU ~= i) && (ProbRand<=0.3)%ProbD
%Checks if the length of the MT is long enough to span the distance between the target centrosome i and the
%other centrosome j, while using the info that the MT is bound to the
%other centrosome j in the mt_centers data arrays
if (MT_length_GPU >= sqrt((centers_1_GPU-mt_centers_1_GPU)^2+(centers_2_GPU-mt_centers_2_GPU)^2) -MinDBind)%Line 20 Spindle_Pole_Dynein
%mindistCtoC = sqrt((centers_1_GPU-mt_centers_1_GPU)^2+(centers_2_GPU-mt_centers_2_GPU)^2)
%Checks if the MT is interpolant/facing towards the target
%centrosome.
%dot product
if (real(acos(complex(mt_vec_1_GPU*(centers_1_GPU-mt_centers_1_GPU) + mt_vec_2_GPU*(centers_2_GPU-mt_centers_2_GPU))))<=pi/2)
%Final check to see if the distMTtoi parameter is small enough
%as calculated in the Spindle_Pole_Dynein function
a = -mt_vec_1_GPU/mt_vec_2_GPU;
c = -(X_MT_2_GPU)+a*X_MT_1_GPU;
xclose=(b*(b*centers_1_GPU-a*centers_2_GPU)-a*c)/(a^2+b^2);
yclose=(a*(-b*centers_1_GPU+a*centers_2_GPU)-b*c)/(a^2+b^2);
if ((xclose-centers_1_GPU)^2+(yclose-centers_2_GPU)^2)<1
%Dynein force interaction
f_MT_1_GPU = -(f0_dynein*(1-(vel_centers_1_GPU-mt_vel_1_GPU-vf)/v0_dynein))*(mt_vec_1_GPU)*exp(-MT_length_GPU/(LengthFac*(sqrt((centers_1_GPU-mt_centers_1_GPU)^2+(centers_2_GPU-mt_centers_2_GPU)^2))));%Eq2 and Eq8 in paper
f_MT_2_GPU = -(f0_dynein*(1-(vel_centers_2_GPU-mt_vel_2_GPU-vf)/v0_dynein))*(mt_vec_2_GPU)*exp(-MT_length_GPU/(LengthFac*(sqrt((centers_1_GPU-mt_centers_1_GPU)^2+(centers_2_GPU-mt_centers_2_GPU)^2))));
MT_flavor_GPU = 1;
end
end
end
end
end