Thanks for the fast response.
Hi
On 07/22/2014 01:24 AM, Erik Schultheis wrote:
Dear all,
i am simulating particle trajectories, and depending on the data i want
to generate i may or may not have to track additional data [boost.ublas
c_matrix] along those trajectories. Right now i enable/disable that via
a template parameter (so, using std::tuple(vec, vec) or tuple(vec, vec,
c_matrix) and fusion_algebra), which has a few drawbacks.
The algorithm is already templated to support varying dimensions via a
template parameter, so the additional data practically doubles the
number of template instantiations for system and all possible observers.
This actually got so bad that gcc (mingw gcc 4.8) refuses to compile
with debugging enabled. Therefore, i would like to make the switch
between tracking the matrix and ignoring it a runtime choice instead of
a template parameter.
I don't understand fully what you exactly want to do. Can you show us
some more code? Can you put the tracking into an observer? I mean, do
you need to solve an ODE to track the additional data?
Unfortunatly, the tracking is an ode itself, so the change depends on
the position and the current matrix value, so
it cannot be done inside of an observer.
As for the current implementation:
I have a class BasicTracer, where S is the template parameter
that decides the state type (currently one of std::array, 2>,
std::array, 2>, std::tuple, , matrix<4>> and
std::tuple, vec<3>, matrix<6>>) and I the integration startegy
(either Integrator, that only operates on the vectors, or
MonodromyIntegrator, that also operates on the matrix). The reason for
this is that I have two different use cases for the program: 1) a
human-viewable visualisation of the trajectories that does not need
matrices, and 2) the actual data i want to generate. Since 1) needs far
more trajectories to be calculated, it is really helpful that the
computation time in this case is lower.
With the mentioned limitation (gcc refuses to compile with -g,
"tracer.o: too many sections (50892)" and "ccQF7l7o.s: Fatal error:
can't close .obj\Debug\src\tracer.o: File too big") the code works as it
is now.
For now i have only worked with small problem sizes, where the wait time
for a result is in the order of a few minutes, however, this is going to
change and therefore i want to parallelize the code. The differentiation
between he use cases via templates make the code quite complex at times,
and i hope to reduce that complexity before i add the parallelization.
Therefore, I want to transform from a templated approach to a runtime
approach. I am not sure if it is possible to make the code select the
dimension at runtime without significant performance impact, but at
least the second parameter should be removable without increasing
runtime significantly.
I am not really sure which parts of the code to post to significantly
inprove my description.
The tracer itself is a type like
template class BasicTracer, vec<3>, monodromy_matrix<3>>, MonodromyIntegrator, vec<3>, monodromy_matrix<3>>>>;
or
template class BasicTracer, 2>, Integrator, 2>>>;
the integration is performed like
double starte = mEnergy(p);
try
{
mMasterObserver.startTrajectory(p);
perform_integration(mIntegrator, p, 2.0, 1.0 / mPotential.getSize(), mMasterObserver, mAbsErrorBound, mRelErrorBound);
} catch(int& i) {};
/// \todo this is the only reason for getLastState's existence
mEnergyError += std::pow( starte - mEnergy( mMasterObserver.getLastState() ), 2 );
where perform integration just sets parameters to the odeint function:
template
std::size_t perform_integration(const System& system, State& start, double t_end, double dt, Observer& obs, double abs_err, double rel_err)
{
using namespace boost::numeric::odeint;
// when range access is possible, try range algebra, otherwise, try fusion
using algebra_type = typename std::conditional::type;
typedef runge_kutta_cash_karp54 error_stepper_type;
typedef controlled_runge_kutta controlled_stepper_type;
typedef default_error_checker error_checker_type;
controlled_stepper_type stepper(error_checker_type(abs_err, rel_err));
return integrate_const(stepper, system, start, 0.0, t_end, dt, std::ref(obs) );
}
and the operator() of the integrators look like
template<class S>
void Integrator<S>::operator()( const S& state, S& deriv, double/*t*/ ) const
{
using namespace system_state;
auto position = get_position(state);
auto p = position * mPotential.getSize();
double x = p[0];
double y = p[1];
// prevent out of bounds access
if(y < 0 || y > mPotential.getSize()-1 || x < 0 || x > mPotential.getSize()-1)
{
throw(1);
};
// calculate acceleration
auto& acceleration = get_velocity(deriv);
for(int i = 0; i < DIMENSION; ++i)
{
/// the compiler should be able to ompimize this
std::array didx;
didx.fill(0);
didx[i] = 1;
acceleration[i] = -linearInterpolate( mPotential.getDerivative(didx), p );
}
// change in position = current velocity
get_position( deriv ) = get_velocity( state );
}
template<class S>
void MonodromyIntegrator<S>::operator()( const S& state, S& deriv, double/*t*/ ) const
{
using namespace system_state;
auto position = get_position(state);
auto p = position * mPotential.getSize();
double x = p[0];
double y = p[1];
// prevent out of bounds access
if(y < 0 || y > mPotential.getSize()-1 || x < 0 || x > mPotential.getSize()-1)
{
throw(1);
};
// calculate acceleration
auto& acceleration = get_velocity(deriv);
for(int i = 0; i < DIMENSION; ++i)
{
std::array didx;
didx.fill(0);
didx[i] = 1;
acceleration[i] = -linearInterpolate( mPotential.getDerivative(didx), p );
}
// change in position = current velocity
get_position( deriv ) = get_velocity( state );
// change in monodromy
boost::numeric::ublas::axpy_prod(getMonodromyCoeff( mPotential, p ), get_data(state), get_data(deriv), true);
}
If there is some other part of the code that would be helpful, please
let me know.
However, the cost of tracking that matrix is quite high (that is why it
is optional), but simply disableing the d matrix / dt calculation in the
system does not speed up the calculation much (86 s -> 58 s vs 27 s
without matrix). A lot of time is spent in adding and multiplying
numbers that are not used anywhere, or actually even copying them (e.g
std::swap(c_matrix, c_matrix) shows up quite high in gprof (even though
i am not sure how reliable the -pg option is in conjunction with -O3)).
So what i am asking here is for suggestions how to best let odeint treat
the c_matrix as uninitialized memory and never perform any operations on
that depending on a runtime parameter.
Options i thought of were:
* deriving a class from c_matrix and providing my own versions of
operator overloads that check a flag:
as far as i know, boost.ublas matrices use some template magic to
let operators only return expression templates to optimize evaluation if
operations are chained, and i would not be sure about how to actually
retain that behaviour in case the matrix is tracked.
* creating my own algebra class:
from what i could tell from a few quick glances into the algebra
files, this would probably be conceptually easier than the other option,
but would require a lot of repetetive coding.
Does anyone have suggestion for a more efficient implementation strategy?
Have you considered to use a fusion view which only points to tuple( vec
, vec ). This could be used as the state type you use. Nevertheless, you
need to instantiate the stepper with a real resizable state_type which
you could be again a tuple( vec , vec ).
No, I didn't know about fusion view up until now :), so i am going to
take a look at it. Why would I need a resizable state_type?
Whether I track the matrix or not is decided before the call into the
integrate function.
From the options you showed here, an own algebra looks most promising.
This algebra you also be implemented in terms of the existing
fusion_algebra and redirect all calls to fusion_algebra.