./configure;make speed_test
;
edit src/config.h
(set CONFIG_NOS1SYM
);
cd exe;./spin >output
;
grep ns.t output (for numsymconf speed)
Most data intensive is the sparse matrix (stays in memory or on disk),
followed by vectors and config space (stays in memory).
Symmetries (permutations) fit into the CPU cache normaly.
Matrix is read out sequentially
(no latency problem, for big systems its on disk -> bandwith).
Space computation is mainly integer or bit driven, but because of
missing bit-permutation atomic function its very CPU intensive.
As a first test, space generation could be completely done within FPGA
replacing numsymconf() function, writing out minimum symmetric
configurations to memory (byte packet or long array).
Second test would be implement parts or full hamilton matrix
generation to FPGA, if speedup is about 100, matrix could be generated
on the fly on every iteration without the need of storing the matrix.
This would reduce bandwith problems to disk for bigger spin systems.
Nowadays we are limited by disk bandwith (100MB/s) and could go to
FPGA streams about 1GB/s per node (speedup 10 without needs of disks
and better scaling).
Estimation of FPGA logic needs to compare 40bits configurations
to get the minimum. Permutations at zero costs (just wires)?
Conventional 1.5GHz-x86-CPUs need around 8000 CPU clocks to get
the minimum config for N=40 square system. An FPGA needs one FPGA Clock.
- add ImpulseC codes and results in short: the c-to-vhdl and vhdl-compiler did a bad job, it looks like only the demo codes does compile, trying to compile more complex networks (160 permutations + minimum) does the compiler hang up, its not a fun to work with it I assume bad memory wasting algorithms and bad scaling code which cause OOMs - my hope is that someone writes better (open) compilers for FPGAs - the other way is to make the curcuit much more simple, one could design a butterfly network for permutations (of cause it would be better to integrate it to the CPU)