Simulation data come in diverse ASCII formats. MP_tools
converts them into a proprietary binary format to facilitate their further handling and to standardise the input information. At this initial stage is also created an accompanying <project_name>.par
file (referred to as .PAR
hereafter) with additional information and specific MP_tools
settings for each project. The present section describes the main principles of this action, followed by more details in sections devoted to specific conversion tools.
1/ SIMULATION TYPES
MP_tools
were originally developed to treat sequences of dump files from DL_POLY
molecular dynamics (MD) simulations. Later on their use has been extended to treatment of output from other sources. The type of simulation has to be specified in the associated parameter file (.PAR
) by specifying
sim_type = ‘TIMESTEP’
The alternative values is 'STATIC'
, in which case all the energy-resolved functions of MP_tools
will be blocked.
2/ DATA TYPES
At present MP_tools
can handle ASCII output data from a vast variety of simulation packages. For historical reasons DL_POLY
and LAMMPS
(MD) are particular cases, for which the file header contents is decoded to provide automatically the accompanying information on auxiliary parameters of the simulation and on the details of the data structure within the two respective input tools MP_DBIN
and MP_LBIN
. In all other cases the user has to fill in the corresponding parameter values into the accompanying .PAR
file (cf. templates in this package) and provide a symbolic description of the data records, which will be parsed by the input code of the MP_LBIN
tool .
The data type has to be specified by setting
DAT_TYPE = ‘DL_POLY’
Alternative values are 'LAMMPS'
and 'GENERAL'
, in the latter case an arbitrary string (<16 characters), eg. 'VASP'
, can be supplied to identify the specific case. This name will be stored with the binary data for later information and the input ASCII data will be treated along the 'GENERAL'
scheme.
3/ UNITS OF MEASUREMENT
Intrinsically, MP_tools
use the system of ‘atomic’ units (sometimes called DAPS) of Dalton (Da) or gram/mole for mass, Angstrom (Å) for length and picosecond (ps) for time. The energy of excitations is expressed as frequency in THz (1THZ = 1 ps-1 ≈ 4.136 meV).
On input, the DL_POLY
data is assumed to have space coordinates in Angstroms and velocities and forces following the DAPS logic. For LAMMPS
the convention is similar (corresponding its the METAL
style).
In the GENERAL
cases the atom position coordinates may be given either in absolute units (Angstrom), in lattice units (l.u.) or as fractions of the simulation box size (i.e. on the scale 0. to 1.), the corresponding case is to be specified in the .PAR
file by
POS_UNITS = ‘LATTICE’
or, alternatively, 'ANGSTROM
’ or 'BOX'
. After conversion the binary data use either 'LATTICE'
or 'ANGSTROM'
units in a transparent way (cf. below). The case of 'BOX'
is in fact a special situation of 'LATTICE'
with a single unit cell occupying the whole simulation box.
4/ DATA INPUT METHODS
Each of the two MP_tools
data conversion utilities, MP_DBIN
and MP_LBIN
, can use two fundamentally different input methods to interprete the input data: CELL
and BULK
. Common to the two modes is conversion to a coordinate system centred in the middle of the simulation box as required by the subsequent use of the NUFFT library to calculate the lattice sums by non-uniform FFT. The CELL
method will convert coordinates into LATTICE
units (l.u.) and assign each atom to a specific unit cell while the BULK
method may produce coordinates in either LATTICE
or ANGSTROM
units. Here atoms will be just grouped according to their chemical symbol (or another supplied key) without identifying their unit cell (cf. below). This latter method is useful for heavily distorted systems with interstitial atoms and/or coexisting highly disoriented sublattices.
With the CELL
input method the coordinates are converted to the lattice units (l.u.). Each atom is checked against structural information given in the ATOMS
table of the .PAR
file in order to identify its special position in the unit cell and to assign it to the corresponding sublattice. This will make possible to address any single sublattice individually (e.g. the oxygens at the 0.5 0.5 0 positions in a perovskite crystal) in the course of further analysis. This task is largely facilitated if already in the input ASCII file each atom has its unique symbol (eg. O1 identifying the [.5 .5 0] oxygen in an ABO3 perovskite) or a numerical key identifying its position in the crystallographic basis (atom type
in LAMMPS). In absence of such information the MP_*BIN
code attempts to identify each atom by matching its position against the basis vectors given in the ATOMS
section of the .PAR
file with a tolerance margin given by the EPS
parameter. This procedure is only efficient for relatively small atom displacements (EPS
not exceeding half of the distance between neighbours).
In the BULK
mode the atom coordinates may be either converted to l.u., if the corresponding dimensions of the simulation box are specified in the .PAR
file, eg. for a box of 32 x 32 x 32 unit cells
N_ROW = 32 32 32
or they can by kept as they are (arbitrary in principle, but usually angstrom, Å) by specifying
N_ROW = 1 1 1
If the input coordinates are given in fractions of the simulation box size, MP_LBIN
may convert them by multiplying them by the simulation box size (in LATTICE or ANGSTROM units) specified in .PAR
(eg. in ANGSTROM):
A_CELL_PAR = 12.012030 10.402723 42.498058
In any case the BULK
mode reads in the atom data without any attempt to put them into correspondence with a regular lattice. On the output the atoms are classified by their labels (chemical species), in each class they are indexed in the order of the input. The atom labels have to be exactly reproduced in the first column of the ATOMS
section of the .PAR
file. This approach permits to address systems distorted and disordered without any limits, including irregular shapes of the simulation box. On the other hand, in such a case not all functionalities of the MP_tools
are available because of the missing information.
Further details are given in the corresponding sections dealing in detail with the LAMMPS, GENERAL and DL_POLY cases.
5/ MP_tools
BINARY DATA STRUCTURE
The output uses a binary direct-access file structure with 1024 byte records (L_REC4 = 256
of 32-bit words), a compromise between I/O speed and the possibility to selectively access individual atom data in the binary files. All the binary data are aligned with the 32-bit word length.
The first three records contain header information, which is stored in text format, so that it can be displayed in ASCII without knowledge of the details of the binary file structure.
RECORD 1
contains information on the origin of the data (MP_tools
), version of the program (1.54), number of header lines (including this one) and time stamp:
MP_tools
1.54 3 2022/11/01 17:42:22
RECORD 2
contains the name_list DATA_HEADER_1
containing scalar variables and fixed-dimension arrays
&DATA_HEADER_1
SIM_TYPE CHARACTER(16) simulation type
DAT_TYPE CHARACTER(16) data type
INPUT_METHOD CHARACTER(16) input method
FILE_PAR CHARACTER(16) name of the associated parameter file
SUBST_NAME CHARACTER(16) substance name
T_MS REAL(4) MD integration step (microstep) [ps]
T_DUMP REAL(4) time tag of the frame (snapshot) dump[ps]
TEMP REAL(4) temperature [K]
A_PAR(3) REAL(4) lattice parameter [Å] (simulation box if N_ROW = 1 1 1)
ANGLE(3) REAL(4) unit cell angles [so far 90 degrees assumed]
N_ROW(3) INTEGER(4) simulation box size [l.u.]
N_ATOM INTEGER(4) number of atoms in unit cell (incl. mixed occupancies)
N_EQ INTEGER(4) number of symmetry equivalent positions (not yet used)
N_TRAJ INTEGER(4) MD trajectory: 0 positions, 1 & velocities, 2 & forces
J_SHELL_OUT INTEGER(4) 1 = shell data recorded
N_COND INTEGER(4) 3 = periodic boundary conditions
N_REC INTEGER(4) number of records per output array
N_TOT INTEGER(4) total number of atoms in simulation box
RECORD 3
contains the name_list DATA_HEADER_2
containing allocatable arrays, which can be allocated using the parameter values obtained from DATA_HEADER_1
&DATA_HEADER_2
AT_NAME_OUT(N_ATOM) CHARACTER(4),ALLOCATABLE atom type names
AT_OCCUP_(N_ATOM) REAL(4),ALLOCATABLE atom occupancies
NSUPER_R(N_ATOM) INTEGER(4),ALLOCATABLE atom numbers in simulation box
The rest of the binary file contains sequences, N_REC
records each, of the following arrays
AT_IND(4,N_TOT) unit cell indices (1:3), atom type index (4)
AT_POS(4,N_TOT) atom position vectors (1:3), atom charge (4, optional)
AT_VEL(4,N_TOT) atom velocities [Å/ps] (1:3), atom mass(4) (optional, N_TRAJ ≥ 1)
AT_FORCE(4,N_TOT) atom forces (1:3)(optional, N_TRAJ ≥ 2)
In case of data from core-shell simulations with J_SHELL_OUT = 1
the same sequence of shell position, velocities and forces with the same option rules will follow. The last record for each array is zero-padded in case of N_TOT
not being commensurate with the record pattern (L_REC4
).