Simulation data come in diverse ASCII formats. MP_tools converts them into a proprietary binary format to facilitate their further handling and to standardise the input information. At this initial stage is also created an accompanying <project_name>.par file (referred to as .PAR hereafter) with additional information and specific MP_toolssettings for each project. The present section describes the main principles of this action, followed by more details in sections devoted to specific conversion tools.

1/ SIMULATION TYPES

MP_tools were originally developed to treat sequences of dump files from DL_POLY molecular dynamics (MD) simulations. Later on their use has been extended to treatment of output from other sources. The type of simulation has to be specified in the associated parameter file (.PAR) by specifying

sim_type = ‘TIMESTEP’

The alternative values is 'STATIC', in which case all the energy-resolved functions of MP_tools will be blocked.

2/ DATA TYPES

At present MP_tools can handle ASCII output data from a vast variety of simulation packages. For historical reasons DL_POLY and LAMMPS (MD) are particular cases, for which the file header contents is decoded to provide automatically the accompanying information on auxiliary parameters of the simulation and on the details of the data structure within the two respective input tools MP_DBIN and MP_LBIN. In all other cases the user has to fill in the corresponding parameter values into the accompanying .PAR file (cf. templates in this package) and provide a symbolic description of the data records, which will be parsed by the input code of the MP_LBIN tool .

The data type has to be specified by setting

DAT_TYPE = ‘DL_POLY’

Alternative values are 'LAMMPS' and 'GENERAL', in the latter case an arbitrary string (<16 characters), eg. 'VASP', can be supplied to identify the specific case. This name will be stored with the binary data for later information and the input ASCII data will be treated along the 'GENERAL' scheme.

3/ UNITS OF MEASUREMENT

Intrinsically, MP_tools use the system of ‘atomic’ units (sometimes called DAPS) of Dalton (Da) or gram/mole for mass, Angstrom (Å) for length and picosecond (ps) for time. The energy of excitations is expressed as frequency in THz (1THZ = 1 ps-1 ≈ 4.136 meV).

On input, the DL_POLY data is assumed to have space coordinates in Angstroms and velocities and forces following the DAPS logic. For LAMMPS the convention is similar (corresponding its the METAL style).

In the GENERAL cases the atom position coordinates may be given either in absolute units (Angstrom), in lattice units (l.u.) or as fractions of the simulation box size (i.e. on the scale 0. to 1.), the corresponding case is to be specified in the .PAR file by

POS_UNITS = ‘LATTICE’

or, alternatively, 'ANGSTROM’ or 'BOX'. After conversion the binary data use either 'LATTICE' or 'ANGSTROM' units in a transparent way (cf. below). The case of 'BOX' is in fact a special situation of 'LATTICE' with a single unit cell occupying the whole simulation box.

4/ DATA INPUT METHODS

Each of the two MP_tools data conversion utilities, MP_DBIN and MP_LBIN, can use two fundamentally different input methods to interprete the input data: CELL and BULK. Common to the two modes is conversion to a coordinate system centred in the middle of the simulation box as required by the subsequent use of the NUFFT library to calculate the lattice sums by non-uniform FFT. The CELL method will convert coordinates into LATTICE units (l.u.) and assign each atom to a specific unit cell while the BULK method may produce coordinates in either LATTICE or ANGSTROM units. Here atoms will be just grouped according to their chemical symbol (or another supplied key) without identifying their unit cell (cf. below). This latter method is useful for heavily distorted systems with interstitial atoms and/or coexisting highly disoriented sublattices.

With the CELL input method the coordinates are converted to the lattice units (l.u.). Each atom is checked against structural information given in the ATOMS table of the .PAR file in order to identify its special position in the unit cell and to assign it to the corresponding sublattice. This will make possible to address any single sublattice individually (e.g. the oxygens at the 0.5 0.5 0 positions in a perovskite crystal) in the course of further analysis. This task is largely facilitated if already in the input ASCII file each atom has its unique symbol (eg. O1 identifying the [.5 .5 0] oxygen in an ABO3 perovskite) or a numerical key identifying its position in the crystallographic basis (atom type in LAMMPS). In absence of such information the MP_*BIN code attempts to identify each atom by matching its position against the basis vectors given in the ATOMS section of the .PAR file with a tolerance margin given by the EPS parameter. This procedure is only efficient for relatively small atom displacements (EPS not exceeding half of the distance between neighbours).

In the BULK mode the atom coordinates may be either converted to l.u., if the corresponding dimensions of the simulation box are specified in the .PAR file, eg. for a box of 32 x 32 x 32 unit cells

N_ROW = 32 32 32

or they can by kept as they are (arbitrary in principle, but usually angstrom, Å) by specifying

N_ROW = 1 1 1

If the input coordinates are given in fractions of the simulation box size, MP_LBIN may convert them by multiplying them by the simulation box size (in LATTICE or ANGSTROM units) specified in .PAR (eg. in ANGSTROM):

A_CELL_PAR = 12.012030 10.402723 42.498058

In any case the BULK mode reads in the atom data without any attempt to put them into correspondence with a regular lattice. On the output the atoms are classified by their labels (chemical species), in each class they are indexed in the order of the input. The atom labels have to be exactly reproduced in the first column of the ATOMS section of the .PAR file. This approach permits to address systems distorted and disordered without any limits, including irregular shapes of the simulation box. On the other hand, in such a case not all functionalities of the MP_tools are available because of the missing information.

Further details are given in the corresponding sections dealing in detail with the LAMMPS, GENERAL and DL_POLY cases.

5/ MP_tools BINARY DATA STRUCTURE

The output uses a binary direct-access file structure with 1024 byte records (L_REC4 = 256 of 32-bit words), a compromise between I/O speed and the possibility to selectively access individual atom data in the binary files. All the binary data are aligned with the 32-bit word length.

The first three records contain header information, which is stored in text format, so that it can be displayed in ASCII without knowledge of the details of the binary file structure.

RECORD 1 contains information on the origin of the data (MP_tools), version of the program (1.54), number of header lines (including this one) and time stamp:

MP_tools     1.54        3   2022/11/01 17:42:22

RECORD 2 contains the name_list DATA_HEADER_1 containing scalar variables and fixed-dimension arrays

&DATA_HEADER_1
 SIM_TYPE      CHARACTER(16)   simulation type
 DAT_TYPE      CHARACTER(16)   data type
 INPUT_METHOD  CHARACTER(16)   input method
 FILE_PAR      CHARACTER(16)   name of the associated parameter file
 SUBST_NAME    CHARACTER(16)   substance name
 T_MS          REAL(4)         MD integration step (microstep) [ps]
 T_DUMP        REAL(4)         time tag of the frame (snapshot) dump[ps]
 TEMP          REAL(4)         temperature [K]
 A_PAR(3)      REAL(4)         lattice parameter [Å] (simulation box if N_ROW = 1 1 1)
 ANGLE(3)      REAL(4)         unit cell angles [so far 90 degrees assumed]
 N_ROW(3)      INTEGER(4)      simulation box size [l.u.]
 N_ATOM        INTEGER(4)      number of atoms in unit cell (incl. mixed occupancies)
 N_EQ          INTEGER(4)      number of symmetry equivalent positions (not yet used)
 N_TRAJ        INTEGER(4)      MD trajectory: 0 positions, 1 & velocities, 2 & forces
 J_SHELL_OUT   INTEGER(4)      1 = shell data recorded
 N_COND        INTEGER(4)      3 = periodic boundary conditions
 N_REC         INTEGER(4)      number of records per output array
 N_TOT         INTEGER(4)      total number of atoms in simulation box

RECORD 3 contains the name_list DATA_HEADER_2 containing allocatable arrays, which can be allocated using the parameter values obtained from DATA_HEADER_1

&DATA_HEADER_2
 AT_NAME_OUT(N_ATOM)    CHARACTER(4),ALLOCATABLE  atom type names
 AT_OCCUP_(N_ATOM)      REAL(4),ALLOCATABLE       atom occupancies
 NSUPER_R(N_ATOM)       INTEGER(4),ALLOCATABLE    atom numbers in simulation box

The rest of the binary file contains sequences, N_REC records each, of the following arrays

AT_IND(4,N_TOT)    unit cell indices (1:3), atom type index (4)
AT_POS(4,N_TOT)    atom position vectors (1:3), atom charge (4, optional)
AT_VEL(4,N_TOT)    atom velocities [Å/ps] (1:3), atom mass(4) (optional, N_TRAJ ≥ 1)
AT_FORCE(4,N_TOT)  atom forces (1:3)(optional, N_TRAJ ≥ 2)

In case of data from core-shell simulations with J_SHELL_OUT = 1 the same sequence of shell position, velocities and forces with the same option rules will follow. The last record for each array is zero-padded in case of N_TOT not being commensurate with the record pattern (L_REC4).