# Orthogonal structure generation¶

In this notebook we will generate structures based on orthogonalization of the sensing matrix. This entails interatively generating structures by constructing cluster vectors orthogonal to all previous cluster vectors and finding the closest matching structure.

## Enumerate supercells¶

First, we enumerate all supercells up to 25 repetitions of the primitive cell which we will randomly choose from when matching cluster vector to structure.

```
[1]:
```

```
%%time
import numpy as np
from ase.db import connect
from ase.io import read
from icet.tools import enumerate_supercells
# read primitive structure
primitive_structure = read('../structures/MoC_rocksalt_prim.xyz')
# matrix representation of primitive cell
P = primitive_structure.get_cell().array
# Enumerate supercells
db_sc = connect('supercells.db', append=False)
for supercell in enumerate_supercells(
primitive_structure,
sizes=list(range(1, 26)),
niggli_reduce=True,
):
# matrix representation of supercell
C = supercell.get_cell().array
# check that the width of the supercell is less than three
# times the width of the primitive cell to avoid skewed cells
H = C.dot(np.linalg.inv(P)) # Transformation matrix H P = C
if np.amax(abs(H)) <= 3.001:
db_sc.write(supercell)
print(f'Number of supercells: {len(db_sc) + 1}')
```

```
Number of supercells: 340
CPU times: user 2min 23s, sys: 3min 24s, total: 5min 48s
Wall time: 1min 30s
```

## Set up cluster space¶

Next, we setup a cluster space with cutoffs that will be used for orthogonalization.

**Note:** Below “Be” is used to represent a vacant site.

```
[2]:
```

```
from icet import ClusterSpace
cutoffs = [9.0, 5.0]
chemical_symbols = [['C', 'Be'], ['Mo', 'V']]
cs = ClusterSpace(primitive_structure, cutoffs, chemical_symbols)
cs
```

```
[2]:
```

#### Cluster Space

Field | Value |
---|---|

Space group | Fm-3m (225) |

Sublattice A | ('Be', 'C') |

Sublattice B | ('Mo', 'V') |

Cutoffs | [9.0, 5.0] |

Total number of parameters | 52 |

Number of parameters of order 0 | 1 |

Number of parameters of order 1 | 2 |

Number of parameters of order 2 | 25 |

Number of parameters of order 3 | 24 |

fractional_position_tolerance | 2e-06 |

position_tolerance | 1e-05 |

symprec | 1e-05 |

## Generate orthogonal structures¶

Then we are ready to start iteratively generating structures. The algorithm consists of the following steps:

Generate a random cluster vector

Make cluster vector orthogonal to all previous cluster vectors

From the cluster vector, calculate the corresponding concentrations and impose a maximum of 30% vacancies (=Be)

Select a random supercell

Find the structure with the target concentration and supercell that closest matches the cluster vector and add it to the data set

Repeat until the number of structures reaches the size of the cluster space (minus 1 since the zerolet is not included in the orthogonalization)

A database with a set of final DFT relaxed structures generated using this procedure is available at `../dft-databases/rocksalt_MoVCvac_orthogonal.db`

.

```
[3]:
```

```
# Helper functions
def orthogonalize_cv(cv, cv_basis):
for cv_prev in cv_basis:
cv = cv - np.dot(cv, cv_prev) / np.dot(cv_prev, cv_prev) * cv_prev
return cv
def concentrations_from_singlets(cv):
"""
Get the concentration for sublattices A (C, Be) and B (Mo, V) from
the values of the singlets (cv[0] and cv[1] for A and B, respectively).
The singlet has a value between 1 and -1 where 1 corresponds to 100% of the
first species and -1 to 100% of the second species.
"""
conc_Be = 1 - (cv[0] + 1) / 2
conc_V = 1 - (cv[1] + 1) / 2
return conc_Be, conc_V
```

```
[4]:
```

```
%%time
from icet.tools.structure_generation import generate_target_structure_from_supercells
db = connect('orthogonal-structures.db')
# List of basis vectors for the cluster space
cv_basis = []
count = 0
while count < len(cs) - 1:
print('No. ', count + 1)
# generate random cluster vector (without zerolet)
cv = np.array([2 * i - 1 for i in np.random.rand(len(cs) - 1)])
# orthogonalize against cluster vector basis
cv = orthogonalize_cv(cv, cv_basis)
# randomly select a supercell
supercell = db_sc.get(id=np.random.randint(1, len(db_sc)+1)).toatoms()
n_atoms_per_lattice = len(supercell) / 2
# convert singlets to concentrations
conc_Be, conc_V = concentrations_from_singlets(cv)
# scale vacancy (=Be) concentration with 0.3 since
# we only allow for a maximum of 30% vacancies
conc_Be *= 0.3
# round concentrations to be commensurate with supercell
conc_Be = np.round(conc_Be*n_atoms_per_lattice)/n_atoms_per_lattice
if conc_Be >= 0.3: # decrease by 1 Be atom
conc_Be = (np.round(conc_Be * n_atoms_per_lattice) - 1.0) / n_atoms_per_lattice
conc_V = np.round(conc_V*n_atoms_per_lattice)/n_atoms_per_lattice
conc_dict = {'A': {'C': 1.0 - conc_Be, 'Be': conc_Be},
'B': {'Mo': 1.0 - conc_V, 'V': conc_V}}
# add initial cluster vector element 1.0 for zerolet
cv = np.insert(cv, 0, 1.0)
# find structure that matches cluster vector
structure = generate_target_structure_from_supercells(
cs, [supercell], conc_dict, cv)
# get cluster vector for the identified structure
cv = cs.get_cluster_vector(structure)
# remove zerolet
cv = np.delete(cv, 0)
# calculate final concentrations
conc_Be = structure.get_chemical_symbols().count('Be') / n_atoms_per_lattice
conc_V = structure.get_chemical_symbols().count('V') / n_atoms_per_lattice
# orthogonalize updated cluster vector and add to basis
cv = orthogonalize_cv(cv, cv_basis)
cv_basis.append(cv)
# sanity check
if np.linalg.norm(cv) < 1e-6:
print('Norm of cv basis vector is 0, something probably went wrong')
raise
# write to database
db.write(structure)
count += 1
```

```
No. 1
No. 2
No. 3
No. 4
No. 5
No. 6
No. 7
No. 8
No. 9
No. 10
No. 11
No. 12
No. 13
No. 14
No. 15
No. 16
No. 17
No. 18
No. 19
No. 20
No. 21
No. 22
No. 23
No. 24
No. 25
No. 26
No. 27
No. 28
No. 29
No. 30
No. 31
No. 32
No. 33
No. 34
No. 35
No. 36
No. 37
No. 38
No. 39
No. 40
No. 41
No. 42
No. 43
No. 44
No. 45
No. 46
No. 47
No. 48
No. 49
No. 50
No. 51
CPU times: user 14min 12s, sys: 0 ns, total: 14min 12s
Wall time: 22min 44s
```