Tutorial 1 - Basic concepts: using AtomsCollection objects

Tutorial 1 - Basic concepts: using AtomsCollection objects#

      _
    /|_|\   
   / / \ \  
  /_/   \_\  
  \ \   / /  
   \ \_/ /  
    \|_|/  

SOPRANO: a Python library for generation, manipulation and analysis of large batches of crystalline structures

Developed within the CCP-NC project. Copyright STFC 2022

# Basic imports
import os, sys
sys.path.insert(0, os.path.abspath('..')) # This to add the Soprano path to the PYTHONPATH
                                          # so we can load it without installing it
# Other useful imports

import glob

import numpy as np

import ase
from ase import io as ase_io

from soprano.collection import AtomsCollection

1 - LOADING STRUCTURES#

Soprano can handle multiple structure loading into a single AtomsCollection object. The structures are loaded singularly as ASE (Atomic Simulation Environment) Atoms objects.

# List all files in the tutorial directory
cifs = glob.glob('tutorial_data/struct*.cif')

aColl = AtomsCollection(cifs, progress=True) # "progress" means we will visualize a loading bar
Loading collection...

Loading: [██                  ] -
Loading: [████                ] |
Loading: [██████              ] -
Loading: [████████            ] |
Loading: [██████████          ] -
Loading: [████████████        ] |
Loading: [██████████████      ] -
Loading: [████████████████    ] |
Loading: [██████████████████  ] -
Loading: [████████████████████] |
Loaded 10 structures

2 - HANDLING COLLECTIONS#

Collections are a convenient way of manipulating multiple structures. They allow for many operations that act collectively on all Atoms objects, or return values from them all at once.

# To access an individual structure, one can simply use indexing:
a0 = aColl.structures[0]
print('---- struct_0.cif positions ----\n')
print(a0.get_positions(), '\n\n')

# All properties and methods of Atoms objects are available on an entire collection too, by using
# the meta-element 'all'

print('---- all struct_*.cif positions----\n')
print(aColl.all.get_positions(), '\n\n')

print('---- all struct_*.cif info dictionaries----\n')
print(aColl.all.info, '\n\n')
---- struct_0.cif positions ----

[[0.05895857 0.06582596 0.07966662]
 [1.72613191 1.81057858 0.0119536 ]
 [3.5064522  1.84354883 1.80932793]
 [1.81202532 0.05911503 1.69412552]] 


---- all struct_*.cif positions----

[[[ 5.89585653e-02  6.58259617e-02  7.96666218e-02]
  [ 1.72613191e+00  1.81057858e+00  1.19536046e-02]
  [ 3.50645220e+00  1.84354883e+00  1.80932793e+00]
  [ 1.81202532e+00  5.91150302e-02  1.69412552e+00]]

 [[ 8.24872667e-02  3.55901214e+00  1.10421932e-02]
  [ 1.72351233e+00  1.88405678e+00  5.28293163e-02]
  [ 3.51426700e+00  1.81580154e+00  1.75271361e+00]
  [ 1.73825398e+00  4.84128249e-02  1.79915577e+00]]

 [[ 3.42796466e-02  6.91411917e-02  5.37884717e-03]
  [ 1.39547491e+00  1.40001475e+00  1.48655720e+00]
  [ 3.03357630e+00  2.94880358e+00  2.87166818e+00]
  [ 4.34194710e+00  1.47101907e+00  1.48470641e+00]]

 [[ 3.48288473e+00  4.75675904e-02  8.97808307e-02]
  [ 1.70569009e+00  1.77260990e+00  3.52281688e+00]
  [-1.08356961e-02  1.84483312e+00  1.71758982e+00]
  [ 1.67108507e+00  3.54946776e+00  1.85131431e+00]]

 [[ 3.56910526e+00  5.99952895e-03  3.52821034e+00]
  [ 1.83266686e+00  1.85504344e+00  3.59265978e+00]
  [ 3.51933020e+00  1.81900427e+00  1.70614393e+00]
  [ 1.83770992e+00  5.27131706e-02  1.75249850e+00]]

 [[ 5.74883048e+00  7.91573696e-02  8.31980011e-02]
  [ 1.43302687e+00  1.49128521e+00  1.53879773e+00]
  [ 2.91035182e+00  5.46785737e-02  2.85012330e+00]
  [ 4.39840032e+00  1.45361753e+00  1.43423769e+00]]

 [[ 4.60285008e-04  5.12367189e-02  9.08900430e-02]
  [ 1.77401329e+00  1.85725741e+00  3.53019651e+00]
  [ 3.55679575e+00  1.86861735e+00  1.76536705e+00]
  [ 1.73128434e+00  3.54250246e+00  1.82163573e+00]]

 [[ 5.79569455e+00  2.89409981e+00  6.50747171e-02]
  [ 1.40122641e+00  1.38759482e+00  1.48040628e+00]
  [ 2.96041901e+00  2.92373080e+00  2.86494199e+00]
  [ 4.37069091e+00  1.35628096e+00  1.40592736e+00]]

 [[ 3.50671463e+00  1.23313009e-01  3.47969537e+00]
  [ 1.86416396e+00  1.74336553e+00  2.91507769e-02]
  [ 3.52197558e+00  1.82648779e+00  1.76115704e+00]
  [ 1.80603566e+00  3.57107775e+00  1.78205558e+00]]

 [[ 5.79085147e+00  6.01135888e-02  1.51527660e-02]
  [ 1.47864707e+00  1.52581067e+00  1.54393698e+00]
  [ 2.81985227e+00  8.05176223e-02  8.22001478e-02]
  [ 4.39080428e+00  1.52250146e+00  1.42424456e+00]]] 


---- all struct_*.cif info dictionaries----

[{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_8'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_9'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_10'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_1'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_4'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_6'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_7'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_2'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_3'}
 {'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_5'}] 
# Collections can also be sliced like Numpy arrays for convenience
aColl02 = aColl[0:2]
aColl25 = aColl[2:5]

# Then join them together
aColl05 = aColl02+aColl25

print("---- Collection slice lengths ---- \n")
print("aColl02 = {0}\taColl25 = {1}\taColl05 = {2}\n\n".format(aColl02.length, aColl25.length, aColl05.length))
---- Collection slice lengths ---- 

aColl02 = 2	aColl25 = 3	aColl05 = 5
# Collections can also store "arrays" of data, similarly to Atoms objects in ase
# These arrays' elements are tied each to one structure, and can be used to sort them

arr = range(10, 0, -1) # Let's use this array to reverse the order of a collection

aColl.set_array('reversed_range', arr)

aCollSorted = aColl.sorted_byarray('reversed_range')

print("---- Getting an array from a collection ---- \n")
print("Unsorted: ", aColl.get_array('reversed_range'), "\n")
print("Sorted: ", aCollSorted.get_array('reversed_range'), "\n\n")

# And to make sure
print("---- First vs. last elements ---- \n")
print(aColl.structures[0].get_positions(), "\n")
print(aCollSorted.structures[-1].get_positions())
---- Getting an array from a collection ---- 

Unsorted:  [10  9  8  7  6  5  4  3  2  1] 

Sorted:  [ 1  2  3  4  5  6  7  8  9 10] 


---- First vs. last elements ---- 

[[0.05895857 0.06582596 0.07966662]
 [1.72613191 1.81057858 0.0119536 ]
 [3.5064522  1.84354883 1.80932793]
 [1.81202532 0.05911503 1.69412552]] 

[[0.05895857 0.06582596 0.07966662]
 [1.72613191 1.81057858 0.0119536 ]
 [3.5064522  1.84354883 1.80932793]
 [1.81202532 0.05911503 1.69412552]]
# Collections are iterable as well

for i, a in enumerate(aColl):
    print(a.get_volume())
46.67652880128671
46.54212437721978
49.04344576266337
45.899607683178004
46.53632814886902
49.10641277045326
46.00236376936779
48.923313970995935
46.09636367152582
48.47547426282922

Filtering and classifying#

Collections can also be split in advanced ways. Filtering allows one to create a collection with only those Atoms objects which satisfy a certain condition, while classifying allows one to create a collection based on some arbitrary integer array representing each Atoms’ class.

# Filter: only structures with volume > 47

def isBig(a):
    return a.get_volume() >= 47

aCollBig = aColl.filter(isBig)

print('{0} structures have V >= 47'.format(len(aCollBig)))
4 structures have V >= 47
# Classify: split in volume classes

volumes = aColl.all.get_volume()
classes = [int(np.floor(v)) for v in volumes]

aCollHist = aColl.classify(classes)

for v, c in aCollHist.items():
    print('{0} structures with volume within {1} and {2}'.format(len(c), v, v+1))
2 structures with volume within 48 and 49
2 structures with volume within 49 and 50
1 structures with volume within 45 and 46
5 structures with volume within 46 and 47