Tutorial 1 - Basic concepts: using AtomsCollection objects#
_
/|_|\
/ / \ \
/_/ \_\
\ \ / /
\ \_/ /
\|_|/
SOPRANO: a Python library for generation, manipulation and analysis of large batches of crystalline structures
Developed within the CCP-NC project. Copyright STFC 2022
# Basic imports
import os, sys
sys.path.insert(0, os.path.abspath('..')) # This to add the Soprano path to the PYTHONPATH
# so we can load it without installing it
# Other useful imports
import glob
import numpy as np
import ase
from ase import io as ase_io
from soprano.collection import AtomsCollection
1 - LOADING STRUCTURES#
Soprano can handle multiple structure loading into a single AtomsCollection object. The structures are loaded singularly as ASE (Atomic Simulation Environment) Atoms objects.
# List all files in the tutorial directory
cifs = glob.glob('tutorial_data/struct*.cif')
aColl = AtomsCollection(cifs, progress=True) # "progress" means we will visualize a loading bar
Loading collection...
Loading: [██ ] -
Loading: [████ ] |
Loading: [██████ ] -
Loading: [████████ ] |
Loading: [██████████ ] -
Loading: [████████████ ] |
Loading: [██████████████ ] -
Loading: [████████████████ ] |
Loading: [██████████████████ ] -
Loading: [████████████████████] |
Loaded 10 structures
2 - HANDLING COLLECTIONS#
Collections are a convenient way of manipulating multiple structures. They allow for many operations that act collectively on all Atoms objects, or return values from them all at once.
# To access an individual structure, one can simply use indexing:
a0 = aColl.structures[0]
print('---- struct_0.cif positions ----\n')
print(a0.get_positions(), '\n\n')
# All properties and methods of Atoms objects are available on an entire collection too, by using
# the meta-element 'all'
print('---- all struct_*.cif positions----\n')
print(aColl.all.get_positions(), '\n\n')
print('---- all struct_*.cif info dictionaries----\n')
print(aColl.all.info, '\n\n')
---- struct_0.cif positions ----
[[0.05895857 0.06582596 0.07966662]
[1.72613191 1.81057858 0.0119536 ]
[3.5064522 1.84354883 1.80932793]
[1.81202532 0.05911503 1.69412552]]
---- all struct_*.cif positions----
[[[ 5.89585653e-02 6.58259617e-02 7.96666218e-02]
[ 1.72613191e+00 1.81057858e+00 1.19536046e-02]
[ 3.50645220e+00 1.84354883e+00 1.80932793e+00]
[ 1.81202532e+00 5.91150302e-02 1.69412552e+00]]
[[ 8.24872667e-02 3.55901214e+00 1.10421932e-02]
[ 1.72351233e+00 1.88405678e+00 5.28293163e-02]
[ 3.51426700e+00 1.81580154e+00 1.75271361e+00]
[ 1.73825398e+00 4.84128249e-02 1.79915577e+00]]
[[ 3.42796466e-02 6.91411917e-02 5.37884717e-03]
[ 1.39547491e+00 1.40001475e+00 1.48655720e+00]
[ 3.03357630e+00 2.94880358e+00 2.87166818e+00]
[ 4.34194710e+00 1.47101907e+00 1.48470641e+00]]
[[ 3.48288473e+00 4.75675904e-02 8.97808307e-02]
[ 1.70569009e+00 1.77260990e+00 3.52281688e+00]
[-1.08356961e-02 1.84483312e+00 1.71758982e+00]
[ 1.67108507e+00 3.54946776e+00 1.85131431e+00]]
[[ 3.56910526e+00 5.99952895e-03 3.52821034e+00]
[ 1.83266686e+00 1.85504344e+00 3.59265978e+00]
[ 3.51933020e+00 1.81900427e+00 1.70614393e+00]
[ 1.83770992e+00 5.27131706e-02 1.75249850e+00]]
[[ 5.74883048e+00 7.91573696e-02 8.31980011e-02]
[ 1.43302687e+00 1.49128521e+00 1.53879773e+00]
[ 2.91035182e+00 5.46785737e-02 2.85012330e+00]
[ 4.39840032e+00 1.45361753e+00 1.43423769e+00]]
[[ 4.60285008e-04 5.12367189e-02 9.08900430e-02]
[ 1.77401329e+00 1.85725741e+00 3.53019651e+00]
[ 3.55679575e+00 1.86861735e+00 1.76536705e+00]
[ 1.73128434e+00 3.54250246e+00 1.82163573e+00]]
[[ 5.79569455e+00 2.89409981e+00 6.50747171e-02]
[ 1.40122641e+00 1.38759482e+00 1.48040628e+00]
[ 2.96041901e+00 2.92373080e+00 2.86494199e+00]
[ 4.37069091e+00 1.35628096e+00 1.40592736e+00]]
[[ 3.50671463e+00 1.23313009e-01 3.47969537e+00]
[ 1.86416396e+00 1.74336553e+00 2.91507769e-02]
[ 3.52197558e+00 1.82648779e+00 1.76115704e+00]
[ 1.80603566e+00 3.57107775e+00 1.78205558e+00]]
[[ 5.79085147e+00 6.01135888e-02 1.51527660e-02]
[ 1.47864707e+00 1.52581067e+00 1.54393698e+00]
[ 2.81985227e+00 8.05176223e-02 8.22001478e-02]
[ 4.39080428e+00 1.52250146e+00 1.42424456e+00]]]
---- all struct_*.cif info dictionaries----
[{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_8'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_9'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_10'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_1'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_4'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_6'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_7'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_2'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_3'}
{'spacegroup': Spacegroup(1, setting=1), 'unit_cell': 'conventional', 'occupancy': {'0': {'Fe': 1.0}, '1': {'Fe': 1.0}, '2': {'Fe': 1.0}, '3': {'Fe': 1.0}}, 'name': 'struct_5'}]
# Collections can also be sliced like Numpy arrays for convenience
aColl02 = aColl[0:2]
aColl25 = aColl[2:5]
# Then join them together
aColl05 = aColl02+aColl25
print("---- Collection slice lengths ---- \n")
print("aColl02 = {0}\taColl25 = {1}\taColl05 = {2}\n\n".format(aColl02.length, aColl25.length, aColl05.length))
---- Collection slice lengths ----
aColl02 = 2 aColl25 = 3 aColl05 = 5
# Collections can also store "arrays" of data, similarly to Atoms objects in ase
# These arrays' elements are tied each to one structure, and can be used to sort them
arr = range(10, 0, -1) # Let's use this array to reverse the order of a collection
aColl.set_array('reversed_range', arr)
aCollSorted = aColl.sorted_byarray('reversed_range')
print("---- Getting an array from a collection ---- \n")
print("Unsorted: ", aColl.get_array('reversed_range'), "\n")
print("Sorted: ", aCollSorted.get_array('reversed_range'), "\n\n")
# And to make sure
print("---- First vs. last elements ---- \n")
print(aColl.structures[0].get_positions(), "\n")
print(aCollSorted.structures[-1].get_positions())
---- Getting an array from a collection ----
Unsorted: [10 9 8 7 6 5 4 3 2 1]
Sorted: [ 1 2 3 4 5 6 7 8 9 10]
---- First vs. last elements ----
[[0.05895857 0.06582596 0.07966662]
[1.72613191 1.81057858 0.0119536 ]
[3.5064522 1.84354883 1.80932793]
[1.81202532 0.05911503 1.69412552]]
[[0.05895857 0.06582596 0.07966662]
[1.72613191 1.81057858 0.0119536 ]
[3.5064522 1.84354883 1.80932793]
[1.81202532 0.05911503 1.69412552]]
# Collections are iterable as well
for i, a in enumerate(aColl):
print(a.get_volume())
46.67652880128671
46.54212437721978
49.04344576266337
45.899607683178004
46.53632814886902
49.10641277045326
46.00236376936779
48.923313970995935
46.09636367152582
48.47547426282922
Filtering and classifying#
Collections can also be split in advanced ways. Filtering allows one to create a collection with only those Atoms objects which satisfy a certain condition, while classifying allows one to create a collection based on some arbitrary integer array representing each Atoms’ class.
# Filter: only structures with volume > 47
def isBig(a):
return a.get_volume() >= 47
aCollBig = aColl.filter(isBig)
print('{0} structures have V >= 47'.format(len(aCollBig)))
4 structures have V >= 47
# Classify: split in volume classes
volumes = aColl.all.get_volume()
classes = [int(np.floor(v)) for v in volumes]
aCollHist = aColl.classify(classes)
for v, c in aCollHist.items():
print('{0} structures with volume within {1} and {2}'.format(len(c), v, v+1))
2 structures with volume within 48 and 49
2 structures with volume within 49 and 50
1 structures with volume within 45 and 46
5 structures with volume within 46 and 47