# Tutorial 1 - Basic concepts: using AtomsCollection objects



```
      _
    /|_|\   
   / / \ \  
  /_/   \_\  
  \ \   / /  
   \ \_/ /  
    \|_|/  

```
SOPRANO: a Python library for generation, manipulation and analysis of large batches of crystalline structures


*Developed within the CCP-NC project. Copyright STFC 2022*


In [10]:
# Basic imports
import os, sys
sys.path.insert(0, os.path.abspath('..')) # This to add the Soprano path to the PYTHONPATH
                                          # so we can load it without installing it

In [11]:
# Other useful imports

import glob

import numpy as np

import ase
from ase import io as ase_io

from soprano.collection import AtomsCollection

## 1 - LOADING STRUCTURES

Soprano can handle multiple structure loading into a single AtomsCollection object.
The structures are loaded singularly as ASE (Atomic Simulation Environment) Atoms objects.

In [12]:
# List all files in the tutorial directory
cifs = glob.glob('tutorial_data/struct*.cif')

aColl = AtomsCollection(cifs, progress=True) # "progress" means we will visualize a loading bar

Loading collection...
Loading: [████████████████████] |
Loaded 10 structures


## 2 - HANDLING COLLECTIONS

Collections are a convenient way of manipulating multiple structures. They allow for many operations that act
collectively on all Atoms objects, or return values from them all at once.

In [13]:
# To access an individual structure, one can simply use indexing:
a0 = aColl.structures[0]
print('---- struct_0.cif positions ----\n')
print(a0.get_positions(), '\n\n')

# All properties and methods of Atoms objects are available on an entire collection too, by using
# the meta-element 'all'

print('---- all struct_*.cif positions----\n')
print(aColl.all.get_positions(), '\n\n')

print('---- all struct_*.cif info dictionaries----\n')
print(aColl.all.info, '\n\n')

---- struct_0.cif positions ----

[[3.50671463 0.12331301 3.47969537]
 [1.86416396 1.74336553 0.02915078]
 [3.52197558 1.82648779 1.76115704]
 [1.80603566 3.57107775 1.78205558]] 


---- all struct_*.cif positions----

[[[ 3.50671463e+00  1.23313009e-01  3.47969537e+00]
  [ 1.86416396e+00  1.74336553e+00  2.91507769e-02]
  [ 3.52197558e+00  1.82648779e+00  1.76115704e+00]
  [ 1.80603566e+00  3.57107775e+00  1.78205558e+00]]

 [[ 5.89585653e-02  6.58259617e-02  7.96666218e-02]
  [ 1.72613191e+00  1.81057858e+00  1.19536046e-02]
  [ 3.50645220e+00  1.84354883e+00  1.80932793e+00]
  [ 1.81202532e+00  5.91150302e-02  1.69412552e+00]]

 [[ 5.79085147e+00  6.01135888e-02  1.51527660e-02]
  [ 1.47864707e+00  1.52581067e+00  1.54393698e+00]
  [ 2.81985227e+00  8.05176223e-02  8.22001478e-02]
  [ 4.39080428e+00  1.52250146e+00  1.42424456e+00]]

 [[ 5.74883048e+00  7.91573696e-02  8.31980011e-02]
  [ 1.43302687e+00  1.49128521e+00  1.53879773e+00]
  [ 2.91035182e+00  5.46785737e-02  2.85012330e

In [14]:
# Collections can also be sliced like Numpy arrays for convenience
aColl02 = aColl[0:2]
aColl25 = aColl[2:5]

# Then join them together
aColl05 = aColl02+aColl25

print("---- Collection slice lengths ---- \n")
print("aColl02 = {0}\taColl25 = {1}\taColl05 = {2}\n\n".format(aColl02.length, aColl25.length, aColl05.length))

---- Collection slice lengths ---- 

aColl02 = 2	aColl25 = 3	aColl05 = 5




In [15]:
# Collections can also store "arrays" of data, similarly to Atoms objects in ase
# These arrays' elements are tied each to one structure, and can be used to sort them

arr = range(10, 0, -1) # Let's use this array to reverse the order of a collection

aColl.set_array('reversed_range', arr)

aCollSorted = aColl.sorted_byarray('reversed_range')

print("---- Getting an array from a collection ---- \n")
print("Unsorted: ", aColl.get_array('reversed_range'), "\n")
print("Sorted: ", aCollSorted.get_array('reversed_range'), "\n\n")

# And to make sure
print("---- First vs. last elements ---- \n")
print(aColl.structures[0].get_positions(), "\n")
print(aCollSorted.structures[-1].get_positions())

---- Getting an array from a collection ---- 

Unsorted:  [10  9  8  7  6  5  4  3  2  1] 

Sorted:  [ 1  2  3  4  5  6  7  8  9 10] 


---- First vs. last elements ---- 

[[3.50671463 0.12331301 3.47969537]
 [1.86416396 1.74336553 0.02915078]
 [3.52197558 1.82648779 1.76115704]
 [1.80603566 3.57107775 1.78205558]] 

[[3.50671463 0.12331301 3.47969537]
 [1.86416396 1.74336553 0.02915078]
 [3.52197558 1.82648779 1.76115704]
 [1.80603566 3.57107775 1.78205558]]


In [16]:
# Collections are iterable as well

for i, a in enumerate(aColl):
    print(a.get_volume())

46.09636367152582
46.67652880128671
48.47547426282922
49.10641277045326
45.899607683178004
46.53632814886902
46.00236376936779
48.923313970995935
49.04344576266337
46.54212437721978


## Filtering and classifying

Collections can also be split in advanced ways. Filtering allows one to create a collection with only those Atoms objects which satisfy a certain condition, while classifying allows one to create a collection based on some arbitrary integer array representing each Atoms' class.

In [17]:
# Filter: only structures with volume > 47

def isBig(a):
    return a.get_volume() >= 47

aCollBig = aColl.filter(isBig)

print('{0} structures have V >= 47'.format(len(aCollBig)))

4 structures have V >= 47


In [18]:
# Classify: split in volume classes

volumes = aColl.all.get_volume()
classes = [int(np.floor(v)) for v in volumes]

aCollHist = aColl.classify(classes)

for v, c in aCollHist.items():
    print('{0} structures with volume within {1} and {2}'.format(len(c), v, v+1))

2 structures with volume within 48 and 49
2 structures with volume within 49 and 50
1 structures with volume within 45 and 46
5 structures with volume within 46 and 47
