Tutorial 0 - Python quickstart guide

Tutorial 0 - Python quickstart guide#

      _
    /|_|\   
   / / \ \  
  /_/   \_\  
  \ \   / /  
   \ \_/ /  
    \|_|/  

SOPRANO: a Python library for generation, manipulation and analysis of large batches of crystalline structures

Developed within the CCP-NC project. Copyright STFC 2022

In this tutorial we look at how to quickly get started with Python, Jupyter notebooks and using Soprano.

Setting up Python#

There are many ways to run Python, either on your own machine or in the cloud. In these tutorials we use Jupyter notebooks, which are a great way to run Python code interactively, combining text, code and code-outputs such as plots. More on these notebooks later.

Installing Python#

If you are new to Python, we recommend using the Anaconda distribution, which is a free and open-source distribution of Python. It is widely used in the scientific community and comes with many pre-installed packages. You can download it from here and it is available for Windows, macOS and Linux.

There are different flavours of “conda”, but we recommend using the full Anaconda distribution if you’re new to Python. This will install Python, Jupyter notebooks and many other useful packages. If you have more limited disk space, you can use the Miniconda distribution, which is a smaller version of Anaconda. Another alternative is Mamba which can be a faster and more efficient package manager than conda.

For Windows users, the Anaconda installer will add Python to your PATH, so you can run Python from the command line. You can also use the Anaconda Navigator, which is a graphical user interface for managing packages and environments.

Alternatively, WSL (Windows Subsystem for Linux) allows you to run a Linux distribution on Windows. You can then install Python using the package manager of the Linux distribution you are using (see below instructions). Using WSL is a good option if you want to use Python in a Linux-like environment (sometimes scientific packages are less-well tested on Windows, for example). It also helps when it comes ssh’ing into an HPC cluster such as Young.

For Linux users, you can also install Python via Anaconda. Or you can use your package manager. For example, on Ubuntu you can install Python using the following command:

sudo apt-get install python3

For macOS users, you can also install Python via Anaconda. Or you can use Homebrew, which is a package manager for macOS. If you have homebrew already, you can install Python using the following command:

brew install python3

Using Python in the cloud#

Instead of installing Python on your own machine, you can also use Python in the cloud via e.g. Google Colab. This is a free service that allows you to run Python code in the cloud. You can also use Binder to run Jupyter notebooks in the cloud. These tutorials can be launched in either Binder or Google Colab by clicking on the respective badge after hovering over the rocket icon at the top of the page. Note that launching in Binder may take a few minutes but brings you into an environment with Soprano pre-installed, whereas Google Colab requires you to install the software yourself (you need to add !pip install soprano into a new cell and run that first).

Environment management#

Once you have Python installed, it is a good idea to create a new environment for each project (or set of projects with similar requirements) you are working on. This helps to keep your packages separate and avoid conflicts between different versions of packages. The Anaconda Navigator allows you to create and manage environments using a graphical user interface.

You can also create a new environment using conda with the following command:

conda create --name myenv python=3.9

This will create a new environment called myenv with Python 3.9. You can then activate the environment with the following command:

conda activate myenv

You can then install packages into this environment using conda install or pip install.

Alternatively, if you installed Python using your package manager, you can use virtualenv to create environments. You can install virtualenv using the following command:

pip install virtualenv

You can then create a new environment using the following command:

virtualenv myenv

This will create the environment in your current directory - you can specify a different directory if you want.

You can activate the environment using the following command:

source myenv/bin/activate

where myenv is the name of your environment and also the name of the directory where the environment is stored.

You can then install packages using pip install.

Running Python#

Once installed, you can use Python either by:

running a python script (python my_script.py) or by
using an interactive Python shell (python or better ipython, if it’s installed) from your terminal.
You can also use Jupyter notebooks, which allow you to run Python code interactively in a browser, or some other IDE (Integrated Development Environment) like PyCharm. Another excellent IDE is VS Code which comes with excellent Python and Jupyter notebook support via the Python extension and is a popular choice for many developers. VS Code also works well with WSL (Windows Subsystem for Linux) if you are using Windows.

Jupyter notebooks#

There is excellent documentation on installing and using Jupyter notebooks on the web (for example here).

Once you have a notebook running, here are a few tips for using notebooks for computational materials science:

Use the %matplotlib inline magic command to display plots inline in the notebook. This is especially useful when using matplotlib to plot data. You can also use %matplotlib notebook for interactive plots.
Use the ? to get help on a function or module. For example, np.linspace? will show you the documentation for the linspace function in numpy.
Use the ! to run shell commands. For example, !ls will list the files in the current directory. Any other shell command can be run in this way.
Use the %%time magic command to time the execution of a cell. For example, %%time at the top of a cell will time how long it takes to run the cell.
Use the %%bash magic command to run bash commands in a cell. For example, %%bash at the top of a cell will run all the commands in the cell as bash commands.
In markdown cells, you can use LaTeX to write mathematical equations. For example, $\int_0^\infty e^{-x} dx$ will render as $\int_0^\infty e^{-x} dx$. This is very helpful for documenting your research in a notebook.
The Atomic Simulation Environment (ASE) is a useful package for working with atomic structures in Python. It can read and write many file formats, and has many useful tools for manipulating atomic structures. Soprano is built on top of ASE and so learning a little more about how ASE works can be very helpful. One useful feature of ASE is the view function, which allows you to view atomic structures in the ASE GUI. For example, a cell that has this:
```
from ase.io import read
from ase.visualize import view
atoms = read('my_structure.cif')
view(atoms)
```
will open a GUI window showing the structure in my_structure.cif. This is very useful for quickly checking that you have read in the correct structure. Note that the GUI will only work if you are running the notebook on your local machine, not in the cloud. For WSL users, you will need to either have WSLg installed or use an X server such as VcXsrv.
Whichever way you start a Jupyter notebook, you will have the option to pick which Python environment you want to use. This is useful if you have multiple environments set up on your machine with different versions of packages. For example, if you have a conda environment called myenv, you can start Jupyter notebook with this environment using the following command:
```
conda activate myenv
jupyter notebook
```
This will start a Jupyter notebook server with the myenv environment. You can then select this environment in the notebook by going to Kernel -> Change kernel and selecting myenv from the list of available kernels.