Using the InVEST Tools on the Palmetto Cluster

The Integrated Valuation of Ecosystem Services and Tradeoffs tool, or InVEST, is a free and open-source suite of models developed by the Natural Capital Project at Stanford University for determining the value of various ecosystem functions. The models are written in Python and can be run by installing the Natural Capital package. There are standard installers for Windows and Mac OS, but can also work in Linux OS by setting up a Python environment and installing the tool as a Python package. This enables us to leverage Clemson's high-performance Palmetto Cluster to speed up the computations and to distribute multiple jobs across many different processing nodes.

This tutorial will show you how to get the InVEST tools up-and-running on your Palmetto account using a pre-built Anaconda environment, which will install the necessary Python libraries and dependencies. There are a few bugs in the libraries, and this tutorial will help you fix these.

Prerequisites

- An account on the Palmetto Cluster. You can request one here.
- Basic familiarity with using a command line interface, including writing and modifying text files.
- Basic familiarity with the Palmetto Cluster (storage, resources, file transfer, submitting jobs, etc.).

Getting the Data

Download all files from the bottom of this page. This includes sample data, an example model inputs file, an example PBS job file, and Python files to reduce bugs during processing. The sample data are from the Natural Capital Project documentation (http://data.naturalcapitalproject.org/invest-data/ version 3.6.0).

Setting up Your InVEST Environment

Using your preferred SSH client, log into Palmetto. For example, the MobaXTerm Portable client can be downloaded here. Once downloaded, extract the file and run the program.

Select Session in the upper left-hand corner, then select SSH.

For the Remote host, type: login.palmetto.clemson.edu. Click OK.

When prompted, enter your Clemson username, password, and dual authenticate.

When you are successfully logged in, start an interactive session.

qsub –I –l select=1:ncpus=1:mem=6gb,walltime=2:00:00

Next, load the Anaconda module.

module add anaconda3/5.1.0

Create the Anaconda environment. You are specifying to clone an environment from a particular user (balytle) on Anaconda Cloud. The environment will be stored in your home directory on the path /home/username/.conda/envs.

conda env create balytle/invest

This will take several minutes as the required packages are downloaded and installed. If prompted, press Y to continue installation. Once finished, activate the environment.

source activate invest

Depending on your SSH client, you should now see that the invest environment is activated above your username.

Before:

After:

During testing, we found some bugs in the pygeoprocessing library and in the natcap-invest library. Download the geoprocessing.py and cli.py files which have been altered to fix these bugs, then upload them to the indicated file paths.

geoprocessing.py /home/username/.conda/envs/invest/lib/python2.7/site-packages/pygeoprocessing/
cli.py /home/username/.conda/envs/invest_env/lib/python2.7/site-packages/natcap/invest/

For example, if using MobaXTerm, use the file explorer on the left-hand side of the screen. Starting from your home directory (/home/username/), you can click on the appropriate folders to navigate to the directory. Then, click the upload button and select the Python file.

Once the files have uploaded, validate the installation by calling the invest command and the help flag.

invest --help

The InVEST standard help dialog displays.

Running a Model

As the help shows, the InVEST models can be called from the command line by invoking invest along with optional flags to specify processing conditions. It can also be given a list of the inputs as a JSON file, referred to as a datastack, along with the workspace directory where any intermediary files and the final output will be written. Finally, it requires the name of the model to be run.

We can test run a simple model from the command line during this interactive session. For longer-running processes, or to run lots of models at once, you can submit jobs to the queue. Let's take a look at both using the Carbon Storage and Sequestration model. It calculated quickly and has fairly simple inputs and outputs.

The example datastack can be modified to match the requirements for any model. Let's take a look at the basic requirements for the Carbon Sequestration and Storage Model:

-Current land use/land cover (required): Raster of land use/land cover (LULC) for each pixel, where each unique integer represents a different land use/land cover class.

-Carbon pools (required): A CSV (comma-separated value) table of LULC classes, containing data on carbon stored in each of the four fundamental pools for each LULC class.

The input files are already contained in the sample data as lulc_samp_redd.tif and carbon_pools_samp.csv. The basic calculation will calculate carbon storage for our input land use/cover data. We will specify not to calculate valuation, sequestration, or to evaluate reducing emissions (REDD).

We will use the /scratch2 workspace to store our files. First, let's create new directories, then upload the necessary files. Then we will modify the example datastack file to point to this location.

Navigate to your user directory on /scratch2.

cd /scratch2/username

Make a directory to store the data and files and move into the new directory.

mkdir carbon

cd carbon

Upload carbon.zip and carbon_datastack.invest.json to the newly created directory. Extract carbon.zip.

unzip carbon.zip

Using a text editor, open and modify the datastack file. For example, using nano:

nano carbon_datastack.invest.json

The contents of the datastack file are below. Change the carbon_pools_path and lulc_cur_path, replacing "username" with your Clemson username.

{

"args": {

"calc_sequestration": false,

"carbon_pools_path": "/scratch2/username/carbon/carbon/carbon_pools_samp.csv",

"do_redd": false,

"do_valuation": false,

"lulc_cur_path": "/scratch2/username/carbon/carbon/lulc_samp_redd.tif",

"results_suffix": ""

},

"invest_version": "3.7.0",

"model_name": "natcap.invest.carbon"

}

Save the changes. In nano, press ctrl + o to write out the file, then enter to write to the same filename, followed by ctrl + x to exit.

You can now run the model by issuing the command below. The -l flag tells it to run in headless mode without the GUI. -d tells it the datastack to use. -workspace is the location for the output files. carbon tells InVEST which model we want to run. Change the "username" to your actual username.

invest –l –d carbon_datastack.invest.json --workspace /scratch2/username/carbon/output carbon

After completion, inspect the output folder to see the result.

ls output

The file tot_c_cur.tif is the resulting data.

***placeholder for screenshot of the data***

Submitting InVEST as a Job

Now that you see the workflow for running a model, we can write and submit a job script to perform the calculation. Let's navigate to your home directory to write the job file.

cd ~

In a text editor, open a new file called carbonJob.pbs.

nano carbonJob.pbs

Type the following PBS job file, which will submit a job named invest, requesting 1 CPU, 6gb RAM, and 1:00:00 of processing time. It then performs the same commands we ran interactively. Any output to the command line and any errors will be written to a file. Replace "username" with your actual username.

#PBS -N investC

#PBS -l select=1:ncpus=1:mem=6gb,walltime=1:00:00

#PBS -j oe

cd $PBS_O_WORKDIR

module add anaconda3/5.1.0

source activate invest

invest -v -y -l -d /scratch2/username/carbon/carbon_datastack.invest.json -w /scratch2/username/carbon/output/ carbon

Write out the file by pressing ctrl + o followed by enter, then ctrl + x to exit nano. Let's remove the output directory so we can be sure our job runs successfully.

rm -R /scratch2/username/carbon/output

Exit the interactive session so we can submit the job.

exit

If needed, navigate to your home directory.

cd ~

Submit the job to the queue.

qsub carbonJob.pbs

Validate that the model outputs have run. You should once again see the tot_c_cur.tif file containing the carbon storage.

ls /scratch2/username/carbon/output