Introduction

Question: Who is responsible for what CO2 emissions?

1.1.   Data selection

We’ll use EPA’s FLIGHT database, a list of largeindustrial/commercial facilities across the U.S. and their GHG emissions.Later, we’ll join this dataset with EPA TRI database, a list of facilities andtheir toxic releases, and the EIA power-plant data to get at electricgeneration numbers.

Flight data looks like this:

Looking at just the column headers, we get:

Columns

Data Type

Description

Facility Id

ID (number)

ID specific to the EPA FLIGHT database

FRS Id

ID (number)

Standard EPA ID, used in multiple databases

Facility Name

ID (string)

 

City

geographic ID (discreet)

 

State

geographic ID (discreet)

 

Zip Code

geographic ID (discreet)

 

Address

geographic ID (discreet)

 

County

geographic ID (discreet)

 

Latitude

geographic ID (continuous)

 

Longitude

geographic ID (continuous)

 

Industry Type (sectors)

Classifier

Facilities are classified as one of 30+ industry types, based primarily on product output.

reported CO2e emissions (metric tons)

measurement

Total CO2 equivalent emissions reported by each facilities. This is the sum of multiple indicators, including the ones below.

CO2 emissions (non-biogenic)

measurement

Total direct CO2 emissions per facility, not from biological sources.

Methane (CH4) emissions

measurement

Total methane emissions per facility, in units of CO2e

Nitrous Oxide (N2O) emissions

measurement

Total N2O emissions per facility, in units of CO2e

Biogenic CO2 emissions (metric tons)

measurement

Total CO2 emissions from biological sources (cows, agriculture, etc.)

 

Now, there’s way too many rows here to work with it justusing Excel. Let’s load the data into Tableau.

What isTableau

Open Tableau and connect to a text file. Navigate to the GHGRP data.


1.2.  Dimensions/ measures and interface

Basic Tableau Interface:


A) This pane contains all the headers of your data,split into Dimensions and Measures. Whatare Dimensions and Measures? Typically, dimensions are your main IDvariables and are almost always discrete data. Measures are usuallymeasurements, often continuous data that are dependent on dimensions.

B) This pane contains the Columns and Rows shelf.Tableau is almost all drag-and-drop. You can drag dimensions and / or measuresto these shelves to build a graph.

C) This is the pane where a graph will show up. Ifempty, you can drag a dimension / measure directly to this area.

D) This pane contains the Filters shelf and Markcontrol. Marks are Tableau’s name for data-points. You can modify the look ofdata-points by adding dimensions to “color” or “size” or just clicking on thesebuttons. Filters are an option to crop out or refine the data. Drag a dimensionor measure to the Filters shelf in order to filter the data being graphed.

1.3.  Create1-D graph (aka, a histogram)

Question: What’s the distribution of facility CO2eemissions? Are there many large emitters vs small emitters? (We’ll need ahistogram to answer this)

To start figuring this out, we likely need to do somethingwith the “reported CO2e emissions” measure. Let’s drag that onto the Rows shelfand see what we get:


Because we haven’t differentiated by any dimensions, thisjust shows the sum of all facilities. Notice that, in the rows shelf, the“reported CO2e emissions” has a SUM before it. We can left click on that andchange it to average or std deviation, but let’s change it to COUNT. This isalmost a histogram, as it counts the number of facilities, but it doesn’t splitit out or generate bins based on facility emission output.



There is an easy way to change this to a histogram: click onthe “Show me” button in the top right of the screen.

The “Show me” lists all the basic graph types that Tableaucan generate. Most are greyed out, because we don’t have enough Dimensions /Measures on the Columns / Rows shelf to generate anything interesting. It’salways worthwhile to click “show me” if you’re not sure how to graph stuff.


Let’s click on Histogram, the only thing not greyed outunder “Show me”, this bell curve representing a histogram.

There’s a ton of small facilities and few large facilities,seems to follow a power law. So we’ve answered our initial question. Yay!

Note that Tableau created a new dimension “reported CO2eemissions (bin)”. This allowed Tableau to count the CO2e emissions reportswithin each bin size.





1.4.  Createa group from existing dimension (select groups of states to form regions)

Question: Might the distribution be different for differentregions in the U.S.?

Start a new worksheet through the menu bar at the top:worksheet -> new worksheet

Our data’s largest geographic Dimension is State, we do nothave a region Dimension. So, we need to create one. The easiest way to do thisis to drag State onto the middle of the graph area, “Drop field here”

This should cause a map to pop up, with a mark (circle) inthe middle of each state. We can click and drag on the map to select groups ofstates.







Right click on the selected group of states and click the“group” option.






This creates a legend on the right side of the screen,displaying two groups: the one we just made and all other states. Let’s changethe label of the new group to improve readability. Right-click on the group inthe legend, then click “Edit Alias”





Now, group the rest of the states into their respectiveregions. Northwest, Southwest, Midwest, Northeast, Southeast. Ignore Alaska,Hawaii, and other US territories.


Now we have a good Region dimension to use on other graphs.And, we can answer our question: what’re the regional distributions of GHGemissions? Go back to our histogram, and add the new dimension to the Columnsshelf.

1.5.  Modifyfirst histogram worksheet to generate histogram for each region. 

Add the new regions dimension to the Columns shelf

Takeaway from graph: Northwest and “other” (AK, HI) have fewfacilities. Midwest has a ton of small facilities. South has many largerfacilities.


Cool. So, We’ve seen some regional stuff, but there’sanother question: What’s the average CO2e emissions per industry? Onto the next section.

SUBPAGES (10): 1. INTRODUCTION 2. CREATING 2-D GRAPHS 3. 3-D GRAPHS AND MORE! 4. MAPS 5. MERGE IN EIA POWER PLANT DATA 6. DOES THE AMOUNT OF ELECTRICITY GENERATED INFLUENCE GHG EMISSIONS? 7. CALCULATED FIELDS 7. CALCULATED FIELDS 8. TRI DATASET & TABLE CALCULATIONS 9. DASHBOARDS