tutorial banner

Tutorial 1 - Weather Data: Accessing it, understanding it, visualizing it!#

This notebook explores a standard type of weather data, the typical meteorological year (TMY), and how to summarize it with Python and Pandas.

Overview

Steps:#

  • Weather data in PV performance models

  • Looking at a sample weather data file

  • Where to get weather data from?

  • Weather data to API

PV Concepts:#

  • TMY

  • GHI, DNI, DHI

  • DryBulb, Wspd

  • Irradiance vs. Insolation

Python Concepts:#

  • Exploring a Pandas dataframe (df): len(), df.head(), df.keys()

  • pvlib input-output tools

  • Ploting a Pandas dataframe (df): df.plot()

  • Aggregating data in a dataframe (df): df.resample(freq).sum()

  • Pandas DateOffsets - shortcuts to set the frequency when resampling

  • Getting NREL irradiance data from the web based API using pvlib

1. Weather Data & PV#

Weather and irradiance data are used as input to PV performance models.

SRRL

These data are directly measured, derived from measured data, or simulated using a stochastic model.

If you want to learn more about irradiance data, you can visit the Solar Radiometry webpage

Typical Meteorological Year#

TMY datasets are intended to represent the weather for a typical year at a given location.

TMY datasets provide hourly solar irradiance, air temperature, wind speed, and other weather measurements for a hypothetical year that represents more or less a “median year” for solar resource.

TMY3 screenshot

TMY datasets are created by selecting individual months out of an extended period of weather measurememts (say, 20 years of data) to construct a single year’s worth of data. There are several methods for selecting which months to include, but the general idea is to calculate monthly summary statistics and take the month that lies in the middle of the distribution. For example, no two Januaries will be exactly the same, so summing the total solar irradiance for each January will give a normal distribution, and the month that falls closest to the median is chosen as the representative month. The same process is followed for February, March, and so on, and all twelve representative months are stitched together into a year-long dataset.

The oldest TMYs were calculated using data from the nearest weather station (airports and such). Today, it’s common to use TMYs calculated using simulated weather data from satellite imagery because of the improved spatial resolution.

To get a better feel for TMY data, we’ll first explore an example TMY dataset that is bundled with pvlib.

# if running on google colab, uncomment the next line and execute this cell to install the dependencies and prevent "ModuleNotFoundError" in later cells:
# !pip install -r https://raw.githubusercontent.com/PVSC-Python-Tutorials/PVPMC_2022/main/requirements.txt

First Step: Import Libraries#

In Python, some functions are builtin like print() but others must be imported before they can be used. For this notebook we’re going to import three packages:

  • pvlib - library for simulating performance of photovoltaic energy systems.

  • pandas - analysis tool for timeseries and tabular data

  • matplotlib - data visualization for Python

Some Python modules are part of the standard library, but are not imported with builtins. We’ll use the pathlib module which is useful for accessing files and folders.

import os  # for getting environment variables
import pathlib  # for finding the example dataset
import pvlib
import pandas as pd  # for data wrangling
import matplotlib.pyplot as plt  # for visualization

Query which version you are using of pvlib:

print(pvlib.__version__)
0.9.1

Reading a TMY dataset with pvlib#

First, we’ll read the TMY dataset with pvlib.iotools.read_tmy3() which returns a Pandas DataFrame of the timeseries weather data and a second output with a Python dictionary of the TMY metadata like longitude, latitude, elevation, etc.

We will use the Python pathlib to get the path to the 'data' directory which comes with the pvlib package. Then we can use the slash operator, / to make the full path to the TMY file.

help(pvlib.iotools.read_tmy3)
Help on function read_tmy3 in module pvlib.iotools.tmy:

read_tmy3(filename, coerce_year=None, recolumn=True)
    Read a TMY3 file in to a pandas dataframe.
    
    Note that values contained in the metadata dictionary are unchanged
    from the TMY3 file (i.e. units are retained). In the case of any
    discrepancies between this documentation and the TMY3 User's Manual
    [1]_, the TMY3 User's Manual takes precedence.
    
    The TMY3 files were updated in Jan. 2015. This function requires the
    use of the updated files.
    
    Parameters
    ----------
    filename : str
        A relative file path or absolute file path.
    
    coerce_year : None or int, default None
        If supplied, the year of the index will be set to `coerce_year`, except
        for the last index value which will be set to the *next* year so that
        the index increases monotonically.
    
    recolumn : bool, default True
        If ``True``, apply standard names to TMY3 columns. Typically this
        results in stripping the units from the column name.
    
    Returns
    -------
    Tuple of the form (data, metadata).
    
    data : DataFrame
        A pandas dataframe with the columns described in the table
        below. For more detailed descriptions of each component, please
        consult the TMY3 User's Manual ([1]), especially tables 1-1
        through 1-6.
    
    metadata : dict
        The site metadata available in the file.
    
    Notes
    -----
    The returned structures have the following fields.
    
    ===============   ======  ===================
    key               format  description
    ===============   ======  ===================
    altitude          Float   site elevation
    latitude          Float   site latitudeitude
    longitude         Float   site longitudeitude
    Name              String  site name
    State             String  state
    TZ                Float   UTC offset
    USAF              Int     USAF identifier
    ===============   ======  ===================
    
    =============================       ======================================================================================================================================================
    TMYData field                       description
    =============================       ======================================================================================================================================================
    TMYData.Index                       A pandas datetime index. NOTE, the index is currently timezone unaware, and times are set to local standard time (daylight savings is not included)
    TMYData.ETR                         Extraterrestrial horizontal radiation recv'd during 60 minutes prior to timestamp, Wh/m^2
    TMYData.ETRN                        Extraterrestrial normal radiation recv'd during 60 minutes prior to timestamp, Wh/m^2
    TMYData.GHI                         Direct and diffuse horizontal radiation recv'd during 60 minutes prior to timestamp, Wh/m^2
    TMYData.GHISource                   See [1]_, Table 1-4
    TMYData.GHIUncertainty              Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.DNI                         Amount of direct normal radiation (modeled) recv'd during 60 mintues prior to timestamp, Wh/m^2
    TMYData.DNISource                   See [1]_, Table 1-4
    TMYData.DNIUncertainty              Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.DHI                         Amount of diffuse horizontal radiation recv'd during 60 minutes prior to timestamp, Wh/m^2
    TMYData.DHISource                   See [1]_, Table 1-4
    TMYData.DHIUncertainty              Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.GHillum                     Avg. total horizontal illuminance recv'd during the 60 minutes prior to timestamp, lx
    TMYData.GHillumSource               See [1]_, Table 1-4
    TMYData.GHillumUncertainty          Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.DNillum                     Avg. direct normal illuminance recv'd during the 60 minutes prior to timestamp, lx
    TMYData.DNillumSource               See [1]_, Table 1-4
    TMYData.DNillumUncertainty          Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.DHillum                     Avg. horizontal diffuse illuminance recv'd during the 60 minutes prior to timestamp, lx
    TMYData.DHillumSource               See [1]_, Table 1-4
    TMYData.DHillumUncertainty          Uncertainty based on random and bias error estimates                        see [2]_
    TMYData.Zenithlum                   Avg. luminance at the sky's zenith during the 60 minutes prior to timestamp, cd/m^2
    TMYData.ZenithlumSource             See [1]_, Table 1-4
    TMYData.ZenithlumUncertainty        Uncertainty based on random and bias error estimates                        see [1]_ section 2.10
    TMYData.TotCld                      Amount of sky dome covered by clouds or obscuring phenonema at time stamp, tenths of sky
    TMYData.TotCldSource                See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.TotCldUncertainty           See [1]_, Table 1-6
    TMYData.OpqCld                      Amount of sky dome covered by clouds or obscuring phenonema that prevent observing the sky at time stamp, tenths of sky
    TMYData.OpqCldSource                See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.OpqCldUncertainty           See [1]_, Table 1-6
    TMYData.DryBulb                     Dry bulb temperature at the time indicated, deg C
    TMYData.DryBulbSource               See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.DryBulbUncertainty          See [1]_, Table 1-6
    TMYData.DewPoint                    Dew-point temperature at the time indicated, deg C
    TMYData.DewPointSource              See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.DewPointUncertainty         See [1]_, Table 1-6
    TMYData.RHum                        Relatitudeive humidity at the time indicated, percent
    TMYData.RHumSource                  See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.RHumUncertainty             See [1]_, Table 1-6
    TMYData.Pressure                    Station pressure at the time indicated, 1 mbar
    TMYData.PressureSource              See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.PressureUncertainty         See [1]_, Table 1-6
    TMYData.Wdir                        Wind direction at time indicated, degrees from north (360 = north; 0 = undefined,calm)
    TMYData.WdirSource                  See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.WdirUncertainty             See [1]_, Table 1-6
    TMYData.Wspd                        Wind speed at the time indicated, meter/second
    TMYData.WspdSource                  See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.WspdUncertainty             See [1]_, Table 1-6
    TMYData.Hvis                        Distance to discernable remote objects at time indicated (7777=unlimited), meter
    TMYData.HvisSource                  See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.HvisUncertainty             See [1]_, Table 1-6
    TMYData.CeilHgt                     Height of cloud base above local terrain (7777=unlimited), meter
    TMYData.CeilHgtSource               See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.CeilHgtUncertainty          See [1]_, Table 1-6
    TMYData.Pwat                        Total precipitable water contained in a column of unit cross section from earth to top of atmosphere, cm
    TMYData.PwatSource                  See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.PwatUncertainty             See [1]_, Table 1-6
    TMYData.AOD                         The broadband aerosol optical depth per unit of air mass due to extinction by aerosol component of atmosphere, unitless
    TMYData.AODSource                   See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.AODUncertainty              See [1]_, Table 1-6
    TMYData.Alb                         The ratio of reflected solar irradiance to global horizontal irradiance, unitless
    TMYData.AlbSource                   See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.AlbUncertainty              See [1]_, Table 1-6
    TMYData.Lprecipdepth                The amount of liquid precipitation observed at indicated time for the period indicated in the liquid precipitation quantity field, millimeter
    TMYData.Lprecipquantity             The period of accumulatitudeion for the liquid precipitation depth field, hour
    TMYData.LprecipSource               See [1]_, Table 1-5, 8760x1 cell array of strings
    TMYData.LprecipUncertainty          See [1]_, Table 1-6
    TMYData.PresWth                     Present weather code, see [2]_.
    TMYData.PresWthSource               Present weather code source, see [2]_.
    TMYData.PresWthUncertainty          Present weather code uncertainty, see [2]_.
    =============================       ======================================================================================================================================================
    
    .. warning:: TMY3 irradiance data corresponds to the *previous* hour, so
        the first index is 1AM, corresponding to the irradiance from midnight
        to 1AM, and the last index is midnight of the *next* year. For example,
        if the last index in the TMY3 file was 1988-12-31 24:00:00 this becomes
        1989-01-01 00:00:00 after calling :func:`~pvlib.iotools.read_tmy3`.
    
    .. warning:: When coercing the year, the last index in the dataframe will
        become midnight of the *next* year. For example, if the last index in
        the TMY3 was 1988-12-31 24:00:00, and year is coerced to 1990 then this
        becomes 1991-01-01 00:00:00.
    
    References
    ----------
    
    .. [1] Wilcox, S and Marion, W. "Users Manual for TMY3 Data Sets".
       NREL/TP-581-43156, Revised May 2008.
    
    .. [2] Wilcox, S. (2007). National Solar Radiation Database 1991 2005
       Update: Users Manual. 472 pp.; NREL Report No. TP-581-41364.
DATA_DIR = pathlib.Path(pvlib.__file__).parent / 'data'
df_tmy, meta_dict = pvlib.iotools.read_tmy3(DATA_DIR / '723170TYA.CSV', coerce_year=1990)
meta_dict  # display the dictionary of metadata
{'USAF': 723170,
 'Name': '"GREENSBORO PIEDMONT TRIAD INT"',
 'State': 'NC',
 'TZ': -5.0,
 'latitude': 36.1,
 'longitude': -79.95,
 'altitude': 273.0}

Let’s display the first 4 lines of the dataframe

df_tmy.head(4)
Date (MM/DD/YYYY) Time (HH:MM) ETR ETRN GHI GHISource GHIUncertainty DNI DNISource DNIUncertainty ... Alb AlbSource AlbUncertainty Lprecipdepth Lprecipquantity LprecipSource LprecipUncertainty PresWth PresWthSource PresWthUncertainty
1990-01-01 01:00:00-05:00 01/01/1988 01:00 0 0 0 1 0 0 1 0 ... 0.0 ? 0 0 1 D 9 0 C 8
1990-01-01 02:00:00-05:00 01/01/1988 02:00 0 0 0 1 0 0 1 0 ... 0.0 ? 0 0 1 D 9 0 C 8
1990-01-01 03:00:00-05:00 01/01/1988 03:00 0 0 0 1 0 0 1 0 ... 0.0 ? 0 0 1 D 9 0 C 8
1990-01-01 04:00:00-05:00 01/01/1988 04:00 0 0 0 1 0 0 1 0 ... 0.0 ? 0 0 1 D 9 0 C 8

4 rows × 71 columns

This dataset follows the standard format of handling timeseries data with pandas – one row per timestamp, one column per measurement type. Because TMY files represent one year of data (no leap years), that means they’ll have 8760 rows. The number of columns can vary depending on the source of the data.

print("Number of rows:", len(df_tmy))
print("Number of columns:", len(df_tmy.columns))
Number of rows: 8760
Number of columns: 71

You can access single rows by pointing to its number location (iloc) or by using the index name it has. In this case, that is a dateTime

df_tmy.iloc[0];
df_tmy.loc['1990-01-01 01:00:00-05:00'];

You can also print all the column names in the dataframe

df_tmy.keys()
Index(['Date (MM/DD/YYYY)', 'Time (HH:MM)', 'ETR', 'ETRN', 'GHI', 'GHISource',
       'GHIUncertainty', 'DNI', 'DNISource', 'DNIUncertainty', 'DHI',
       'DHISource', 'DHIUncertainty', 'GHillum', 'GHillumSource',
       'GHillumUncertainty', 'DNillum', 'DNillumSource', 'DNillumUncertainty',
       'DHillum', 'DHillumSource', 'DHillumUncertainty', 'Zenithlum',
       'ZenithlumSource', 'ZenithlumUncertainty', 'TotCld', 'TotCldSource',
       'TotCldUncertainty', 'OpqCld', 'OpqCldSource', 'OpqCldUncertainty',
       'DryBulb', 'DryBulbSource', 'DryBulbUncertainty', 'DewPoint',
       'DewPointSource', 'DewPointUncertainty', 'RHum', 'RHumSource',
       'RHumUncertainty', 'Pressure', 'PressureSource', 'PressureUncertainty',
       'Wdir', 'WdirSource', 'WdirUncertainty', 'Wspd', 'WspdSource',
       'WspdUncertainty', 'Hvis', 'HvisSource', 'HvisUncertainty', 'CeilHgt',
       'CeilHgtSource', 'CeilHgtUncertainty', 'Pwat', 'PwatSource',
       'PwatUncertainty', 'AOD', 'AODSource', 'AODUncertainty', 'Alb',
       'AlbSource', 'AlbUncertainty', 'Lprecipdepth', 'Lprecipquantity',
       'LprecipSource', 'LprecipUncertainty', 'PresWth', 'PresWthSource',
       'PresWthUncertainty'],
      dtype='object')

There are 71 columns, which is quite a lot! For now, let’s focus just on the ones that are most important for PV modeling – the irradiance, temperature, and wind speed columns, and extract them into a new DataFrame.

Irradiance#

Irradiance is an instantaneous measurement of solar power over some area. For practical purposes of measurement and interpretation, irradiance is expressed and separated into different components.

overview irradiance

The units of irradiance are watts per square meter.

Wind#

Wind speed is measured with an anemometer. The most common type is a the cup-type anemometer, shown on the right side of the picture below. The number of rotations per time interval is used to calculate the wind speed. The vane on the left is used to measure the direction of the wind. Wind direction is reported as the direction from which the wind is blowing.

Air temperature#

Also known as dry-bulb temperature, is the temperature of the ambient air when the measurement device is shielded from radiation and moisture. The most common method of air temperature measurement uses a resistive temperature device (RTD) or thermocouple within a radiation shield. The shield blocks sunlight from reaching the sensor (avoiding radiative heating), yet allows natural air flow around the sensor. More accurate temperature measurement devices utilize a shield which forces air across the sensor.

Air temperature is typically measured on the Celsius scale.

Air temperature plays a large role in PV system performance as PV modules and inverters are cooled convectively by the surrounding air.

Downselect columns#

There are a lot more weather data in that file that you can access. To investigate all the column headers, we used .keys() above. Always read the Instruction Manual for the weather files to get more details on how the data is aggregated, units, etc.

At this point we are interested in GHI, DHI, DNI, DryBulb and Wind Speed . For this NREL TMY3 dataset the units of irradiance are W/m², dry bulb temperature is in °C, and wind speed is m/s.

# GHI, DHI, DNI are irradiance measurements
# DryBulb is the "dry-bulb" (ambient) temperature
# Wspd is wind speed
df = df_tmy[['GHI', 'DHI', 'DNI', 'DryBulb', 'Wspd']]
# show the first 15 rows:
df.head(15)
GHI DHI DNI DryBulb Wspd
1990-01-01 01:00:00-05:00 0 0 0 10.0 6.2
1990-01-01 02:00:00-05:00 0 0 0 10.0 5.2
1990-01-01 03:00:00-05:00 0 0 0 10.0 5.7
1990-01-01 04:00:00-05:00 0 0 0 10.0 5.7
1990-01-01 05:00:00-05:00 0 0 0 10.0 5.2
1990-01-01 06:00:00-05:00 0 0 0 10.0 4.1
1990-01-01 07:00:00-05:00 0 0 0 10.0 4.1
1990-01-01 08:00:00-05:00 9 9 1 10.0 5.2
1990-01-01 09:00:00-05:00 46 46 3 10.0 5.2
1990-01-01 10:00:00-05:00 79 78 4 10.6 5.2
1990-01-01 11:00:00-05:00 199 198 3 11.7 6.2
1990-01-01 12:00:00-05:00 261 260 3 11.7 5.2
1990-01-01 13:00:00-05:00 155 155 0 11.7 5.2
1990-01-01 14:00:00-05:00 144 144 2 11.7 3.1
1990-01-01 15:00:00-05:00 131 131 1 11.1 4.1

2. Plotting Weather time series data with pandas and matplotlib#

Let’s make some plots to get a better idea of what TMY data gives us.

Irradiance#

First, the three irradiance fields:

first_week = df.head(24*7)  # Plotting 7 days, each one has 24 hours or entries
first_week[['GHI', 'DHI', 'DNI']].plot()
plt.ylabel('Irradiance [W/m$^2$]');
_images/Tutorial 1 - TMY Weather Data_32_0.png

Let’s control the parameters a bit more

birthday = df.loc['1990-11-06':'1990-11-06']
plt.plot(birthday['DNI'], color='r') 
plt.plot(birthday['DHI'], color='g', marker='.') 
plt.plot(birthday['GHI'], color='b', marker='s') 
plt.ylabel('Irradiance [W/m$^2$]');
_images/Tutorial 1 - TMY Weather Data_34_0.png

Exercise#

How does the Irradiance look like in YOUR birthday?

Hint: the next cell is ‘Markdown’, you need to switch it to ‘Code’ for it to run

birthday = df.loc[] # Type your birthday here using the 1990 year, i.e. ‘1990-11-06’:’1990-11-06’ plt.plot(birthday[‘DNI’], color=’g’) plt.plot(birthday[‘DHI’], color=’b’, marker=’.’) plt.plot(birthday[‘GHI’], color=’r’, marker=’s’) plt.ylabel(‘Irradiance [W/m\(^2\)]’);

GHI, DHI, and DNI are the three “basic” ways of measuring irradiance, although each of them is measured in units of power per area (watts per square meter):

  • GHI: Global Horizontal Irradiance; the total sunlight intensity falling on a horizontal plane

  • DHI: Diffuse Horizontal Irradiance; the subset of sunlight falling on a horizontal plane that isn’t coming directly from the sun (e.g., the light that makes the sky blue)

  • DNI: Direct Normal Irradiance; the subset of sunlight coming directly from the sun

Overview

Overview

Overview

Later tutorials will show how these three values are used in PV modeling. For now, let’s just get a qualitative understanding of the differences between them: looking at the above plot, there is a pattern where when DNI is high, DHI is low. The sun puts out a (roughly) constant amount of energy, which means photons either make it through the atmosphere without scattering and are counted as direct irradiance, or they tend to get scattered and become part of the diffuse irradiance, but not both. Looking at DNI makes it easy to pick out which hours are cloud and which are sunny – most days in January are rather overcast with low irradiance, but the sun does occasionally break through.

In addition to daily variation, there is also seasonal variation in irradiance. Let’s compare a winter week with a summer week, using pandas to select out a summertime subset. Notice the increased intensity in summer – GHI peaks around 900 W/m^2, whereas in winter the maximum was more like 500 W/m^2.

summer_week = df.loc['1990-06-01':'1990-06-08']
summer_week[['GHI', 'DHI', 'DNI']].plot()
plt.ylabel('Irradiance [W/m$^2$]');
_images/Tutorial 1 - TMY Weather Data_41_0.png

Temperature#

Next up is temperature:

first_week['DryBulb'].plot()
plt.ylabel('Ambient Temperature [°C]');
_images/Tutorial 1 - TMY Weather Data_43_0.png

Wind speed#

And finally, wind speed:

first_week['Wspd'].plot()
plt.ylabel('Wind Speed [m/s]');
_images/Tutorial 1 - TMY Weather Data_45_0.png

Aggregating hourly data to monthly summaries#

Pandas makes it easy to roll-up timeseries data into summary values. We can use the DataFrame.resample() function with DateOffsets like 'M' for months. For example, we can calculate total monthly GHI as a quick way to visualize the seasonality of solar resource:

# summing hourly irradiance (W/m^2) gives insolation (W h/m^2)
monthly_ghi = df['GHI'].resample('M').sum()
monthly_ghi.head(4)
1990-01-31 00:00:00-05:00     74848
1990-02-28 00:00:00-05:00     85751
1990-03-31 00:00:00-05:00    131766
1990-04-30 00:00:00-05:00    162302
Freq: M, Name: GHI, dtype: int64
monthly_ghi = monthly_ghi.tz_localize(None)  # don't need timezone for monthly data
monthly_ghi.plot.bar()
plt.ylabel('Monthly Global Horizontal Irradiance\n[W h/m$^2$]');
_images/Tutorial 1 - TMY Weather Data_48_0.png

We can also take monthly averages instead of monthly sums:

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()  # add a second y-axis
monthly_average_temp_wind = df[['DryBulb', 'Wspd']].resample('M').mean()
monthly_average_temp_wind['DryBulb'].plot(ax=ax1, c='tab:blue')
monthly_average_temp_wind['Wspd'].plot(ax=ax2, c='tab:orange')
ax1.set_ylabel(r'Ambient Temperature [$\degree$ C]')
ax2.set_ylabel(r'Wind Speed [m/s]')
ax1.legend(loc='lower left')
ax2.legend(loc='lower right');
_images/Tutorial 1 - TMY Weather Data_50_0.png

Exercise#

Plot the Average DNI per Day

try:
    daily_average_DNI = df[['']].resample('').mean()  # Add the column name, and resample by day. Month is 'M', day is..
    daily_average_DNI.plot()
except:
    print("You haven't finished this exercise correctly, try again!")
You haven't finished this exercise correctly, try again!

3. Where to get Free Solar Irradiance Data?#

There are many different sources of solar irradiance data. For your projects, these are some of the most common:

  • TMY3/TMY2 - like the data we loaded earlier on this example. This database of TMY3 .csv files and TMY2 files have been archived by NREL.

  • NSRDB - National Solar Radiation Database. You can access data through the website for many locations accross the world, or you can use their web API to download data programmatically. An “API” is an “application programming interface”, and a “web API” is a programming interface that allows you to write code to interact with web services like the NSRDB.

  • EPW - Energy Plus Weather data is available for many locations accross the world. It’s in its own format file (‘EPW’) so you can’t open it easily in a spreadsheet program like Excel, but you can use pvlib.iotools.read_epw() to get it into a dataframe and use it.

  • PVGIS - Free global weather data provided by the European Union and derived from many govermental agencies including the NSRDB. PVGIS also provides a web API. You can get PVGIS TMY data using pvlib.iotools.get_pvgis_tmy().

  • Perhaps another useful link: https://sam.nrel.gov/weather-data.html

Where else can you get historical irradiance data?#

There are several commercial providers of solar irradiance data. Data is available at different spatial and time resolutions. Each provider offers data under subscription that will provide access to irradiance (and other weather variables) via API to leverage in python.

NSRDB Example

NREL API Key#

At the NREL Developer Network, there are APIs to a lot of valuable solar resources like weather data from the NSRDB, operational data from PVDAQ, or indicative calculations using PVWatts. In order to use these resources from NREL, you need to register for a free API key. You can test out the APIs using the DEMO_KEY but it has limited bandwidth compared to the usage limit for registered users. NREL has some API usage instructions, but pvlib has a few builtin functions, like pvlib.iotools.get_psm3(), that wrap the NREL API, and call them for you to make it much easier to use. Skip ahead to the next section to learn more. But before you do…

Please pause now to visit https://developer.nrel.gov/signup/ and get an API key.

Application Programming Interface (API)#

What exactly is an API? Nowadays, the phrase is used interchangeably with a “web API” but in general an API is just a recipe for how to interface with a application programmatically, IE: in code. An API could be as simple as a function signature or its published documentation, EG: the API for the solarposition function is you give it an ISO8601 formatted date with a timezone, the latitude, longitude, and elevation as numbers, and it returns the zenith and azimuth as numbers.

A web API is the same, except the application is a web service, that you access at its URL using web methods. We won’t go into too much more detail here, but the most common web method is GET which is pretty self explanatory. Look over the NREL web usage instructions for some examples, but interacting with a web API can be as easy as entering a URL into a browser. Try the URL below to get the PVWatts energy output for a fixed tilt site in Broomfield, CO.

https://developer.nrel.gov/api/pvwatts/v6.json?api_key=DEMO_KEY&lat=40&lon=-105&system_capacity=4&azimuth=180&tilt=40&array_type=1&module_type=1&losses=10

In addition to just using your browser, you can also access web APIs programmatically. The most popular Python package to interact with web APIs is requests. There’s also free open source command-line tools like cURL and HTTPie, and a popular nagware/freemium GUI application called Postman.

If you have an NREL API key please enter it in the next cell.

NREL_API_KEY = None  # <-- please set your NREL API key here

# note you must use "quotes" around your key, for example:
# NREL_API_KEY = 'DEMO_KEY'  # single or double both work fine

# during the live tutorial, we've stored a dedicated key on our server
if NREL_API_KEY is None:
    try:
        NREL_API_KEY = os.environ['NREL_API_KEY']  # get dedicated key for tutorial from servier
    except KeyError:
        NREL_API_KEY = 'DEMO_KEY'  # OK for this demo, but better to get your own key

Fetching TMYs from the NSRDB#

The example TMY dataset used here is from an airport in North Carolina, but what if we wanted to model a PV system somewhere else? The NSRDB, one of many sources of weather data intended for PV modeling, is free and easy to access using pvlib. As an example, we’ll fetch a TMY dataset for Albuquerque, New Mexico at coordinates (35.0844, -106.6504). We use pvlib.iotools.get_psm3() which returns a Python dictionary of metadata and a Pandas dataframe of the timeseries weather data.

df_abq, metadata = pvlib.iotools.get_psm3(
    latitude=35.0844, longitude=-106.6504,
    api_key=NREL_API_KEY,
    email='mark.mikofski@dnv.com',  # <-- any email works here fine
    names='tmy',
    map_variables=True)
metadata
{'Source': 'NSRDB',
 'Location ID': '94139',
 'City': '-',
 'State': '-',
 'Country': '-',
 'Time Zone': -7,
 'Local Time Zone': -7,
 'Dew Point Units': 'c',
 'DHI Units': 'w/m2',
 'DNI Units': 'w/m2',
 'GHI Units': 'w/m2',
 'Temperature Units': 'c',
 'Pressure Units': 'mbar',
 'Wind Direction Units': 'Degrees',
 'Wind Speed Units': 'm/s',
 'Surface Albedo Units': 'N/A',
 'Version': '3.2.0',
 'latitude': 35.09,
 'longitude': -106.66,
 'altitude': 1496}

TMY datasets from the PSM3 service of the NSRDB are timestamped using the real year that the measurements came from. The pvlib.iotools.read_tmy3() function had a coerce_year argument to force everything to align to a single dummy year, but pvlib.iotools.get_psm3() doesn’t have that feature. For convenience let’s standardize the data to 1990 and then compare monthly GHI to the North Carolina location:

df_abq
Year Month Day Hour Minute dhi wind_speed Dew Point albedo pressure temp_air wind_direction dni ghi
2020-01-01 00:30:00-07:00 2020 1 1 0 30 0.0 1.6 -8.8 0.17 845.0 -2.6 324.0 0.0 0.0
2020-01-01 01:30:00-07:00 2020 1 1 1 30 0.0 1.5 -9.1 0.17 845.0 -3.0 327.0 0.0 0.0
2020-01-01 02:30:00-07:00 2020 1 1 2 30 0.0 1.4 -9.4 0.17 844.0 -3.4 333.0 0.0 0.0
2020-01-01 03:30:00-07:00 2020 1 1 3 30 0.0 1.4 -9.7 0.17 844.0 -3.8 336.0 0.0 0.0
2020-01-01 04:30:00-07:00 2020 1 1 4 30 0.0 1.3 -9.8 0.17 844.0 -4.1 334.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2012-12-31 19:30:00-07:00 2012 12 31 19 30 0.0 1.4 -7.9 0.17 846.0 -0.7 329.0 0.0 0.0
2012-12-31 20:30:00-07:00 2012 12 31 20 30 0.0 1.4 -8.0 0.17 846.0 -1.3 333.0 0.0 0.0
2012-12-31 21:30:00-07:00 2012 12 31 21 30 0.0 1.4 -8.2 0.17 846.0 -1.7 330.0 0.0 0.0
2012-12-31 22:30:00-07:00 2012 12 31 22 30 0.0 1.5 -8.4 0.17 846.0 -1.9 326.0 0.0 0.0
2012-12-31 23:30:00-07:00 2012 12 31 23 30 0.0 1.5 -8.6 0.17 846.0 -2.2 324.0 0.0 0.0

8760 rows × 14 columns

df_abq['Year'] = 1990
df_abq.index = pd.to_datetime(df_abq[['Year', 'Month', 'Day', 'Hour']])

ghi_comparison = pd.DataFrame({
    'NC': monthly_ghi,  # using the monthly values from earlier
    'NM': df_abq['ghi'].resample('M').sum(),
})

ghi_comparison.plot.bar()
plt.ylabel('Monthly GHI [W h/m^2]');
_images/Tutorial 1 - TMY Weather Data_62_0.png

It’s not too surprising to see that our New Mexico location is significantly sunnier than the one in North Carolina.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.