Tutorial 1 - Weather Data: Accesing it, understanding it, visualizing it!#
This notebook explores a standard type of weather data, the typical meteorological year (TMY), and how to summarize it with Python and Pandas.
Steps:#
Looking at a sample weather data file
Where to get weather data from?
Weather data to API
PV Concepts:#
TMY
GHI, DNI, DHI
DryBulb, Wspd
Irradiance vs. Insolation
Python Concepts:#
Exploring a Pandas dataframe (
df
):len()
,df.head()
,df.keys()
Ploting a Pandas dataframe (
df
):df.plot()
Aggregating data in a dataframe (
df
):df.resample(freq).sum()
Pandas
DateOffsets
- shortcuts to set the frequency when resamplingGetting NREL irradiance data from the web based API using pvlib
Weather Data & PV#
Weather and irradiance data are used as input to PV performance models.
These data are directly measured, derived from measured data, or simulated using a stochastic model.
Typical Meteorological Year#
TMY datasets are intended to represent the weather for a typical year at a given location.
TMY datasets provide hourly solar irradiance, air temperature, wind speed, and other weather measurements for a hypothetical year that represents more or less a “median year” for solar resource.
TMY datasets are created by selecting individual months out of an extended period of weather measurememts (say, 20 years of data) to construct a single year’s worth of data. There are several methods for selecting which months to include, but the general idea is to calculate monthly summary statistics and take the month that lies in the middle of the distribution. For example, no two Januaries will be exactly the same, so summing the total solar irradiance for each January will give a normal distribution, and the month that falls closest to the median is chosen as the representative month. The same process is followed for February, March, and so on, and all twelve representative months are stitched together into a year-long dataset.
The oldest TMYs were calculated using data from the nearest weather station (airports and such). Today, it’s common to use TMYs calculated using simulated weather data from satellite imagery because of the improved spatial resolution.
To get a better feel for TMY data, we’ll first explore an example TMY dataset that is bundled with pvlib.
Irradiance#
Irradiance is an instantaneous measurement of solar power over some area. For practical purposes of measurement and interpretation, irradiance is expressed and separated into different components.
The units of irradiance are watts per square meter.
GHI, DHI, and DNI are the three “basic” ways of measuring irradiance, although each of them is measured in units of power per area (watts per square meter):
GHI: Global Horizontal Irradiance; the total sunlight intensity falling on a horizontal plane
DHI: Diffuse Horizontal Irradiance; the subset of sunlight falling on a horizontal plane that isn’t coming directly from the sun (e.g., the light that makes the sky blue)
DNI: Direct Normal Irradiance; the subset of sunlight coming directly from the sun
Wind#
Wind speed is measured with an anemometer. The most common type is a the cup-type anemometer, shown on the right side of the picture below. The number of rotations per time interval is used to calculate the wind speed. The vane on the left is used to measure the direction of the wind. Wind direction is reported as the direction from which the wind is blowing.
Air temperature#
Also known as dry-bulb temperature, is the temperature of the ambient air when the measurement device is shielded from radiation and moisture. The most common method of air temperature measurement uses a resistive temperature device (RTD) or thermocouple within a radiation shield. The shield blocks sunlight from reaching the sensor (avoiding radiative heating), yet allows natural air flow around the sensor. More accurate temperature measurement devices utilize a shield which forces air across the sensor.
Air temperature is typically measured on the Celsius scale.
Air temperature plays a large role in PV system performance as PV modules and inverters are cooled convectively by the surrounding air.
Where to get Free Solar Irradiance Data?#
There are many different sources of solar irradiance data. For your projects, these are some of the most common:
NSRDB - National Solar Radiation Database. You can access data through the website for many locations accross the world, or you can use their web API to download data programmatically. An “API” is an “application programming interface”, and a “web API” is a programming interface that allows you to write code to interact with web services like the NSRDB.
EPW - Energy Plus Weather data is available for many locations accross the world. It’s in its own format file (‘EPW’) so you can’t open it easily in a spreadsheet program like Excel, but you can use
pvlib.iotools.read_epw()
to get it into a dataframe and use it.PVGIS - Free global weather data provided by the European Union and derived from many govermental agencies including the NSRDB. PVGIS also provides a web API. You can get PVGIS TMY data using
pvlib.iotools.get_pvgis_tmy()
.Perhaps another useful link: https://sam.nrel.gov/weather-data.html
Where else can you get historical irradiance data?#
There are several commercial providers of solar irradiance data. Data is available at different spatial and time resolutions. Each provider offers data under subscription that will provide access to irradiance (and other weather variables) via API to leverage in python.
NREL API Key#
At the NREL Developer Network, there are APIs to a lot of valuable solar resources like weather data from the NSRDB, operational data from PVDAQ, or indicative calculations using PVWatts. In order to use these resources from NREL, you need to register for a free API key. You can test out the APIs using the DEMO_KEY
but it has limited bandwidth compared to the usage limit for registered users. NREL has some API usage instructions, but pvlib has a few builtin functions, like pvlib.iotools.get_psm3()
, that wrap the NREL API, and call them for you to make it much easier to use.
Application Programming Interface (API)#
What exactly is an API? Nowadays, the phrase is used interchangeably with a “web API” but in general an API is just a recipe for how to interface with a application programmatically, IE: in code. An API could be as simple as a function signature or its published documentation, EG: the API for the solarposition
function is you give it an ISO8601 formatted date with a timezone, the latitude, longitude, and elevation as numbers, and it returns the zenith and azimuth as numbers.
A web API is the same, except the application is a web service, that you access at its URL using web methods. We won’t go into too much more detail here, but the most common web method is GET
which is pretty self explanatory. Look over the NREL web usage instructions for some examples, but interacting with a web API can be as easy as entering a URL into a browser. Try the URL below to get the PVWatts energy output for a fixed tilt site in Broomfield, CO.
In addition to just using your browser, you can also access web APIs programmatically. The most popular Python package to interact with web APIs is requests. There’s also free open source command-line tools like cURL and HTTPie, and a popular nagware/freemium GUI application called Postman.
Now to Coding:#
0. First Step for Google Collab:#
If running on google colab, uncomment (remove the # sign) from the next cell and execute it to install the dependencies and prevent “ModuleNotFoundError” in later cells.
# !pip install -r https://raw.githubusercontent.com/PV-Tutorials/2024_PVSC/main/requirements.txt
1. Import Libraries#
In Python, some functions are builtin like print()
but others must be imported before they can be used. For this notebook we’re going to import three packages:
pvlib - library for simulating performance of photovoltaic energy systems.
pandas - analysis tool for timeseries and tabular data
matplotlib - data visualization for Python
Some Python modules are part of the standard library, but are not imported with builtins.
import os # for getting environment variables
import pathlib # for finding the example dataset
import pvlib
import pandas as pd # for data wrangling
import matplotlib.pyplot as plt # for visualization
Query which version you are using of pvlib:
print(pvlib.__version__)
0.11.2
2. Fetching TMYs from the NSRDB#
The NSRDB, one of many sources of weather data intended for PV modeling, is free and easy to access using pvlib. As an example, we’ll fetch a TMY dataset for Albuquerque, New Mexico Rico at coordinates (35.06995837964615, -106.63433040356928). We use pvlib.iotools.get_psm3()
which returns a Python dictionary of metadata and a Pandas dataframe of the timeseries weather data.
Please pause now to visit https://developer.nrel.gov/signup/ and get an API key.
If you have an NREL API key please enter it in the next cell.
NREL_API_KEY = None # <-- please set your NREL API key here
# note you must use "quotes" around your key, for example:
# NREL_API_KEY = 'DEMO_KEY' # single or double both work fine
# during the live tutorial, we've stored a dedicated key on our server
if NREL_API_KEY is None:
try:
NREL_API_KEY = os.environ['NREL_API_KEY'] # get dedicated key for tutorial from servier
except KeyError:
NREL_API_KEY = 'DEMO_KEY' # OK for this demo, but better to get your own key
df_tmy, metadata = pvlib.iotools.get_psm3(
latitude=18.4671, longitude=-66.1185,
api_key=NREL_API_KEY,
email='silvana.ovaitt@nrel.com', # <-- any email works here fine
names='2021')
metadata
{'Source': 'NSRDB',
'Location ID': '1493238',
'City': '-',
'State': '-',
'Country': '-',
'Time Zone': -4,
'Local Time Zone': -4,
'Clearsky DHI Units': 'w/m2',
'Clearsky DNI Units': 'w/m2',
'Clearsky GHI Units': 'w/m2',
'Dew Point Units': 'c',
'DHI Units': 'w/m2',
'DNI Units': 'w/m2',
'GHI Units': 'w/m2',
'Solar Zenith Angle Units': 'Degree',
'Temperature Units': 'c',
'Pressure Units': 'mbar',
'Relative Humidity Units': '%',
'Precipitable Water Units': 'cm',
'Wind Direction Units': 'Degrees',
'Wind Speed Units': 'm/s',
'Cloud Type -15': 'N/A',
'Cloud Type 0': 'Clear',
'Cloud Type 1': 'Probably Clear',
'Cloud Type 2': 'Fog',
'Cloud Type 3': 'Water',
'Cloud Type 4': 'Super-Cooled Water',
'Cloud Type 5': 'Mixed',
'Cloud Type 6': 'Opaque Ice',
'Cloud Type 7': 'Cirrus',
'Cloud Type 8': 'Overlapping',
'Cloud Type 9': 'Overshooting',
'Cloud Type 10': 'Unknown',
'Cloud Type 11': 'Dust',
'Cloud Type 12': 'Smoke',
'Fill Flag 0': 'N/A',
'Fill Flag 1': 'Missing Image',
'Fill Flag 2': 'Low Irradiance',
'Fill Flag 3': 'Exceeds Clearsky',
'Fill Flag 4': 'Missing CLoud Properties',
'Fill Flag 5': 'Rayleigh Violation',
'Surface Albedo Units': 'N/A',
'Version': 'v3.2.2',
'latitude': 18.45,
'longitude': -66.1,
'altitude': 3}
What if we want a specific year, can NSRDB API provide it? Identify the answer with the cell below:
help(pvlib.iotools.get_psm3)
Help on function get_psm3 in module pvlib.iotools.psm3:
get_psm3(latitude, longitude, api_key, email, names='tmy', interval=60, attributes=('air_temperature', 'dew_point', 'dhi', 'dni', 'ghi', 'surface_albedo', 'surface_pressure', 'wind_direction', 'wind_speed'), leap_day=True, full_name='pvlib python', affiliation='pvlib python', map_variables=True, url=None, timeout=30)
Retrieve NSRDB PSM3 timeseries weather data from the PSM3 API. The NSRDB
is described in [1]_ and the PSM3 API is described in [2]_, [3]_, and [4]_.
.. versionchanged:: 0.9.0
The function now returns a tuple where the first element is a dataframe
and the second element is a dictionary containing metadata. Previous
versions of this function had the return values switched.
.. versionchanged:: 0.10.0
The default endpoint for hourly single-year datasets is now v3.2.2.
The previous datasets can still be accessed (for now) by setting
the ``url`` parameter to the original API endpoint
(``"https://developer.nrel.gov/api/nsrdb/v2/solar/psm3-download.csv"``).
Parameters
----------
latitude : float or int
in decimal degrees, between -90 and 90, north is positive
longitude : float or int
in decimal degrees, between -180 and 180, east is positive
api_key : str
NREL Developer Network API key
email : str
NREL API uses this to automatically communicate messages back
to the user only if necessary
names : str, default 'tmy'
PSM3 API parameter specifing year (e.g. ``2020``) or TMY variant
to download (e.g. ``'tmy'`` or ``'tgy-2019'``). The allowed values
update periodically, so consult the NSRDB references below for the
current set of options.
interval : int, {60, 5, 15, 30}
interval size in minutes, must be 5, 15, 30 or 60. Must be 60 for
typical year requests (i.e., tmy/tgy/tdy).
attributes : list of str, optional
meteorological fields to fetch. If not specified, defaults to
``pvlib.iotools.psm3.ATTRIBUTES``. See references [2]_, [3]_, and [4]_
for lists of available fields. Alternatively, pvlib names may also be
used (e.g. 'ghi' rather than 'GHI'); see :const:`REQUEST_VARIABLE_MAP`.
To retrieve all available fields, set ``attributes=[]``.
leap_day : bool, default : True
include leap day in the results. Only used for single-year requests
(i.e., it is ignored for tmy/tgy/tdy requests).
full_name : str, default 'pvlib python'
optional
affiliation : str, default 'pvlib python'
optional
map_variables : bool, default True
When true, renames columns of the Dataframe to pvlib variable names
where applicable. See variable :const:`VARIABLE_MAP`.
url : str, optional
API endpoint URL. If not specified, the endpoint is determined from
the ``names`` and ``interval`` parameters.
timeout : int, default 30
time in seconds to wait for server response before timeout
Returns
-------
data : pandas.DataFrame
timeseries data from NREL PSM3
metadata : dict
metadata from NREL PSM3 about the record, see
:func:`pvlib.iotools.parse_psm3` for fields
Raises
------
requests.HTTPError
if the request response status is not ok, then the ``'errors'`` field
from the JSON response or any error message in the content will be
raised as an exception, for example if the `api_key` was rejected or if
the coordinates were not found in the NSRDB
Notes
-----
The required NREL developer key, `api_key`, is available for free by
registering at the `NREL Developer Network <https://developer.nrel.gov/>`_.
.. warning:: The "DEMO_KEY" `api_key` is severely rate limited and may
result in rejected requests.
.. warning:: PSM3 is limited to data found in the NSRDB, please consult the
references below for locations with available data. Additionally,
querying data with < 30-minute resolution uses a different API endpoint
with fewer available fields (see [4]_).
See Also
--------
pvlib.iotools.read_psm3, pvlib.iotools.parse_psm3
References
----------
.. [1] `NREL National Solar Radiation Database (NSRDB)
<https://nsrdb.nrel.gov/>`_
.. [2] `Physical Solar Model (PSM) v3.2.2
<https://developer.nrel.gov/docs/solar/nsrdb/psm3-2-2-download/>`_
.. [3] `Physical Solar Model (PSM) v3 TMY
<https://developer.nrel.gov/docs/solar/nsrdb/psm3-tmy-download/>`_
.. [4] `Physical Solar Model (PSM) v3 - Five Minute Temporal Resolution
<https://developer.nrel.gov/docs/solar/nsrdb/psm3-5min-download/>`_
4. Explore your weatherfile#
Let’s display the first 4 lines of the dataframe
df_tmy.head(4)
Year | Month | Day | Hour | Minute | temp_air | temp_dew | dhi | dni | ghi | albedo | pressure | wind_direction | wind_speed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-01-01 00:30:00-04:00 | 2021 | 1 | 1 | 0 | 30 | 25.6 | 20.0 | 0.0 | 0.0 | 0.0 | 0.08 | 1030.0 | 75.0 | 6.1 |
2021-01-01 01:30:00-04:00 | 2021 | 1 | 1 | 1 | 30 | 25.5 | 19.8 | 0.0 | 0.0 | 0.0 | 0.08 | 1030.0 | 76.0 | 6.0 |
2021-01-01 02:30:00-04:00 | 2021 | 1 | 1 | 2 | 30 | 25.4 | 19.7 | 0.0 | 0.0 | 0.0 | 0.08 | 1029.0 | 77.0 | 5.8 |
2021-01-01 03:30:00-04:00 | 2021 | 1 | 1 | 3 | 30 | 25.4 | 19.8 | 0.0 | 0.0 | 0.0 | 0.08 | 1029.0 | 77.0 | 5.7 |
This dataset follows the standard format of handling timeseries data with pandas – one row per timestamp, one column per measurement type. Because TMY files represent one year of data (no leap years), that means they’ll have 8760 rows. The number of columns can vary depending on the source of the data.
print("Number of rows:", len(df_tmy))
print("Number of columns:", len(df_tmy.columns))
Number of rows: 8760
Number of columns: 14
You can access single rows by pointing to its number location (iloc) or by using the index name it has. In this case, that is a dateTime
df_tmy.iloc[0];
df_tmy.loc['2021-01-01 01:30:00-04:00'];
You can also print all the column names in the dataframe
df_tmy.keys()
Index(['Year', 'Month', 'Day', 'Hour', 'Minute', 'temp_air', 'temp_dew', 'dhi',
'dni', 'ghi', 'albedo', 'pressure', 'wind_direction', 'wind_speed'],
dtype='object')
There are 71 columns, which is quite a lot! For now, let’s focus just on the ones that are most important for PV modeling – the irradiance, temperature, and wind speed columns, and extract them into a new DataFrame.
5. Downselect columns#
There are a lot more weather data in that file that you can access. To investigate all the column headers, we used .keys()
above. Always read the Instruction Manual for the weather files to get more details on how the data is aggregated, units, etc.
At this point we are interested in GHI, DHI, DNI, DryBulb and Wind Speed . For this NREL TMY3 dataset the units of irradiance are W/m², dry bulb temperature is in °C, and wind speed is m/s.
# GHI, DHI, DNI are irradiance measurements
# DryBulb is the "dry-bulb" (ambient) temperature
# Wspd is wind speed
df_tmy = df_tmy[['ghi', 'dhi', 'dni', 'temp_air', 'wind_speed']]
# show the first 15 rows:
df_tmy.head(15)
ghi | dhi | dni | temp_air | wind_speed | |
---|---|---|---|---|---|
2021-01-01 00:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.6 | 6.1 |
2021-01-01 01:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.5 | 6.0 |
2021-01-01 02:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.4 | 5.8 |
2021-01-01 03:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.4 | 5.7 |
2021-01-01 04:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.3 | 5.5 |
2021-01-01 05:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.4 | 5.4 |
2021-01-01 06:30:00-04:00 | 0.0 | 0.0 | 0.0 | 25.7 | 5.5 |
2021-01-01 07:30:00-04:00 | 70.0 | 46.0 | 215.0 | 26.3 | 5.8 |
2021-01-01 08:30:00-04:00 | 262.0 | 85.0 | 553.0 | 26.9 | 6.4 |
2021-01-01 09:30:00-04:00 | 437.0 | 120.0 | 634.0 | 27.2 | 6.7 |
2021-01-01 10:30:00-04:00 | 577.0 | 162.0 | 651.0 | 27.4 | 6.8 |
2021-01-01 11:30:00-04:00 | 755.0 | 112.0 | 890.0 | 27.6 | 6.7 |
2021-01-01 12:30:00-04:00 | 792.0 | 114.0 | 904.0 | 27.6 | 6.5 |
2021-01-01 13:30:00-04:00 | 638.0 | 208.0 | 598.0 | 27.6 | 6.3 |
2021-01-01 14:30:00-04:00 | 641.0 | 107.0 | 848.0 | 27.5 | 6.3 |
Plotting time series data with pandas and matplotlib#
Let’s make some plots to get a better idea of what TMY data gives us.
Irradiance#
First, the three irradiance fields:
first_week = df_tmy.head(24*7) # Plotting 7 days, each one has 24 hours or entries
first_week[['ghi', 'dhi', 'dni']].plot()
plt.ylabel('Irradiance [W/m$^2$]');

Let’s control the parameters a bit more
birthday = df_tmy.loc['2021-11-06':'2021-11-06']
plt.plot(birthday['dni'], color='r')
plt.plot(birthday['dhi'], color='g', marker='.')
plt.plot(birthday['ghi'], color='b', marker='s')
plt.ylabel('Irradiance [W/m$^2$]');

Exercise#
How does the Irradiance typically look like this week?
Hint: the next cell is ‘Markdown’, you need to switch it to ‘Code’ for it to run
classweek = df.loc[] # Type the conference week here, using the last code cell as example i.e. ‘1990-11-06’:’1990-11-06’ plt.plot(classweek[‘DNI’], color=’g’) plt.plot(classweek[‘DHI’], color=’b’, marker=’.’) plt.plot(classweek[‘GHI’], color=’r’, marker=’s’) plt.ylabel(‘Irradiance [W/m\(^2\)]’);
Later tutorials will show how these three values are used in PV modeling. For now, let’s just get a qualitative understanding of the differences between them: looking at the above plot, there is a pattern where when DNI is high, DHI is low. The sun puts out a (roughly) constant amount of energy, which means photons either make it through the atmosphere without scattering and are counted as direct irradiance, or they tend to get scattered and become part of the diffuse irradiance, but not both. Looking at DNI makes it easy to pick out which hours are cloud and which are sunny – most days in January are rather overcast with low irradiance, but the sun does occasionally break through.
Temperature#
Next up is temperature:
first_week['temp_air'].plot()
plt.ylabel('Ambient Temperature [°C]');

Wind speed#
And finally, wind speed:
first_week['wind_speed'].plot()
plt.ylabel('Wind Speed [m/s]');

3. Aggregating hourly data to monthly summaries#
Pandas makes it easy to roll-up timeseries data into summary values. We can use the DataFrame.resample()
function with DateOffsets
like 'M'
for months. For example, we can calculate total monthly GHI as a quick way to visualize the seasonality of solar resource:
# summing hourly irradiance (W/m^2) gives insolation (W h/m^2)
monthly_ghi = df_tmy['ghi'].resample('M').sum()
monthly_ghi.head(4)
C:\Users\sayala\AppData\Local\Temp\1\ipykernel_24596\3896696835.py:2: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
monthly_ghi = df_tmy['ghi'].resample('M').sum()
2021-01-31 00:00:00-04:00 153522.0
2021-02-28 00:00:00-04:00 159212.0
2021-03-31 00:00:00-04:00 204065.0
2021-04-30 00:00:00-04:00 195086.0
Freq: ME, Name: ghi, dtype: float64
monthly_ghi = monthly_ghi.tz_localize(None) # don't need timezone for monthly data
monthly_ghi.plot.bar()
plt.ylabel('Monthly Global Horizontal Insolation\n[W h/m$^2$]');

We can also take monthly averages instead of monthly sums:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx() # add a second y-axis
monthly_average_temp_wind = df_tmy[['temp_air', 'wind_speed']].resample('ME').mean()
monthly_average_temp_wind['temp_air'].plot(ax=ax1, c='tab:blue')
monthly_average_temp_wind['wind_speed'].plot(ax=ax2, c='tab:orange')
ax1.set_ylabel(r'Ambient Temperature [$\degree$ C]')
ax2.set_ylabel(r'Wind Speed [m/s]')
ax1.legend(loc='lower left')
ax2.legend(loc='lower right');

Exercise#
Plot the Average DNI per Day
try:
daily_average_DNI = df_tmy[['']].resample('').mean() # Add the column name, and resample by day. Month is 'M', day is..
daily_average_DNI.plot()
except:
print("You haven't finished this exercise correctly, try again!")
You haven't finished this exercise correctly, try again!
This work is licensed under a Creative Commons Attribution 4.0 International License.