07. Interactive Catalog Visualization#
Interactive Solar System Catalog Visualization with Holoviews, Bokeh, and Datashader
Contact author(s): Sarah Greenstreet
Last verified to run: 2024-11-20
LSST Science Pipelines version: Weekly 2024_42
Container Size: large
Targeted learning level: intermediate
Description: Interactive Solar System catalog data visualizations with three open-source python libraries.
Skills: Create linked interactive plots for large datasets, and output interactive plots to interactive HTML files. Use Bokeh, HoloViews, and Datashader.
LSST Data Products: Solar System Object catalogs.
Packages: bokeh, holoviews, datashader
Credit: This tutorial is based on the DP02 NB06b tutorial notebook developed by Leanne Guy. It has been adapted and extended to visualizing data in the DP0.3 simulated Solar System catalogs by Sarah Greenstreet.
Get Support: Find DP0-related documentation and resources at dp0-3.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This notebook introduces learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.
- HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
- Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
- Datashader: Accurately render very large datasets quickly and flexibly.
These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations of the DP0.3 simulated Solar System catalogs. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.
Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is imporant for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.
Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.
1.1. Import packages¶
Import general scientific python packages (os
, numpy
, pandas
),
functions from the astronomy python package astropy
,
the Rubin function for accessing the TAP service (lsst.rsp.get_tap_service
),
and various functions from the bokeh
and holoviews
packages
that are used in this tutorial.
import os
import numpy as np
import pandas as pd
from lsst.rsp import get_tap_service
from math import pi
import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool, CDSView
from bokeh.plotting import figure, gridplot
from bokeh.transform import factor_cmap, cumsum
from bokeh.palettes import Colorblind
import holoviews as hv
from holoviews import streams, dim, opts
from holoviews.operation.datashader import datashade, dynspread
from holoviews.plotting.util import process_cmap
import datashader as dsh
Show which versions of Bokeh, HoloViews, and Datashader we are working with. This is important when referring to online documentation as APIs can change between versions.
print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
print("Datashader version: " + dsh.__version__)
Bokeh version: 3.5.2 HoloViews version: 1.19.1 Datashader version: 0.16.3
1.2. Define functions and parameters¶
Update the maximum number of display rows for Pandas tables.
pd.set_option('display.max_rows', 5)
Notice: The ordering of the next two cells is important for ensuring the plots in this notebook render properly.
Set the display of the output Bokeh plots to be inline, in the notebook.
Set the HoloViews plotting library to be bokeh. The HoloViews and Bokeh icons are displayed when the library is loaded successfully.
hv.extension('bokeh')
Notice: Sometimes the
bokeh.io.show
function can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).
To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.
def show_bokeh_inline(p):
try:
reset_output()
output_notebook()
show(p)
except Exception:
output_notebook()
show(p)
def show_bokeh_to_file(p, outputFile):
try:
reset_output()
output_file(outputFile)
show(p)
except Exception:
output_file(outputFile)
show(p)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an orbital semimajor axis ($a$). Their relationship is defined by $q = a * (1 - e)$.
def calc_semimajor_axis(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
a: ndarray
Semi-major axis of the orbit, in au.
q = a(1-e), so a = q/(1-e)
"""
return q / (1.0 - e)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an aphelion distance ($Q$). Their relationship is defined by $Q = q * (1 + e) / (1 - e)$.
def calc_aphelion(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
Q: ndarray
Distance at aphelion, in au.
Q = q*(1+e)/(1-e)
"""
return q * (1.0 + e) / (1.0 - e)
2. Use the TAP service to obtain table data¶
The basis for any data visualization is the underlying data. This tutorial works with tabular data that is retrieved from the DP0.3 simulated Solar System catalogs using the Rubin TAP service.
Get a Rubin TAP service instance.
service = get_tap_service("ssotap")
assert service is not None
Define the orbital parameter boundaries for querying the DP0.3 catalogs for all main-belt asteroids, as an additional example subset of Solar System objects to explore with these data visualization tools. Create a query for the DP0.3 MPCORB and SSObject catalogs for the orbital parameters above to obtain a sample of main-belt asteroids with semimajor axis 1.8 < $a$ < 3.7 au, eccentricity $e$ < 1.0, and perihelion $q$ > 1.3.
a_MBA_min = 1.8
a_MBA_max = 3.7
q_MBA_min = 1.3
e_MBA_max = 1.0
query = """
SELECT
mpc.ssObjectId, mpc.mpcDesignation, mpc.mpcNumber,
mpc.mpcH, mpc.e, mpc.q, mpc.incl, mpc.n,
sso.arc, sso.numObs, sso.g_H, sso.r_H, sso.i_H, sso.z_H
FROM
dp03_catalogs_10yr.MPCORB as mpc
INNER JOIN dp03_catalogs_10yr.SSObject as sso
ON mpc.ssObjectId = sso.ssObjectId
WHERE mpc.q/(1.0-mpc.e) > {}
AND mpc.q/(1.0-mpc.e) < {}
AND mpc.e < {}
AND mpc.q > {}
ORDER by sso.ssObjectId
""".format(a_MBA_min, a_MBA_max, e_MBA_max, q_MBA_min)
Run the query.
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Retrieve the query results as a pandas table.
uniqueMBAs = job.fetch_result().to_table().to_pandas()
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN |
4249463 rows × 14 columns
Calculate the semimajor axis $a$ for all objects in uniqueMBAs
and add as a new column.
a_MBAs = calc_semimajor_axis(uniqueMBAs['q'], uniqueMBAs['e'])
uniqueMBAs['a'] = a_MBAs
Calculate the aphelion distance $Q$ for all objects in uniqueMBAs
and add as a new column.
Q_MBAs = calc_aphelion(uniqueMBAs['q'], uniqueMBAs['e'])
uniqueMBAs['Q'] = Q_MBAs
Define the conditions for assigning MBAs to their dynamical classifications (Note: the conditions used here are simplified to only include limits in semimajor axis and not eccentricity or inclination for the purposes of this tutorial).
MBApop_conditions = [
(uniqueMBAs['a'] >= 1.8) & (uniqueMBAs['a'] < 2.0),
(uniqueMBAs['a'] >= 2.0) & (uniqueMBAs['a'] < 2.5),
(uniqueMBAs['a'] >= 2.5) & (uniqueMBAs['a'] < 2.82),
(uniqueMBAs['a'] >= 2.82) & (uniqueMBAs['a'] < 3.25),
(uniqueMBAs['a'] >= 3.25) & (uniqueMBAs['a'] <= 3.7)
]
Define the names of the MBA dynamical classes as they correspond to the MBApop_conditions
above.
MBApop_types = ['Hungaria', 'Inner Belt',
'Middle Belt', 'Outer Belt',
'Cybele']
Determine the MBA population for each object in uniqueMBAs
and add as a new column.
uniqueMBAs['MBApop'] = np.select(MBApop_conditions, MBApop_types)
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | a | Q | MBApop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 | 2.376014 | 3.121593 | Inner Belt |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN | 2.212204 | 2.494867 | Inner Belt |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN | 2.607158 | 2.930828 | Middle Belt |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN | 2.377575 | 2.556669 | Inner Belt |
4249463 rows × 17 columns
uniqueMBAs['MBApop'].value_counts()
MBApop Middle Belt 1514199 Outer Belt 1348022 Inner Belt 1296318 Hungaria 66272 Cybele 24652 Name: count, dtype: int64
To visualize the fraction of MBAs in each population, a pie chart can be created.
MBApop_counts = uniqueMBAs['MBApop'].value_counts().to_dict()
data = pd.Series(MBApop_counts).reset_index(name='value').rename(
columns={'index': 'subpop'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Colorblind[len(MBApop_counts)]
MBApop = figure(height=350, title="MBA Populations",
toolbar_location=None, tools="hover",
tooltips="@subpop: @value", x_range=(-0.5, 1.0))
MBApop.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True),
end_angle=cumsum('angle'),
line_color="white", fill_color='color',
legend_field='subpop', source=data)
MBApop.axis.axis_label = None
MBApop.axis.visible = False
MBApop.grid.grid_line_color = None
show_bokeh_inline(MBApop)
Figure 1: Pie chart showing the fraction of objects in each MBA population.
Compute three colors from the apparent magnitudes.
uniqueMBAs['gmi'] = uniqueMBAs['g_H'] - uniqueMBAs['i_H']
uniqueMBAs['rmi'] = uniqueMBAs['r_H'] - uniqueMBAs['i_H']
uniqueMBAs['gmr'] = uniqueMBAs['g_H'] - uniqueMBAs['r_H']
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | a | Q | MBApop | gmi | rmi | gmr | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 | 2.376014 | 3.121593 | Inner Belt | 0.547071 | 0.092690 | 0.454382 |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN | 2.212204 | 2.494867 | Inner Belt | NaN | 0.089827 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN | 2.607158 | 2.930828 | Middle Belt | NaN | -0.193834 | NaN |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN | 2.377575 | 2.556669 | Inner Belt | 0.577389 | 0.072811 | 0.504578 |
4249463 rows × 20 columns
3. Holoviews¶
Holoviews supports easy analysis and visualization by annotating data rather than utilizing direct calls to plotting packages. This tutorial uses Bokeh as the plotting library backend for HoloViews. HoloViews supports several plotting libraries, and exercise 1 in Section 5 is to explore using HoloViews with other plotting packages.
Create a random subsample of 20,000 MBAs from uniqueMBAs
to use to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it contains roughly 20K objects.
frac = 0.005
data20K_MBAs = uniqueMBAs.sample(frac=frac, axis='index')
print(len(data20K_MBAs))
assert len(data20K_MBAs) == round(frac * len(uniqueMBAs))
21247
3.1. Single plots¶
The basic core primitives of HoloViews are Elements (hv.Element
). Elements are simple wrappers for data which provide a semantically meaningful visual representation. An Element may be a set of Points, an Image, a Curve, a Histogram, etc. See the HoloViews Reference Gallery for all the various types of Elements that can be created with HoloViews.
The example in this section uses the HoloViews Scatter Element to quickly visualize the catalog data as a scatter plot.
Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly. kdims
are the key dimensions or the independent variable(s) and vdims
are the value dimensions or the dependenent variable(s). The dimensions have to be specified as strings as below, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.
HoloViews maintains a strict separation between content and presentation. This separation is achieved by maintaining sets of keyword values as options
that specify how Elements
are to appear.
This example plots the semimajor axes and eccentricities with chosen x-axis limits, fontscale, plot height and width, and removes the toolbar.
Make a simple scatter plot of the data using the Scatter element.
aeplot = hv.Scatter(data20K_MBAs, kdims=['a'],
vdims=['e']).options(xlim=(1.8, 3.7), toolbar=None,
fontscale=1.2, height=350, width=350)
aeplot
Figure 2: A non-interactive plot of semimajor axis $a$ vs. eccentricity $e$ appears as a blue circle composed of individual, but mostly blended, blue dots.
The data20K
set contains several columns. If no columns are specified explicitly, the first 2 columns are taken for x and y respectively by the Scatter Element.
Now bin the data in semimajor axis $a$ using the robust Freedman Diaconis Estimator, plot the resulting distribution using the HoloViews Histogram Element, and add in some basic plot options. Read more about about customizing plots via options
. Note that options
can be shortened to opts
.
(H_bin, count) = np.histogram(data20K_MBAs['mpcH'], bins='fd')
H_distribution = hv.Histogram((H_bin, count)).opts(
title="H Magnitude Distribution", color='darkmagenta',
xlabel='H mag', fontscale=1.2,
height=400, width=400)
H_distribution
Figure 3: A histogram (bar chart in purple) of the number of objects in a given $H$ magnitude bin in the subset of twenty thousand objects. This plot is interactive and displays a tool bar at right for the user to, e.g., zoom in on the plot.
3.2. Layouts of unlinked plots¶
Create a layout of several plots. A Layout
is a type of Container
that can contain any HoloViews object. Other types of Containers
that exist include Overlay
, Gridspace
, Dynamicmap
, etc. See the HoloViews Reference Gallery for the full list of Layouts
that can be created with HoloViews. See Building Composite Objects for the full details about the ways Containers
can be composed.
Slice the data and set some more options, and then construct a layout using the +
operator.
aeplot = hv.Scatter(data20K_MBAs, kdims=['a'],
vdims=['e']).opts(
title="Semimajor Axis vs Eccentricity",
toolbar='above', tools=['hover'],
height=350, width=350, alpha=0.2,
size=2)
OrbHPlots = aeplot + H_distribution.options(height=350, width=350)
OrbHPlots