07. Interactive Catalog Visualization#
Interactive Solar System Catalog Visualization with Holoviews, Bokeh, and Datashader
Contact author(s): Sarah Greenstreet
Last verified to run: 2024-11-20
LSST Science Pipelines version: Weekly 2024_42
Container Size: large
Targeted learning level: intermediate
Description: Interactive Solar System catalog data visualizations with three open-source python libraries.
Skills: Create linked interactive plots for large datasets, and output interactive plots to interactive HTML files. Use Bokeh, HoloViews, and Datashader.
LSST Data Products: Solar System Object catalogs.
Packages: bokeh, holoviews, datashader
Credit: This tutorial is based on the DP02 NB06b tutorial notebook developed by Leanne Guy. It has been adapted and extended to visualizing data in the DP0.3 simulated Solar System catalogs by Sarah Greenstreet.
Get Support: Find DP0-related documentation and resources at dp0-3.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This notebook introduces learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.
- HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
- Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
- Datashader: Accurately render very large datasets quickly and flexibly.
These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations of the DP0.3 simulated Solar System catalogs. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.
Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is imporant for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.
Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.
1.1. Import packages¶
Import general scientific python packages (os
, numpy
, pandas
),
functions from the astronomy python package astropy
,
the Rubin function for accessing the TAP service (lsst.rsp.get_tap_service
),
and various functions from the bokeh
and holoviews
packages
that are used in this tutorial.
import os
import numpy as np
import pandas as pd
from lsst.rsp import get_tap_service
from math import pi
import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool, CDSView
from bokeh.plotting import figure, gridplot
from bokeh.transform import factor_cmap, cumsum
from bokeh.palettes import Colorblind
import holoviews as hv
from holoviews import streams, dim, opts
from holoviews.operation.datashader import datashade, dynspread
from holoviews.plotting.util import process_cmap
import datashader as dsh
Show which versions of Bokeh, HoloViews, and Datashader we are working with. This is important when referring to online documentation as APIs can change between versions.
print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
print("Datashader version: " + dsh.__version__)
Bokeh version: 3.5.2 HoloViews version: 1.19.1 Datashader version: 0.16.3
1.2. Define functions and parameters¶
Update the maximum number of display rows for Pandas tables.
pd.set_option('display.max_rows', 5)
Notice: The ordering of the next two cells is important for ensuring the plots in this notebook render properly.
Set the display of the output Bokeh plots to be inline, in the notebook.
Set the HoloViews plotting library to be bokeh. The HoloViews and Bokeh icons are displayed when the library is loaded successfully.
hv.extension('bokeh')
Notice: Sometimes the
bokeh.io.show
function can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).
To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.
def show_bokeh_inline(p):
try:
reset_output()
output_notebook()
show(p)
except Exception:
output_notebook()
show(p)
def show_bokeh_to_file(p, outputFile):
try:
reset_output()
output_file(outputFile)
show(p)
except Exception:
output_file(outputFile)
show(p)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an orbital semimajor axis ($a$). Their relationship is defined by $q = a * (1 - e)$.
def calc_semimajor_axis(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
a: ndarray
Semi-major axis of the orbit, in au.
q = a(1-e), so a = q/(1-e)
"""
return q / (1.0 - e)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an aphelion distance ($Q$). Their relationship is defined by $Q = q * (1 + e) / (1 - e)$.
def calc_aphelion(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
Q: ndarray
Distance at aphelion, in au.
Q = q*(1+e)/(1-e)
"""
return q * (1.0 + e) / (1.0 - e)
2. Use the TAP service to obtain table data¶
The basis for any data visualization is the underlying data. This tutorial works with tabular data that is retrieved from the DP0.3 simulated Solar System catalogs using the Rubin TAP service.
Get a Rubin TAP service instance.
service = get_tap_service("ssotap")
assert service is not None
Define the orbital parameter boundaries for querying the DP0.3 catalogs for all main-belt asteroids, as an additional example subset of Solar System objects to explore with these data visualization tools. Create a query for the DP0.3 MPCORB and SSObject catalogs for the orbital parameters above to obtain a sample of main-belt asteroids with semimajor axis 1.8 < $a$ < 3.7 au, eccentricity $e$ < 1.0, and perihelion $q$ > 1.3.
a_MBA_min = 1.8
a_MBA_max = 3.7
q_MBA_min = 1.3
e_MBA_max = 1.0
query = """
SELECT
mpc.ssObjectId, mpc.mpcDesignation, mpc.mpcNumber,
mpc.mpcH, mpc.e, mpc.q, mpc.incl, mpc.n,
sso.arc, sso.numObs, sso.g_H, sso.r_H, sso.i_H, sso.z_H
FROM
dp03_catalogs_10yr.MPCORB as mpc
INNER JOIN dp03_catalogs_10yr.SSObject as sso
ON mpc.ssObjectId = sso.ssObjectId
WHERE mpc.q/(1.0-mpc.e) > {}
AND mpc.q/(1.0-mpc.e) < {}
AND mpc.e < {}
AND mpc.q > {}
ORDER by sso.ssObjectId
""".format(a_MBA_min, a_MBA_max, e_MBA_max, q_MBA_min)
Run the query.
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Retrieve the query results as a pandas table.
uniqueMBAs = job.fetch_result().to_table().to_pandas()
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN |
4249463 rows × 14 columns
Calculate the semimajor axis $a$ for all objects in uniqueMBAs
and add as a new column.
a_MBAs = calc_semimajor_axis(uniqueMBAs['q'], uniqueMBAs['e'])
uniqueMBAs['a'] = a_MBAs
Calculate the aphelion distance $Q$ for all objects in uniqueMBAs
and add as a new column.
Q_MBAs = calc_aphelion(uniqueMBAs['q'], uniqueMBAs['e'])
uniqueMBAs['Q'] = Q_MBAs
Define the conditions for assigning MBAs to their dynamical classifications (Note: the conditions used here are simplified to only include limits in semimajor axis and not eccentricity or inclination for the purposes of this tutorial).
MBApop_conditions = [
(uniqueMBAs['a'] >= 1.8) & (uniqueMBAs['a'] < 2.0),
(uniqueMBAs['a'] >= 2.0) & (uniqueMBAs['a'] < 2.5),
(uniqueMBAs['a'] >= 2.5) & (uniqueMBAs['a'] < 2.82),
(uniqueMBAs['a'] >= 2.82) & (uniqueMBAs['a'] < 3.25),
(uniqueMBAs['a'] >= 3.25) & (uniqueMBAs['a'] <= 3.7)
]
Define the names of the MBA dynamical classes as they correspond to the MBApop_conditions
above.
MBApop_types = ['Hungaria', 'Inner Belt',
'Middle Belt', 'Outer Belt',
'Cybele']
Determine the MBA population for each object in uniqueMBAs
and add as a new column.
uniqueMBAs['MBApop'] = np.select(MBApop_conditions, MBApop_types)
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | a | Q | MBApop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 | 2.376014 | 3.121593 | Inner Belt |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN | 2.212204 | 2.494867 | Inner Belt |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN | 2.607158 | 2.930828 | Middle Belt |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN | 2.377575 | 2.556669 | Inner Belt |
4249463 rows × 17 columns
uniqueMBAs['MBApop'].value_counts()
MBApop Middle Belt 1514199 Outer Belt 1348022 Inner Belt 1296318 Hungaria 66272 Cybele 24652 Name: count, dtype: int64
To visualize the fraction of MBAs in each population, a pie chart can be created.
MBApop_counts = uniqueMBAs['MBApop'].value_counts().to_dict()
data = pd.Series(MBApop_counts).reset_index(name='value').rename(
columns={'index': 'subpop'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Colorblind[len(MBApop_counts)]
MBApop = figure(height=350, title="MBA Populations",
toolbar_location=None, tools="hover",
tooltips="@subpop: @value", x_range=(-0.5, 1.0))
MBApop.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True),
end_angle=cumsum('angle'),
line_color="white", fill_color='color',
legend_field='subpop', source=data)
MBApop.axis.axis_label = None
MBApop.axis.visible = False
MBApop.grid.grid_line_color = None
show_bokeh_inline(MBApop)
Figure 1: Pie chart showing the fraction of objects in each MBA population.
Compute three colors from the apparent magnitudes.
uniqueMBAs['gmi'] = uniqueMBAs['g_H'] - uniqueMBAs['i_H']
uniqueMBAs['rmi'] = uniqueMBAs['r_H'] - uniqueMBAs['i_H']
uniqueMBAs['gmr'] = uniqueMBAs['g_H'] - uniqueMBAs['r_H']
uniqueMBAs
ssObjectId | mpcDesignation | mpcNumber | mpcH | e | q | incl | n | arc | numObs | g_H | r_H | i_H | z_H | a | Q | MBApop | gmi | rmi | gmr | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -9223370383071521539 | S101q3vm | 0 | 20.580000 | 0.313794 | 1.630436 | 23.84439 | NaN | 112.611702 | 45 | 21.215717 | 20.761335 | 20.668646 | 20.5917 | 2.376014 | 3.121593 | Inner Belt | 0.547071 | 0.092690 | 0.454382 |
1 | -9223366535898786459 | S1008gC7 | 0 | 20.670000 | 0.127774 | 1.929541 | 6.54474 | NaN | 2077.049805 | 22 | NaN | 20.813944 | 20.724117 | NaN | 2.212204 | 2.494867 | Inner Belt | NaN | 0.089827 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4249461 | 9223365960393921544 | S100Ulqw | 0 | 20.379999 | 0.124147 | 2.283488 | 13.68164 | NaN | 3341.052979 | 27 | NaN | 20.351301 | 20.545135 | NaN | 2.607158 | 2.930828 | Middle Belt | NaN | -0.193834 | NaN |
4249462 | 9223366833108713925 | S101CtVy | 0 | 20.299999 | 0.075326 | 2.198481 | 7.32284 | NaN | 2105.805664 | 57 | 20.889555 | 20.384977 | 20.312166 | NaN | 2.377575 | 2.556669 | Inner Belt | 0.577389 | 0.072811 | 0.504578 |
4249463 rows × 20 columns
3. Holoviews¶
Holoviews supports easy analysis and visualization by annotating data rather than utilizing direct calls to plotting packages. This tutorial uses Bokeh as the plotting library backend for HoloViews. HoloViews supports several plotting libraries, and exercise 1 in Section 5 is to explore using HoloViews with other plotting packages.
Create a random subsample of 20,000 MBAs from uniqueMBAs
to use to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it contains roughly 20K objects.
frac = 0.005
data20K_MBAs = uniqueMBAs.sample(frac=frac, axis='index')
print(len(data20K_MBAs))
assert len(data20K_MBAs) == round(frac * len(uniqueMBAs))
21247
3.1. Single plots¶
The basic core primitives of HoloViews are Elements (hv.Element
). Elements are simple wrappers for data which provide a semantically meaningful visual representation. An Element may be a set of Points, an Image, a Curve, a Histogram, etc. See the HoloViews Reference Gallery for all the various types of Elements that can be created with HoloViews.
The example in this section uses the HoloViews Scatter Element to quickly visualize the catalog data as a scatter plot.
Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly. kdims
are the key dimensions or the independent variable(s) and vdims
are the value dimensions or the dependenent variable(s). The dimensions have to be specified as strings as below, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.
HoloViews maintains a strict separation between content and presentation. This separation is achieved by maintaining sets of keyword values as options
that specify how Elements
are to appear.
This example plots the semimajor axes and eccentricities with chosen x-axis limits, fontscale, plot height and width, and removes the toolbar.
Make a simple scatter plot of the data using the Scatter element.
aeplot = hv.Scatter(data20K_MBAs, kdims=['a'],
vdims=['e']).options(xlim=(1.8, 3.7), toolbar=None,
fontscale=1.2, height=350, width=350)
aeplot
Figure 2: A non-interactive plot of semimajor axis $a$ vs. eccentricity $e$ appears as a blue circle composed of individual, but mostly blended, blue dots.
The data20K
set contains several columns. If no columns are specified explicitly, the first 2 columns are taken for x and y respectively by the Scatter Element.
Now bin the data in semimajor axis $a$ using the robust Freedman Diaconis Estimator, plot the resulting distribution using the HoloViews Histogram Element, and add in some basic plot options. Read more about about customizing plots via options
. Note that options
can be shortened to opts
.
(H_bin, count) = np.histogram(data20K_MBAs['mpcH'], bins='fd')
H_distribution = hv.Histogram((H_bin, count)).opts(
title="H Magnitude Distribution", color='darkmagenta',
xlabel='H mag', fontscale=1.2,
height=400, width=400)
H_distribution
Figure 3: A histogram (bar chart in purple) of the number of objects in a given $H$ magnitude bin in the subset of twenty thousand objects. This plot is interactive and displays a tool bar at right for the user to, e.g., zoom in on the plot.
3.2. Layouts of unlinked plots¶
Create a layout of several plots. A Layout
is a type of Container
that can contain any HoloViews object. Other types of Containers
that exist include Overlay
, Gridspace
, Dynamicmap
, etc. See the HoloViews Reference Gallery for the full list of Layouts
that can be created with HoloViews. See Building Composite Objects for the full details about the ways Containers
can be composed.
Slice the data and set some more options, and then construct a layout using the +
operator.
aeplot = hv.Scatter(data20K_MBAs, kdims=['a'],
vdims=['e']).opts(
title="Semimajor Axis vs Eccentricity",
toolbar='above', tools=['hover'],
height=350, width=350, alpha=0.2,
size=2)
OrbHPlots = aeplot + H_distribution.options(height=350, width=350)
OrbHPlots
Figure 4: Two side-by-side plots, Fig 2 from above on the left with the
hover
tool, and Fig 3 from above on the right, with the interactive toolbar at upper right.
Note that these two plots above are not linked, they are two independent plots laid out next to each other.
Zoom in on the Semimajor Axis vs Eccentricity plot and notice that the data are not rebinned in the semimajor axis distribution plot. Linking plots is demonstrated below.
The tools, however, do apply to both plots. Try modifying both plots and then use the "reset" tool (the circular arrow symbol). Notice that both plots are reset to their original states.
3.3. Layouts of linked plots¶
Set up some default plot options to avoid duplicating long lists for every new plot.
Different plotting packages typically provide different customization capabilities. Below, one set of options is defined for a Bokeh backend, and one for a matplotlib backend.
Set Bokeh customizations as a python dictionary.
plot_style_bkh = dict(alpha=0.5, color='darkmagenta',
marker='triangle', size=3,
xticks=5, yticks=5,
height=400, width=400,
toolbar='above')
Set matplotlib customizations.
plot_style_mpl = dict(alpha=0.2, color='c', marker='s',
fig_size=200, s=2,
fontsize=14, xticks=8, yticks=8)
Choose to use the Bokeh plot style.
plot_style = plot_style_bkh
Below, create a color-color diagram of the MBAs in the dataset, and also display the distribution of samples along both value dimensions using the hist()
method of the Scatter Element.
Set the axes as rich objects.
rmi = hv.Dimension('rmi', label='(r-i)', range=(-1.0, 1.0))
gmr = hv.Dimension('gmr', label='(g-r)', range=(-0.5, 1.5))
Create the scatter plot.
col_col = hv.Scatter(data20K_MBAs, kdims=gmr,
vdims=rmi).opts(**plot_style)
Use the hist method to show the distribution of samples along both value dimensions.
col_col = col_col.hist(dimension=[gmr, rmi],
num_bins=100, adjoin=True)
col_col
Figure 5: The r-i color versus the g-r color (purple triangles), with histograms (blue bar charts) of the color distributions above and to the right. The interactive tool bar is at upper right.
4. Bokeh¶
A very useful feature of Bokeh is the ability to add connected interactivity between plots that show different attributes of the same data. This is called linking.
With linked plots it is possible to carry out data brushing, whereby data can be selected and manipulated synchronously across multiple linked plots.
For example, if an orbital element plot is linked with a colour-magnitude diagram of the same dataset, it becomes possible to interactively explore the relationship between the positions of objects in each plot.
This section uses the Bokeh plotting library to demonstrate how to set up brushing and linking between two panels showing different representations of the same dataset. A selection applied to either panel will highlight the selected points in the other panel.
This section is based on Bokeh linked brushing.
4.1. Data preparation¶
Getting the data preparation phase right is key to creating powerful visualizations. Bokeh works with a ColumnDataSource (CDS). A CDS is essentially a collection of sequences of data that have their own unique column name.
The CDS is the core of bokeh plots. Bokeh automatically creates a CDS from data passed as python lists or numpy arrays. CDS are useful as they allow data to be shared between multiple plots and renderers, enabling brushing and linking.
Below, a CDS is created from the data returned by the query above and passed directly to Bokeh.
Create a CDS for the plots to share. The data defined as x0
, y0
, x1
, y1
will be used to plot the left and right plots, respectively.
col_data = dict(x0=data20K_MBAs['a'],
y0=data20K_MBAs['incl'],
x1=data20K_MBAs['gmr'],
y1=data20K_MBAs['rmi'],
a=data20K_MBAs['a'],
ecc=data20K_MBAs['e'],
incl=data20K_MBAs['incl'],
Hmag=data20K_MBAs['mpcH'],
gmi=data20K_MBAs['gmi'],
rmi=data20K_MBAs['rmi'],
gmr=data20K_MBAs['gmr'],
MBApop=data20K_MBAs['MBApop'],
g_H=data20K_MBAs['g_H'],
r_H=data20K_MBAs['r_H'],
i_H=data20K_MBAs['i_H'],
z_H=data20K_MBAs['z_H'])
source_MBAs = ColumnDataSource(data=col_data)
Additional data can be added to the CDS after creation.
source_MBAs.data['mpcDesignation'] = data20K_MBAs['mpcDesignation']
source_MBAs.data['q'] = data20K_MBAs['q']
source_MBAs.data['arc'] = data20K_MBAs['arc']/365.25
source_MBAs.data['numObs'] = data20K_MBAs['numObs']
The pointsize used for plotting can be defined based on a given parameter. Define the conditions for assigning a point size for each MBA to be plotted based on its $H$ magnitude.
pointconditions_fromH = [
(data20K_MBAs['mpcH'] <= 5.),
(data20K_MBAs['mpcH'] <= 10.) & (data20K_MBAs['mpcH'] > 5.),
(data20K_MBAs['mpcH'] <= 15.) & (data20K_MBAs['mpcH'] > 10.),
(data20K_MBAs['mpcH'] <= 20.) & (data20K_MBAs['mpcH'] > 15.),
(data20K_MBAs['mpcH'] <= 25.) & (data20K_MBAs['mpcH'] > 20.),
(data20K_MBAs['mpcH'] <= 30.) & (data20K_MBAs['mpcH'] > 25.)
]
Define the point sizes for the given $H$ magnitude ranges in pointconditions_fromH
above.
pointsize_fromH = [12, 10, 8, 6, 4, 2]
Determine the point size bin for each object and add it to the source.
pointsize_bin = np.select(pointconditions_fromH, pointsize_fromH)
source_MBAs.data['pointsize_bin'] = pointsize_bin
Create a "points" view.
points = CDSView()
4.2. Linked plots with data brushing¶
Use Bokeh to plot semimajor axis vs eccentricity and semimajor axis vs inclination, and then link them.
Create a custom hover tool for each panel.
hover_left = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
("MBApop", "@MBApop"),
("(a,e,inc, H)", "(@a, @ecc, @incl, @Hmag)"),
("(g,r,i,z)", "(@g_H, @r_H, @i_H, @z_H)"),
("(g-i), (r-i), (g-r)",
"(@gmi), (@rmi), (@gmr)")
])
hover_right = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
("MBApop", "@MBApop"),
("(a,e,inc, H)", "(@a, @ecc, @incl, @Hmag)"),
("(g,r,i,z)", "(@g_H, @r_H, @i_H, @z_H)"),
("(g-i), (r-i), (g-r)",
"(@gmi), (@rmi), (@gmr)")
])
tools = "box_zoom,box_select,lasso_select,reset,help"
tools_left = [hover_left, tools]
tools_right = [hover_right, tools]
Create a new two-panel plot and add a renderer. Use the "points" view defined above.
Create the left-side plot.
left = figure(tools=tools_left, width=400, height=400,
title='Semimajor Axis vs Inclination')
left.scatter('x0', 'y0', hover_color='firebrick',
size='pointsize_bin', alpha=0.7, source=source_MBAs, view=points)
left.x_range = Range1d(1.7, 3.7)
left.y_range = Range1d(0., 55.)
left.xaxis.axis_label = 'Semimajor Axis (au)'
left.yaxis.axis_label = 'Inclination (deg)'
Create the right-side plot.
right = figure(tools=tools_right, width=400, height=400,
title='Color-Color')
right.scatter('x1', 'y1', hover_color='firebrick',
size='pointsize_bin', alpha=0.8, source=source_MBAs, view=points)
right.x_range = Range1d(-0.5, 1.5)
right.y_range = Range1d(-1., 1.)
right.xaxis.axis_label = '(g - r)'
right.yaxis.axis_label = '(r - i)'
Display the grid of plots. This can take a moment to render.
Figure 6: At left, MBA semimajor axis vs eccentricity, and at right, MBA semimajor axis vs inclination. All points are blue circles and the interactive toolbar is at upper right.
Use the hover tool to see information about individual data points (e.g., the "mpcDesignation"). This information should appear automatically when the mouse is hovered over the data points. Notice the data points highlighted in red on one panel with the hover tool are also highlighted on the other panel.
Next, click on the selection box icon or the selection lasso icon found in the upper right corner of the figure. Use the selection box and selection lasso to make various selections in either panel by clicking and dragging on either panel. The selected data points will be displayed in the other panel.
4.3. Output to interactive HTML file¶
Output this interactive plot to an interactive HTML file.
Define the output file name to store it in the home directory ~/.
.
outputDir = os.path.expanduser('~')
outputFileName = 'draft_DP03_advanced_plots_plot1.html'
outputFile = os.path.join(outputDir, outputFileName)
print('The full pathname of the interactive HTML file will be '+outputFile)
The full pathname of the interactive HTML file will be /home/melissagraham/draft_DP03_advanced_plots_plot1.html
Use the bokeh.io.show
method, embedded in the show_bokeh_to_file
function defined in Section 1.2 above, to output the interactive HTML file.
show_bokeh_to_file(p, outputFile)
To view the interactive HTML file navigate to it via the directory listing in the left-hand side panel of this Jupyter notebook graphical interface, and click on its name. This will open another tab containing the HTML file.
Since the file is relatively small (about 3.6 MB) it should load quickly (within a few seconds). Once loading is complete, click on the "Trust HTML" button at the top-left of the tab's window. Then, a near-duplicate of the two linked plots above should be displayed.
It is also possible to download the HTML file, interact with it in a browser as a a local file.
4.4. Linked streams¶
To do subsequent calculations with the set of selected points, it is possible to use HoloViews linked streams for custom interactivity. The following visualization is a modification of this example.
As for the example above, the plots generated below use the selection box and selection lasso to choose data points on the left panel, and then the selected points appear in the right panel.
Notice that as the selection is changed in the left panel, the mean x- and y-values for selected data points are shown in the title of the right panel.
This section is based on HoloViews Selection1D points.
Declare some points, and define a selection for the stream.
points = hv.Points((data20K_MBAs['gmr'],
data20K_MBAs['rmi'])).options(
tools=['box_select', 'lasso_select'])
selection = streams.Selection1D(source=points)
Define a function that uses the selection indices to slice points and compute statistics. This function is defined here (and not in Section 1.2) because it is only used in the subsequent (i.e., it is not a globally defined and used function).
def selected_info(index) -> str:
selected = points.iloc[index]
if index:
label = 'Mean (g-r), (r-i): %.3f, %.3f' % tuple(
selected.array().mean(axis=0))
else:
label = 'No selection'
return selected.relabel(label).options(color='red')
Combine "points" and DynamicMap. Notice the syntax used here, how the "+" sign makes side-by-side panels.
points.opts(xlabel='(g-r)', ylabel='(r-i)') + hv.DynamicMap(
selected_info, streams=[selection]).opts(xlabel='(g-r)',
ylabel='(r-i)')
Figure 7: At left, the MBA semimajor axis vs eccentricity. The plot at right has the same axes, but will be empty until the action below is taken to select points in the left plot. The interactive toolbar is at upper right.
Use the lasso- or box-select to select a region in the left-hand plot.
Print the number of objects selected.
print(len(selection.index))
54
5.0 Datashader¶
The interactive features of Bokeh work well with datasets up to a few tens of thousands of data points. To efficiently explore larger datasets it is recommended to use another visualization model that offers better scalability: Datashader.
The examples below will show how, when zooming in on the datashaded two-dimensional histograms, the bin sizes are dynamically adjusted to show finer or coarser granularity in the distribution. This allows for the interactive exploration of large datasets without having to manually adjust the bin sizes while panning and zooming.
Zoom in all the way to see individual points (i.e., bins contain either zero or one count). Soom in far enough and find that the individual points are represented by extremely small pixels in datashader that are difficult to see. A solution is to dynspread instead of datashade, which will preserve a finite size of the plotted points.
5.1. Plotting thousands of data points¶
Plot a color-colour diagram using Bokeh with a customized hover tool.
plot_options = {'height': 400, 'width': 800,
'tools': ['pan', 'box_zoom', 'box_select',
'wheel_zoom', 'reset', 'help']}
p = figure(title="Color-Color Diagram",
x_axis_label="(g-r)", y_axis_label="(r-i)",
x_range=(0., 1.5), y_range=(-0.5, 1.0),
**plot_options)
p.scatter(x='gmr', y='rmi', source=source_MBAs,
size='pointsize_bin', alpha=0.2,
hover_color='firebrick',
legend_field="MBApop",
color=factor_cmap('MBApop', 'Category10_5',
['Hungaria', 'Inner Belt', 'Middle Belt', 'Outer Belt', 'Cybele']))
hover = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
("Class", "@MBApop"),
("(a, ecc, inc, H)", "(@a, @ecc, @incl, @Hmag)"),
("(g_H, r_H, i_H, z_H)",
"(@g_H, @r_H, @i_H, @z_H)"),
("(g-r, r-i)", "(@gmr, @rmi)"),
("(numObs, arc (yr))", "(@numObs, @arc)")])
p.add_tools(hover)
Figure 8: The r-i color vs. the g-r color for the MBAs, with Hungarias in blue, Inner Belt objects in orange, Middle Belt objects in green, Outer Belt objects in red, and Cybeles in purple. The interactive tool bar is at right.
Create another plot using Bokeh with a customized hover tool for the semimajor axis vs eccentricity distribution of the MBAs, divided into their respective populations. First create a CDS from the TNO data returned by the query above to be passed directly to Bokeh.
plot_options = {'height': 400, 'width': 800,
'tools': ['pan', 'box_zoom', 'box_select',
'wheel_zoom', 'reset', 'help']}
p = figure(title="Semimajor Axis vs Eccentricity",
x_axis_label="a (au)", y_axis_label="e",
x_range=(1.7, 3.7), y_range=(0., 0.6),
**plot_options)
p.scatter(x='a', y='ecc', source=source_MBAs,
size='pointsize_bin', alpha=0.2,
hover_color='firebrick',
legend_field="MBApop",
color=factor_cmap('MBApop', 'Category10_5',
['Hungaria', 'Inner Belt', 'Middle Belt', 'Outer Belt', 'Cybele']))
hover = HoverTool(tooltips=[("mpcDesignation", "@mpcDesignation"),
("Class", "@MBApop"),
("(a, ecc, inc, H)", "(@a, @ecc, @incl, @Hmag)"),
("(g_H, r_H, i_H, z_H)",
"(@g_H, @r_H, @i_H, @z_H)"),
("(g-r, r-i)", "(@gmr, @rmi)"),
("(numObs, arc (yr))", "(@numObs, @arc)")])
p.add_tools(hover)
Figure 9: Semimajor axis (au) vs. eccentricity for the MBAs, with the Hungarias in blue, Inner Belt objects in orange, Middle Belt objects in green, Outer Belt objects in red, and Cybeles in purple. The interactive tool bar is at right.
The plot above suffers from overplotting (confusion), even though the dataset only contains ~20K points. A classic strategy to visualize the dense areas is to specify the transparency (alpha) of the glyphs; in the plot above, alpha=0.2 was used. This has helped, but washes out the detail in the sparser regions.
HoloViews + Datashader allow millions to billions of points to be plotted, and this produces much more informative plots. Datashader rasterizes or aggregates datasets into regular grids that can then be further analysed or viewed as plots or images.
Create a Holoviews object "points" to hold and plot data.
points = hv.Points((source_MBAs.to_df()['gmr'], source_MBAs.to_df()['rmi']))
Create the linked streams instance.
boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Apply the datashader.
p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
xlim=(0., 1.5), ylim=(-0.5, 1.0), xlabel="(g-r)", ylabel="(r-i)",
tools=['box_select'])
Render the datashaded plot.
p * bounds
Figure 10: The MBA r-i vs. the g-r colors as in Fig 8, but displayed as a 2-dimensional density map (with a purple-green-yellow colormap) in regions of high density, and as individual points in regions of low density. Interactive toolbar at right.
This datashaded plot of the same color-color diagram as above does not require any magic-number parameters such as size and alpha and automatically ensures that there is no saturation or overplotting.
Above, select the wheel zoom and adjust the image to interact with the plot. Note how the shades of color of the data points change according to the local density.
5.2. Plotting millions of data points¶
The datasest of ~20K points used above is actually too small to demonstrate the power of datashader.
Below, visualize the full >1 million main-belt object dataset returned by the query.
Create a Points Element for the data.
points = hv.Points((uniqueMBAs['gmr'], uniqueMBAs['rmi']))
Create the linked streams instance.
boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Apply the datashader.
p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
xlim=(-2.0, 7.0), ylim=(-5.0, 3.0), xlabel="(g-r)", ylabel="(r-i)",
tools=['box_select'])
Render the datashaded plot.
p * bounds
Figure 11: Similar to Fig 10, but for MBAs and with many more objects included.
Above, use the box zoom or wheel zoom functionality to zoom in on the plot, and watch how datashader adapts the view to optimally display the data.
5.3. Adding a callback function¶
Add callback functionality to the color-color diagam above to retrieve the indices of selected points.
Above, use the box selection tool to select data.
STOP - Select some data points from the plot above using the box select tool before proceeding.
Print the number of objects selected.
selection = (points.data.x > box.bounds[0]) \
& (points.data.y > box.bounds[1]) \
& (points.data.x < box.bounds[2]) \
& (points.data.y < box.bounds[3])
print('The selection box contains %i data points'%(np.sum(selection)))
The selection box contains 5870 data points
5.4. Interactive selection¶
Below, create two side-by-side plots.
The left-hand plot will show the datashaded spatial distribution on the sky, and the right-hand plot will be a linked and brushed plot showing the r-band magnitude distribution for objects selected in the left-hand plot. It will be possible to use the box selection in the spatial distribution plot to change which data are included in the histogram.
First, create a holoviews dataset instance, and label some of the columns.
kdims = [('a', 'semimajor axis (au)'), ('e', 'eccentricity')]
vdims = [('incl', 'inclination (deg)')]
ds = hv.Dataset(uniqueMBAs, kdims, vdims)
points = hv.Points(ds)
boundsxy = (np.min(ds.data['a']), np.min(ds.data['e']),
np.max(ds.data['a']), np.max(ds.data['e']))
box = streams.BoundsXY(source=points, bounds=boundsxy)
box_plot = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Create custom callback functionality to update the linked histogram.
These functions are defined here (and not in Section 1.2) because they are only used in the following cell.
def log_inf(x) -> float:
return np.log(x) if x > 0 else 0
def update_histogram(bounds=bounds) -> hv.Histogram:
selection = (ds.data['a'] > bounds[0]) & \
(ds.data['e'] > bounds[1]) & \
(ds.data['a'] < bounds[2]) & \
(ds.data['e'] < bounds[3])
selected_mag = ds.data.loc[selection]['incl']
frequencies, edges = np.histogram(selected_mag)
hist = hv.Histogram(
(list(map(log_inf, frequencies)), edges)).opts(
xlabel='inclination (deg)')
return hist
dmap = hv.DynamicMap(
update_histogram, streams=[box]).options(height=400, width=400)
datashade(points, cmap=process_cmap("Viridis", provider="bokeh")) * \
box_plot.options(height=400, width=400, tools=['box_select']) + \
dmap
Figure 12: At right, semimajor axis vs eccentricity as a 2-dimensional density map with a purple-green-yellow colormap. At left, a histogram (blue bar chart) of the fraction of objects in bins of number of observations.
Try changing the box selection across the inner, middle, and outer main belt, and watch as the histogram is recomputed and displayed.
6. Exercises for the learner¶
HoloViews works with a wide range of plotting libraries; Bokeh, matplotlib, plotly, mpld3, pygal to name a few. As an exercise, try changing the HoloViews plotting library to be matplotlib instead of bokeh at the beginning of the notebook with hv.extension('matplotlib'). Notice the holoviews + matplotlib icons displayed when the library is loaded successfully. Recreate a few plots and compare the outputs. Try again with some other plotting library. Don't forget to set the plotting library back to Bokeh, which is used for this tutorial. Note that some warnings might be raised.
As in Section 4.3, try writing other interactive plots from this notebook to an interactive HTML file. Which ones can easily be output to an interactive HTML file? Which HTML files show full interactive functionality? Which ones don't?
Try making the above interactive plots with a different small body population, such as the transneptunian objects (TNOs) or near-Earth objects (NEOs).