%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
To start with, read in the two data files representing the master source list and observations source list. The fields for the two tables are respectively documented in:
from astropy.table import Table
master_sources = Table.read('data/cdfs_master_sources.fits')
obs_sources = Table.read('data/cdfs_obs_sources.fits')
Do the following to explore the two tables:
<TAB>
completion to easily discover all the attributes and methods, e.g. type master_sources.
and then hit the <TAB>
key.master_sources
obs_sources
master_sources.colnames
obs_sources.colnames
len(master_sources)
len(obs_sources)
master_sources.dtype
obs_sources.dtype
Normally one displays a table in IPython notebook by entering the variable name in a cell and pressing shift-Enter
. In a terminal session the default method is using something like print(my_table)
. In both cases the Table
object prefers to display only a screenful of data to prevent having a zillion lines of output if the table is huge. If you really want to see all the data you can use the Table.pprint method.
master_sources
table using its pprint()
method.more()
method as well.master_sources.pprint(max_lines=-1)
For our analysis we don't actually need the obi
(observation interval) column in the obs_sources
table.
obi
column from the obs_sources
table.The gti_obs
column name is a bit obscure (GTI is a good time interval, FWIW).
gti_obs
column to obs_date
.It would be nice to have a count rate in addition to the source counts.
src_rate_aper_b
which is the source counts divided by observation duration in sec.Some of the sources have a negative net flux in the broad band
obs_sources.remove_column('obi')
obs_sources.rename_column('gti_obs', 'obs_date')
obs_sources['src_rate_aper_b'] = obs_sources['src_cnts_aper_b'] / obs_sources['livetime']
obs_sources.colnames
For each source detected in an individual observation (in the obs_sources
table), let's look at the source flux values.
hist()
function to make a histogram of the source fluxes. Since the fluxes vary by orders of magnitude, use the numpy.log10
to put the fluxes in log space.HINT: if you did not start notebook with --pylab=inline
then try:
import matplotlib.pyplot as plt
%matplotlib inline
theta
that are less than 4.0.vals, bins, patches = plt.hist(np.log10(obs_sources['flux_aper_b']), bins=50)
ok = obs_sources['theta'] < 4
out = plt.hist(np.log10(obs_sources['flux_aper_b'][ok]), bins=bins)
The master_sources
and obs_sources
tables share a common msid
column. What we now want is to join the master RA and Dec positions and master source names with the individual observations table.
sources
that has the master RA, Dec, and name included for each observation source.HINT: the defaults for keys
and join_type='inner'
are correct in this case, so the simplest possible call to join()
will work!
Intermediate: Is the length of the new sources
the same as obs_sources
? What happened?
Advanced: Make a scatter plot of the RA (x-axis) and Dec (y-axis) difference between the master source position and the observation source position. You'll need to use coordinates
!
from astropy.table import join
sources = join(master_sources, obs_sources)
print(len(sources), len(obs_sources))
# Some of the obs_sources had an `msid` value that is not in the `master_sources` table
from astropy.coordinates import SkyCoord
master_c = SkyCoord(sources['ra'], sources['dec'], unit='hourangle,deg', frame='icrs')
obs_c = SkyCoord(sources['ra_b'], sources['dec_b'], unit='hourangle,deg', frame='icrs')
plt.plot((master_c.ra - obs_c.ra).value, (master_c.dec - obs_c.dec).value, '.')
sources
¶Finally, we can look at the variability properties of sources in the CDFS using the group_by()
functionality.
This method makes a new table in which all the sources with identical master ID are next to each other.
g_sources
which is the sources
table grouped by the msid
key using the group_by()
method.The g_sources
table is just a regular table with all the sources
in a particular order. The attribute g_sources.groups
is an object that provides access to the msid
sub-groups. You can access the \(i^{th}\) group with g_sources.groups[i]
.
In addition the g_sources.groups.indices
attribute is an array with the indicies of the group boundaries.
np.diff()
find the number of repeat observations of each master sources. HINT: use the indices, Luke.g_sources = sources.group_by('msid')
np.diff(g_sources.groups.indices)
g_sources.groups[50]
The real power of grouping comes in the ability to create aggregate values for each of the groups, for instance the mean flux for each unique source. This is done with the aggregate()
method, which takes a function reference as its input. This function must take as input an array of values and return a single value.
Aggregate returns a new table that has a length equal to the number of groups.
aggregate
and the np.mean
function. Call this table g_sources_mean
.join()
function to restore the master_sources
information to g_sources_mean
.g_sources_mean = g_sources.groups.aggregate(np.mean)
g_sources_mean
g_sources_mean = join(master_sources, g_sources_mean)
g_sources_mean