utils¶
Utility functions for construction of graphs.
- graphnet.models.graphs.utils.lex_sort(x, cluster_columns)[source]¶
Sort numpy arrays according to columns on ´cluster_columns´.
Note that x is sorted along the dimensions in cluster_columns backwards. I.e. cluster_columns = [0,1,2] means x is sorted along [2,1,0].
- Parameters:
x (
array
) – array to be sorted.cluster_columns (
List
[int
]) – Columns of x to be sorted along.
- Return type:
ndarray
- Returns:
A sorted version of x.
- graphnet.models.graphs.utils.gather_cluster_sequence(x, feature_idx, cluster_columns)[source]¶
Turn x into rows of clusters with sequences along columns.
Sequences along columns are added which correspond to gathered sequences of the feature in x specified by column index feature_idx associated with each column. Sequences are padded with NaN to be of same length. Dimension of clustered array is [n_clusters, l + len(cluster_columns)],where l is the largest sequence length.
Example: Suppose x represents a neutrino event and we have chosen to cluster on the PMT positions and that feature_idx correspond to pulse time.
The resulting array will have dimensions [n_pmts, m + 3] where m is the maximum number of same-pmt pulses found in x, and `+3`for the three spatial directions defining each cluster.
- Parameters:
x (
ndarray
) – Array for clusteringfeature_idx (
int
) – Index of the feature in x tocluster. (be gathered for each)
cluster_columns (
List
[int
]) – Index in x from which to build clusters.
- Returns:
Array with dimensions [n_clusters, l + len(cluster_columns)] column_offset: Indices of the columns in array that defines clusters.
- Return type:
array
- graphnet.models.graphs.utils.identify_indices(feature_names, cluster_on)[source]¶
Identify indices for clustering and summarization.
- Return type:
Tuple
[List
[int
],List
[int
],List
[str
]]- Parameters:
feature_names (List[str])
cluster_on (List[str])
- graphnet.models.graphs.utils.cluster_summarize_with_percentiles(x, summarization_indices, cluster_indices, percentiles, add_counts)[source]¶
Turn x into clusters with percentile summary.
From variables specified by column indices cluster_indices, x is turned into clusters. Information in columns of x specified by indices summarization_indices with each cluster is summarized using percentiles. It is assumed x represents a single event.
Example use-case: Suppose x contains raw pulses from a neutrino event where some DOMs have multiple measurements of Cherenkov radiation. If cluster_indices is set to the columns corresponding to the xyz-position of the DOMs, and the features specified in summarization_indices correspond to time, charge, then each row in the returned array will correspond to a DOM, and the time and charge for each DOM will be summarized by percentiles. Returned output array has dimensions [n_clusters, len(percentiles)*len(summarization_indices) + len(cluster_indices)]
- Parameters:
x (
ndarray
) – Array to be clusteredsummarization_indices (
List
[int
]) – List of column indices that defines features that will be summarized with percentiles.cluster_indices (
List
[int
]) – List of column indices on which the clusters are constructed.percentiles (
List
[int
]) – percentiles used to summarize x. E.g. [10,50,90].add_counts (bool)
- Return type:
ndarray
- Returns:
Percentile-summarized array
- class graphnet.models.graphs.utils.cluster_and_pad(x, cluster_columns, input_names)[source]¶
Bases:
object
Cluster and pad the data for further summarization.
Clusters the inptut data according to the specified columns and computes aggregate statistics on the clusters. The clustering will happen only ones creating a cluster matrix which will hold all the aggregated statistics and a padded matrix which will hold the padded data for quick calculation of aggregate statistics.
Example: cluster_and_pad(x = single_event_as_array,
cluster_columns = [0,1,2])
# Creates a cluster matrix and a padded matrix, # the cluster matrix will contain the unique values of the cluster columns, # no additional aggregate statistics are added yet.
- cluster_class.add_percentile_summary(summarization_indices = [3,4,5],
percentiles = [10,50,90])
# Adds the 10th, 50th and 90th percentile of columns 3,4 # and 5 in the input data to the cluster matrix.
cluster_class.add_std(column = 4) # Adds the standard deviation of column 4 in the input data # to the cluster matrix. x = cluster_class.clustered_x # Gets the clustered matrix with all the aggregate statistics.
Initialize the class with the data and cluster columns.
- Parameters:
x (
ndarray
) – Array to be clusteredcluster_columns (
List
[int
]) – List of column indices on which the clusters are constructed.input_names (
Optional
[List
[str
]], default:None
) – Names of the columns in the input data for automatic generation of names.Adds – clustered_x: Added to the class _counts: Added to the class _padded_x: Added to the class
- add_charge_threshold_summary(summarization_indices, percentiles, charge_index, location)[source]¶
Summarize features through percentiles on charge of sensor.
- Parameters:
summarization_indices (
List
[int
]) – List of column indices that defines features that will be summarized with percentiles.percentiles (
List
[int
]) – percentiles used to summarize x. E.g. [10,50,90].charge_index (
int
) – index of the charge column in the padded tensorlocation (
Optional
[int
], default:None
) – Location to insert the summarization indices in the clustered tensor defaults to adding at the end
- Return type:
ndarray
- Adds:
_charge_sum: Added to the class _charge_weights: Added to the class
- Altered:
- _padded_x: Charge is altered to be the cumulative sum
of the charge divided by the total charge
- clustered_x: The summarization indices are added at the end
of the tensor or inserted at the specified location.
- _cluster_names: The names are added at the end of the tensor
or inserted at the specified location
- add_percentile_summary(summarization_indices, percentiles, method, location)[source]¶
Summarize the features of the sensors using percentiles.
- Parameters:
summarization_indices (
List
[int
]) – List of column indices that defines features that will be summarized with percentiles.percentiles (
List
[int
]) – percentiles used to summarize x. E.g. [10,50,90].method (
str
, default:'linear'
) – Method to summarize the features. E.g. “linear”location (
Optional
[int
], default:None
) – Location to insert the summarization indices in the clustered tensor defaults to adding at the end
- Return type:
ndarray
- Altered:
- clustered_x: The summarization indices are added at the end of
the tensor or inserted at the specified location
- _cluster_names: The names are added at the end of the tensor
or inserted at the specified location
- add_counts(location)[source]¶
Add the counts of the sensor to the summarization features.
- Return type:
ndarray
- Parameters:
location (int | None)
- add_sum_charge(charge_index, location)[source]¶
Add the sum of the charge to the summarization features.
- Return type:
ndarray
- Parameters:
charge_index (int)
location (int | None)
- add_std(columns, location, weights)[source]¶
Add the standard deviation of the column.
- Parameters:
columns (
List
[int
]) – Index of the columns from which to calculate the standard deviation.location (
Optional
[int
], default:None
) – Location to insert the standard deviation in the clustered tensor defaults to adding at the endweights (
Union
[ndarray
,int
], default:1
) – Optional weights to be applied to the standard deviation
- Return type:
ndarray
- graphnet.models.graphs.utils.ice_transparency(z_offset, z_scaling)[source]¶
Return interpolation functions for optical properties of IceCube.
NOTE: The resulting interpolation functions assumes that the Z-coordinate of pulse are scaled as z = z/500. Any deviation from this scaling method results in inaccurate results.
- Parameters:
z_offset (
Optional
[float
], default:None
) – Offset to be added to the depth of the DOM.z_scaling (
Optional
[float
], default:None
) – Scaling factor to be applied to the depth of the DOM.
- Returns:
Function that takes a normalized depth and returns the corresponding normalized scattering length. f_absorption: Function that takes a normalized depth and returns the corresponding normalized absorption length.
- Return type:
f_scattering