dataconverter¶
Contains DataConverter.
- graphnet.data.dataconverter.init_global_index(index, output_files)[source]¶
Make global_index available to pool workers.
- Return type:
None- Parameters:
index (Synchronized)
output_files (List[str])
- class graphnet.data.dataconverter.DataConverter(file_reader, save_method, outdir, extractors, index_column, num_workers)[source]¶
Bases:
ABC,LoggerA finalized data conversion class in GraphNeT.
DataConverter provides parallel processing of file conversion and extraction from experiment-specific file formats to graphnet-supported data formats. This class also assigns event id’s to training examples.
Initialize DataConverter.
- Parameters:
file_reader (
GraphNeTFileReader) – The method used for reading and applying Extractors.save_method (
GraphNeTWriter) – The method used to save the interim data format to a graphnet supported file format.outdir (
str) – The directory to save the files in.extractors (
Union[List[Extractor],List[I3Extractor],List[ParquetExtractor],List[H5Extractor],List[PrometheusExtractor]]) – The `Extractor`(s) that will be applied to the input files.index_column (
str, default:'event_no') – Name of the event id column added to the events. Defaults to “event_no”.num_workers (
int, default:1) – The number of CPUs used for parallel processing. Defaults to 1 (no multiprocessing).
- get_map_function(nb_files, unit(s)')[source]¶
Identify map function to use (pure python or multiprocess).
- Return type:
Tuple[Any,Optional[Pool]]- Parameters:
nb_files (int)
unit (str)
- merge_files(files, output_dir, **kwargs)[source]¶
Merge converted files.
DataConverter will call the .merge_files method in the GraphNeTWriter module that it was instantiated with.
- Parameters:
files (
Union[List[str],str,None], default:None) – Intermediate files to be merged.output_dir (
Optional[str], default:None) – Directory to save the merged files in.**kwargs (
Any) – Additional keyword arguments to be passed to the GraphNeTWriter.merge_files method.
- Return type:
None