K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.public class BootstrappedDatasetBuilder<K,V> extends Object implements PartitionDataBuilder<K,V,EmptyContext,BootstrappedDatasetPartition>
BootstrappedVector
containing each vector from original sample with counters of repetitions
for each subsample. As heuristic this implementation uses Poisson Distribution for generating counter values.Constructor and Description |
---|
BootstrappedDatasetBuilder(Preprocessor<K,V> preprocessor,
int samplesCnt,
double subsampleSize)
Creates an instance of BootstrappedDatasetBuilder.
|
Modifier and Type | Method and Description |
---|---|
BootstrappedDatasetPartition |
build(LearningEnvironment env,
Iterator<UpstreamEntry<K,V>> upstreamData,
long upstreamDataSize,
EmptyContext ctx)
Builds a new partition
data from a partition upstream data and partition context . |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
andThen, build
public BootstrappedDatasetBuilder(Preprocessor<K,V> preprocessor, int samplesCnt, double subsampleSize)
preprocessor
- Mapper of upstream entries into LabeledVector
.samplesCnt
- Samples count.subsampleSize
- Subsample size.public BootstrappedDatasetPartition build(LearningEnvironment env, Iterator<UpstreamEntry<K,V>> upstreamData, long upstreamDataSize, EmptyContext ctx)
data
from a partition upstream
data and partition context
.
Important: there is no guarantee that there will be no more than one UpstreamEntry with given key,
UpstreamEntry should be thought rather as a container saving all data from upstream, but omitting uniqueness
constraint. This constraint is omitted to allow upstream data transformers in DatasetBuilder
replicating
entries. For example it can be useful for bootstrapping.build
in interface PartitionDataBuilder<K,V,EmptyContext,BootstrappedDatasetPartition>
env
- Learning environment.upstreamData
- Partition upstream
data.upstreamDataSize
- Partition upstream
data size.ctx
- Partition context
.data
.
GridGain In-Memory Computing Platform : ver. 8.9.14 Release Date : November 5 2024