public interface DataStreamGenerator
Modifier and Type | Field and Description |
---|---|
static int |
FILL_CACHE_BATCH_SIZE
Size of batch for
IgniteCache.putAll(Map) . |
Modifier and Type | Method and Description |
---|---|
default DatasetBuilder<Vector,Double> |
asDatasetBuilder(int datasetSize,
IgniteBiPredicate<Vector,Double> filter,
int partitions)
Convert first N values from stream to
DatasetBuilder . |
default DatasetBuilder<Vector,Double> |
asDatasetBuilder(int datasetSize,
IgniteBiPredicate<Vector,Double> filter,
int partitions,
UpstreamTransformerBuilder upstreamTransformerBuilder)
Convert first N values from stream to
DatasetBuilder . |
default DatasetBuilder<Vector,Double> |
asDatasetBuilder(int datasetSize,
int partitions)
Convert first N values from stream to
DatasetBuilder . |
default Map<Vector,Double> |
asMap(int datasetSize)
Convert first N values from stream to map.
|
default DataStreamGenerator |
blur(RandomProducer rnd)
Apply pseudorandom noize to vectors without labels mapping.
|
default <K> void |
fillCacheWithCustomKey(int datasetSize,
IgniteCache<K,LabeledVector<Double>> cache,
Function<LabeledVector<Double>,K> keyMapper)
Fills given cache with labeled vectors from this generator and user defined mapper from vectors to keys.
|
default void |
fillCacheWithVecHashAsKey(int datasetSize,
IgniteCache<Integer,LabeledVector<Double>> cache)
Fills given cache with labeled vectors from this generator as values and their hashcodes as keys.
|
default void |
fillCacheWithVecUUIDAsKey(int datasetSize,
IgniteCache<UUID,LabeledVector<Double>> cache)
Fills given cache with labeled vectors from this generator as values and random UUIDs as keys
|
Stream<LabeledVector<Double>> |
labeled() |
default Stream<LabeledVector<Double>> |
labeled(IgniteFunction<Vector,Double> classifier) |
default DataStreamGenerator |
mapVectors(IgniteFunction<Vector,Vector> f)
Apply user defined mapper to vectors stream without labels hiding.
|
default Stream<Vector> |
unlabeled() |
static final int FILL_CACHE_BATCH_SIZE
IgniteCache.putAll(Map)
.Stream<LabeledVector<Double>> labeled()
LabeledVector
in according to dataset shape.default Stream<Vector> unlabeled()
Vector
in according to dataset shape.default Stream<LabeledVector<Double>> labeled(IgniteFunction<Vector,Double> classifier)
classifier
- User defined classifier for vectors stream.LabeledVector
in according to dataset shape and user's classifier.default DataStreamGenerator mapVectors(IgniteFunction<Vector,Vector> f)
f
- Mapper of vectors of data stream.default DataStreamGenerator blur(RandomProducer rnd)
rnd
- Generator of pseudorandom scalars modifying vector components with label saving.default Map<Vector,Double> asMap(int datasetSize)
datasetSize
- Dataset size.default DatasetBuilder<Vector,Double> asDatasetBuilder(int datasetSize, int partitions)
DatasetBuilder
.datasetSize
- Dataset size.partitions
- Partitions count.default DatasetBuilder<Vector,Double> asDatasetBuilder(int datasetSize, IgniteBiPredicate<Vector,Double> filter, int partitions)
DatasetBuilder
.datasetSize
- Dataset size.filter
- Data filter.partitions
- Partitions count.default DatasetBuilder<Vector,Double> asDatasetBuilder(int datasetSize, IgniteBiPredicate<Vector,Double> filter, int partitions, UpstreamTransformerBuilder upstreamTransformerBuilder)
DatasetBuilder
.datasetSize
- Dataset size.filter
- Data filter.partitions
- Partitions count.upstreamTransformerBuilder
- Upstream transformer builder.default <K> void fillCacheWithCustomKey(int datasetSize, IgniteCache<K,LabeledVector<Double>> cache, Function<LabeledVector<Double>,K> keyMapper)
K
- Key type.datasetSize
- Rows count to put.cache
- Cache.keyMapper
- Mapping from vectors to keys.default void fillCacheWithVecHashAsKey(int datasetSize, IgniteCache<Integer,LabeledVector<Double>> cache)
datasetSize
- Rows count to put.cache
- Cache.default void fillCacheWithVecUUIDAsKey(int datasetSize, IgniteCache<UUID,LabeledVector<Double>> cache)
datasetSize
- Rows count to put.cache
- Cache.
GridGain In-Memory Computing Platform : ver. 8.9.15 Release Date : December 3 2024