mlpack
2.2.5
|
The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries. More...
Public Member Functions | |
LSHSearch (const arma::mat &referenceSet, const arma::cube &projections, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500) | |
This function initializes the LSH class. More... | |
LSHSearch (const arma::mat &referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500) | |
This function initializes the LSH class. More... | |
LSHSearch () | |
Create an untrained LSH model. More... | |
~LSHSearch () | |
Clean memory. More... | |
size_t | BucketSize () const |
Get the bucket size of the second hash. More... | |
size_t | DistanceEvaluations () const |
Return the number of distance evaluations performed. More... | |
size_t & | DistanceEvaluations () |
Modify the number of distance evaluations performed. More... | |
size_t | NumProjections () const |
Get the number of projections. More... | |
const arma::mat & | Offsets () const |
Get the offsets 'b' for each of the projections. (One 'b' per column.) More... | |
const arma::cube & | Projections () |
Get the projection tables. More... | |
void | Projections (const arma::cube &projTables) |
Change the projection tables (this retrains the LSH model). More... | |
const arma::mat & | ReferenceSet () const |
Return the reference dataset. More... | |
void | Search (const arma::mat &querySet, const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, const size_t T=0) |
Compute the nearest neighbors of the points in the given query set and store the output in the given matrices. More... | |
void | Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0, size_t T=0) |
Compute the nearest neighbors and store the output in the given matrices. More... | |
const std::vector< arma::Col < size_t > > & | SecondHashTable () const |
Get the second hash table. More... | |
const arma::vec & | SecondHashWeights () const |
Get the weights of the second hash. More... | |
template<typename Archive > | |
void | Serialize (Archive &ar, const unsigned int version) |
Serialize the LSH model. More... | |
void | Train (const arma::mat &referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500, const arma::cube &projection=arma::cube()) |
Train the LSH model on the given dataset. More... | |
Static Public Member Functions | |
static double | ComputeRecall (const arma::Mat< size_t > &foundNeighbors, const arma::Mat< size_t > &realNeighbors) |
Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors. More... | |
The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries.
SortPolicy | The sort policy for distances; see NearestNeighborSort. |
Definition at line 62 of file lsh_search.hpp.
LSHSearch | ( | const arma::mat & | referenceSet, |
const arma::cube & | projections, | ||
const double | hashWidth = 0.0 , |
||
const size_t | secondHashSize = 99901 , |
||
const size_t | bucketSize = 500 |
||
) |
This function initializes the LSH class.
It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done.
referenceSet | Set of reference points and the set of queries. |
projections | Cube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality. |
hashWidth | The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general. |
secondHashSize | The size of the second hash table. This should be a large prime number. |
bucketSize | The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). |
LSHSearch | ( | const arma::mat & | referenceSet, |
const size_t | numProj, | ||
const size_t | numTables, | ||
const double | hashWidth = 0.0 , |
||
const size_t | secondHashSize = 99901 , |
||
const size_t | bucketSize = 500 |
||
) |
This function initializes the LSH class.
It builds the hash one the reference set using the provided projections. See the individual functions performing the hashing for details on how the hashing is done.
referenceSet | Set of reference points and the set of queries. |
numProj | Number of projections in each hash table (anything between 10-50 might be a decent choice). |
numTables | Total number of hash tables (anything between 10-20 should suffice). |
hashWidth | The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general. |
secondHashSize | The size of the second hash table. This should be a large prime number. |
bucketSize | The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). |
LSHSearch | ( | ) |
~LSHSearch | ( | ) |
Clean memory.
|
inline |
Get the bucket size of the second hash.
Definition at line 252 of file lsh_search.hpp.
|
static |
Compute the recall (% of neighbors found) given the neighbors returned by LSHSearch::Search and a "ground truth" set of neighbors.
The recall returned will be in the range [0, 1].
foundNeighbors | Set of neighbors to compute recall of. |
realNeighbors | Set of "ground truth" neighbors to compute recall against. |
|
inline |
Return the number of distance evaluations performed.
Definition at line 235 of file lsh_search.hpp.
|
inline |
Modify the number of distance evaluations performed.
Definition at line 237 of file lsh_search.hpp.
|
inline |
Get the number of projections.
Definition at line 243 of file lsh_search.hpp.
|
inline |
Get the offsets 'b' for each of the projections. (One 'b' per column.)
Definition at line 246 of file lsh_search.hpp.
|
inline |
Get the projection tables.
Definition at line 259 of file lsh_search.hpp.
|
inline |
Change the projection tables (this retrains the LSH model).
Definition at line 262 of file lsh_search.hpp.
References LSHSearch< SortPolicy >::Train().
|
inline |
Return the reference dataset.
Definition at line 240 of file lsh_search.hpp.
void Search | ( | const arma::mat & | querySet, |
const size_t | k, | ||
arma::Mat< size_t > & | resultingNeighbors, | ||
arma::mat & | distances, | ||
const size_t | numTablesToSearch = 0 , |
||
const size_t | T = 0 |
||
) |
Compute the nearest neighbors of the points in the given query set and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
querySet | Set of query points. |
k | Number of neighbors to search for. |
resultingNeighbors | Matrix storing lists of neighbors for each query point. |
distances | Matrix storing distances of neighbors for each query point. |
numTablesToSearch | This parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered. |
T | The number of additional probing bins to examine with multiprobe LSH. If T = 0, classic single-probe LSH is run (default). |
void Search | ( | const size_t | k, |
arma::Mat< size_t > & | resultingNeighbors, | ||
arma::mat & | distances, | ||
const size_t | numTablesToSearch = 0 , |
||
size_t | T = 0 |
||
) |
Compute the nearest neighbors and store the output in the given matrices.
The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.
k | Number of neighbors to search for. |
resultingNeighbors | Matrix storing lists of neighbors for each query point. |
distances | Matrix storing distances of neighbors for each query point. |
numTablesToSearch | This parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered. |
|
inline |
Get the second hash table.
Definition at line 255 of file lsh_search.hpp.
|
inline |
Get the weights of the second hash.
Definition at line 249 of file lsh_search.hpp.
void Serialize | ( | Archive & | ar, |
const unsigned int | version | ||
) |
Serialize the LSH model.
ar | Archive to serialize to. |
void Train | ( | const arma::mat & | referenceSet, |
const size_t | numProj, | ||
const size_t | numTables, | ||
const double | hashWidth = 0.0 , |
||
const size_t | secondHashSize = 99901 , |
||
const size_t | bucketSize = 500 , |
||
const arma::cube & | projection = arma::cube() |
||
) |
Train the LSH model on the given dataset.
If a correctly-sized projection cube is not provided, this means building new hash tables. Otherwise, we use the projections provided by the user.
referenceSet | Set of reference points and the set of queries. |
numProj | Number of projections in each hash table (anything between 10-50 might be a decent choice). |
numTables | Total number of hash tables (anything between 10-20 should suffice). |
hashWidth | The width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general. |
secondHashSize | The size of the second hash table. This should be a large prime number. |
bucketSize | The size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. A value of 0 indicates that there is no limit (so the second hash table can be arbitrarily large—be careful!). |
projections | Cube of projection tables. For a cube of size (a, b, c) we set numProj = a, numTables = c. b is the reference set dimensionality. |
Referenced by LSHSearch< SortPolicy >::Projections().