% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/searchMetadataText.R
\name{searchMetadataText}
\alias{searchMetadataText}
\alias{gypsum.search.clause}
\alias{Ops.gypsum.search.clause}
\alias{defineTextQuery}
\alias{searchMetadataTextFilter}
\title{Text search on the metadata database}
\usage{
searchMetadataText(path, query, latest = TRUE, include.metadata = TRUE)

defineTextQuery(text, field = NULL, partial = FALSE)

searchMetadataTextFilter(query, pid.name = "paths.pid")
}
\arguments{
\item{path}{String containing a path to a SQLite file, usually obtained via \code{\link{fetchMetadataDatabase}}.}

\item{query}{Character vector specifying the query to execute.
Alternatively, a \code{gypsum.search.object} produced by \code{defineTextQuery}.}

\item{latest}{Logical scalar indicating whether to only search for matches within the latest version of each asset.}

\item{include.metadata}{Logical scalar indicating whether metadata should be returned.}

\item{text}{String containing the text to query on.
This will be automatically tokenized, see Details.}

\item{field}{String specifying the name of the metadata field in which to search for \code{text}.
If \code{NULL}, the search is performed on all available metadata fields.}

\item{partial}{Logical scalar indicating whether \code{text} contains SQLite wildcards (\code{\%}, \code{_}) for a partial search.
If \code{TRUE}, the wildcards are preserved during tokenization.}

\item{pid.name}{String containing the name/alias of the column of the \code{paths} table that contains the path ID.}
}
\value{
For \code{searchMetadataText}, a data frame specifying the contaning the search results.
\itemize{
\item The \code{project}, \code{asset} and \code{version} columns contain the identity of the version with matching metadata.
\item The \code{path} column contains the suffix of the object key of the metadata document,
i.e., the relative \dQuote{path} within the version's \dQuote{directory} to the metadata document.
The full object key of the document inside the bucket is defined as \code{{project}/{asset}/{version}/{path}}.
\item If \code{include.metadata=TRUE}, a \code{metadata} column is present with the nested metadata for each match.
\item If \code{latest=TRUE}, a \code{latest} column is present indicating whether the matching version is the latest for its asset.
Otherwise, only the latest version is returned.
}

For \code{searchMetadataTextFilter}, a list containing \code{where}, a string can be directly used as a WHERE filter condition in a SQL SELECT statement;
and \code{parameters}, the parameter bindings to be used in \code{where}.
The return value may also be \code{NULL} if the query has no well-defined filter.

For \code{defineTextQuery}, a \code{gypsum.search.clause} object that can be used in \code{|}, \code{&} and \code{!} to create more complex queries involving multiple text clauses.
}
\description{
Perform a text search on a SQLite database containing metadata from the gypsum backend.
This is based on a precomputed tokenization of all string properties in each metadata document;
see \url{https://github.com/ArtifactDB/bioconductor-metadata-index} for details.
}
\details{
Each string is tokenized by converting it to lower case and splitting it on characters that are not Unicode letters/numbers or a dash.
We currently do not remove diacritics so these will need to be converted to ASCII by the user. 
If a text query involves only non-letter/number/dash characters, the filter will not be well-defined and will be ignored when constructing SQL statements.

For convenience, a non-empty character vector may be used in \code{query}.
A character vector of length 1 is treated as shorthand for a text query with default arguments in \code{defineTextQuery}.
A character vector of length greater than 1 is treated as shorthand for an AND operation on default text queries for each of the individual strings.
}
\examples{
path <- fetchMetadataDatabase()
searchMetadataText(path, c("mouse", "brain"), include.metadata=FALSE)

# Now for a slightly more complex query:
is.mouse <- defineTextQuery("10090", field="taxonomy_id")
query <- (defineTextQuery("brain") | defineTextQuery("pancreas")) & is.mouse
searchMetadataText(path, query, include.metadata=FALSE)

# Throwing in some wildcards.
has.neuro <- defineTextQuery("Neuro\%", partial=TRUE)
searchMetadataText(path, has.neuro, include.metadata=FALSE)

}
\seealso{
\code{\link{fetchMetadataDatabase}}, to download and cache the database files.

\url{https://github.com/ArtifactDB/bioconductor-metadata-index}, for details on the SQLite file contents and table structure.
}
\author{
Aaron Lun
}
