Box keeps a search index for any files or folders stored in Box. Every time a file or folder is changed, those words are added to the index. When a search is performed, the API looks in the search index for files and folders that match the query. When content is added, updated, or deleted in Box, the search index is updated accordingly.
It can take time between uploading or modifying a file for it to be fully indexed and ready to be searched. In most cases, newly added or changed files can be expected to be available via Box search in 10 minutes. The current service load determines the index time and it may take more than 10 minutes in some cases.
Only content that the authenticated user has access to (items they can preview and/or view) will be returned in the search results.
In other words, a user needs to either own an item or be a collaborated in on an item for it to show up in the search results. If a user doesn't have access to an item, or if they have been shared the item via a shared link, then the item will also not appear in the search results.
One exception is that items that have been recently accessed via a shared link
can be requested in the search results by setting the
include_recent_shared_links query parameter to
Prefix Matching and Wildcard Search
Trailing wildcards (also known as
prefix matching) are implicitly included in search results because of the way
text is indexed. Searching for
Bo results in items with titles containing
Boxer. It is the equivalent of searching for
Bo% in traditional search engines. Traditional wildcard notation is not
supported by Box, such as
%ox%. While we support prefix matching on titles,
we do not support prefix matching on body content, suffix matching in the
title or body content, or infix (middle of the word) matching in the title or
body content. For example, a search on
cal would match results for a file
California but not
recall. It would not match results
with file body contents of prefixes, infixes, or suffixes including
Box Search uses stemming to match terms from the query to terms
in the index. Because of this, words that include the same stem may be
included in the result set, even if the words do not contain the exact form
in the query. For example,
running map to the same stem, so a
running may return a document containing
run in the title.
File Content Searching
The content within files is also stored within the Box search index. The following file types allow searching for their content:
Indexed Text per Document
The Box search index stores up to 10,000 bytes (~10,000 characters in English) per document for accounts from Business level and above. This amount can vary from document to document because of language, Box’s indexing method, and document type.
Box does not currently perform OCR on its documents.
Search only indexes content from the current version of a document, so that you do not have to sift through hundreds of irrelevant search results of outdated documents. You cannot use search to query non-current versions of a document.
Box search supports the following languages: Chinese, English, French, German, Italian, Japanese, and Spanish. Box does not support indexing of multiple languages within a single document.
Searching the trash is available via the API by using the
trash_content query parameter.
Check our community article with the latest details on Search in Box