1 User Guides
Milvus Search Tool: User Guide
Overview
The Milvus Search tool provides powerful vector-based search capabilities for your text collection. This tool allows you to perform semantic searches, exact text matching, and hybrid searches that combine both approaches. It’s particularly useful for researchers who need to find documents based on conceptual similarity as well as specific textual content.
Getting Started
The Milvus Search tool works with a vector database backend that needs to be initialized before you can start searching. The tool comes pre-configured with default settings for your collection.
Basic Commands
To get started, launch the program from the Unix command line as follows:
$ python milvus_search
Once launched, milvus_search responds with a menu of options. The commands it responds to are described here:
Creating the Collection
create
This command initializes the Milvus collection and indexes all documents from the configured directory. You only need to run this once when setting up the tool for the first time, or if you want to rebuild the index.
Searching Documents
search <query>
Performs a hybrid search combining semantic and lexical (keyword) approaches. This finds documents that are conceptually related to your query even if they don’t contain the exact search terms.
Example:
search joseph in venice
Exact Text Matching
match <exact text>
Searches for documents containing the exact text you specify. This is useful when you need to find specific phrases or terms.
Example:
match jesus christ
Combined Hybrid Search
hybrid <query> match:<exact text>
Combines semantic search with exact text matching. This allows you to find documents that are conceptually related to your query but must also contain specific text.
Example:
hybrid joseph in venice with canals match:jesus christ
Term Indexing
index <term1,term2,...>
Creates a comprehensive index of all occurrences of the specified terms across your document collection. Results are saved to a text file.
Example:
index jesus,christ,holy
Adjusting Search Parameters
weights <dense_weight> <sparse_weight>
Adjusts the balance between semantic search (dense) and keyword search (sparse). Higher dense weight emphasizes conceptual similarity, while higher sparse weight emphasizes keyword matching.
Example:
weights 0.7 0.3
limit <number>
Sets the maximum number of search results to display. Currently, the maximum number we support is 5000.
Example:
limit 50
Exiting the Program
exit
Closes the Milvus Search tool.
Understanding Search Results
For each search result, you’ll see:
- Result number: Position in the results list
- Filename: Name of the document file
- Original file: For chunked documents, shows the source file and chunk number
- Score: Relevance score (higher is more relevant)
- Occurrences: For text matching, shows the number of times the term appears
- Preview: A snippet of text from the document
- Text length: Total size of the document in characters
Advanced Features
Document Chunking
Large documents are automatically split into manageable chunks with overlap to maintain context. When searching, chunks from the same document are consolidated in the results.
Highlighting
When using text matching, the matched terms are highlighted in the preview with asterisks (**Term**).
Term Indexing
The index command creates a comprehensive report of all occurrences of specified terms:
- Lists all documents containing each term
- Counts total occurrences
- Shows each occurrence in context
- Saves results to a timestamped text file
Example Workflow
- Initialize the collection (first-time setup)
- create
- Perform a broad semantic search
- search religious practices in 17th century England
- Refine with exact text matching
- hybrid religious practices match:prayer book
- Adjust weights for better results
- weights 0.8 0.2
- Create an index of specific terms
- index prayer,worship,communion
Tips for Effective Searching
- Start with semantic search for broad topics and concepts, then refine with text matching.
- Adjust the weights based on your search needs:
- Higher dense weight (0.7-0.9) for conceptual, topic-based searches
- Higher sparse weight (0.7-0.9) for keyword-focused searches
- Balanced weights (0.5/0.5) for general searching
- Use term indexing to create comprehensive lists of all occurrences of key terms in your corpus.
- Combine with other tools in the suite:
- Use Milvus Search to find relevant documents
- Use the Document Viewer to examine the full text of interesting results
- Use Historical Search for specialized variant-aware searches
- Increase the limit when performing broader searches to see more potential matches.
Troubleshooting
- No results found: Try broadening your search terms, using fewer terms, or adjusting the weights to favor dense vectors (semantic search).
- Too many irrelevant results: Try using the hybrid command to combine semantic search with exact text matching or adjust weights to favor sparse vectors (keyword search).
- Search is too slow: Reduce the search limit or use more specific search terms.
- “Collection not found” error: Run the create command to initialize the collection.
Using with Other Tools
For a comprehensive research workflow:
- Use Milvus Search for semantic and hybrid searching to find conceptually related documents
- Use Historical Search when you need to account for historical spelling variants or when you require quick fewer word searches.
- Use the Document Viewer to examine full documents identified by either search method
This combined approach gives you powerful tools for analyzing historical documents from multiple perspectives.
Index Search Tool: User Guide
Overview
The Index Search tool allows you to search through a collection of historical texts with support for historical spelling variants. This makes it particularly useful for research involving early modern texts where spelling was not standardized.
Getting Started
The search tool comes with a pre-built index file that contains information about all documents in the collection. When you run the program, it will either load this existing index or build a new one if it doesn’t exist.
Launch the Program
To get started, launch the program from the Unix command line as follows:
$ python index_search
Basic Search
To perform a search:
- Run the program
- When prompted, enter your search term(s)
- Follow the additional prompts to refine your search
- Review the search results
Search Options
When performing a search, you’ll be prompted for several options:
- Search Terms: Enter one or more words to search for. The system will automatically generate historical spelling variants. Currently, searching for terms without variants is not permitted.
- Before Year: Enter a year to only see results published before that year (leave empty for no filter).
- After Year: Enter a year to only see results published after that year (leave empty for no filter).
- Results Per Page: Enter the number of results you want to see at once (default is 10).
- AND/OR Operation: Specify whether all terms must be present (‘and’) or if any of the terms can be present (‘or’). The default is ‘or.
Understanding Search Results
For each matching document, you’ll see:
- Filename: The name of the matching text file
- Score: A numerical score indicating relevance (higher is more relevant)
- Metadata: Information about the document including:
- Title
- Author
- Date
- Publisher
- Place of publication
- Variants: The specific word variants that were matched in the document with their occurrence count
Navigating Results
After viewing a page of results, you’ll be prompted with options:
- Enter ‘n’ to see the next page of results
- Enter ‘new’ to start a new search
- Enter ‘exit’ to quit the program
Example Search
Here’s an example of how to perform a search:
Search (or 'exit' to quit): jesus christ
Before year (leave empty if no filter): 1650
After year (leave empty if no filter): 1500
Results per page (default 10): 3
and/or (default 'or'): and
The program then responds with a list of its variant spellings:
Variants for 'jesus': Jesus, Iesvs, IESUS, IESVS, Iesus, JESVS, jesvs, jesus, Jesvs, iesvs, JESUS, iesus
Variants for 'christ': CHRIST, christ, Christ
Found 327 total matches
… and the documents found:
A16017.P4.txt
Score: 46.203
Title: The newe testamente both Latine and Englyshe ech correspondent to the other after the vulgare texte, communely called S. Ieroms. Faythfully translated by Myles Couerdale. Anno. M.CCCCC.XXXVIII.
Author: Coverdale, Miles, 1488-1568.
Date: 1538
Publisher: Text Creation Partnership,
Place: Ann Arbor, MI
Variants: jesus: iesvs(244), iesus(734)
——————————————————————————–
A02928.P4.txt
Score: 32.171
Title: The vvay of lyfe A Christian, and catholique institution comprehending principal poincts of Christian religion, which are necessary to bee knowne of all men, to the atteyning of saluation. First delyuered, in the Danish language for the instruction of those people, by Doctor Nicolas Hemmingius, preacher of the Gospell, and professor of diuinitie, for the Kynge of Denmarcke, in his Uniuersitie of Hafnia: and about three yeares past, (for the commoditie of others) translated into Latine, by Andrew Seurinus Velleius: and now first, and newly Englished, for the commodity of English readers: by N. Denham, this yeare of our redemption. 1578.
Author: Hemmingsen, Niels, 1513-1600.
Date: 1578
Publisher: Text Creation Partnership,
Place: Ann Arbor, MI
Variants: jesus: iesvs(34), iesus(55)
——————————————————————————–
A12716.P4.txt
Score: 27.518
Title: A cloud of vvitnesses and they the holy genealogies of the sacred Scriptures. Confirming vnto vs the truth of the histories in Gods most holy word, and the humanitie of Christ Iesus. The second addition. By Io. Speed.
Author: Speed, John, 1552?-1629.
Date: 1620
Publisher: Text Creation Partnership,
Place: Ann Arbor, MI
Variants: jesus: iesvs(5), iesus(81)
——————————————————————————–
Historical Spelling Variants
The system automatically generates historical spelling variants for your search terms. For example:
- For “jesus”: jesus, Iesus, IESVS, etc.
- For words with ‘i’: variants with ‘y’ and ‘j’
- For words with ‘v’: variants with ‘u’
- For words with ‘f’: variants with ‘ff’
This feature is particularly useful when searching for early modern texts where spelling was not standardized.
Tips for Effective Searching
- Start broad, then narrow: Begin with a general search, then add date filters if needed.
- Check variants: Review the variants found to understand how your search terms appear in the documents.
- Try both ‘and’ and ‘or’: If you’re not finding what you need with ‘and’, try switching to ‘or’ to get more results.
- Use date filters wisely: The date filters work on publication year, so only documents with parseable dates will be filtered.
- Look at scores: Higher scores indicate better matches according to the search algorithm.
Troubleshooting
- No matches found: Try broadening your search by using fewer terms, removing date filters, or using the ‘or’ operation instead of ‘and’.
- Too many results: Try narrowing your search by adding more specific terms, using date filters, or switching from ‘or’ to ‘and’.
- Loading the index should generally take around 90-130 seconds. Once the index has been loaded, searches should take less than 3 seconds.
Document Viewer Tool: User Guide
Overview
The Document Viewer tool allows you to quickly locate and view the content of historical documents by searching for their document IDs. This companion tool supplements the Index Search and Milvus Search programs to help you examine the full text of documents found by these two search programs.
Getting Started
The Document Viewer comes with a pre-configured default directory where it will search for documents.
Basic Usage
To view a document:
- Run the program.
- When prompted, enter the document ID you want to view
- The document will open in a new window if found
- You can continue searching for additional documents without closing the program
See the Example Usage section below for a worked example of how to use this program.
Search Options
When looking for documents, the system will:
- Search for document IDs in the specified directory and all its subdirectories
- Match document IDs regardless of case (e.g., “A02495” will find files with “a02495”)
- Look for files with ‘.txt’ and ‘.P4.txt’, extensions
Using the Document Viewer Window
When a document is found, it will open in a new window with these features:
- The window title shows the document filename
- Line numbers are displayed for easy reference
- You can scroll through the document using the scrollbar
- Click the “Close” button to close the document window
The main console program remains open, allowing you to search for additional documents without restarting the tool.
Command Line Options
Type in the file name ID to return the file. Enter ‘exit’ (regardless of casing) to exit the program.
$ python document_viewer
Example Usage
Here’s an example:
$ python document_viewer
After running this command, the program responds:
Document viewer initialized. Searching in: C:/Users/documents/text
Enter document ID (e.g., 'A02495') or 'exit' to quit.
In this example, we want to view document ID A02495. So, we enter that and hit return.
If the document is found, document_viewer responds:
Found document: C:/Users/documents/text/A02495.P4.txt
Document opened in new window. You can continue searching.
If no document is found, document_viewer responds:
Document ID: B12345
No document found with ID: B12345
After viewing documents, you can then quit document_viewer using the exit command:
Document ID: exit
After hitting return, document_viewer responds:
Exiting document viewer.
Tips for Effective Use
- Partial IDs work: You don’t need to enter the complete filename – just the unique ID portion is sufficient.
- Multiple windows: You can have multiple document windows open simultaneously to compare texts.
- Case insensitive: Document IDs are not case-sensitive, so “a02495” and “A02495” will find the same documents.
- Continue searching: You can search for additional documents while keeping previously opened document windows open.
Troubleshooting
- Document not found: Verify that you entered the correct document ID and that the document exists in the search directory.
- Found file but couldn’t read it: The file exists but may be corrupted or have permission issues. Check the file’s permissions and encoding.
- Window doesn’t appear: Make sure your system allows the program to open new windows. Some security settings might prevent this.
Exiting the Program
To exit the Document Viewer:
- Type ‘exit’ at the Document ID prompt
- Close any open document windows
Using with Index Search and Milvus Search
For best results, use this Document Viewer in conjunction with the Index Search tool and Milvus Search Tools:
- Use Historical Search or Milvus Search to find relevant documents and note their filenames
- Extract the document ID from the filename (e.g., extract “A02495″ from ” A02495.P4.txt”)
- Enter this document ID in the Document Viewer to examine the full text
This workflow allows you to quickly find relevant historical texts and then read them in their entirety.