"

1 User Guides

Milvus Search Tool: User Guide

Overview

The Milvus Search tool provides powerful vector-based search capabilities for your text collection. This tool allows you to perform semantic searches, exact text matching, and hybrid searches that combine both approaches. It’s particularly useful for researchers who need to find documents based on conceptual similarity as well as specific textual content.

Getting Started

The Milvus Search tool works with a vector database backend that needs to be initialized before you can start searching. The tool comes pre-configured with default settings for your collection.

Basic Commands

To get started, launch the program from the Unix command line as follows:

$ python milvus_search

Once launched, milvus_search responds with a menu of options.  The commands it responds to are described here:

Creating the Collection

create

This command initializes the Milvus collection and indexes all documents from the configured directory. You only need to run this once when setting up the tool for the first time, or if you want to rebuild the index.

Searching Documents

search <query>

Performs a hybrid search combining semantic and lexical (keyword) approaches. This finds documents that are conceptually related to your query even if they don’t contain the exact search terms.

Example:

search joseph in venice

Exact Text Matching

match <exact text>

Searches for documents containing the exact text you specify. This is useful when you need to find specific phrases or terms.

Example:

match jesus christ

Combined Hybrid Search

hybrid <query> match:<exact text>

Combines semantic search with exact text matching. This allows you to find documents that are conceptually related to your query but must also contain specific text.

Example:

hybrid joseph in venice with canals match:jesus christ

Term Indexing

index <term1,term2,...>

Creates a comprehensive index of all occurrences of the specified terms across your document collection. Results are saved to a text file.

Example:

index jesus,christ,holy

Adjusting Search Parameters

weights <dense_weight> <sparse_weight>

Adjusts the balance between semantic search (dense) and keyword search (sparse). Higher dense weight emphasizes conceptual similarity, while higher sparse weight emphasizes keyword matching.

Example:

weights 0.7 0.3

limit <number>

Sets the maximum number of search results to display. Currently, the maximum number we support is 5000.

Example:

limit 50

Exiting the Program

exit

Closes the Milvus Search tool.

Understanding Search Results

For each search result, you’ll see:

  • Result number: Position in the results list
  • Filename: Name of the document file
  • Original file: For chunked documents, shows the source file and chunk number
  • Score: Relevance score (higher is more relevant)
  • Occurrences: For text matching, shows the number of times the term appears
  • Preview: A snippet of text from the document
  • Text length: Total size of the document in characters

Advanced Features

Document Chunking

Large documents are automatically split into manageable chunks with overlap to maintain context. When searching, chunks from the same document are consolidated in the results.

Highlighting

When using text matching, the matched terms are highlighted in the preview with asterisks (**Term**).

Term Indexing

The index command creates a comprehensive report of all occurrences of specified terms:

  • Lists all documents containing each term
  • Counts total occurrences
  • Shows each occurrence in context
  • Saves results to a timestamped text file

Example Workflow

  1. Initialize the collection (first-time setup)
  2. create
  3. Perform a broad semantic search
  4. search religious practices in 17th century England
  5. Refine with exact text matching
  6. hybrid religious practices match:prayer book
  7. Adjust weights for better results
  8. weights 0.8 0.2
  9. Create an index of specific terms
  10. index prayer,worship,communion

Tips for Effective Searching

  1. Start with semantic search for broad topics and concepts, then refine with text matching.
  2. Adjust the weights based on your search needs:
    • Higher dense weight (0.7-0.9) for conceptual, topic-based searches
    • Higher sparse weight (0.7-0.9) for keyword-focused searches
    • Balanced weights (0.5/0.5) for general searching
  3. Use term indexing to create comprehensive lists of all occurrences of key terms in your corpus.
  4. Combine with other tools in the suite:
    • Use Milvus Search to find relevant documents
    • Use the Document Viewer to examine the full text of interesting results
    • Use Historical Search for specialized variant-aware searches
  5. Increase the limit when performing broader searches to see more potential matches.

Troubleshooting

  • No results found: Try broadening your search terms, using fewer terms, or adjusting the weights to favor dense vectors (semantic search).
  • Too many irrelevant results: Try using the hybrid command to combine semantic search with exact text matching or adjust weights to favor sparse vectors (keyword search).
  • Search is too slow: Reduce the search limit or use more specific search terms.
  • “Collection not found” error: Run the create command to initialize the collection.

Using with Other Tools

For a comprehensive research workflow:

  1. Use Milvus Search for semantic and hybrid searching to find conceptually related documents
  2. Use Historical Search when you need to account for historical spelling variants or when you require quick fewer word searches.
  3. Use the Document Viewer to examine full documents identified by either search method

This combined approach gives you powerful tools for analyzing historical documents from multiple perspectives.


Index Search Tool: User Guide

Overview

The Index Search tool allows you to search through a collection of historical texts with support for historical spelling variants. This makes it particularly useful for research involving early modern texts where spelling was not standardized.

Getting Started

The search tool comes with a pre-built index file that contains information about all documents in the collection. When you run the program, it will either load this existing index or build a new one if it doesn’t exist.

Launch the Program

To get started, launch the program from the Unix command line as follows:

$ python index_search

Basic Search

To perform a search:

  1. Run the program
  2. When prompted, enter your search term(s)
  3. Follow the additional prompts to refine your search
  4. Review the search results

Search Options

When performing a search, you’ll be prompted for several options:

  • Search Terms: Enter one or more words to search for. The system will automatically generate historical spelling variants. Currently, searching for terms without variants is not permitted.
  • Before Year: Enter a year to only see results published before that year (leave empty for no filter).
  • After Year: Enter a year to only see results published after that year (leave empty for no filter).
  • Results Per Page: Enter the number of results you want to see at once (default is 10).
  • AND/OR Operation: Specify whether all terms must be present (‘and’) or if any of the terms can be present (‘or’). The default is ‘or.

Understanding Search Results

For each matching document, you’ll see:

  • Filename: The name of the matching text file
  • Score: A numerical score indicating relevance (higher is more relevant)
  • Metadata: Information about the document including:
    • Title
    • Author
    • Date
    • Publisher
    • Place of publication
  • Variants: The specific word variants that were matched in the document with their occurrence count

Navigating Results

After viewing a page of results, you’ll be prompted with options:

  • Enter ‘n’ to see the next page of results
  • Enter ‘new’ to start a new search
  • Enter ‘exit’ to quit the program

Example Search

Here’s an example of how to perform a search:

Search (or 'exit' to quit): jesus christ

Before year (leave empty if no filter): 1650

After year (leave empty if no filter): 1500

Results per page (default 10): 3

and/or (default 'or'): and

The program then responds with a list of its variant spellings:

Variants for 'jesus': Jesus, Iesvs, IESUS, IESVS, Iesus, JESVS, jesvs, jesus, Jesvs, iesvs, JESUS, iesus

Variants for 'christ': CHRIST, christ, Christ

Found 327 total matches

… and the documents found:

A16017.P4.txt

Score: 46.203

Title: The newe testamente both Latine and Englyshe ech correspondent to the other after the vulgare texte, communely called S. Ieroms. Faythfully translated by Myles Couerdale. Anno. M.CCCCC.XXXVIII.

Author: Coverdale, Miles, 1488-1568.

Date: 1538

Publisher: Text Creation Partnership,

Place: Ann Arbor, MI

Variants: jesus: iesvs(244), iesus(734)

——————————————————————————–

A02928.P4.txt

Score: 32.171

Title: The vvay of lyfe A Christian, and catholique institution comprehending principal poincts of Christian religion, which are necessary to bee knowne of all men, to the atteyning of saluation. First delyuered, in the Danish language for the instruction of those people, by Doctor Nicolas Hemmingius, preacher of the Gospell, and professor of diuinitie, for the Kynge of Denmarcke, in his Uniuersitie of Hafnia: and about three yeares past, (for the commoditie of others) translated into Latine, by Andrew Seurinus Velleius: and now first, and newly Englished, for the commodity of English readers: by N. Denham, this yeare of our redemption. 1578.

Author: Hemmingsen, Niels, 1513-1600.

Date: 1578

Publisher: Text Creation Partnership,

Place: Ann Arbor, MI

Variants: jesus: iesvs(34), iesus(55)

——————————————————————————–

A12716.P4.txt

Score: 27.518

Title: A cloud of vvitnesses and they the holy genealogies of the sacred Scriptures. Confirming vnto vs the truth of the histories in Gods most holy word, and the humanitie of Christ Iesus. The second addition. By Io. Speed.

Author: Speed, John, 1552?-1629.

Date: 1620

Publisher: Text Creation Partnership,

Place: Ann Arbor, MI

Variants: jesus: iesvs(5), iesus(81)

——————————————————————————–

Historical Spelling Variants

The system automatically generates historical spelling variants for your search terms. For example:

  • For “jesus”: jesus, Iesus, IESVS, etc.
  • For words with ‘i’: variants with ‘y’ and ‘j’
  • For words with ‘v’: variants with ‘u’
  • For words with ‘f’: variants with ‘ff’

This feature is particularly useful when searching for early modern texts where spelling was not standardized.

Tips for Effective Searching

  1. Start broad, then narrow: Begin with a general search, then add date filters if needed.
  2. Check variants: Review the variants found to understand how your search terms appear in the documents.
  3. Try both ‘and’ and ‘or’: If you’re not finding what you need with ‘and’, try switching to ‘or’ to get more results.
  4. Use date filters wisely: The date filters work on publication year, so only documents with parseable dates will be filtered.
  5. Look at scores: Higher scores indicate better matches according to the search algorithm.

Troubleshooting

  • No matches found: Try broadening your search by using fewer terms, removing date filters, or using the ‘or’ operation instead of ‘and’.
  • Too many results: Try narrowing your search by adding more specific terms, using date filters, or switching from ‘or’ to ‘and’.
  • Loading the index should generally take around 90-130 seconds. Once the index has been loaded, searches should take less than 3 seconds.

Document Viewer Tool: User Guide

Overview

The Document Viewer tool allows you to quickly locate and view the content of historical documents by searching for their document IDs. This companion tool supplements the Index Search and Milvus Search programs to help you examine the full text of documents found by these two search programs.

Getting Started

The Document Viewer comes with a pre-configured default directory where it will search for documents.

Basic Usage

To view a document:

  1. Run the program.
  2. When prompted, enter the document ID you want to view
  3. The document will open in a new window if found
  4. You can continue searching for additional documents without closing the program

See the Example Usage section below for a worked example of how to use this program.

Search Options

When looking for documents, the system will:

  • Search for document IDs in the specified directory and all its subdirectories
  • Match document IDs regardless of case (e.g., “A02495” will find files with “a02495”)
  • Look for files with ‘.txt’ and ‘.P4.txt’, extensions

Using the Document Viewer Window

When a document is found, it will open in a new window with these features:

  • The window title shows the document filename
  • Line numbers are displayed for easy reference
  • You can scroll through the document using the scrollbar
  • Click the “Close” button to close the document window

The main console program remains open, allowing you to search for additional documents without restarting the tool.

Command Line Options

Type in the file name ID to return the file. Enter ‘exit’ (regardless of casing) to exit the program.

$ python document_viewer

Example Usage

Here’s an example:

$ python document_viewer

After running this command, the program responds:

Document viewer initialized. Searching in: C:/Users/documents/text

Enter document ID (e.g., 'A02495') or 'exit' to quit.

In this example, we want to view document ID A02495.  So, we enter that and hit return.

If the document is found, document_viewer responds:

Found document: C:/Users/documents/text/A02495.P4.txt

Document opened in new window. You can continue searching.

If no document is found, document_viewer responds:

Document ID: B12345

No document found with ID: B12345

After viewing documents, you can then quit document_viewer using the exit command:

Document ID: exit

After hitting return, document_viewer responds:

Exiting document viewer.

Tips for Effective Use

  1. Partial IDs work: You don’t need to enter the complete filename – just the unique ID portion is sufficient.
  2. Multiple windows: You can have multiple document windows open simultaneously to compare texts.
  3. Case insensitive: Document IDs are not case-sensitive, so “a02495” and “A02495” will find the same documents.
  4. Continue searching: You can search for additional documents while keeping previously opened document windows open.

Troubleshooting

  • Document not found: Verify that you entered the correct document ID and that the document exists in the search directory.
  • Found file but couldn’t read it: The file exists but may be corrupted or have permission issues. Check the file’s permissions and encoding.
  • Window doesn’t appear: Make sure your system allows the program to open new windows. Some security settings might prevent this.

Exiting the Program

To exit the Document Viewer:

  1. Type ‘exit’ at the Document ID prompt
  2. Close any open document windows

Using with Index Search and Milvus Search

For best results, use this Document Viewer in conjunction with the Index Search tool and Milvus Search Tools:

  1. Use Historical Search or Milvus Search to find relevant documents and note their filenames
  2. Extract the document ID from the filename (e.g., extract “A02495″ from ” A02495.P4.txt”)
  3. Enter this document ID in the Document Viewer to examine the full text

This workflow allows you to quickly find relevant historical texts and then read them in their entirety.

 

License

The Alex Research System Copyright © by Daniel Maxwell. All Rights Reserved.