In this blog I’m going to talk you through a proof of concept we created here at Silversands, exploring the AI capabilities of Azure Search across multiple document types.
What is Azure Search?
Azure Search is not a search of Azure as the name might suggest. It is in-fact a managed search service by Microsoft that allows you to index data from one or more sources, and use ‘document cracking’ to extract further information using Cognitive Services. Ultimately you would typically integrate the search service into a website or bot for user consumption.
Step 1 – Select the Data Source
The first task was to gather some test data. This could be an existing database or just a collection of unstructured data in various formats. With this in mind, there are two indexing methods to consider.
- Pulling Data- Automatically crawls and uploads data into the index from supported Azure data sources, such as Azure SQL, Cosmos DB, Azure Blob Storage.
- Pushing Data – Programmatically send documents to Azure Search ether individually or in batches, regardless of where the data might be.
In this proof of concept, we wanted to highlight how Azure search could index through various document types, including images, PDF’s, html and text all stored in blob storage. The import data wizard within the Azure portal provided a simple way to get started using the pull method, so we used that and selected Azure Blob Storage from the list.