• Features
  • Examples
  • Pricing
  • About
  • Blog
  • Sign In
  • …  
    • Features
    • Examples
    • Pricing
    • About
    • Blog
    • Sign In
    Start Free Trial
    • Features
    • Examples
    • Pricing
    • About
    • Blog
    • Sign In
    • …  
      • Features
      • Examples
      • Pricing
      • About
      • Blog
      • Sign In
      Start Free Trial

      Detecting Document Type

      Using A.I. to detect the document type

      If we receive a file with a name like "1.pdf" or "document1.pdf", we need to open it to understand what it is. Then usually we proceed to rename it to something like this "a big document from a customer that I will forget about in next hour.pdf". 

      It is an amazingly inefficient and cumbersome method to control the information flow. Because usually, a colleague uses a different way to name the files, then multiply on a number of employees you have, then contractors, customers, etc. and it becomes impossible to control the naming.

      In this article, we wanted to show how you can improve a bit the productivity associated with files and documents in a project. We will create a workflow to detect the file type and assign a corresponding label or a custom field value.

      In our example, we will be detecting shipping documentation - several documents are required to send something internationally. We will try to detect the following types: a bill of lading, insurance, certificate of origin, commercial invoice, and dangerous goods certificate.

      Additionally, because there are many shipping companies, they all use different templates, then as step 2 we will detect the company name, and then knowing the company type, we will extract data from the document using the corresponding template workflow. Our logic is the following:

      1. Detect Document Type (Bitskout workflow - Shipping Documents Detector)
      2. Detect Vendor - we will use a bill of lading document to detect vendor in the example (Bitskout workflow - Bill of Lading Vendor)
      3. Extract Data from the bill of lading depending on the vendor.

      First, lets prepare our model for Document Type Detection - we will use Data Extraction for that. Please type the model name and add some description. Then select the data extraction model type.

      A view of Bitskout screenshot where we create an A.I. model for data extraction to detect our shipping document type
      A view of Bitskout screenshot where we create an A.I. model for data extraction to detect our shipping document type

      The next step is to choose the type of data extraction. We will use standard data extraction which will allow us to select the region of the file. 

      screenshot showing the selection of various data extraction technique that are supported by Bitskout no-code A.I. models

      Then we need to load a template. And here we will do a trick - because all documents are of similar structure (A4 pages), they all have a title in the top part of the document. Hence, what we will do is capture all the text in the top part of the document and then find the keyword that tells us about the document type and use it as a detector for the document type.

      Let's load a sample.

      A view of Bitskout screenshot where we create an A.I. model for data extraction and select areas to get data from to detect our shipping document type

      After the sample has been loaded, let's select not too big and not too small region and name it Document Type (the string type). After the sample has been loaded, let's select not too big and not too small region and name it Document Type (the string type). Then you will need to move the selection to reach the top of the page (the region name will disappear). We need to grab the whole top part of the page to be sure that we've captured all text.

      A view of Bitskout screenshot where we create an A.I. model and select areas we want to extract data from for data extraction to detect our shipping document type

      You can leave the rest of the options as it is. Press Apply to save the model.

      The next step is to create a Label Output. Click on Outputs in the left main menu and choose Label Mappings: 

      a screenshot of label mapping feature where a user can set up the mapping of A.I. model output with a label or dropdown list.

      Label mappings allow you to map A.I. model output to labels or dropdown lists. Let's add a new mapping by press Add. As per instruction, first we select the A.I. model from the list and then a project management service and project/board where we want to map the output. Next step is the actual configuration. 

      As you can see we've added keywords to look in the scanned text that would allow us detect the document type. Press Apply and the output is saved. Now we need to create a workflow:

      a screenshot of Bitskout Workflow configuration to use no-code A.I. model to detect documents

      Once the workflow is saved, let's try and use it. If you've configured the output for monday.com, then you will need to use the recipe:

      a screenshot of monday.com recipe with Bitskout no-code A.I. workflow

      And once you run the recipe, the Bitskout workflow will set the labels automatically. 

      Now, as we know the document type, we can now extract information from it. But before we do it, we need to understand which vendor document is this. Continued in part 2.

      Such functionality is very usefull if you want to get back control of your files or documents. Obviously, this functionality has its limits and we dont recommend to use that technique in trying to detect all possible documents you have. There already should be some filtering done before - in our case the client allowed only shipping documentation to be loaded via the form, hence, the workflow was quite efficient. 

      Feel free to contact us if you have a use case or have any questions. 

      Cheers,

      Ilia

      Subscribe
      Previous
      Comparing Documents and Labels
      Next
      Customizing Keywords From Text
       Return to site
      Cancel
      All Posts
      ×

      Almost done…

      We just sent you an email. Please click the link in the email to confirm your subscription!

      OK

        Home
        Contact