Detecting Document Type

Ilia Zelenkin
May 2, 2021
2 min read

If we receive a file with a name like "1.pdf" or "document1.pdf", we need to open it to understand what it is. Then usually we proceed to rename it to something like this:

 "important-customer-document-that-I-will-forget-about-in-the-next-hour.pdf".

It is an amazingly inefficient and cumbersome method to control the information flow. Because usually, a colleague uses a different way to name the files, then multiplies the number of employees you have, then contractors, customers, etc. and it becomes impossible to control the naming.

In this article, we wanted to show how you can improve a bit the productivity associated with files and documents in a project. We will create a workflow to detect the file type and assign a corresponding label or a custom field value.

TL:DR; video

Detecting Document Type Process

In our example, we will be detecting document type for shipping documentation - several documents are required to send something internationally. We will try to detect the following types:

a bill of lading,
insurance,
certificate of origin,
commercial invoice,
dangerous goods certificate.

Additionally, because there are many shipping companies, they all use different templates, then in step 2 we'll a Bitskout template to process the same document types regardless of the format.

Step 1 - Configure Bitskout

First, let's prepare our plugin for Document Type Detection - we'll use an Understand type of plugin. Choose the "Understand Document Type" option.

Then you'll see the screen to load the examples. Bitskout works based on your examples. Thus, you'll need to upload a few examples and tell Bitskout what you want to detect. Let's start with the Certificate of Origin Example. Just load one file and type in Document Type - certificate of origin.

Then, let's load another example by pressing Add Example

The next example would be a dangerous goods certificate - same thing: upload the file and type the document type.

And let's load one example with the Bill of Lading:

Once we are done, rename the plugin to Detect Logistics Document type.

Now our plugin is ready to be used:

You can always update the list of documents by adding more examples via the Validation tab.

And also test the plugin (testing plugins inside Bitskout is not counted as a plugin run).

To run a test I'll load a document that was not in the example to show that Bitskout tries to understand what you need based on your examples.

As you can see Bitskout managed to detect the document type correctly.

Now your plugin is ready and you can use it in various scenarios. Below are a few examples:

Document Type Detection For Airtable

Document Type Detection for monday.com

Conclusion

Now, as we know the document type, we can now extract information from it using other templates.

Such functionality is very useful if you want to get back control of your files or documents. Obviously, this functionality has its limits and we don't recommend using that technique in trying to detect all possible documents you have. There already should be some filtering done before - in our case the client allowed only shipping documentation to be loaded via the form, hence, the workflow was quite efficient.

Feel free to contact us if you have a use case or have any questions.

Also Read: How to Extract Data from Invoices or Purchases Orders in 3 Clicks