How Are Tech Companies Utilizing AI To Gather Data and Analyze Risk?

Chris Hedenberg • September 3, 2020

A story of how AI can be deployed to benefit carriers, brokers, and policyholders.

We can’t believe spring is already here! We’ve been launching products, opening new offices, and pushing our technology to new heights.

In our first post, we covered the reasons why we set out to analyze litigation patterns and described how we used Natural Language Processing (NLP), an AI technique, to solve a sticky issue with the way companies are named. Click here to catch up on Part I.

Now, we’re going to look at how we handled another challenge in building our database of litigation, and consider what it means for insurance companies to be taking this kind of approach.

Oh and, of course, review whether this all worked!

What is a tech company, anyway?

After solving how to deal with variance in company names, we turned our attention to enriching the database with information to enable segmentation. Most importantly we wanted to classify both plaintiff and defendant companies by industry. Since this is Technology E&O, we didn’t want to make assumptions about litigation risk based on defendants from other industries, which may be more or less likely than technology companies to be sued.

Of course, industry class is readily available from a number of data sources. But we’ve found that tech companies, in particular, are often misclassified. A provider of EMR software for hospitals may be (not incorrectly) called a “healthcare” company or something ill-defined like “e-health”. We’d want to differentiate the EMR company from both a healthcare provider and from another tech company like Fitbit that has an entirely different business model. An additional wrinkle is that many of the ways to categorize companies are not in sync with the insurance industry’s NAICS code, which we use as the basis for our own categorizations.

Utilizing AI to Gather Data

We thought we could better classify companies using information from the people who know the companies best: themselves.

A typical company's website contains thousands of words describing products and services. So we again turned to NLP -- specifically, this time, the BERT technique used by Google to analyze its searches. NLP models like BERT can ingest a large volume of written text and interpret it the way a human being would, as long as it's been adequately trained. We fed the model websites of tech companies so that it could learn what kinds of language was present on them, then let it loose on our own database.

Having learned what tech companies typically have on their websites, our NLP model was able to re-classify thousands of companies into the preferred buckets, based on how they describe their own products and services.

The Payoff

By now, we had a rich database of legal actions going back many years with the correct plaintiffs and defendants identified and classified, as well as the lawyers and judges involved. You may be asking: after all that, did it actually deliver anything?

Thankfully, yes!

The database has provided the foundation for a number of uses, most notably to create scoring mechanisms that feed into our proprietary underwriting model. We can now score risk from both the “defendant” side (how risky is tech company “X”, regardless of who their customers are) and from the “plaintiff” side (how litigious is customer "Y," and does that impact X's risk). The model's scores align accurately to real-world results, as shown in the chart.

[GRAPH] Actual vs. Predicted Litigation Risk

With these scores, we can now add litigation risk to the many other rating factors involved in Tech E&O underwriting, for a more well-rounded view of risk. We’re continuing to refine the model and are excited to deploy it in more sophisticated ways. New inputs, like data on settlement amounts, will yield more detailed outputs. We’re also learning a lot about litigation patterns across different industries, company sizes, and other slices of the database, enabling more potential scoring and underwriting measures.

In fact, these scores were key to the development of automated underwriting for Tech E&O, which Corvus launched earlier this year to our broker partners. We’ve started automating quotes for lower-risk applications and are working to increase the complexity of risks we can automate.

What Does This Tell Us About the Future of Insurance?

We felt this story was worth telling not because its ultimate findings were earth-shattering, but because of what it represents: the future of insurance.

Traditional insurers have a few well-worn tools for determining pricing and underwriting rules. Primarily they have their own claims information, the filings of competitors, and market feedback (whether the price and coverage they quote is competitive). These tools have worked well enough for many lines of insurance for centuries. But in a world where nearly all business activity leaves a digital fingerprint that can be analyzed, the traditional model looks more and more outdated.

Projects like this one represent how insurers can go about expanding the amount of data that feeds into underwriting and proactively improving their risk assessments. They enable us to go broader in scope – by factoring in an entirely new set of risk data – and simultaneously get more granular, by enabling us to drill down to highly specific sub-sets of industry and company size.

Using an example from this project, we can say that a company that initiated at least one lawsuit in the last two years, is in the manufacturing sector, and has over 250 employees presents one of the highest risks of litigation for its IT or software vendors. Even with a highly developed claims database, a traditional underwriter would struggle to be so prescriptive.

The introduction of cyber perils into nearly all corners of the P/C risk environment further underscores that change is needed. Claims information is necessarily backward-looking and does not account, for instance, for rapid changes in the behavior of cybercriminals or the discovery of a major new vulnerability. Just look at the rise of ransomware and specifically Remote Desktop Protocol (RDP) as a key vector for attack.

[GRAPH] Ransomware Attack Vectors from Q4 2018 - Q1 2020

Source: Coveware

One could argue that badly secured RDP ports are the most critical factor in determining the risk of a ransomware attack. But ratings based on years of claims data used to rate factors like industry class or revenue size won’t tell us whether the insured has this particular risk – we need up-to-the-moment data. We need to know if the insured is doing a good job securing its RDP ports today.

Policyholders and Brokers Win

The most direct application of this kind of data is for insurers to make determinations about risk. But it's far from the only one.

With a little ingenuity, brokers and their policyholders can see improvements in speed and efficiency in the quoting process. We already mentioned that at Corvus the litigation database has enabled our team to automate underwriting for Tech E&O, resulting in quotes delivered in minutes. This is coupled with a faster application process since we are able to develop a risk profile from fewer initial pieces of information. Faster quote delivery, less repetitive data entry thanks to shorter forms, and deeper integration with APIs across platforms can all be unlocked when more forms of data are available.

In future iterations of this project, we plan to take the further step of informing policyholders about the litigation patterns that may affect them in their particular industry segment. This will include a Dynamic Loss Prevention scorecard that provides information about the most litigious customers in their areas and claim severity estimations based on analysis of settlement amounts. This will help organizations better manage risk and find the safest ways to succeed.

***

This is the future of insurance: bringing in new sources of data to broaden the scope of what’s considered for underwriting and ensuring that data is as recent as possible wherever needed. Then sharing that information with brokers and policyholders to make everyone safer from adverse events. Thanks to AI we can accomplish all of this at scale. It’s an exciting moment for insurance, and we’re eager to share more of our progress soon.

How We Use AI to Better Understand Tech E&O Risk (Part II of II)

What is a tech company, anyway?

Utilizing AI to Gather Data

The Payoff

What Does This Tell Us About the Future of Insurance?

Policyholders and Brokers Win

Recommended Blogs

Recent Articles

Best Practices for Managing Cyber Risks in Open-Source Software

Q4 Travelers' Cyber Threat Report: Ransomware Goes Full Scale

Fortinet Vulnerability | January 2025