You MUST have a Contract Database if you want to use AI

Transcript

Hello, it’s me again, Josh from Nomio, and I love building and maintaining contract repositories so much that I have spent the last seven years of my life doing pretty much nothing but that, which is something that I believe no one else in the world can say.

So, in the previous videos we talked about what happens if you try to try and stick AI on top of your contracts.

And the two big problems were that it’s really expensive and it’s really slow. And the reason for this is if you consider your entire contractual estate as the big circle, whenever you ask a question to Claude sitting on top of your contracts.

It’s having to look through the entire estate to answer a question that probably only concerns a small slice of contracts or maybe even a small slice of each contract, i.e. the red dot. And we’ve looked at a couple of ways that we can go about fixing this. And what they really involve is trying to trim down how much of that contractual estate we look at when we ask these questions.

So instead of having to look through this big white circle, can we instead zoom in on the small red circle?

And then, the other side of it is when we’ve answered a question, why should we have to do all of that work again every time we have the same question?

We end up with something like this, where the cost compounds because we want to modify the question slightly, or the next question depends on the answers of the previous question, and we have to go and do all of that work again.

This makes little sense. So, we’re going to going to condense everything that we’ve gone through to come up with the ideal solution.

For what you stick Claude on top of if you want to have a really good contract repository. Also, remember that, when we are just sticking AI on top of our files, it’s extraordinarily hard to have any confidence that the right people are going to see the right stuff.

So, let’s just recap. You’ve not got contracts, you’ve got documents. So here we’ve got 500 documents. The first one thing that we have to do is we have to filter out the trash.

So, we keep all the stuff that’s genuine contractual documents, we get rid of the duplicates, we get rid of what’s irrelevant and junk.

You need some process that’s going to do that. That doesn’t just happen automatically. And then, we end up with our day space.

Which, first, as its core, has built up that document structure. So, we group all of our individual documents into the contracts that they form.

And these can come in all different kinds of arrangements, as you can see here. Once we’ve got those contracts, we can start doing some accurate data extraction.

And we’ll pull all of that into a table. A bit like the spreadsheet that we talked about earlier. And this table is really useful because it removes any ambiguity when we’re asking questions.

We have pre-structured the data in advance of asking the question. If you remember back to a previous video, one of the big, problems was that we’re leaving it to chance every time we ask a question, because we’re relying on ourselves to specify the structure of the output every time we ask a question. Instead, we should be doing the work once to define the kind of structure that we care about. Querying a spreadsheet is so much cheaper and faster than having to interrogate the text and, reform that information every single time.

But we know that a spreadsheet has problems. For example, spreadsheets are divorced from the documents that they summarise, which is why the kind of database that you want is one that links every cell here to exactly where in the document contains that piece of information.

And then we also know that a spreadsheet only contains the information that you have populated it with in advance, which means that when you have a question that whose answer isn’t contained in the spreadsheet, you need a fallback mechanism that can go and intelligently search through the text itself. And that’s what we’ve got down here.

Something similar to the system that we looked at earlier. Where we’ve broken each contract up into its themes and then we can attack those themes when we have a question that can’t be answered with the spreadsheet and we actually have to look at the text of the contract itself. And then all of this passes through secure filter that makes, and that happens before Claude touches anything, that says which person is trying to query this information.

I need to filter all of the results so that they’re only seeing the stuff that they’re meant to see. You cannot rely on Claude to do this.

This is how big enterprise security leaks have already happened. Trying to depend on an LLM, a probabilistic model, to manage your security.

It just doesn’t work. So, the real insight here is that we cannot directly stick Claude on top of our contracts.

We need this intermediate layer. The intermediate layer is what makes everything cheap, fast, reliable, verifiable, secure, and, most importantly, it completely eliminates the dependence that you have on people within your organisation to actually keep the thing up to date and operate it.

It translates your repository from being something that represents a huge key person risk, to something that is an asset, an institutional asset, that outlasts any individual who worked on it.

Now, if you accept that this intermediate layer is a good thing, and we should have this sitting between our pile of documents and Claude or whoever is going to be querying the database, the next question is, who should be building this thing?

Should you go and get it from a vendor, or should you do it DIY, should you build it yourself? And we’re going to cover that in the next video.

In the meantime, thank you very much for watching, have a nice day.

Transcript

More from the blog

The 9 biggest challenges facing in-house lawyers in 2025

Why structured databases still matter in the AI era

You don’t have Contracts, you have Documents

Stop managing your contracts manually.