JSON and Case Law#

How can we extract data from APIs (machine-readable online data sources)?

In this lesson, we look at how we can use data from the web. We will use real-world data from Harvard’s Caselaw Access Project (“CAP”). CAP aims to make all published US courts decisions freely available in a standard, machine-readable format. CAP and the data format is documented here.

JSON#

JSON is a machine-readable data format. Machine-readable data makes it easy to read and process the information with a computer. JSON data is usually tree structured, with multiple levels containing information. This is kind of like a directory tree containing files.

Fetching Data#

To fetch data from the web, we can use a library called requests that makes this task quite easy. Since we are are fetching data in the JSON format, we will also import a library to decode JSON data. Libraries are collections of code written by others that we can utilize instead of writing everything from scratch ourselves.

import requests
import json

We need to specify the URL to the data we want to fetch.

URL = "https://api.case.law/v1/cases/"

We include some parameters that specifies which cases we want to load:

parameters = {'jurisdiction': 'ill',
              'full_case': 'true',
              'decision_date_min': '2011-01-01',
              'page_size': 3}
  • jurisdiction is Illinois in this example

  • full_case include the full text of each case

  • decision_date_min is the minimum date, we only want decisions later than this date

  • page_size is the number of items

More parameters are listed in the CAP documentation linked above.

Now, let’s fetch the data.

data = requests.get(URL, params=parameters).json()
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/requests/models.py:974, in Response.json(self, **kwargs)
    973 try:
--> 974     return complexjson.loads(self.text, **kwargs)
    975 except JSONDecodeError as e:
    976     # Catch JSON-related errors and raise as requests.JSONDecodeError
    977     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[4], line 1
----> 1 data = requests.get(URL, params=parameters).json()

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/requests/models.py:978, in Response.json(self, **kwargs)
    974     return complexjson.loads(self.text, **kwargs)
    975 except JSONDecodeError as e:
    976     # Catch JSON-related errors and raise as requests.JSONDecodeError
    977     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 978     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This step both fetches and decodes the json data in one line. We can also do this step-by-step, to see how the process works. If you don’t want to get into the details at this point, you can skip ahead to the section “Using the data”. The server response also contains metadata, but we want the content:

response = requests.get(URL, params=parameters)
content = response.content

We can look at the first 100 characters from the raw data. We can see the same data if we open the URL in a web browser: https://api.case.law/v1/cases/?jurisdiction=ill&full_case=true&decision_date_min=2011-01-01&page_size=3

print(content[:100])

To use the data, we must decode them. We must specify the character set, which is often UTF-8. Then we decode the json format into a Python dictionary.

text = content.decode("utf-8")
data = json.loads(text)

Using the Data#

We can print the data, however this is a lot of text:

print(data)

Instead, we can print only the keys using list():

keys = list(data)
print(keys)

The field count contains the number of hits in the database. This is usually different from the number of items we requested. If the count is zero, we don’t have any results and need to check the query in the URL.

print(data["count"])

That looks good. Let’s fetch the list of cases:

cases = data["results"]

Now we can inspect each case. Let’s loop over the cases and get some of the information. The data contains various metadata about each case, such as the case name and the abbreviated case name.

It’s often useful to look at the data in a web browser to get an overview.

for case in cases:
    print("Case name:", case["name_abbreviation"])

Exercise: Docket Number #

Complete the code below to print the docket number and decision date of each case. You will need to browse the data.

URL = "https://api.case.law/v1/cases/?jurisdiction=ill&full_case=true&decision_date_min=2011-01-01&page_size=3"
data = requests.get(URL).json()

cases = data["results"]
for case in cases:
    print("Case name:", case["name_abbreviation"])
    #your code here

Following the Path#

As mentioned, JSON data is a tree structure. It can contain many nested levels. In that case, we need to follow the path to find the entry we’re looking for. It’s usually advisable to follow the path one step at a time. This makes it easier to find errors in our programs.

cases = data["results"]
for case in cases:
    print("Case name:", case["name_abbreviation"])

    # Step-by-step:
    court = case["court"]
    court_name = court["name"]
    print("Court name:", court_name)

    # We start a new path from the root:
    casebody = case["casebody"]
    case_data = casebody["data"] #we have already used the variable name data
    attorneys = case_data["attorneys"]
    print("Attorneys:", attorneys)

    # Extra linebreak:
    print()

Exercise: Attorneys and Head Matter #

Complete the code below to print the attorneys and head matter of each case. You will need to browse the data.

URL = "https://api.case.law/v1/cases/?jurisdiction=ill&full_case=true&decision_date_min=2011-01-01&page_size=3"
data = requests.get(URL).json()

cases = data["results"]
for case in cases:
    print("Case name:", case["name_abbreviation"])
    # Your solution

Working with Lists#

Each case in the data set contains a list of one or more opinions. These lists are located quite deep in the data structure, in case["casebody"]["data"]["opinions"]. These levels are somewhat like directories in a file tree, and can be seen when browsing the web interface.

We can look at some of the data for each opinion:

for case in cases:
    print("Case name:", case["name_abbreviation"])
    court = case["court"]
    print("Court name:", court["name"])
    casebody = case["casebody"]
    case_data = casebody["data"]
    opinions = case_data["opinions"]
    
    for opinion in opinions:
        print("  Opinion author:", opinion["author"])
    print()

Key Points#

  • The requests library can be used to fetch data from the web.

  • Many data providers provide an API from which we can fetch data programatically.

  • Parameters can be used to control what data we get from an API.

  • Most APIs provide data in the JSON format, and JSON is well supported in Python.

  • Additional filtering and processing of the retrieved data can be done using loops and conditions (if-statements, next chapter)