It’s 2025 and you find yourself using Caddy as the home webserver.

One nice feature of Caddy is structured logging.

Caddy’s access logs include all request/response headers and fields as newline-delimited JSON. This is a bit mind-blowing coming from something like nginx or Apache’s Common Log Format from 3 decades ago.

Structured logs are great when loaded into ELK to automatically parse and index fields.

But what to do at home where there is no ELK, and I want to know simple things like:

  • What is the date/time for the log?
  • What countries/Cloudflare POPs are requests coming from?
  • What URLs are bots hitting most?
  • Are there unexpected errors?

How to parse the wall of JSON documents?

jq is a great for pretty printing or extracting a single field from a JSON document. But I find the syntax hard to understand and remember especially for extracting multiple fields.

We can do this with a few lines of Python. These particular batteries are included in reasonably-modern python3, no extra packages needed:

#!/usr/bin/env python3

import datetime
import json
import sys

for line in sys.stdin:
    data = json.loads(line)
    localTS = datetime.datetime.fromtimestamp(data['ts'])
    host = data['request']['host']
    uri = data['request']['uri']

    requestHeaders = data['request']['headers']
    cfRay = requestHeaders.get('Cf-Ray')
    cfIpCountry = requestHeaders.get('Cf-Ipcountry')

    status = data['status']

    responseHeaders = data['resp_headers']
    cacheControl = responseHeaders.get('Cache-Control')

    print(f'localTS={localTS} host={host} cfRay={cfRay} cfIpCountry={cfIpCountry} status={status} cacheControl={cacheControl} uri={uri}')

Above reads and parses all lines from stdin so works nicely in a shell pipeline. The output is easy to parse with grep/awk/sed, etc.

Below is an example of spotting a bad guy looking for wordpress sites:

$ tail -1000 access.log | parse_logs.py | grep 'status=404' | grep wp

localTS=2025-04-20 07:04:04.185519 host=aaronr.digital cfRay=['93348085ad18f826-MAN'] cfIpCountry=['IE'] status=404 cacheControl=['public, max-age=60'] uri=/wp_class_datalib.php
localTS=2025-04-20 07:04:04.701258 host=aaronr.digital cfRay=['933480878df7f826-MAN'] cfIpCountry=['IE'] status=404 cacheControl=['public, max-age=60'] uri=/wp_wrong_datlib.php