Kotlin script to download NYC Yellow Taxi Data

attempting to learn kotlin by using in place of Python

Created: by Pradeep Gowda Updated: Nov 04, 2023 Tagged: kotlin

I am trying to do two things:

  1. Use more Java (ecosystem)
  2. Use less Python (get out of the goldilocks zone)

So, the natural answer is to use Kotlin ;)

I wrote a “throwaway” script to download NYC Yellow Taxi data from here

import java.net.URL
import java.nio.file.Files
import java.nio.file.Paths

fun main(args: Array<String>) {

    for(year in 2009..2022) {
        for (month in 1 .. 12) {
            var uri = "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_${year}-${month.toString().padStart(2, '0')}.parquet"
            var fileName = uri.split("/").last()
            if (year == 2022 && month > 6) { //they only have data upto 2022-06
                continue
            }
            println("${uri} -> ${fileName}")
            var url = URL(uri)
            // yes, this does not handle exceptions
            // it's a script, YOLO
            url.openStream().use { Files.copy(it, Paths.get(fileName)) }
        }
    }

}

Some observations about this code:

  • didn’t have to use a third party library like requests in Python, like I usually reach for.
  • This isn’t that much longer than an equivalent Python script, except for curly braces
  • String interpolation - ${uri} is a must have. Don’t know why Python didn’t have f-strings for so long.
  • no semicolons is a nice touch
  • an integrated IDE support out of the box…? chef’s kiss.

Followup - I plan to:

  1. Take DuckDB for a spin using these parquet data files.
  2. Play with Tantivy and “search indexes” and see if Tantivy et al can be a replacement for Solr for certain use cases.