Open Data Datasets
Free Parquet datasets from GARUDA. Download historical BMKG climate data and start analyzing.
BMKG Climate Observations
Historical temperature, humidity, and precipitation data from 100+ BMKG stations across Indonesia.
bmkg_climate_2020_2024.parquet
~2.3 GB | 50M+ rows
Schema
- •
station_id: String - •
timestamp: Timestamp - •
temperature_c: Float64 - •
humidity_pct: Float64 - •
precipitation_mm: Float64 - •
province: String
Saka Calendar Reference
Complete Saka Calendar mapping for 2020–2030. Use for temporal enrichment in queries.
saka_calendar_2020_2030.parquet
~5 MB | 4,000 rows
Schema
- •
gregorian_date: Date - •
saka_sasih: String - •
saka_pawukon: String - •
saka_eka: Int32
Usage Examples
Python with Polars
import polars as pl
# Load climate data
df = pl.read_parquet('bmkg_climate_2020_2024.parquet')
# Query average temperature by province
avg_temp = df.group_by('province').agg(
pl.col('temperature_c').mean()
).sort('temperature_c', descending=True)
print(avg_temp) Rust with DataFusion
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
// Register Parquet file
ctx.register_parquet(
"climate",
"bmkg_climate_2020_2024.parquet",
Default::default(),
).await?;
// Query
let df = ctx.sql(
"SELECT province, AVG(temperature_c) as avg_temp
FROM climate
GROUP BY province
ORDER BY avg_temp DESC"
).await?;
df.show().await?;
Ok(())
} SQL with DuckDB
-- Load and query with DuckDB
SELECT
province,
AVG(temperature_c) as avg_temp,
COUNT(*) as observation_count
FROM read_parquet('bmkg_climate_2020_2024.parquet')
GROUP BY province
ORDER BY avg_temp DESC; License & Attribution
All open datasets are provided under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Attribution: Data sourced from BMKG (Badan Meteorologi, Klimatologi, dan Geofisika) and processed by TeknoRakit.
For commercial use, contact us for licensing options.