An ETL pipeline that extracts weather and air quality data from public APIs, transforms the data into a clean, analyzable format, and loads it into a PostgreSQL database.
View the Project on GitHub themispap/weather_air_quality_etl_pipline

This project demonstrates an ETL (Extract, Transform, Load) pipeline that integrates weather and air quality data for a specified city over a defined time period. The data is extracted from external APIs, transformed (cleaned and processed), and loaded into a PostgreSQL database. The goal of this project is to provide a clean, well-structured dataset for further analysis or reporting.
├── src
│ ├── extract.py # Code for extracting data from APIs
│ ├── transform.py # Code for cleaning and transforming the data
│ ├── load.py # Code for loading data into PostgreSQL
│ ├── config.py # Configuration for API keys, database credentials
│ └── utils.py # Utility functions for the ETL process
├── sql
│ └── schema.sql # SQL script for creating database schema
├── notebooks
│ └── analysis.ipynb # Jupyter notebook for data exploration and analysis
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── LICENSE # License details
└── run_etl.sh # Executable for ETL
requests: For interacting with APIs.pandas: For data manipulation and cleaning.SQLAlchemy: For database interaction.numpy: For numerical computations.dotenv: To handle environment variables (API keys, DB credentials).git clone https://github.com/your_username/weather_air_quality_etl_pipline.git
cd weather_air_quality_etl_pipline
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
.env file in the root directory with the following keys:
VISUALCROSSING_API_KEY = <your_visualcrossing_api_key>
AIRVISUAL_API_KEY = <your_airvisual_api_key>
DATABASE_URL=postgresql://<username>:<password>@<host>:<port>/<dbname>
psql -U <username> -d <dbname> -f sql/schema.sql
python src/extract.py
python src/transform.py
python src/load.py
Alternatively,
Make sure the run_etl.sh is executable by running
chmod +x run_etl.sh
Execute the script with
./run_etl.sh
jupyter notebook notebooks/analysis.ipynb
This project is licensed under the MIT License. See the LICENSE file for details.