← All Assignments
Merge Multiple CSV Files
Problem Statement
You have 3 monthly sales CSV files: `sales_jan.csv`, `sales_feb.csv`, `sales_mar.csv`. Each file has the same columns: salesperson, region, product, amount, sale_date Write a Python pipeline that: 1. Reads all 3 files and adds a `source_file` column to each 2. Merges them into one combined dataset 3. Removes duplicate rows (same salesperson + product + sale_date) 4. Calculates total sales per region 5. Saves the merged data to `sales_combined.csv` 6. Prints total rows before and after deduplication
Sample Data
sales_jan.csv: Alice,North,Laptop,50000,2024-01-10 sales_feb.csv: Bob,South,Mouse,1500,2024-02-05 sales_mar.csv: Alice,North,Laptop,50000,2024-01-10 ← duplicate of Jan row Use pandas for this task.
Expected Output
Before dedup: 3 rows After dedup: 2 rows Total sales by region: North=50000, South=1500 sales_combined.csv saved with 2 rows