10 NumPy Exercises to Analyze Data in Python

NumPy, the workhorse library for numerical computing in Python, is an essential tool for anyone working with data. Its efficient arrays and powerful functions make data manipulation and analysis a breeze. But how do you translate theory into practice? Here, we delve into 10 engaging NumPy exercises designed to solidify your understanding and equip you to tackle real-world data challenges.

Table of Contents

1. Array Creation and Exploration:

Task: Create a NumPy array containing the temperatures recorded for a week (7 days). Populate the array with random values between 10°C and 30°C. Subsequently, calculate the average, minimum, and maximum temperatures for the week.
Solution:

import numpy as np

# Generate random temperatures between 10 and 30 (inclusive)
temperatures = np.random.randint(10, 31, size=7)  # size defines array length

# Calculate descriptive statistics
average_temp = np.mean(temperatures)
min_temp = np.min(temperatures)
max_temp = np.max(temperatures)

print(f"Weekly Temperatures: {temperatures}")
print(f"Average Temperature: {average_temp:.2f}°C")
print(f"Minimum Temperature: {min_temp}°C")
print(f"Maximum Temperature: {max_temp}°C")

This exercise introduces you to creating NumPy arrays with np.random.randint and basic statistical operations using np.mean, np.min, and np.max.

2. Conditional Selection and Array Slicing:

Task: Given an array of exam scores, identify students who scored above 80% and display their corresponding scores.
Solution:

# Sample exam scores
scores = np.array([75, 88, 92, 67, 95, 82])

# Select elements where score is greater than 80%
high_scorers = scores[scores > 80]

print("Scores above 80%:", high_scorers)

This exercise demonstrates conditional selection with boolean indexing (scores > 80) and array slicing to extract specific elements.

3. Array Manipulation and Reshaping:

Task: Reshape a 1D array containing customer purchase data (e.g., item IDs) into a 2D array representing individual customer purchases (multiple items per customer).
Solution:

# Sample 1D purchase data (item IDs)
purchases = np.array([101, 202, 303, 101, 404, 202, 505])

# Reshape into a 2D array with 3 customers and variable purchase lengths
customer_purchases = purchases.reshape(3, -1)  # -1 infers columns based on remaining data

print("Customer Purchases:")
print(customer_purchases)

Here, we explore reshaping arrays using reshape to accommodate different data structures.

4. Element-wise Operations and Broadcasting:

Task: Calculate the total cost of items in a shopping cart represented by an array of prices. You have a separate NumPy array containing discount percentages for each item. Apply the discounts element-wise to the original prices.
Solution:

# Sample item prices
prices = np.array([10.99, 5.49, 12.75])

# Discount percentages (as decimals)
discounts = np.array([0.1, 0.05, 0.15])

# Apply discounts element-wise using broadcasting
discounted_prices = prices - (prices * discounts)

# Calculate total cost after discounts
total_cost = discounted_prices.sum()

print("Discounted Prices:", discounted_prices)
print(f"Total Cost: ${total_cost:.2f}")

This exercise showcases element-wise operations with - and * and leverages broadcasting to apply discounts seamlessly.

5. Array Concatenation and Stacking:

Task: Combine sales data for two different product categories (represented by separate NumPy arrays) into a single array for further analysis.
Solution:

# Sales data for categories A and B
category_a_sales = np.array([100, 120, 80])
category_b_sales = np.array([150, 90, 110])

# Concatenate arrays horizontally (column-wise)
combined_sales = np.concatenate((category_a_sales, category_b_sales), axis=1)

# Alternatively, stack arrays vertically (row-wise)
stacked_sales = np.vstack((category_a_sales, category_b_sales))

print("Combined Sales (horizontal):", combined_sales)
print("Stacked Sales (vertical):", stacked_sales)

This demonstrates both horizontal concatenation (np.concatenate) for combining columns and vertical stacking (np.vstack) for combining rows.

6. Random Number Generation with Distributions:

Task: Generate a sample population with ages following a normal distribution, simulating a real-world scenario. Calculate basic statistics (mean, standard deviation) of the generated population.
Solution:

# Sample size (number of individuals)
population_size = 1000

# Generate random ages following a normal distribution (mean 30, standard deviation 5)
ages = np.random.normal(loc=30, scale=5, size=population_size)

# Calculate mean and standard deviation
average_age = np.mean(ages)
std_dev_age = np.std(ages)

print(f"Sample Population Age Distribution (Normal, Mean: 30, Std. Dev: 5)")
print(f"Average Age: {average_age:.2f} years")
print(f"Standard Deviation: {std_dev_age:.2f} years")

This exercise explores generating random numbers with specific distributions (np.random.normal) for simulating real-world data.

7. Linear Algebra Operations with NumPy:

Task: Calculate the dot product of two vectors representing customer preferences and product features. Interpret the result in the context of recommendation systems.
Solution:

# Sample customer preferences vector
customer_prefs = np.array([4, 3, 2])

# Sample product features vector (e.g., ratings for different attributes)
product_features = np.array([1, 5, 2])

# Calculate dot product
dot_product = np.dot(customer_prefs, product_features)

print(f"Customer Preferences: {customer_prefs}")
print(f"Product Features: {product_features}")
print(f"Dot Product: {dot_product}")

# Interpretation: Higher dot product suggests a better match between customer preferences and product features.

This dives into linear algebra operations like dot product (np.dot) for analyzing relationships between vectors.

8. Boolean Indexing and Advanced Filtering:

Task: Identify customers within a specific age range (e.g., 25-35 years old) from a dataset containing customer age information.
Solution:

# Sample customer ages
customer_ages = np.array([22, 30, 18, 38, 27, 42])

# Filter customers between 25 and 35 years old (inclusive)
filtered_customers = customer_ages[(customer_ages >= 25) & (customer_ages <= 35)]

print("Customer Ages:", customer_ages)
print("Customers Aged 25-35:", filtered_customers)

This exercise showcases boolean indexing with conditions (>= and <=) for complex filtering operations.

9. Custom Array Functions with NumPy:

Task: Define a custom NumPy function to calculate the absolute difference between corresponding elements in two arrays.
Solution:

def absolute_difference(arr1, arr2):
  """
  Calculates the absolute difference between elements of two NumPy arrays.

  Args:
      arr1 (np.ndarray): First NumPy array.
      arr2 (np.ndarray): Second NumPy array.

  Returns:
      np.ndarray: Array containing the absolute difference between corresponding elements.
  """
  return np.abs(arr1 - arr2)

# Sample arrays
array1 = np.array([5, 10, 15])
array2 = np.array([3, 8, 12])

# Apply custom function
difference = absolute_difference(array1, array2)

print("Array 1:", array1)
print("Array 2:", array2)
print("Absolute Difference:", difference)

10. Data Loading and Saving with NumPy:

Task: Load a CSV file containing weather data (temperature, humidity) into a NumPy array and save the processed data (e.g., average temperature for each month) to a new CSV file.
Solution:

import csv

# Load weather data from CSV
weather_data = []
with open("weather_data.csv", newline="") as csvfile:
  reader = csv.reader(csvfile)
  next(reader)  # Skip header row
  for row in reader:
    weather_data.append([float(val) for val in row])

weather_data = np.array(weather_data)  # Convert list to NumPy array

# Process data (example: calculate average monthly temperature)
# ... (your data processing logic here)

# Save processed data to a new CSV file
processed_data = []
# ... (prepare processed data for saving)

with open("processed_weather.csv", "w", newline="") as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(["Month", "Average Temperature"])  # Write header row
  writer.writerows(processed_data)

print("Weather data loaded successfully!")
print("Processed data saved to processed_weather.csv")

This final exercise explores loading data from CSV files with csv and saving processed data back to CSV using NumPy arrays for efficient manipulation.

Conclusion

By diligently practicing these exercises, you’ll solidify your grasp of NumPy’s core functionalities and gain the confidence to tackle more complex data analysis tasks. Remember, consistency is key! Explore additional datasets, experiment with different NumPy functions, and continuously challenge yourself to unlock the full potential of NumPy for data manipulation and analysis in Python.

10 NumPy Exercises to Analyze Data in Python

1. Array Creation and Exploration:

2. Conditional Selection and Array Slicing:

3. Array Manipulation and Reshaping:

4. Element-wise Operations and Broadcasting:

5. Array Concatenation and Stacking:

6. Random Number Generation with Distributions:

7. Linear Algebra Operations with NumPy:

8. Boolean Indexing and Advanced Filtering:

9. Custom Array Functions with NumPy:

10. Data Loading and Saving with NumPy:

Conclusion

By Jay Patel

Related Post

Learn More

What is Model Complexity in Machine Learning?

How to Handle Outliers in Regression Analysis: Taming the Wild Data Points

Exploring Ridge and Lasso Regression in Python: Taming Complex Data

Advanced Regression Techniques in Python: Unlocking the Power of Data

We

Legal