AI HANDOUT FOR INTERN STUDENT

Introduction to Artificial Intelligence (AI)

Artificial Intelligence (AI) is the field of computer science that focuses on creating systems that can perform tasks that typically require human intelligence. These tasks include problem-solving, learning, decision-making, and understanding natural language. AI is used in various industries to improve efficiency and automate complex processes.

Applications of AI

AI is widely applied in different sectors, including:

1. Healthcare – AI assists in diagnosing diseases, predicting patient outcomes, and automating administrative tasks.

2. Finance– Used for fraud detection, risk assessment, and automated trading.

3. Education – AI-powered chatbots and adaptive learning help personalize education for students.

4. Transportation – Self-driving cars and traffic management systems use AI for navigation and safety.

5. Customer Service – AI chatbots handle customer inquiries efficiently.

6. Entertainment – AI recommends music, movies, and games based on user preferences.

Setting Up Python for AI Projects

To work on AI projects, we need to set up a Python environment. Below are three popular tools:

1. Installing Anaconda

Anaconda is a distribution of Python that includes essential libraries for data science and AI.

- Download Anaconda from [anaconda.com](https://www.anaconda.com/).

- Install it by following the on-screen instructions.

- Open Anaconda Navigator and launch Jupyter Notebook to start coding.

2. Using Jupyter Notebook

Jupyter Notebook is an interactive coding environment commonly used for AI and machine learning.

- It allows you to write and execute Python code in cells.

- You can install additional libraries using commands like:

python

!pip install numpy pandas

3. Google Colab

Google Colab is a cloud-based platform that allows you to run Python code without installing anything on your computer.

- Visit [colab.research.google.com](https://colab.research.google.com/) and sign in with your Google account.

- You can create a new notebook and start coding immediately.

Python Basics for AI

To build AI applications, understanding basic Python concepts is important.

1. Variables and Data Types

Variables store data values, and Python has different data types such as integers, floats, strings, and booleans.

python

name = "AI Learning"

age = 25

is_smart = True

2. Loops

Loops help in executing repetitive tasks.

python

for i in range(5):

print("AI is powerful!")

3. Functions

Functions are used to organize code into reusable blocks.

python

def greet(name):

return f"Hello, {name}!"

print(greet("AI Student"))

Introduction to NumPy and Pandas for Data Handling

AI projects involve handling large amounts of data. NumPy and Pandas are Python libraries designed for efficient data processing.

1. NumPy – For numerical computing

python

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr * 2) # Multiply each element by 2

2. Pandas – For data analysis and manipulation

python

import pandas as pd

data = {"Name": ["Alice", "Bob"], "Age": [25, 30]}

df = pd.DataFrame(data)

print(df)

These tools help process and analyze data, which is essential for training AI models.

Assignment:

Write a Python program to analyze simple data (e.g., sales data).

Create a NumPy array and perform basic operations.

Week 2: Machine Learning Basics & Data Preprocessing

In this week, we will explore the fundamentals of Machine Learning (ML) and learn how to prepare data for building ML models.

1. Introduction to Machine Learning (ML)

Machine Learning is a subset of Artificial Intelligence that allows computers to learn from data and make predictions or decisions without being explicitly programmed. It is widely used in various applications, such as:

- Fraud detection in banking

- Recommendation systems (Netflix, YouTube)

- Self-driving cars

- Medical diagnosis

Types of Machine Learning

There are three main types of Machine Learning:

1. Supervised Learning

- The model learns from labeled data (input-output pairs).

- Example: Predicting house prices based on size, location, and number of rooms.

- Algorithms: Linear Regression, Decision Trees, Neural Networks.

2. Unsupervised Learning

- The model finds patterns in data without labels.

- Example: Customer segmentation in marketing.

- Algorithms: K-Means Clustering, PCA (Principal Component Analysis).

3. Reinforcement Learning

- The model learns by interacting with an environment and receiving rewards.

- Example: Training a robot to walk or play chess.

- Algorithms: Q-Learning, Deep Q Networks (DQN).

Understanding Datasets (CSV, JSON formats)

Before training a machine learning model, we need to understand how data is stored.

1. CSV (Comma-Separated Values)

A CSV file is a simple text file where data is stored in rows and columns.

Example:

Name, Age, Score

Alice, 25, 90

Bob, 30, 85

Reading CSV files in Python using Pandas:

python

import pandas as pd

df = pd.read_csv("data.csv")

print(df.head()) # Display the first 5 rows

2. JSON (JavaScript Object Notation)

JSON stores data in a structured format, often used in web applications.

Example:

json

{

"students": [

{"name": "Alice", "age": 25, "score": 90},

{"name": "Bob", "age": 30, "score": 85}

]

}

Reading JSON files in Python:

python

df = pd.read_json("data.json")

print(df)

4. Data Cleaning using Pandas

Raw data often contains errors, missing values, or duplicates. Data cleaning is a crucial step in ML.

1. Handling Missing Values

python

df.fillna(0, inplace=True) # Replace missing values with 0

df.dropna(inplace=True) # Remove rows with missing values

2. Removing Duplicates

python

df.drop_duplicates(inplace=True)

3. Converting Data Types

python

df["Age"] = df["Age"].astype(int) # Convert age to integer

Data Visualization with Matplotlib & Seaborn

Data visualization helps us understand patterns in data.

1. Matplotlib for Basic Charts

python

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [10, 20, 30, 40, 50]

plt.plot(x, y, marker='o')

plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.title("Simple Line Graph")

plt.show()

2. Seaborn for Advanced Visualization

python

import seaborn as sns

sns.histplot(df["Age"], bins=5)

plt.show()

This lesson covered the basics of Machine Learning, dataset formats, data cleaning, and visualization.

Assignment:

Download a dataset (e.g., Titanic Dataset) and clean it using Pandas.

Create basic charts to visualize the data.

Week 3: Supervised Learning - Regression

This week, we will explore *Regression, a fundamental technique in Supervised Learning used for predicting continuous values.

1. Introduction to Regression Models

Regression models help predict a numerical outcome based on input features. Common applications include:

- House price prediction (based on location, size, etc.)

- Stock price forecasting

- Sales prediction

Types of Regression Models

1. Linear Regression– Predicts a straight-line relationship between input (X) and output (Y).

2. Multiple Regression – Uses multiple features to make predictions.

3. Polynomial Regression – Fits a curve instead of a straight line.

2. Linear Regression using Scikit-Learn

Step 1: Import Libraries

python

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

Step 2: Load Dataset

python

data = pd.read_csv("house_prices.csv")

print(data.head()) # Display first 5 rows

Step 3: Preprocess Data

python

X = data[["Size"]] # Feature (Independent variable)

y = data["Price"] # Target (Dependent variable)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Train Model

python

model = LinearRegression()

model.fit(X_train, y_train)

Step 5: Make Predictions

python

y_pred = model.predict(X_test)

Evaluating Regression Models

To measure how well our model performs, we use the following metrics:

1. R² Score (Coefficient of Determination)

python

from sklearn.metrics import r2_score

print("R² Score:", r2_score(y_test, y_pred))

- R² close to 1 = Good model

- R² close to 0 = Poor model

2. Mean Squared Error (MSE)

python

from sklearn.metrics import mean_squared_error

print("MSE:", mean_squared_error(y_test, y_pred))

- Lower MSE = Better predictions

Project: House Price Prediction using Linear Regression

Objective:

Build a model that predicts house prices based on size.

Steps:

1. Load and clean data

2. Train a *Linear Regression* model

3. Evaluate performance using *R² Score & MSE*

4. Predict prices for new house sizes

Plotting Results

python

plt.scatter(X_test, y_test, color='red', label="Actual Prices")

plt.plot(X_test, y_pred, color='blue', label="Predicted Prices")

plt.xlabel("House Size (sq ft)")

plt.ylabel("Price ($)")

plt.legend()

plt.show()

This lesson covered Linear Regression, model training, and evaluation.

Assignment

Train a Linear Regression model on house price data.

Evaluate model accuracy and improve it

Week 4: Supervised Learning - Classification

This week, we will explore Classification, a key technique in Supervised Learning* used for predicting categories.

What is Classification?

Classification is a machine learning task where the goal is to categorize data into predefined groups. Examples include:

- Spam Detection:Classifying emails as spam or not spam.

- Medical Diagnosis: Identifying diseases based on symptoms.

- Sentiment Analysis: Classifying text as positive, neutral, or negative.

Common Classification Algorithms:

1. Logistic Regression – Used for binary classification (e.g., spam vs. non-spam).

2. Decision Trees – Uses tree structures to make decisions.

3. Random Forest – An ensemble of multiple decision trees for better accuracy.

2. Logistic Regression, Decision Trees, and Random Forest

Logistic Regression

- Used when the target variable has two classes (e.g., spam vs. not spam).

- Uses the *sigmoid function* to predict probabilities.

Decision Trees

- Splits data based on feature conditions.

- Easy to interpret but may overfit the data.

Random Forest

- Uses multiple decision trees and averages their predictions.

- More accurate and less prone to overfitting than a single decision tree.

Implementing a Spam Email Classifier

We will use Scikit-Learn to build a spam classifier using the Naïve Bayes algorithm, a common choice for text classification.

Step 1: Import Libraries

python

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score, precision_score, recall_score

Step 2: Load Dataset

python

# Load the dataset (Example dataset with 'text' and 'label' columns)

data = pd.read_csv("spam.csv")

print(data.head()

Step 3: Data Preprocessing

python

X = data["text"] # Features (Email content)

y = data["label"].map({"spam": 1, "ham": 0}) # Convert labels to numerical values

# Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Convert Text to Features

python

vectorizer = TfidfVectorizer(stop_words="english")

X_train_tfidf = vectorizer.fit_transform(X_train)

X_test_tfidf = vectorizer.transform(X_test)

Step 5: Train the Model

python

model = MultinomialNB()

model.fit(X_train_tfidf, y_train)

Step 6: Make Predictions

python

y_pred = model.predict(X_test_tfidf)

Model Evaluation: Accuracy, Precision, Recall

1. Accuracy – Measures overall correctness.

python

print("Accuracy:", accuracy_score(y_test, y_pred))

2. Precision – Measures how many predicted spam emails are actually spam.

python

print("Precision:", precision_score(y_test, y_pred))

3. Recall– Measures how many actual spam emails were correctly classified.

python

print("Recall:", recall_score(y_test, y_pred))

Conclusion

This lesson covered:

✅ *Classification & its applications*

✅ *Logistic Regression, Decision Trees, Random Forest*

✅ *Spam Email Classifier using Naïve Bayes*

✅ *Model evaluation with Accuracy, Precision, and Recall*

Assignment:

Train a Spam Classifier model using the SMS Spam Dataset.

Compare the accuracy of different models (Logistic Regression vs Random Forest).

Week 5: Unsupervised Learning - Clustering & NLP

This week, we will explore Unsupervised Learning, focusing on Clustering and Natural Language Processing (NLP)

1. What is Unsupervised Learning?

Unsupervised learning is a type of machine learning where the algorithm finds patterns in data without labeled outputs.

Example Applications:

- Customer Segmentation: Grouping similar customers based on behavior.

- Anomaly Detection: Identifying fraud in financial transactions.

- Document Clustering: Organizing news articles into topics.

2. Clustering Techniques

K-Means Clustering

- A popular algorithm that groups data points into K clusters.

- Each data point is assigned to the nearest cluster center.

- Used in: Market segmentation, Image compression.

Implementation in Python:

python

from sklearn.cluster import KMeans

import numpy as np

# Sample data

data = np.array([[1, 2], [1, 4], [1, 0],

[10, 2], [10, 4], [10, 0]])

# Apply K-Means with 2 clusters

kmeans = KMeans(n_clusters=2, random_state=0).fit(data)

print("Cluster Centers:", kmeans.cluster_centers_)

print("Labels:", kmeans.labels_)

Hierarchical Clustering

- Creates a tree-like structure of clusters.

- Does not require specifying the number of clusters beforehand.

Implementation in Python:

python

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

# Perform Hierarchical Clustering

linked = linkage(data, method='ward')

# Plot Dendrogram

plt.figure(figsize=(8, 5))

dendrogram(linked)

plt.show()

3. Natural Language Processing (NLP) Basics

NLP enables machines to understand, interpret, and generate human language.

*Common NLP tasks:*

✅ *Text Classification* (Spam detection)

✅ *Named Entity Recognition* (Identifying names, places in text)

✅ *Sentiment Analysis* (Determining the emotion behind text)

### *Preprocessing Text Data with NLP*

python

import nltk

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

import string

nltk.download('punkt')

nltk.download('stopwords')

text = "Natural Language Processing (NLP) is amazing!"

tokens = word_tokenize(text.lower()) # Tokenization

filtered_words = [word for word in tokens if word not in stopwords.words('english') and word not in string.punctuation]

print("Processed Text:", filtered_words)

4. Sentiment Analysis using NLP*

We will analyze *Twitter data* to classify sentiments as *positive, negative, or neutral*.

*Implementation in Python:*

python

from textblob import TextBlob

# Sample tweets

tweets = ["I love Python!", "This is so frustrating.", "I am feeling okay today."]

# Perform Sentiment Analysis

for tweet in tweets:

sentiment = TextBlob(tweet).sentiment.polarity

if sentiment > 0:

print(f"'{tweet}' - Positive 😊")

elif sentiment < 0:

print(f"'{tweet}' - Negative 😠")

else:

print(f"'{tweet}' - Neutral 😐")

Project: Twitter Sentiment Analysis*

Step 1: Install Required Libraries*

pip install tweepy textblob

Step 2: Authenticate with Twitter API*

python

import tweepy

# Set up API keys (Get these from Twitter Developer Portal)

api_key = "your_api_key"

api_secret = "your_api_secret"

access_token = "your_access_token"

access_secret = "your_access_secret"

auth = tweepy.OAuthHandler(api_key, api_secret)

auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

Step 3: Fetch and Analyze Tweets

python

public_tweets = api.search_tweets(q="AI", count=10) # Search for "AI" tweets

for tweet in public_tweets:

analysis = TextBlob(tweet.text)

sentiment = "Positive" if analysis.sentiment.polarity > 0 else "Negative" if analysis.sentiment.polarity < 0 else "Neutral"

print(f"Tweet: {tweet.text}\nSentiment: {sentiment}\n")

## *Conclusion*

This week, we learned:

✅ *Clustering with K-Means & Hierarchical Clustering*

✅ *NLP Basics & Sentiment Analysis*

✅ *Twitter Sentiment Analysis Project*

Assignment:

Use NLP to classify tweets as positive or negative.

Visualize sentiment trends using Word Clouds.

Week 6: Deep Learning & Neural Networks

This week, we will dive into *Deep Learning* and explore how *Neural Networks* work. We will also implement a project on *Handwritten Digit Recognition* using *TensorFlow and Keras*.

1. What is Deep Learning?

Deep Learning is a subset of *Machine Learning* that uses *Artificial Neural Networks (ANNs)* to learn from large amounts of data.

Key Features of Deep Learning:

✅ *Learns from raw data* (images, text, audio)

✅ *Reduces the need for manual feature engineering*

✅ *Performs well on complex tasks* like image recognition and NLP

2. Building a Simple Neural Network

We will use *TensorFlow* and *Keras* to create a *basic neural network*.

Step 1: Install Required Libraries

bash

pip install tensorflow keras numpy matplotlib

Step 2: Create a Neural Network

python

import tensorflow as tf

from tensorflow import keras

import numpy as np

# Sample dataset (X: inputs, Y: outputs)

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)

Y = np.array([[0], [1], [1], [0]], dtype=np.float32) # XOR logic

# Define the model

model = keras.Sequential([

keras.layers.Dense(4, activation='relu', input_shape=(2,)),

keras.layers.Dense(1, activation='sigmoid')

])

# Compile the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(X, Y, epochs=100, verbose=1)

# Test the model

predictions = model.predict(X)

print("Predictions:", predictions)

3. Convolutional Neural Networks (CNNs)

CNNs are a special type of neural network designed for *image recognition*.

*Key Components of CNNs:*

✅ *Convolution Layer* – Extracts features from images

✅ *Pooling Layer* – Reduces the size of images

✅ *Fully Connected Layer* – Makes final predictions

CNN Architecture Example

python

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Define CNN model

cnn_model = Sequential([

Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),

MaxPooling2D(2,2),

Flatten(),

Dense(128, activation='relu'),

Dense(10, activation='softmax')

])

# Compile model

cnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(cnn_model.summary())

## *4. Project: Handwritten Digit Recognition (MNIST Dataset)*

We will build a CNN model to recognize handwritten digits from the *MNIST dataset*.

Step 1: Load the MNIST Dataset

python

from tensorflow.keras.datasets import mnist

import matplotlib.pyplot as plt

# Load dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Display a sample image

plt.imshow(X_train[0], cmap='gray')

plt.show()

Step 2: Preprocess the Data

python

# Normalize pixel values

X_train = X_train / 255.0

X_test = X_test / 255.0

# Reshape for CNN input

X_train = X_train.reshape(-1, 28, 28, 1)

X_test = X_test.reshape(-1, 28, 28, 1)

Step 3: Train the CNN Model

python

cnn_model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))

Step 4: Evaluate and Test

python

test_loss, test_acc = cnn_model.evaluate(X_test, y_test)

print("Test Accuracy:", test_acc)

# Predict a sample image

import numpy as np

sample = np.expand_dims(X_test[0], axis=0)

prediction = np.argmax(cnn_model.predict(sample))

print("Predicted Label:", prediction)

Conclusion

This week, we learned:

✅ *How Neural Networks work*

✅ *Building a simple ANN with Keras*

✅ *Understanding CNNs for image classification*

✅ *Handwritten Digit Recognition with MNIST*

Would you like to explore *Recurrent Neural Networks (RNNs)* next?

Assignment:

Train a CNN model to recognize handwritten digits.

Test your model with new images.

Final Project Ideas (Choose One)

✅ Chatbot using NLP

✅ Face Recognition System

✅ Movie Recommendation System

✅ Stock Market Price Prediction

Pages

Popular Posts

About

Search This Blog

About Me

Tuesday, March 18, 2025

AI HANDOUT FOR INTERN STUDENT

0 comments:

Post a Comment