String Manipulations
1. String Slicing
Syntax:
substring = string[start:end:step]
- start: Starting index (inclusive).
- end: Ending index (exclusive).
- step: How many characters to jump (optional).
Examples:
text = "Hello, World!"
# Basic slicing: get "Hello"
print(text[0:5]) # Output: Hello
# Omitting start defaults to the beginning, so "World!"
print(text[7:]) # Output: World!
# Using step: every 2nd character
print(text[::2]) # Output: Hlo ol!
# Negative indices: get last character
print(text[-1]) # Output: !
Explanation:
Slicing lets you extract portions of a string. If you omit start
or end
, Python uses the beginning or end of the string, respectively. A negative step
reverses the string.
2. String Concatenation
Concatenation using the +
Operator:
greeting = "Hello"
name = "Alice"
message = greeting + ", " + name + "!"
print(message) # Output: Hello, Alice!
Explanation:
The +
operator joins two or more strings into one.
3. Common String Methods
a. .strip()
- Purpose: Removes whitespace (or specified characters) from the beginning and end of a string.
s = " Hello, World! "
print(s.strip()) # Output: "Hello, World!"
print(s.strip(" !")) # Removes leading/trailing spaces and exclamation marks
b. .split()
- Purpose: Splits a string into a list based on a specified delimiter (defaults to whitespace).
sentence = "Python is fun"
words = sentence.split() # Splits on whitespace by default
print(words) # Output: ['Python', 'is', 'fun']
csv_data = "apple,banana,cherry"
fruits = csv_data.split(",")
print(fruits) # Output: ['apple', 'banana', 'cherry']
c. .replace()
- Purpose: Replaces occurrences of a specified substring with another substring.
text = "I like apples. Apples are my favorite fruit."
new_text = text.replace("apples", "oranges")
print(new_text)
# Output: I like oranges. Apples are my favorite fruit.
# Note: .replace() is case-sensitive.
d. .find()
- Purpose: Returns the index of the first occurrence of a substring (or -1 if not found).
text = "Find the needle in the haystack."
index = text.find("needle")
print(index) # Output: 9
# If the substring is not found:
print(text.find("thread")) # Output: -1
e. .join()
- Purpose: Concatenates an iterable (like a list of strings) into a single string, using a specified separator.
words = ['Join', 'these', 'words']
sentence = " ".join(words)
print(sentence) # Output: Join these words
# Using a comma as a separator:
csv = ",".join(words)
print(csv) # Output: Join,these,words
Explanation:
.strip()
is useful for cleaning up user input or data read from files..split()
converts a string into a list, which can be helpful for parsing text..replace()
allows for quick modifications to string content..find()
helps locate substrings for further processing..join()
efficiently creates strings from collections of words or characters.
Regular Expressions
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation in Python. They allow you to search, match, and extract complex patterns from strings. Below is a comprehensive review covering the core pattern syntax, capturing groups, and common use cases with functions such as re.match(), re.search(), and re.findall().
1. Pattern Syntax
Regular expression patterns are strings that describe the text you want to match. Some key elements include:
- Literals: Characters that match themselves (e.g.,
"abc"
matches “abc”). - Meta-characters: Symbols with special meaning:
.
matches any character except a newline.^
asserts the start of a line.$
asserts the end of a line.*
matches 0 or more occurrences of the preceding element.+
matches 1 or more occurrences.?
makes the preceding element optional (0 or 1 occurrence) or denotes non-greedy matching.{n,m}
specifies between n and m occurrences.
- Character classes:
[abc]
matches any one of a, b, or c.[^abc]
matches any character except a, b, or c.\d
matches any digit (equivalent to[0-9]
).\w
matches any word character (letters, digits, and underscore).\s
matches any whitespace character.
- Escaping:
- Use a backslash (
\
) to escape meta-characters when you want to match them literally (e.g.,\.
to match a period).
- Use a backslash (
Example Pattern:
To match an email address, you might use a simplified pattern like:
r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
2. Capturing Groups
Capturing groups are parts of the regex pattern enclosed in parentheses ()
. They allow you to extract sub-parts of a match.
- Basic Usage:
import re
text = "John Doe, john@example.com"
pattern = r"(\w+ \w+), (\S+@\S+)"
match = re.search(pattern, text)
if match:
full_name = match.group(1) # Capturing group 1: "John Doe"
email = match.group(2) # Capturing group 2: "john@example.com"
print(full_name, email)
- Non-Capturing Groups:
Use(?:...)
when you want to group parts of a pattern without capturing them.
3. Common Use Cases with re.match(), re.search(), and re.findall()
a. re.match()
- Purpose:
Checks for a match only at the beginning of the string.
import re
text = "Hello, world!"
pattern = r"Hello"
match = re.match(pattern, text)
if match:
print("Match found:", match.group()) # Output: "Hello"
else:
print("No match found.")
Explanation:
Since re.match()
only checks the start of the string, it will return a match only if the pattern is found right at the beginning.
b. re.search()
- Purpose:
Scans through the entire string and returns the first match found.
import re
text = "Say Hello, then say Hi."
pattern = r"Hi"
match = re.search(pattern, text)
if match:
print("Search found:", match.group()) # Output: "Hi"
else:
print("No match found.")
Explanation:
re.search()
will find “Hi” even though it doesn’t occur at the beginning of the string.
c. re.findall()
- Purpose:
Returns a list of all non-overlapping matches in the string.
import re
text = "The numbers are 123, 456, and 789."
pattern = r"\d+"
matches = re.findall(pattern, text)
print("All numbers found:", matches) # Output: ['123', '456', '789']
Explanation:
re.findall()
scans the whole string and returns all sequences of digits as a list.
Final Thoughts
-
Choosing the Right Function:
- Use re.match() when you’re only interested in matches at the beginning of a string.
- Use re.search() when you need to find the first occurrence anywhere in the string.
- Use re.findall() when you want all occurrences of a pattern.
-
Capturing Groups:
They are crucial when you need to extract specific parts of the matched text, and can be combined with the above functions to tailor your text processing tasks.
Regular expressions are an essential tool in text processing, enabling you to perform sophisticated pattern matching and extraction in a concise manner. Experimenting with different patterns and functions will help you master their use in your projects.
Regex & String Manipulations
Below is a short, self-contained Python script that demonstrates:
- Cleaning a string: Using both
.strip()
and a regex withre.sub()
to remove unwanted characters. - Extracting substrings: Using slicing with
.split()
and regex capturing groups. - Replacing/reformatting parts of a string: Using
.replace()
andre.sub()
.
import re
# Sample string with extra whitespace and unwanted characters.
raw_string = " [Hello] - this is a sample string! "
# --- Step 1: Clean the String ---
# Option A: Use .strip() to remove leading/trailing whitespace.
cleaned = raw_string.strip()
print("After strip():", cleaned)
# Output: "[Hello] - this is a sample string!"
# Option B: Use regex (re.sub) to remove unwanted characters.
# For example, remove square brackets from the entire string.
cleaned_regex = re.sub(r'[\[\]]', '', cleaned)
print("After re.sub() removing brackets:", cleaned_regex)
# Output: "Hello - this is a sample string!"
# --- Step 2: Extract Specific Substrings ---
# Let's assume we want to extract:
# a) The greeting ("Hello")
# b) The sentence ("this is a sample string!")
# Option A: Using slicing and .split() on the cleaned string.
parts = cleaned_regex.split(" - ")
if len(parts) == 2:
greeting = parts[0].strip() # "Hello"
sentence = parts[1].strip() # "this is a sample string!"
print("Greeting (split):", greeting)
print("Sentence (split):", sentence)
# Option B: Using regex capturing groups.
# Pattern explanation:
# \s* -> Optional leading whitespace.
# (?P<greeting>\w+) -> Captures a word as 'greeting'.
# \s*-\s* -> A hyphen surrounded by optional whitespace.
# (?P<sentence>.+) -> Captures the rest of the string as 'sentence'.
pattern = r'\s*\[?(?P<greeting>\w+)\]?\s*-\s*(?P<sentence>.+)'
match = re.match(pattern, cleaned)
if match:
greeting_rgx = match.group('greeting')
sentence_rgx = match.group('sentence')
print("Greeting (regex):", greeting_rgx)
print("Sentence (regex):", sentence_rgx)
# --- Step 3: Replace/Reformat Parts of a String ---
# Example 1: Use .replace() to change "sample" to "test" in the sentence.
formatted_sentence = sentence.replace("sample", "test")
print("Formatted sentence (.replace):", formatted_sentence)
# Output: "this is a test string!"
# Example 2: Use re.sub() to remove exclamation marks from the entire string.
final_string = re.sub(r'!', '', cleaned_regex)
print("Final string after reformatting (re.sub):", final_string)
# Output: "Hello - this is a sample string"
Explanation
-
Cleaning:
.strip()
removes extra whitespace at the start and end.re.sub(r'[\[\]]', '', cleaned)
removes any[
or]
characters from the string.
-
Extraction:
- Using
.split(" - ")
: Splits the string into parts where the hyphen acts as a delimiter. - Using regex capturing groups: The pattern captures the greeting and sentence into named groups (
greeting
andsentence
).
- Using
-
Replacement/Reformatting:
.replace("sample", "test")
: Directly replaces the substring “sample” with “test”.re.sub(r'!', '', cleaned_regex)
removes exclamation marks from the string.
This script combines several common string processing techniques to clean, extract, and reformat text in Python.
OOP in Python
Core Concepts
-
Classes & Objects:
- Class: A blueprint for creating objects (instances) that defines attributes (data) and methods (behavior).
- Object: An instance of a class that holds specific data and can execute the defined methods.
-
Inheritance:
- Single Inheritance: A class inherits from one base class.
- Multiple Inheritance: A class inherits from more than one base class.
- Purpose: To reuse code and establish a relationship between a general class (parent) and specialized classes (children).
-
Encapsulation:
- Concept: Bundling data (attributes) and methods that operate on the data within a class.
- Private Attributes/Methods: Use naming conventions such as
_protected
(convention) and__private
(name mangling) to restrict access.
-
Polymorphism:
- Concept: Different classes can be treated as instances of the same parent class.
- Method Overriding: Child classes can override methods defined in a parent class to provide specialized behavior.
UML-Like Notation (Simplified)
For conceptualizing relationships:
- Inheritance:
Animal ↑ ----------------- | | Dog Cat
- Composition:
If an Animal “has a” behavior or component, you might depict it as:Animal | has a | Tail (or another component)
2. Hands-On Practice
Below is a Python script that creates a simple class hierarchy for an Animal base class and its child classes Dog and Cat. It demonstrates inheritance by overriding a method (make_sound()
), and encapsulation using private attributes and methods. We also briefly discuss the Singleton design pattern as an optional bonus.
# Base class representing a general Animal.
class Animal:
def __init__(self, name, age):
self.name = name # public attribute
self.age = age # public attribute
self.__secret = "I love food" # private attribute (name mangling)
def make_sound(self):
# General animal sound (could be abstract in a real application)
return "Some generic sound"
def get_secret(self):
# Public method to access private attribute safely.
return self.__secret
def __private_method(self):
# Private method (name mangled)
return "This is a private method."
# Child class inheriting from Animal.
class Dog(Animal):
def __init__(self, name, age, breed):
super().__init__(name, age) # Inherit initialization from Animal.
self.breed = breed # additional attribute for Dog
# Overriding the make_sound() method
def make_sound(self):
return "Woof!"
# Another child class inheriting from Animal.
class Cat(Animal):
def __init__(self, name, age, color):
super().__init__(name, age) # Inherit initialization from Animal.
self.color = color # additional attribute for Cat
# Overriding the make_sound() method
def make_sound(self):
return "Meow!"
# Optional: Singleton Pattern Example using a class decorator.
def singleton(cls):
instances = {}
def get_instance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
@singleton
class Zoo:
def __init__(self):
self.animals = []
def add_animal(self, animal):
self.animals.append(animal)
def list_animals(self):
return [animal.name for animal in self.animals]
# --- Testing the classes ---
if __name__ == "__main__":
# Create instances of Dog and Cat.
dog = Dog("Buddy", 3, "Golden Retriever")
cat = Cat("Whiskers", 2, "Tabby")
# Demonstrate polymorphism: each animal makes its own sound.
print(f"{dog.name} says: {dog.make_sound()}") # Buddy says: Woof!
print(f"{cat.name} says: {cat.make_sound()}") # Whiskers says: Meow!
# Demonstrate encapsulation by accessing a private attribute via a public method.
print(f"{dog.name}'s secret: {dog.get_secret()}") # Buddy's secret: I love food
# Show that the private method is not directly accessible.
try:
print(dog.__private_method())
except AttributeError as e:
print("Error accessing private method:", e)
# Demonstrate the Singleton pattern with the Zoo class.
zoo1 = Zoo()
zoo2 = Zoo()
zoo1.add_animal(dog)
zoo1.add_animal(cat)
print("Animals in zoo1:", zoo1.list_animals())
print("Zoo1 and Zoo2 are the same instance:", zoo1 is zoo2)
Explanation
-
Classes & Inheritance:
Animal
serves as the base class.Dog
andCat
inherit fromAnimal
and override themake_sound()
method, demonstrating polymorphism.
-
Encapsulation:
Animal
includes a private attribute__secret
and a private method__private_method()
, which are not accessible outside the class except via public methods (get_secret()
).
-
Singleton Pattern (Optional):
- The
Zoo
class is decorated with asingleton
decorator. This ensures that only one instance ofZoo
is created, which is useful in scenarios where a single point of access is required (e.g., managing a shared resource).
- The
Pandas Operations
1. Importing Data
Pandas provides several functions to import data from various file formats. The most common is using pd.read_csv()
:
import pandas as pd
# Importing data from a CSV file
df = pd.read_csv("data.csv")
# For other formats:
# df_excel = pd.read_excel("data.xlsx")
# df_json = pd.read_json("data.json")
Explanation:
pd.read_csv("data.csv")
reads a CSV file into a DataFrame.- Similar functions exist for Excel, JSON, SQL, etc.
2. Data Inspection
Once the data is loaded, you can inspect it using several useful methods:
a. Viewing Data
.head()
: Displays the first few rows (default is 5)..tail()
: Displays the last few rows.
print(df.head())
b. Data Overview
.info()
: Provides summary information about DataFrame columns, data types, and non-null counts..describe()
: Generates descriptive statistics for numeric columns.
print(df.info())
print(df.describe())
Explanation:
These methods help you understand the structure and basic statistics of your dataset.
3. Common Transformations
a. Grouping Data
Use .groupby()
to aggregate data based on one or more columns. For example, to calculate the mean value per group:
grouped = df.groupby("Category")["Value"].mean()
print(grouped)
Explanation:
groupby("Category")
groups rows based on the unique values in the “Category” column.["Value"].mean()
computes the mean of the “Value” column for each group.
b. Merging/Joins
Pandas offers multiple methods to combine DataFrames:
pd.merge()
: Merges DataFrames based on one or more common keys..join()
: Joins columns of another DataFrame based on the index.
# Merge two DataFrames on a common column "ID"
df1 = pd.DataFrame({"ID": [1, 2, 3], "A": ["foo", "bar", "baz"]})
df2 = pd.DataFrame({"ID": [1, 2, 4], "B": [10, 20, 30]})
merged_df = pd.merge(df1, df2, on="ID", how="inner")
print(merged_df)
Explanation:
how="inner"
returns only rows with matching keys in both DataFrames. Options include “left”, “right”, and “outer”.
c. Reshaping Data
Reshape data using pivot tables, .melt()
, and .pivot()
.
Pivot Tables
# Create a pivot table summarizing values by two categorical variables
pivot_table = pd.pivot_table(df, values="Value", index="Category", columns="Type", aggfunc="sum")
print(pivot_table)
Melt
# Unpivot a DataFrame from wide to long format
melted_df = pd.melt(df, id_vars=["Category"], value_vars=["Value1", "Value2"], var_name="Variable", value_name="Value")
print(melted_df)
Pivot
# Pivot data from long to wide format
pivot_df = melted_df.pivot(index="Category", columns="Variable", values="Value")
print(pivot_df)
Explanation:
- Pivot Table: Aggregates data and summarizes values across categories.
- Melt: Converts wide-format data into a long-format, which is often easier for analysis or plotting.
- Pivot: The reverse of melt—converts long data back into a wide format.
4. Dealing with Missing Data
Handling missing data is a critical part of data cleaning. Common methods include:
a. Filling Missing Data
.fillna()
: Replace missing values with a specified value or a computed statistic (mean, median, etc.).
# Fill missing values with zero
df_filled = df.fillna(0)
# Fill missing values with the column mean (for numeric columns)
df['Value'] = df['Value'].fillna(df['Value'].mean())
b. Dropping Missing Data
.dropna()
: Remove rows (or columns) that contain missing data.
# Drop rows with any missing values
df_dropped = df.dropna()
# Drop columns with any missing values
df_dropped_cols = df.dropna(axis=1)
Explanation:
- Use
.fillna()
when you want to impute missing values and maintain the data shape. - Use
.dropna()
to remove incomplete rows or columns if the missing data is not critical.
Final Thoughts
Pandas provides a robust toolkit for data wrangling and transformation:
- Importing data: Read data from various formats.
- Inspecting data: Understand your dataset’s structure and statistics.
- Transformations: Grouping, merging, reshaping data make it easier to analyze.
- Handling missing data: Cleaning data by filling or dropping missing values ensures more robust analysis.
Python Tricky Questions
1. List Comprehensions
Syntax Refresher:
A list comprehension creates a new list by evaluating an expression for every item in an iterable, optionally including a condition.
# Syntax:
new_list = [expression for item in iterable if condition]
Example:
# Standard for-loop to create a list of even numbers:
even_numbers = []
for num in range(10):
if num % 2 == 0:
even_numbers.append(num)
print(even_numbers) # Output: [0, 2, 4, 6, 8]
# Using list comprehension:
even_numbers_comp = [num for num in range(10) if num % 2 == 0]
print(even_numbers_comp) # Output: [0, 2, 4, 6, 8]
Comparison:
- Clarity: List comprehensions offer a concise syntax that makes the intent clear.
- Performance: They are often faster and more memory-efficient than equivalent for-loops, because the comprehension is implemented in C under the hood.
2. Decorators
Decorators are a way to wrap functions (or methods) to modify their behavior without permanently modifying them.
Concept:
A decorator is a callable that takes a function as an argument and returns a new function with added behavior.
Example: A Simple Logger Decorator
def logger(func):
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__} with args: {args} kwargs: {kwargs}")
result = func(*args, **kwargs)
print(f"{func.__name__} returned: {result}")
return result
return wrapper
@logger
def add(a, b):
return a + b
# Calling the decorated function:
add(3, 5)
Explanation:
- The
logger
decorator wraps theadd
function. - When
add
is called, it first prints the input arguments, then calls the original function, prints the output, and finally returns the result.
3. Generators
Generators are a special type of iterator defined using functions and the yield
statement.
Creating a Generator with yield:
def count_up_to(n):
count = 1
while count <= n:
yield count
count += 1
# Using the generator:
for number in count_up_to(5):
print(number) # Outputs: 1, 2, 3, 4, 5 (one by one)
Memory Usage Comparison:
- Generators: Produce one value at a time, which makes them very memory-efficient especially when processing large datasets.
- List Comprehensions: Create the entire list in memory at once, which can be less efficient for very large sequences.
4. Context Managers
Context managers allow you to allocate and release resources precisely, typically using the with
statement.
Using with Statements:
with open("example.txt", "w") as file:
file.write("Hello, World!")
Explanation:
- The file is automatically closed when the block inside the
with
statement is exited, even if an error occurs.
Creating Your Own Context Manager:
To create a custom context manager, define __enter__
and __exit__
methods in a class.
class MyContextManager:
def __enter__(self):
print("Entering the context")
# Return an object if needed
return self
def __exit__(self, exc_type, exc_value, traceback):
print("Exiting the context")
# Handle exceptions if necessary (return True to suppress)
return False
# Using the custom context manager:
with MyContextManager() as manager:
print("Inside the context")
Benefits:
- Readability: The
with
statement makes it clear where a resource is acquired and released. - Maintainability: Encapsulates resource management, reducing boilerplate code and potential errors.
- Performance: Ensures resources like file handles or network connections are properly closed, improving program stability.
Final Thoughts
- List Comprehensions: Provide a concise and often faster alternative to for-loops for generating lists.
- Decorators: Allow you to modify function behavior in a reusable and readable way.
- Generators: Enable efficient iteration over large or infinite sequences by yielding one item at a time.
- Context Managers: Simplify resource management with the
with
statement, ensuring clean setup and teardown.
Each of these features can dramatically improve the quality and performance of your Python code when used appropriately.