Raj Shaikh    16 min read    3216 words

String Manipulations

1. String Slicing

Syntax:

substring = string[start:end:step]
  • start: Starting index (inclusive).
  • end: Ending index (exclusive).
  • step: How many characters to jump (optional).

Examples:

text = "Hello, World!"

# Basic slicing: get "Hello"
print(text[0:5])   # Output: Hello

# Omitting start defaults to the beginning, so "World!"
print(text[7:])    # Output: World!

# Using step: every 2nd character
print(text[::2])   # Output: Hlo ol!

# Negative indices: get last character
print(text[-1])    # Output: !

Explanation:
Slicing lets you extract portions of a string. If you omit start or end, Python uses the beginning or end of the string, respectively. A negative step reverses the string.


2. String Concatenation

Concatenation using the + Operator:

greeting = "Hello"
name = "Alice"
message = greeting + ", " + name + "!"
print(message)  # Output: Hello, Alice!

Explanation:
The + operator joins two or more strings into one.


3. Common String Methods

a. .strip()

  • Purpose: Removes whitespace (or specified characters) from the beginning and end of a string.
s = "   Hello, World!   "
print(s.strip())          # Output: "Hello, World!"
print(s.strip(" !"))       # Removes leading/trailing spaces and exclamation marks

b. .split()

  • Purpose: Splits a string into a list based on a specified delimiter (defaults to whitespace).
sentence = "Python is fun"
words = sentence.split()   # Splits on whitespace by default
print(words)               # Output: ['Python', 'is', 'fun']

csv_data = "apple,banana,cherry"
fruits = csv_data.split(",")
print(fruits)              # Output: ['apple', 'banana', 'cherry']

c. .replace()

  • Purpose: Replaces occurrences of a specified substring with another substring.
text = "I like apples. Apples are my favorite fruit."
new_text = text.replace("apples", "oranges")
print(new_text)
# Output: I like oranges. Apples are my favorite fruit.
# Note: .replace() is case-sensitive.

d. .find()

  • Purpose: Returns the index of the first occurrence of a substring (or -1 if not found).
text = "Find the needle in the haystack."
index = text.find("needle")
print(index)  # Output: 9

# If the substring is not found:
print(text.find("thread"))  # Output: -1

e. .join()

  • Purpose: Concatenates an iterable (like a list of strings) into a single string, using a specified separator.
words = ['Join', 'these', 'words']
sentence = " ".join(words)
print(sentence)  # Output: Join these words

# Using a comma as a separator:
csv = ",".join(words)
print(csv)  # Output: Join,these,words

Explanation:

  • .strip() is useful for cleaning up user input or data read from files.
  • .split() converts a string into a list, which can be helpful for parsing text.
  • .replace() allows for quick modifications to string content.
  • .find() helps locate substrings for further processing.
  • .join() efficiently creates strings from collections of words or characters.

Regular Expressions

Regular expressions (regex) are a powerful tool for pattern matching and text manipulation in Python. They allow you to search, match, and extract complex patterns from strings. Below is a comprehensive review covering the core pattern syntax, capturing groups, and common use cases with functions such as re.match(), re.search(), and re.findall().


1. Pattern Syntax

Regular expression patterns are strings that describe the text you want to match. Some key elements include:

  • Literals: Characters that match themselves (e.g., "abc" matches “abc”).
  • Meta-characters: Symbols with special meaning:
    • . matches any character except a newline.
    • ^ asserts the start of a line.
    • $ asserts the end of a line.
    • * matches 0 or more occurrences of the preceding element.
    • + matches 1 or more occurrences.
    • ? makes the preceding element optional (0 or 1 occurrence) or denotes non-greedy matching.
    • {n,m} specifies between n and m occurrences.
  • Character classes:
    • [abc] matches any one of a, b, or c.
    • [^abc] matches any character except a, b, or c.
    • \d matches any digit (equivalent to [0-9]).
    • \w matches any word character (letters, digits, and underscore).
    • \s matches any whitespace character.
  • Escaping:
    • Use a backslash (\) to escape meta-characters when you want to match them literally (e.g., \. to match a period).

Example Pattern:
To match an email address, you might use a simplified pattern like:

r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"

2. Capturing Groups

Capturing groups are parts of the regex pattern enclosed in parentheses (). They allow you to extract sub-parts of a match.

  • Basic Usage:
import re

text = "John Doe, john@example.com"
pattern = r"(\w+ \w+), (\S+@\S+)"
match = re.search(pattern, text)
if match:
    full_name = match.group(1)  # Capturing group 1: "John Doe"
    email = match.group(2)      # Capturing group 2: "john@example.com"
    print(full_name, email)
  • Non-Capturing Groups:
    Use (?:...) when you want to group parts of a pattern without capturing them.

3. Common Use Cases with re.match(), re.search(), and re.findall()

a. re.match()

  • Purpose:
    Checks for a match only at the beginning of the string.
import re

text = "Hello, world!"
pattern = r"Hello"
match = re.match(pattern, text)
if match:
    print("Match found:", match.group())  # Output: "Hello"
else:
    print("No match found.")

Explanation:
Since re.match() only checks the start of the string, it will return a match only if the pattern is found right at the beginning.

b. re.search()

  • Purpose:
    Scans through the entire string and returns the first match found.
import re

text = "Say Hello, then say Hi."
pattern = r"Hi"
match = re.search(pattern, text)
if match:
    print("Search found:", match.group())  # Output: "Hi"
else:
    print("No match found.")

Explanation:
re.search() will find “Hi” even though it doesn’t occur at the beginning of the string.

c. re.findall()

  • Purpose:
    Returns a list of all non-overlapping matches in the string.
import re

text = "The numbers are 123, 456, and 789."
pattern = r"\d+"
matches = re.findall(pattern, text)
print("All numbers found:", matches)  # Output: ['123', '456', '789']

Explanation:
re.findall() scans the whole string and returns all sequences of digits as a list.


Final Thoughts

  • Choosing the Right Function:

    • Use re.match() when you’re only interested in matches at the beginning of a string.
    • Use re.search() when you need to find the first occurrence anywhere in the string.
    • Use re.findall() when you want all occurrences of a pattern.
  • Capturing Groups:
    They are crucial when you need to extract specific parts of the matched text, and can be combined with the above functions to tailor your text processing tasks.

Regular expressions are an essential tool in text processing, enabling you to perform sophisticated pattern matching and extraction in a concise manner. Experimenting with different patterns and functions will help you master their use in your projects.


Regex & String Manipulations

Below is a short, self-contained Python script that demonstrates:

  1. Cleaning a string: Using both .strip() and a regex with re.sub() to remove unwanted characters.
  2. Extracting substrings: Using slicing with .split() and regex capturing groups.
  3. Replacing/reformatting parts of a string: Using .replace() and re.sub().
import re

# Sample string with extra whitespace and unwanted characters.
raw_string = "   [Hello] - this is a sample string!   "

# --- Step 1: Clean the String ---

# Option A: Use .strip() to remove leading/trailing whitespace.
cleaned = raw_string.strip()
print("After strip():", cleaned)
# Output: "[Hello] - this is a sample string!"

# Option B: Use regex (re.sub) to remove unwanted characters.
# For example, remove square brackets from the entire string.
cleaned_regex = re.sub(r'[\[\]]', '', cleaned)
print("After re.sub() removing brackets:", cleaned_regex)
# Output: "Hello - this is a sample string!"

# --- Step 2: Extract Specific Substrings ---

# Let's assume we want to extract:
#   a) The greeting ("Hello")
#   b) The sentence ("this is a sample string!")
# Option A: Using slicing and .split() on the cleaned string.
parts = cleaned_regex.split(" - ")
if len(parts) == 2:
    greeting = parts[0].strip()  # "Hello"
    sentence = parts[1].strip()  # "this is a sample string!"
    print("Greeting (split):", greeting)
    print("Sentence (split):", sentence)

# Option B: Using regex capturing groups.
# Pattern explanation:
#   \s*           -> Optional leading whitespace.
#   (?P<greeting>\w+) -> Captures a word as 'greeting'.
#   \s*-\s*       -> A hyphen surrounded by optional whitespace.
#   (?P<sentence>.+) -> Captures the rest of the string as 'sentence'.
pattern = r'\s*\[?(?P<greeting>\w+)\]?\s*-\s*(?P<sentence>.+)'
match = re.match(pattern, cleaned)
if match:
    greeting_rgx = match.group('greeting')
    sentence_rgx = match.group('sentence')
    print("Greeting (regex):", greeting_rgx)
    print("Sentence (regex):", sentence_rgx)

# --- Step 3: Replace/Reformat Parts of a String ---

# Example 1: Use .replace() to change "sample" to "test" in the sentence.
formatted_sentence = sentence.replace("sample", "test")
print("Formatted sentence (.replace):", formatted_sentence)
# Output: "this is a test string!"

# Example 2: Use re.sub() to remove exclamation marks from the entire string.
final_string = re.sub(r'!', '', cleaned_regex)
print("Final string after reformatting (re.sub):", final_string)
# Output: "Hello - this is a sample string"

Explanation

  1. Cleaning:

    • .strip() removes extra whitespace at the start and end.
    • re.sub(r'[\[\]]', '', cleaned) removes any [ or ] characters from the string.
  2. Extraction:

    • Using .split(" - "): Splits the string into parts where the hyphen acts as a delimiter.
    • Using regex capturing groups: The pattern captures the greeting and sentence into named groups (greeting and sentence).
  3. Replacement/Reformatting:

    • .replace("sample", "test"): Directly replaces the substring “sample” with “test”.
    • re.sub(r'!', '', cleaned_regex) removes exclamation marks from the string.

This script combines several common string processing techniques to clean, extract, and reformat text in Python.


OOP in Python

Core Concepts

  • Classes & Objects:

    • Class: A blueprint for creating objects (instances) that defines attributes (data) and methods (behavior).
    • Object: An instance of a class that holds specific data and can execute the defined methods.
  • Inheritance:

    • Single Inheritance: A class inherits from one base class.
    • Multiple Inheritance: A class inherits from more than one base class.
    • Purpose: To reuse code and establish a relationship between a general class (parent) and specialized classes (children).
  • Encapsulation:

    • Concept: Bundling data (attributes) and methods that operate on the data within a class.
    • Private Attributes/Methods: Use naming conventions such as _protected (convention) and __private (name mangling) to restrict access.
  • Polymorphism:

    • Concept: Different classes can be treated as instances of the same parent class.
    • Method Overriding: Child classes can override methods defined in a parent class to provide specialized behavior.

UML-Like Notation (Simplified)

For conceptualizing relationships:

  • Inheritance:
           Animal
             ↑
      -----------------
      |               |
     Dog             Cat
  • Composition:
    If an Animal “has a” behavior or component, you might depict it as:
      Animal
        |
      has a
        |
    Tail (or another component)

2. Hands-On Practice

Below is a Python script that creates a simple class hierarchy for an Animal base class and its child classes Dog and Cat. It demonstrates inheritance by overriding a method (make_sound()), and encapsulation using private attributes and methods. We also briefly discuss the Singleton design pattern as an optional bonus.

# Base class representing a general Animal.
class Animal:
    def __init__(self, name, age):
        self.name = name         # public attribute
        self.age = age           # public attribute
        self.__secret = "I love food"  # private attribute (name mangling)

    def make_sound(self):
        # General animal sound (could be abstract in a real application)
        return "Some generic sound"

    def get_secret(self):
        # Public method to access private attribute safely.
        return self.__secret

    def __private_method(self):
        # Private method (name mangled)
        return "This is a private method."

# Child class inheriting from Animal.
class Dog(Animal):
    def __init__(self, name, age, breed):
        super().__init__(name, age)  # Inherit initialization from Animal.
        self.breed = breed          # additional attribute for Dog

    # Overriding the make_sound() method
    def make_sound(self):
        return "Woof!"

# Another child class inheriting from Animal.
class Cat(Animal):
    def __init__(self, name, age, color):
        super().__init__(name, age)  # Inherit initialization from Animal.
        self.color = color         # additional attribute for Cat

    # Overriding the make_sound() method
    def make_sound(self):
        return "Meow!"

# Optional: Singleton Pattern Example using a class decorator.
def singleton(cls):
    instances = {}
    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance

@singleton
class Zoo:
    def __init__(self):
        self.animals = []

    def add_animal(self, animal):
        self.animals.append(animal)

    def list_animals(self):
        return [animal.name for animal in self.animals]

# --- Testing the classes ---
if __name__ == "__main__":
    # Create instances of Dog and Cat.
    dog = Dog("Buddy", 3, "Golden Retriever")
    cat = Cat("Whiskers", 2, "Tabby")

    # Demonstrate polymorphism: each animal makes its own sound.
    print(f"{dog.name} says: {dog.make_sound()}")  # Buddy says: Woof!
    print(f"{cat.name} says: {cat.make_sound()}")  # Whiskers says: Meow!

    # Demonstrate encapsulation by accessing a private attribute via a public method.
    print(f"{dog.name}'s secret: {dog.get_secret()}")  # Buddy's secret: I love food

    # Show that the private method is not directly accessible.
    try:
        print(dog.__private_method())
    except AttributeError as e:
        print("Error accessing private method:", e)

    # Demonstrate the Singleton pattern with the Zoo class.
    zoo1 = Zoo()
    zoo2 = Zoo()
    zoo1.add_animal(dog)
    zoo1.add_animal(cat)
    print("Animals in zoo1:", zoo1.list_animals())
    print("Zoo1 and Zoo2 are the same instance:", zoo1 is zoo2)

Explanation

  1. Classes & Inheritance:

    • Animal serves as the base class.
    • Dog and Cat inherit from Animal and override the make_sound() method, demonstrating polymorphism.
  2. Encapsulation:

    • Animal includes a private attribute __secret and a private method __private_method(), which are not accessible outside the class except via public methods (get_secret()).
  3. Singleton Pattern (Optional):

    • The Zoo class is decorated with a singleton decorator. This ensures that only one instance of Zoo is created, which is useful in scenarios where a single point of access is required (e.g., managing a shared resource).

Pandas Operations

1. Importing Data

Pandas provides several functions to import data from various file formats. The most common is using pd.read_csv():

import pandas as pd

# Importing data from a CSV file
df = pd.read_csv("data.csv")

# For other formats:
# df_excel = pd.read_excel("data.xlsx")
# df_json  = pd.read_json("data.json")

Explanation:

  • pd.read_csv("data.csv") reads a CSV file into a DataFrame.
  • Similar functions exist for Excel, JSON, SQL, etc.

2. Data Inspection

Once the data is loaded, you can inspect it using several useful methods:

a. Viewing Data

  • .head(): Displays the first few rows (default is 5).
  • .tail(): Displays the last few rows.
print(df.head())

b. Data Overview

  • .info(): Provides summary information about DataFrame columns, data types, and non-null counts.
  • .describe(): Generates descriptive statistics for numeric columns.
print(df.info())
print(df.describe())

Explanation:
These methods help you understand the structure and basic statistics of your dataset.


3. Common Transformations

a. Grouping Data

Use .groupby() to aggregate data based on one or more columns. For example, to calculate the mean value per group:

grouped = df.groupby("Category")["Value"].mean()
print(grouped)

Explanation:

  • groupby("Category") groups rows based on the unique values in the “Category” column.
  • ["Value"].mean() computes the mean of the “Value” column for each group.

b. Merging/Joins

Pandas offers multiple methods to combine DataFrames:

  • pd.merge(): Merges DataFrames based on one or more common keys.
  • .join(): Joins columns of another DataFrame based on the index.
# Merge two DataFrames on a common column "ID"
df1 = pd.DataFrame({"ID": [1, 2, 3], "A": ["foo", "bar", "baz"]})
df2 = pd.DataFrame({"ID": [1, 2, 4], "B": [10, 20, 30]})
merged_df = pd.merge(df1, df2, on="ID", how="inner")
print(merged_df)

Explanation:

  • how="inner" returns only rows with matching keys in both DataFrames. Options include “left”, “right”, and “outer”.

c. Reshaping Data

Reshape data using pivot tables, .melt(), and .pivot().

Pivot Tables

# Create a pivot table summarizing values by two categorical variables
pivot_table = pd.pivot_table(df, values="Value", index="Category", columns="Type", aggfunc="sum")
print(pivot_table)

Melt

# Unpivot a DataFrame from wide to long format
melted_df = pd.melt(df, id_vars=["Category"], value_vars=["Value1", "Value2"], var_name="Variable", value_name="Value")
print(melted_df)

Pivot

# Pivot data from long to wide format
pivot_df = melted_df.pivot(index="Category", columns="Variable", values="Value")
print(pivot_df)

Explanation:

  • Pivot Table: Aggregates data and summarizes values across categories.
  • Melt: Converts wide-format data into a long-format, which is often easier for analysis or plotting.
  • Pivot: The reverse of melt—converts long data back into a wide format.

4. Dealing with Missing Data

Handling missing data is a critical part of data cleaning. Common methods include:

a. Filling Missing Data

  • .fillna(): Replace missing values with a specified value or a computed statistic (mean, median, etc.).
# Fill missing values with zero
df_filled = df.fillna(0)

# Fill missing values with the column mean (for numeric columns)
df['Value'] = df['Value'].fillna(df['Value'].mean())

b. Dropping Missing Data

  • .dropna(): Remove rows (or columns) that contain missing data.
# Drop rows with any missing values
df_dropped = df.dropna()

# Drop columns with any missing values
df_dropped_cols = df.dropna(axis=1)

Explanation:

  • Use .fillna() when you want to impute missing values and maintain the data shape.
  • Use .dropna() to remove incomplete rows or columns if the missing data is not critical.

Final Thoughts

Pandas provides a robust toolkit for data wrangling and transformation:

  • Importing data: Read data from various formats.
  • Inspecting data: Understand your dataset’s structure and statistics.
  • Transformations: Grouping, merging, reshaping data make it easier to analyze.
  • Handling missing data: Cleaning data by filling or dropping missing values ensures more robust analysis.

Python Tricky Questions

1. List Comprehensions

Syntax Refresher:
A list comprehension creates a new list by evaluating an expression for every item in an iterable, optionally including a condition.

# Syntax:
new_list = [expression for item in iterable if condition]

Example:

# Standard for-loop to create a list of even numbers:
even_numbers = []
for num in range(10):
    if num % 2 == 0:
        even_numbers.append(num)
print(even_numbers)  # Output: [0, 2, 4, 6, 8]

# Using list comprehension:
even_numbers_comp = [num for num in range(10) if num % 2 == 0]
print(even_numbers_comp)  # Output: [0, 2, 4, 6, 8]

Comparison:

  • Clarity: List comprehensions offer a concise syntax that makes the intent clear.
  • Performance: They are often faster and more memory-efficient than equivalent for-loops, because the comprehension is implemented in C under the hood.

2. Decorators

Decorators are a way to wrap functions (or methods) to modify their behavior without permanently modifying them.

Concept:
A decorator is a callable that takes a function as an argument and returns a new function with added behavior.

Example: A Simple Logger Decorator

def logger(func):
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__} with args: {args} kwargs: {kwargs}")
        result = func(*args, **kwargs)
        print(f"{func.__name__} returned: {result}")
        return result
    return wrapper

@logger
def add(a, b):
    return a + b

# Calling the decorated function:
add(3, 5)

Explanation:

  • The logger decorator wraps the add function.
  • When add is called, it first prints the input arguments, then calls the original function, prints the output, and finally returns the result.

3. Generators

Generators are a special type of iterator defined using functions and the yield statement.

Creating a Generator with yield:

def count_up_to(n):
    count = 1
    while count <= n:
        yield count
        count += 1

# Using the generator:
for number in count_up_to(5):
    print(number)  # Outputs: 1, 2, 3, 4, 5 (one by one)

Memory Usage Comparison:

  • Generators: Produce one value at a time, which makes them very memory-efficient especially when processing large datasets.
  • List Comprehensions: Create the entire list in memory at once, which can be less efficient for very large sequences.

4. Context Managers

Context managers allow you to allocate and release resources precisely, typically using the with statement.

Using with Statements:

with open("example.txt", "w") as file:
    file.write("Hello, World!")

Explanation:

  • The file is automatically closed when the block inside the with statement is exited, even if an error occurs.

Creating Your Own Context Manager:
To create a custom context manager, define __enter__ and __exit__ methods in a class.

class MyContextManager:
    def __enter__(self):
        print("Entering the context")
        # Return an object if needed
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        print("Exiting the context")
        # Handle exceptions if necessary (return True to suppress)
        return False

# Using the custom context manager:
with MyContextManager() as manager:
    print("Inside the context")

Benefits:

  • Readability: The with statement makes it clear where a resource is acquired and released.
  • Maintainability: Encapsulates resource management, reducing boilerplate code and potential errors.
  • Performance: Ensures resources like file handles or network connections are properly closed, improving program stability.

Final Thoughts

  • List Comprehensions: Provide a concise and often faster alternative to for-loops for generating lists.
  • Decorators: Allow you to modify function behavior in a reusable and readable way.
  • Generators: Enable efficient iteration over large or infinite sequences by yielding one item at a time.
  • Context Managers: Simplify resource management with the with statement, ensuring clean setup and teardown.

Each of these features can dramatically improve the quality and performance of your Python code when used appropriately.

Last updated on
Any doubt in content? Ask me anything?
Chat
Hi there! I'm the chatbot. Please tell me your query.