12 - PANDAS

Study Guide for Python Programming: Unit on Data Analysis

Lovey kind of looks like a Panda?

Here’s a bulleted list summarizing the syntax for the specified Pandas functions:

loc
- Accesses a group of rows and columns by labels or a boolean array.
- Syntax: DataFrame.loc[<row_labels>, <column_labels>]
iloc
- Accesses a group of rows and columns by integer index positions.
- Syntax: DataFrame.iloc[<row_indices>, <column_indices>]
series()
- Creates a Pandas Series from a list, array, or dictionary.
- Syntax: pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
DataFrame()
- Creates a DataFrame from a variety of input data structures like a 2D array, dictionary of arrays, Series, or another DataFrame.
- Syntax: pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
read_csv()
- Reads a CSV file into a DataFrame.
- Syntax: pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, ...)
Row / Column Filters
- Used to filter the rows or columns of a DataFrame based on some condition or criteria.
- Syntax for Row Filter: DataFrame[DataFrame['column_name'] <operator> <condition>]
- Syntax for Column Filter: DataFrame[['column_name1', 'column_name2', ...]]
to_records()
- Converts DataFrame to a NumPy record array.
- Syntax: DataFrame.to_records(index=True, column_dtypes=None, index_dtypes=None)

Definition: The process of applying logical techniques to describe, condense, recap, evaluate data, and illustrate information.
Goals: To discover useful information, provide insights, suggest conclusions, and support decision-making.

Pandas Package: A Python package for data analysis, providing built-in data structures for manipulating and analyzing data sets.
Functionality: Allows fetching data from various sources and tabulating it for analysis.
Data Structures: Includes Series and DataFrame, which simplify data manipulation.
Documentation: Extensive documentation available at Pandas Documentation.

Series: A named Python list, similar to a dictionary with a list as a value.
- Example: {‘grades’: [50, 90, 100, 45]}
DataFrame: A dictionary of Series, representing a tabular data structure.
- Example: {‘names’: [‘bob’, ‘ken’, ‘art’, ‘joe’], ‘grades’: [50, 90, 100, 45]}

Creating a DataFrame: From a dictionary, CSV, JSON, HTML tables, or database queries.
Column Selection: Accessing specific columns in a DataFrame.
Row Selection: Using Boolean indexing to filter rows based on conditions.
Data Manipulation: Basic operations like row/column filters, handling null values, head(), sample(), and value_counts.

Reading Data: Pandas can read data from CSV, Excel, delimited files, HTML tables, JSON, and API outputs.
Data Exploration: Once data is loaded into a DataFrame, it’s easy to explore and manipulate.

What Pandas data structure is similar to a Python dictionary?
- A. DataFrame
- B. Series
- C. List
- D. Tuple

Click to see the answer

Answer: A. DataFrame

Explanation: A DataFrame is a tabular data structure in Pandas, similar to a dictionary of Series.

Which method in Pandas is used to read a CSV file into a DataFrame?
- A. pd.read_csv()
- B. pd.read_excel()
- C. pd.read_json()
- D. pd.read_html()

Click to see the answer

Answer: A. pd.read_csv()

Explanation: The pd.read_csv() function is used in Pandas to read data from a CSV file into a DataFrame.

How do you select a column named ‘Age’ from a DataFrame df?
- A. df['Age']
- B. df.Age
- C. Both A and B
- D. df(Age)

Click to see the answer

Answer: C. Both A and B

Explanation: In Pandas, a column can be selected using either df[‘Age’] or df.Age.

What is the output of df.head(3) where df is a Pandas DataFrame?
- A. The first 3 columns of df
- B. The first 3 rows of df
- C. The last 3 rows of df
- D. The header row of df

Click to see the answer

Answer: B. The first 3 rows of df

Explanation: The head() method in Pandas returns the first N rows of the DataFrame, with df.head(3) returning the first 3 rows.

In Pandas, how do you filter rows in DataFrame df where the column ‘Sales’ is greater than 100?
- A. df[df['Sales'] > 100]
- B. df('Sales' > 100)
- C. df.get('Sales' > 100)
- D. df.query('Sales > 100')

Click to see the answer

Answer: A. df[df[‘Sales’] > 100]

Explanation: This syntax is used for Boolean indexing in Pandas, filtering rows where the ‘Sales’ column values are greater than 100.