Python Data Types and Structures

 

 

Programming languages store and process data of various types as variables, and are defined by their type. Datatypes dictate how the data is represented and what operations can be performed with the data. One method by which a datatype can be assigned to a variable is through casting, which is achieved using the appropriate constructor.

 

If a variable's data type is unknown, Python offers the following functionality to return the variable's type:

type(<variable>)

 

Python's built-in data types are broken out into five categories: Numeric, Boolean, Sequence, Mapping, and Sets. Generally, data types define the kind of data stored in a variable, and data structures define how the data is organized and stored. In other words, data structures are collections of data types.

 

These types are detailed below, along with some examples and useful Python methods that facilitate processing data of various types and structures. Practice implementing various data types through these exercises.

 

A screenshot of a computer

AI-generated content may be incorrect.

 

 

None

·       None: null value or null object, represents the absence of an object

o   NOT the same as False, 0, or an empty variable

o   Used as a default value when a true value does not exist or is yet to be defined

o   None is a data type of the class NoneType object

§  Example: None



Numeric

·       Integer: Numeric, Whole

 

 

·       Float: Numeric, Decimal





Boolean

·       Boolean: Binary, False/True, 0/1

o   True or False values, also represented as 1 or 0, respectively. This is particularly useful for comparison operations, where populated variables or data structures are True, and otherwise False.

§  Constructor: bool(<value>)

§  Example: True

§  Example: 1

§  Example: False

§  Example: 0

 

o   True is the same as 1, and False is the same as 0

o   bool(“”) yields False

o   bool(“<char>”) yields True



Sequence

·       String: Text, sequence of characters

 

 

 

String Methods

<upper_cased_string_value> = <string_ value>.upper()

Returns <upper_cased_string_value> with all <string_value> characters capitalized

<lower_cased_string_value> = <string_ value>.lower()

Returns <lower_cased_string_value> with all <string_value> characters lower-cased

<title_string_value> = <string_value>.title()

Returns <title_string_value> with the first character of each word in <string_value> capitalized

<stripped_string_value> = <string_value>.strip(<char>)

Returns <stripped_string_value> with leading and trailing space, or otherwise indicated, characters removed from <string_value>. Leading and trailing space is the default character, including \t and \n

<split_string_value> = <string_value>.split(<delimiter>)

Returns <split_string_value> with the original <string_value> split at the defined delimiter, if defined. No defined delimiter splits the string at the white spaces

<replaced_string_value> =

<string_value>.replace(<current_char>, <new_char>, <index>)

Returns <string_value> with all instances of the <current_char> in <string_value>, replaced with <new_char>.  <index> is optional and indicates the index of the substring to replace.

 

 

Strings can also contain escape characters, which generally represent illegal characters in a string. For example, double quotes are considered illegal characters because Python already recognizes single and double quotes as string syntax. Alternatively, a user can insert double quotes in a string by preceding that piece of the text with a backslash:

"My message to this planet we call \"Earth\" is Hello!"

 

Escape characters can also define how a string is spaced and distributed. These escape characters are placed directly in the text where the user wants a particular spacing or distribution to occur. Some escape characters include:

String Escape Characters

\n

new line

\t

tab

\\

backslash

\’

single quote, apostrophe

\”

double quote

 

 

Dynamic Typing formats strings that contain variables that are not explicitly defined. These variables may be defined programmatically or as user inputs, and can be represented in a string with curly braces, {}. Examples of how this is achieved are shown below:

 

**A user is prompted to input the <current_month> and <current_day> as string values:

 

% is the placeholder for the variable in the string and 's' indicates that the variable is a string type. In this instance, we have multiple string inputs so the sequence in which the variables are defined determines the sequence in which they occur in the string

'Today\'s date is %s/%s' % (<current_month>, <current_day>)

 

 

Empty string variables are populated sequentially by the format variable's index

'Today\'s date is {}/{}'.format(<current_month>, <current_day>)

 

 

String variables containing integers are defined by the pertaining format variable's index

'Today\'s date is {0}/{1}'.format(<current_month>, <current_day>)

OR

'Today\'s date is {1}/{0}'.format(<current_day>, <current_month>)

 

 

String variables containing variable names are defined by format variable's assigned value

'Today\'s date is {month}/{day}'.format(month=<current_month>, day=<current_day>)

 


The 'f' is used to format the string literal directly (introduced in Python 3)

f'Today\'s date is {<current_month>}/{<current_day>}'



If a float type is being inserted into a string, it can be follow the methods above or be further formatted using the methods below:

 

** A user is prompted to input a price:

 

The price is inserted into a string and formatted to show 2 decimal places

'The price is {:.2f}'.format(price)



The price is inserted into a string literal and formatted to show 3 decimal places

f'The price is {price:.3f}'

 



Data Structures

·       Data structures are containers within which data can be stored. Data within data structures can be standalone data of the types defined above, or other data structures through a process known as nesting

 

·       Terminology

o   Ordered: Container retains the order of the values

o   Mutable: Container values can be changed

o   Heterogeneous: Container multiple more than one data type

o   Duplicates: Container allows repeated values

 

·       The four data structures to know are Lists, Tuples, Dictionaries, and Sets.

 

·       List: An ordered sequence structure for storing a collection of items in a single variable

 

o   It is one of the most versatile and frequently used data types in the language

o   Syntax: [<val_0>, <val_1>]

o   Properties:

§  Ordered: YES

§  Mutable: YES

§  Heterogeneous: YES

§  Duplicates: YES

§  Indexable: YES (integer)

List Methods

list()

Constructor

<list_obj_length> = len(<list_obj>)

Get length of <list_obj>, which is the number of objects in <list_obj>

<value_count> = <list_obj>.count(<value>)

Count how many times a specific <value> appears in <list_obj>

<value > = <list_obj>[index]

Index <list_obj> to access the value in the indicated <index> position

<index> = <list_obj>.index(<value>)

Get the index of a <value> that exists in <list_obj>

<list_obj>.insert(<index>, <value>)

Inserts <value> into the indicated index in <list_obj>, in-place

<list_obj>.remove(<value>)

Removes one (first as it appears) instance of <value> from <list_obj>, in-place

<list_obj>.sort(reverse=<boolean>)

Sorts <list_obj> values, in-place

reverse=True: ascending  |  reverse=False: descending

<new_list_obj> = sorted(<list_obj>, reverse=<boolean>)

Creates <new_list_obj> of <list_obj> values sorted

reverse=True: ascending (default)  |  reverse=False: descending

<popped_val> = <list_obj>.pop(<index>)

Removes the value in the indicated <index> in <list_obj>, returns the value as <popped_val>, and <list_obj> now exists without the value

<list_obj>.append(<value>)

Appends <value> to the end of <list_obj>, in-place

<list_obj>.extend(<other_list>)

Appends all values in <other_list> to the end of <list_obj>, in-place

<new_list_obj> = <list_obj_1> + <list_obj_2>

Combine <list_obj_1> and <list_obj_2> into one <new_list_obj>

 

 

·       Tuple: An ordered sequence structure for storing a collection of items in a single variable

 

o   Syntax: (<val_0>, <val_1>)

o   Properties:

§  Ordered: YES

§  Mutable: NO

§  Heterogeneous: YES

§  Duplicates: YES

§  Indexable: YES (integer)

 

Tuple Methods

tuple()

Constructor

<tuple_object_length> = len(<tuple_obj>)

Get <tuple_object_length> of <tuple_obj>, which is the number of values in <tuple_obj>

<value_count> = <tuple_obj>.count(<value>)

Count how many times a specific <value> appears in <tuple_obj>

<value> = <tuple_obj>[index]

Index <tuple_obj> to get <value> in the indicated <index> position

<index> = <tuple_obj>.index(<value>)

Get the index of a <value> that exists in <tuple_obj>

<new_tuple_obj> = <tuple_obj_1> + (tuple_obj_2)

Combine <tuple_obj_1> and <tuple_obj_2> into one <new_tuple_obj>

A single-value tuple requires a comma to be recognized as a tuple

 

Values can be appended to tuples. However, given their immutable nature, there is a specific process by which this is achieved. Tuples can only be concatenated with other tuples, so the following process satisfies this requirement:

 

1.       Identify the primary tuple

primary_tuple=(1, 2, 3)

 

 

2.       Identify the other tuple within which values to append are contained

other_tuple=('a', 'b', 'c')

 

 

3.       Perform an addition of the primary tuple and other tuple, and assign to a variable

concat_tuple=primary_tuple + other_tuple

 

 

4.       The new tuple contains the primary tuple values and the other tuple values

concat_tuple=(1, 2, 3, 'a', 'b', 'c')

 


This is the only way to append values to an existing tuple. If an individual value is being appended to an existing tuple, it must first be cast as a tuple, thereby satisfying the tuples requirement:

 

primary_tuple=(1, 2, 3, 4)

concat_tuple=primary_tuple + (4, )

concat_tuple=(1, 2, 3, 4)

 

OR

 

primary_tuple=(1, 2, 3, 4)

concat_tuple=primary_tuple + ('a', )

concat_tuple=(1, 2, 3, 'a')

 

 

Note that the parentheses when defining a tuple are optional, as Python recognizes when a tuple is being created:

 

x = 5, 11

 

is the same as

 

x = (5, 11)

 

is the same as

 

x = (11, 5)

 

 

The parentheses indicate how the variables are stored in the collection. When the tuple is a part of another collection, the parentheses are necessary to define the structure. For example:

 

y = [5, 11]

 

is NOT the same as

 

z = [(5, 11)]

 

is NOT the same as

 

a, b = 5, 11

 

 

In the examples above, variable 'y' is a list of values, whereas variable 'z' is a list comprised of a tuple. The variables 'a, b' are assigned integers 5 and 11, respectively, through what is known as destructuring or decomposing.



Mapping

·        Dictionary: Also known as a hashmap, a mapping data structure that stores data in key-value pairs

o   Efficiently retrieves stored data by mapping a key (think of this as an address) to its corresponding value

§  This is significantly more efficient than sequentially iterating through a data structure index by index

 

o   Syntax: {<key>: <value>}

o   Properties:

§  Ordered: Yes 3.7+ (keys will follow the order listed in the source code)

§  Mutable: YES (keys are immutable, values are mutable)

§  Heterogeneous: YES (keys can be different data types)

§  Duplicates: NO (keys must be unique, values can be repeated)

§  Indexable: YES (keys)

 

Dictionary Methods

dict()

Constructor

<all_dict_keys> = <dict_obj>.keys()

Returns a list of all <dict_obj> keys

<all_dict_keys> = list(<dict_obj>)

Returns a list of all <dict_obj> keys

<all_dict_values> = <dict_obj>.values()

Returns a list of all <dict_obj> values

<key_valule_pairs> = <dict_obj>.items()

Returns <dict_obj> key-value pairs, (key, value), as a list of tuples

<value> = <dict_obj>[<key>]

Returns <value> from <dict_obj> at the <key >

<value> = <dict_obj>.get(<key >)

Returns <value> from <dict_obj> at the <key >

<popped_val> = <dict_obj>.pop(<key >)

Removes the value at <key > in <dict_obj>, returns the value as <popped_val>, and <dict_obj> now exists without the key-value pair

<dict_obj>.update({<key >: <value>})

Inserts <value> at <key > in <dict_obj>, in-place

<dict_obj>.clear()

Removes all elements from <dict_obj>, in-place

 

·        Keys and Values don’t have defined data types, so the developer gets to define them in the code design

o   Keys can be any data type (str, int, float, bool, None)

§  Keys are unique

§  Keys are immutable

o   Values can be any data type (str, int, float, bool, None) OR data structure (list, tuple, set, dict)

§  There is no limit to how many data structures you nest, but indexing can become complex when you need to access the dictionary values

 

Set

·       Set: Unordered collections of unique elements, and the structure is designed to test membership and perform mathematical operations, like a Venn Diagram

 

o   Syntax: {<val_0>, <val_1>}

o   Properties:

§  Ordered: NO

§  Mutable: YES

§  Heterogeneous: YES

§  Duplicates: NO

§  Indexable: NO (unordered structure cannot be indexed)

 

Set Methods

set()

Constructor

<set_obj>.add(<value>)

Adds <value> to <set_obj>, in-place

<set_obj>.update(<other_set_obj>)

Inserts values from <other_set> into <set_obj>, in-place

<set_obj>.remove(<value>)

Removes <value> from <set_obj>, raising an error if <value> does not exist, in-place

<set_obj>.discard(<value>)

Removes <value> from <set_obj>, NOT raising an error if <value> does not exist, in-place

<popped_val> = <set_obj>.pop()

Removes the value ‘at random’ from <set_obj>, returns the value as <popped_val>, and <set_obj> now exists without the value

<set_obj>.clear()

Removes all values from <set_obj>, in-place

 

 

Set Value Membership Methods

 

 

<new_set_obj> = <set_obj>.union(<other_set_obj>)

 

 

 

Returns <new_set_obj> with ALL values from

<set_obj> and <other_set_obj>

other

AI-generated content may be incorrect.

<set_obj>.update(<other_set_obj>)

<set_obj> |= <other_set>

Updates <set_obj> in-place, keeping ALL values existing in <set_obj> AND <other_set_obj>

 

 

<new_set_obj> = <set_obj>.difference(<other_set>)

 

 

 

Returns <new_set_obj> with values that exist in

<set_obj> but not in <other_set_obj>

Other Set.

AI-generated content may be incorrect.

<set_obj>. difference_update(<other_set>)

<set_obj> -= <other_set>

Updates <set_obj> in-place, keeping only values existing in <set_obj> BUT NOT in <other_set_obj>

 

 

<new_set_obj> = <set_obj>.symmetric_difference(<other_set>)

 

 

 

Returns <new_set_obj> with values that exist in

<set_obj> OR <other_set_obj>, but not in both

<set_obj>.symmetric_difference_update(<other_set>)

<set_obj> ^= <other_set>

Updates <set_obj> in-place, keeping only values existing in <set_obj> OR <other_set_obj>, not both

 

 

<new_set_obj> = <set_obj>.intersection(<other_set>)

 

 

 

Returns <new_set_obj> with only values that exist in

<set_obj> AND <other_set_obj>

<set_obj>.intersection_update(<other_set>)

<set_obj> &= <other_set>

Updates <set_obj> in-place, keeping only values existing in BOTH <set_obj> AND <other_set_obj>

 

 

NOTE: 1 and True, and 0 and False are considered the same value in sets, so the Boolean value and equivalent numeric value cannot exist in the same set

 

Operators

 

Assignment Operator

=

assignment operator, assigns a value to a variable

 

 

Conditional Statements

Code will step through the comparison block from top to bottom and exit the block once a condition is met, even if the conditions later in the block have not been checked yet.  The order in which a comparison block is built matters!

 

If <conditional_statement>:

              <action to execute>

elif <conditional_statement>:

              <action to execute>

elif <conditional_statement>:

              <action to execute>

else:

              <action to execute for all other conditions not captured above>

 

if

primary comparison (only one per comparison block)

elif

alternative comparison (can be multiple elif statements in a comparison block)

else

all other comparisons (only one per comparison block)

 

 

Comparison Operators

==

determine equality

!=

determine inequality

> 

determine greater than

>=

determine greater than or equal to

< 

determine less than

<=

determine less than or equal to

 

 

Identity Operator

is

determines likeness between values

 

 

Logical Operators

and

combine comparison operators, returns 'True' if both conditions are met

or

combine comparison operators, returns 'True' if at least one condition is met

not

negates a boolean or other binary

 

 

Math Operators

+

Addition

-

Subtraction

*

Multiplication

**

Exponent

/

Division

//

floor division (performs division and rounds down to the integer)

%

modulus (returns the remainder of a division operation)

 

 

 

Helpful Resources