The Python Tutorial
1. Whetting Your Appetite
The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C).
Python allows you to split your program into modules that can be reused in other Python programs. It comes with a large collection of standard modules that you can use as the basis of your programs — or as examples to start learning to program in Python. Some of these modules provide things like file I/O, system calls, sockets, and even interfaces to graphical user interface toolkits like Tk.
The interpreter can be used interactively, which makes it easy to experiment with features of the language, to write throw-away programs, or to test functions during bottom-up program development.
Python is extensible: if you know how to program in C it is easy to add a new built-in function or module to the interpreter, either to perform critical operations at maximum speed, or to link Python programs to libraries that may only be available in binary form (such as a vendor-specific graphics library). Once you are really hooked, you can link the Python interpreter into an application written in C and use it as an extension or command language for that application.
2. Using the Python Interpreter
2.1. Invoking the Interpreter
Typing an end-of-file character (Control-D on Unix, Control-Z on Windows) at the primary prompt causes the interpreter to exit with a zero exit status.
python -c command [arg] ..., which executes the statement(s) in command, analogous to the shell’s -c option.
When a script file is used, it is sometimes useful to be able to run the script and enter interactive mode afterwards. This can be done by passing -i before the script.
2.1.1. Argument Passing
When known to the interpreter, the script name and additional arguments thereafter are turned into a list of strings and assigned to the argv variable in the sys module. You can access this list by executing import sys. The length of the list is at least one; when no script and no arguments are given, sys.argv[0] is an empty string. When the script name is given as ‘-’ (meaning standard input), sys.argv[0] is set to ‘-’. When -c command is used, sys.argv[0] is set to ‘-c’. When -m module is used, sys.argv[0] is set to the full name of the located module. Options found after -c command or -m module are not consumed by the Python interpreter’s option processing but left in sys.argv for the command or module to handle.
2.1.2. Interactive Mode
In this mode it prompts for the next command with the primary prompt, usually three greater-than signs (>>>); for continuation lines it prompts with the secondary prompt, by default three dots (...).
2.2. The Interpreter and Its Environment
2.2.1. Source Code Encoding
the source code starts with a UNIX “shebang” line.
|
|
By default, Python source files are treated as encoded in UTF-8. To declare an encoding other than the default one:
|
|
3. An Informal Introduction to Python
Note that a secondary prompt on a line by itself in an example means you must type a blank line; this is used to end a multi-line command.
Comments in Python start with the hash character, #, and extend to the end of the physical line.
|
|
3.1. Using Python as a Calculator
3.1.1. Numbers
|
|
In addition to int and float, Python supports other types of numbers, such as Decimal and Fraction. Python also has built-in support for complex numbers, and uses the j or J suffix to indicate the imaginary part (e.g. 3+5j).
3.1.2. Strings
They can be enclosed in single quotes ('...') or double quotes ("...") with the same result. Unlike other languages, special characters such as \n have the same meaning with both single ('...') and double ("...") quotes.
If you don’t want characters prefaced by \ to be interpreted as special characters, you can use raw strings by adding an r before the first quote:
|
|
String literals can span multiple lines. One way is using triple-quotes: """...""" or '''...'''. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line.
|
|
Strings can be concatenated (glued together) with the + operator, and repeated with *:
|
|
|
|
|
|
Strings can be indexed (subscripted), with the first character having index 0, last character having index -1. Attempting to use an index that is too large will result in an error:
|
|
slicing: string_variable[included_start : excluded_end] . Out of range slice indexes are handled gracefully when used for slicing. Note how the start is always included, and the end always excluded.
|
|
Python strings cannot be changed — they are immutable. Therefore, assigning to an indexed position in the string results in an error. Immutable objects include numbers, strings and tuples. A new object has to be created if a different value has to be stored.
The built-in function len(s) returns the length of a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a disctionary, set, or frozen set).
|
|
Common Sequence Operations
In the table, s and t are sequences of the same type, n, i, j and k are integers and x is an arbitrary object.
| Operation | Result | Notes |
|---|---|---|
| x in s | True if an item of s is equal to x, else False | (1) |
| x not in s | False if an item of s is equal to x, else True | (1) |
| s + t | the concatenation of s and t | (6)(7) |
| s * n or n * s | equivalent to adding s to itself n times | (2)(7) |
| s[i] | ith item of s, origin 0 | (3) |
| s[i:j] | slice of s from i to j | (3)(4) |
| s[i:j:k] | slice of s from i to j with step k | (3)(5) |
| len(s) | length of s | |
| min(s) | smallest item of s | |
| max(s) | largest item of s | |
| s.index(x[, i[, j]]) | index of the first occurrence of x in s (at or after index i and before index j) | (8) |
| s.count(x) | total number of occurrences ofx in s |
(2) Values of n less than 0 are treated as 0 (which yields an empty sequence of the same type as s). Note that items in the sequence s are not copied; they are referenced multiple times. This often haunts new Python programmers; consider:
What has happened is that [[]] is a one-element list containing an empty list, so all three elements of [[]] * 3 are references to this single empty list. Modifying any of the elements of lists modifies this single list. You can create a list of different lists this way:
(6) Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost, you must switch to one of the alternatives below:
- if concatenating
strobjects, you can build a list and usestr.join()at the end or else write to anio.StringIOinstance and retrieve its value when complete - if concatenating
bytesobjects, you can similarly usebytes.join()orio.BytesIO, or you can do in-place concatenation with abytearrayobject.bytearrayobjects are mutable and have an efficient overallocation mechanism - if concatenating tuple objects, extend a
listinstead - for other types, investigate the relevant class documentation
(7) range don’t support sequence concatenation or repetition.
3.1.3. Lists
Lists might contain items of different types, but usually the items all have the same type.
List support sequence operation, can be indexed, sliced, concatenated. All slice operations return a new list.
|
|
Lists are a mutable type.
|
|
You can also add new items at the end of the list, by using the append() method
|
|
Assignment to slices is also possible, and this can even change the size of the list or clear it entirely.
|
|
The built-in function len() also applies to lists:
|
|
It is possible to nest lists (create lists containing other lists)
|
|
3.2. First Steps Towards Programming
a, b = b, a+b
while condition:
Any non-zero integer value is true; zero is false. Anything with a non-zero length is true, empty sequences are false.
<(less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to).
Each line within a basic block must be indented by the same amount.
4. More Control Flow Tools
4.1. if Statements
|
|
An if … elif … elif … sequence is a substitute for the switch or case statements found in other languages.
4.2. for Statements
The for statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable object:
for target_list in expression_list:
suit
Names in the target list are not deleted when the loop is finished.
the built-in function range() returns an iterator of integers suitable to emulate.
There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]: if x < 0: a.remove(x)
# then a changed
4.3. The range() Function
The given end point is never part of the generated sequence;
class range(start, stop[, step])
The built-in function range() iterate over a sequence of numbers.
4.4. break and continue Statements, and else Clauses on Loops
Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.(Yes, this is the correct code. Look closely: the else clause belongs to the for loop, not the if statement.)
When used with a loop, the else clause has more in common with the else clause of a try statement than it does that of if statements: a try statement’s else clause runs when no exception occurs, and a loop’s else clause runs when no break occurs.
4.5. pass Statements
The pass statement does nothing. It can be used when a statement is required syntactically but the program requires no action.This is commonly used for creating minimal classes. Another place pass can be used is as a place-holder for a function or conditional body when you are working on new code, allowing you to keep thinking at a more abstract level.
4.6. Defining Functions
The keyword def introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented. The first statement of the function body can optionally be a string literal; this string literal is the function’s documentation string, or docstring.
The execution of a function introduces a new symbol table used for the local variables of the function. More precisely, all variable assignments in a function store the value in the local symbol table; whereas variable references first look in the local symbol table, then in the local symbol tables of enclosing functions, then in the global symbol table, and finally in the table of built-in names. Thus, global variables cannot be directly assigned a value within a function (unless named in a global statement), although they may be referenced.
The actual parameters (arguments) to a function call are introduced in the local symbol table of the called function when it is called; thus, arguments are passed using call by value (where the value is always an object reference, not the value of the object). [1] When a function calls another function, a new local symbol table is created for that call.
Actually, call by object reference would be a better description, since if a mutable object is passed, the caller will see any changes the callee makes to it (items inserted into a list).
———————————————- passe argument using call by object ———————————————————
在python中, parameter sent to function 使用的全部是 by object。
也就是,这无法通过by value或者 by reference 来定义。这是python的独到之处。
如果object本身是immutable的,例如一个不是太长的整数,那么你可以看作是传值。因为每一次对这个object赋值,都会创建一个新的object,如下:
a=10
def function1(value):
value=20
print(value)
function1(a)
print(a)
结果是
20
10
虽然传过去的是a这个object,但当function1对a赋值的时候,其实他并没有改变a,而是创建了一个新的object,这个object叫做value了。global当中的a并没有变。
如果object本身是mutable,例如一个list,因为每一次对这个object赋值,都会改变这个object本身。那么就可以看作是传reference。如下:
a=[10,11,12,13]
def function1(value):
value[1:3]=[]
print(value)
function1(a)
print(a)
结果是
[10,13]
[10,13]
list.reverse 是一个in-place method。也就是说,reverse是在原来object上操作,而不会创造一个新的object。上面t=s,按照python传object的标准,那么就是t=s 是同一个object。.reverse作用在这个object上,那么t,s都变了。他们只是名字而已。而slicing [::] 这个,会创造一个新的object。最好的办法是deep copy
-——————————————————————————————————————————————————–
A function definition introduces the function name in the current symbol table. The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function.
In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name).
4.7. More on Defining Functions
4.7.1. Default Argument Values
i = 5
def f(arg=i):
print(arg)
i = 6
f()
will print 5.
Important warning: The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent calls:
def f(a, L=[]):
L.append(a)
return L
print(f(1))
print(f(2))
print(f(3))
[1]
[1, 2]
[1, 2, 3]
If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:
def f(a, L=None):
if L is None:
L = []
L.append(a)
return L
4.7.2. Keyword Arguments
Functions can also be called using keyword arguments of the form kwarg=value.
parrot(1000) # 1 positional argument
parrot(voltage=1000) # 1 keyword argument
In a function call, keyword arguments must follow positional arguments. All the keyword arguments passed must match one of the arguments accepted by the function, and their order is not important.
4.7.3. Arbitrary Argument Lists
When a final formal parameter of the form **name is present, it receives a dictionary (see Mapping Types — dict) containing all keyword arguments except for those corresponding to a formal parameter. This may be combined with a formal parameter of the form *name which receives a tuple containing the positional arguments beyond the formal parameter list. (*name must occur before **name.) For example, if we define a function like this:
def cheeseshop(kind, *arguments, **keywords):
print("– Do you have any", kind, “?”)
print("– I’m sorry, we’re all out of", kind)
for arg in arguments:
print(arg)
print("-" * 40)
keys = sorted(keywords.keys())
for kw in keys:
print(kw, “:”, keywords[kw])
>» def concat(*args, sep="/"):
… return sep.join(args)
…
>» concat(“earth”, “mars”, “venus”)
’earth/mars/venus’
>» concat(“earth”, “mars”, “venus”, sep=".")
’earth.mars.venus’
Any formal parameters which occur after the *args parameter are ‘keyword-only’ arguments, meaning that they can only be used as keywords rather than positional arguments.
4.7.4. Unpacking Argument Lists
>» list(range(3, 6)) # normal call with separate arguments
[3, 4, 5]
>» args = [3, 6]
>» list(range(*args)) # call with arguments unpacked from a list
[3, 4, 5]
In the same fashion, dictionaries can deliver keyword arguments with the **-operator:
>» def parrot(voltage, state=‘a stiff’, action=‘voom’):
… print("– This parrot wouldn’t", action, end=’ ‘)
… print(“if you put”, voltage, “volts through it.”, end=’ ‘)
… print(“E’s”, state, “!”)
…
>» d = {“voltage”: “four million”, “state”: “bleedin’ demised”, “action”: “VOOM”}
>» parrot(**d)
4.7.5. Lambda Expressions
Small anonymous functions can be created with the lambda keyword.
lambda argument_lists : expressions.
Lambda functions can be used wherever function objects are required. uses a lambda expression to return a function. or pass a small function as an argument:
4.7.6. Documentation Strings
The first line should always be a short, concise summary of the object’s purpose.
If there are more lines in the documentation string, the second line should be blank, visually separating the summary from the rest of the description.
Equivalence of whitespace should be tested after expansion of tabs (to 8 spaces, normally).
4.7.7. Function Annotations
Annotations are stored in the annotations attribute of the function as a dictionary and have no effect on any other part of the function. Parameter annotations are defined by a colon after the parameter name, followed by an expression evaluating to the value of the annotation. Return annotations are defined by a literal ->, followed by an expression, between the parameter list and the colon denoting the end of the def statement.
>» def f(ham: str, eggs: str = ’eggs’) -> str:
… print(“Annotations:”, f.annotations)
… print(“Arguments:”, ham, eggs)
… return ham + ’ and ’ + eggs
…
>» f(‘spam’)
Annotations: {‘ham’: <class ‘str’>, ‘return’: <class ‘str’>, ’eggs’: <class ‘str’>}
Arguments: spam eggs
‘spam and eggs’
4.8. Intermezzo: Coding Style
For Python, PEP 8 has emerged as the style guide that most projects adhere to; it promotes a very readable and eye-pleasing coding style. Every Python developer should read it at some point;
- Use 4-space indentation, and no tabs.
- Wrap lines so that they don’t exceed 79 characters.
- Use blank lines to separate functions and classes, and larger blocks of code inside functions.
- When possible, put comments on a line of their own.
- Use docstrings.
- Use spaces around operators and after commas, but not directly inside bracketing constructs: a = f(1, 2) + g(3, 4).
- Name your classes and functions consistently; the convention is to use CamelCase for classes and lower_case_with_underscores for functions and methods. Always use self as the name for the first method argument (see A First Look at Classes for more on classes and methods)
- Don’t use fancy encodings if your code is meant to be used in international environments. Python’s default, UTF-8, or even plain ASCII work best in any case.
- Likewise, don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code.
5. Data Structures
5.1. More on Lists
list.append(x)
list.extend(L)
list.insert(i, x) # The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).
list.remove(x) # Remove the first item from the list whose value is x. It is an error if there is no such item.
list.pop([i]) # Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list.
list.clear() # Equivalent to del a[:].
list.index(x) # Return the index in the list of the first item whose value is x. It is an error if there is no such item.
list.count(x)
list.sort(key=None, reverse=False) # Sort the items of the list in place.
sorted(iterable[, key][, reverse]) Return a new sorted list from the items in iterable. key specifies a function of one argument . reverse is a boolean value.
list.reverse() # in place.
list.copy() # shallow copy
insert, remove or sort that only modify the list have no return value
5.1.1. Using Lists as Stacks
append() pop()
5.1.2. Using Lists as Queues
>» from collections import deque
>» queue = deque([“Eric”, “John”, “Michael”])
>» queue.append(“Graham”) # Graham arrives
>» queue.popleft() # The first to arrive now leaves
‘Eric’
5.1.3. List Comprehensions
①
>» squares = []
>» for x in range(10):
… squares.append(x**2)
②
squares = list(map(lambda x: x**2, range(10)))
③
squares = [x**2 for x in range(10)]
④
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses.
>» [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
5.1.4. Nested List Comprehensions
>» matrix = [
… [1, 2, 3, 4],
… [5, 6, 7, 8],
… [9, 10, 11, 12],
… ]
>» [[row[i] for row in matrix] for i in range(4)]
The zip() function would do a great job for this use case:
>» list(zip(*matrix))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]
5.2. The del statement
>» a = [-1, 1, 66.25, 333, 333, 1234.5]
>» del a[0]
>» a
[1, 66.25, 333, 333, 1234.5]
>» del a[2:4]
>» a
[1, 66.25, 1234.5]
>» del a[:]
>» a
[]
5.3. Tuples and Sequences
A tuple consists of a number of values separated by commas.Tuples may be nested.Tuples are immutable.
>» t = 12345, 54321, ‘hello!’
>» t[0]
they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression).
Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of namedtuples). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.
>» empty = () # Empty tuples are constructed by an empty pair of parentheses
>» singleton = ‘hello’, # a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses)
>» x, y, z = t # unpacking
Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
5.4. Sets
A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.
Curly braces or the set() function can be used to create sets. Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary
>» basket = {‘apple’, ‘orange’, ‘apple’, ‘pear’, ‘orange’, ‘banana’}
>» print(basket) # show that duplicates have been removed
{‘orange’, ‘banana’, ‘pear’, ‘apple’}
>» ‘orange’ in basket # fast membership testing
True
>» ‘crabgrass’ in basket
False
>» # Demonstrate set operations on unique letters from two words
…
>» a = set(‘abracadabra’)
>» b = set(‘alacazam’)
>» a # unique letters in a
{‘a’, ‘r’, ‘b’, ‘c’, ’d’}
>» a - b # letters in a but not in b
{‘r’, ’d’, ‘b’}
>» a | b # letters in either a or b
{‘a’, ‘c’, ‘r’, ’d’, ‘b’, ’m’, ‘z’, ’l’}
>» a & b # letters in both a and b
{‘a’, ‘c’}
>» a ^ b # letters in a or b but not both
{‘r’, ’d’, ‘b’, ’m’, ‘z’, ’l’}
Similarly to list comprehensions, set comprehensions are also supported:
>» a = {x for x in ‘abracadabra’ if x not in ‘abc’}
>» a
{‘r’, ’d’}
5.5. Dictionaries
Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key.
A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary.
It is also possible to delete a key:value pair with del. It is an error to extract a value using a non-existent key.
Performing list(d.keys()) on a dictionary returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just use sorted(d.keys()) instead). [2] To check whether a single key is in the dictionary, use the in keyword.
>» ‘guido’ in tel
>» ‘jack’ not in tel
The dict() constructor builds dictionaries directly from sequences of key-value pairs:
>»
>» dict([(‘sape’, 4139), (‘guido’, 4127), (‘jack’, 4098)])
{‘sape’: 4139, ‘jack’: 4098, ‘guido’: 4127}
In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions:
>»
>» {x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}
When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:
>»
>» dict(sape=4139, guido=4127, jack=4098)
{‘sape’: 4139, ‘jack’: 4098, ‘guido’: 4127}
5.6. Looping Techniques
When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the items() method.
>» knights = {‘gallahad’: ’the pure’, ‘robin’: ’the brave’}
>» for k, v in knights.items():
When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the enumerate() function.
enumerate(iterable, start=0) Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration.
>» for i, v in enumerate([’tic’, ’tac’, ’toe’]):
… print(i, v)
To loop over two or more sequences at the same time, the entries can be paired with the zip() function.
>» questions = [’name’, ‘quest’, ‘favorite color’]
>» answers = [’lancelot’, ’the holy grail’, ‘blue’]
>» for q, a in zip(questions, answers):
… print(‘What is your {0}? It is {1}.’.format(q, a))
To loop over a sequence in reverse, call the reversed() function.
To loop over a sequence in sorted order, use the sorted() function which returns a new sorted list.
It is sometimes tempting to change a list while you are looping over it; however, it is often simpler and safer to create a new list instead, then append after loop and check.
5.7. More on Conditions
comparison operators:
The comparison operators in and not in check whether a value occurs (does not occur) in a sequence. The operators is and is not compare whether two objects are really the same object; this only matters for mutable objects like lists.
Comparisons can be chained. For example, a < b == c tests whether a is less than b and moreover b equals c.
Boolean operators:
Comparisons may be combined using the Boolean operators and and or, and the outcome of a comparison (or of any other Boolean expression) may be negated with not.
between them, not has the highest priority and or the lowest, so that A and not B or C is equivalent to (A and (not B)) or C. As always, parentheses can be used to express the desired composition.
The Boolean operators and and or are so-called short-circuit operators.
When used as a general value and not as a Boolean, the return value of a short-circuit operator is the last evaluated argument.
5.8. Comparing Sequences and Other Types
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Note that comparing objects of different types with < or > is legal provided that the objects have appropriate comparison methods. Otherwise, rather than providing an arbitrary ordering, the interpreter will raise a TypeError exception.
6. Modules
As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable name.
>» import fibo
This does not enter the names of the functions defined in fibo directly in the current symbol table; it only enters the module name fibo there. Using the module name you can access the functions.
>» fibo.fib(1000)
If you intend to use a function often you can assign it to a local name.
>» fib = fibo.fib
6.1. More on Modules
In fact function definitions are also ‘statements’ that are ‘executed’; the execution of a module-level function definition enters the function name in the module’s global symbol table.They are executed only the first time the module name is encountered in an import statement.
>» from fibo import fib, fib2
>» from fibo import *
This imports all names except those beginning with an underscore (_). Python programmers do not use this facility since it introduces an unknown set of names into the interpreter, possibly hiding some things you have already defined.
Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module. modname.itemname.\
For efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you must restart the interpreter – or, if it’s just one module you want to test interactively, use importlib.reload(), e.g. import importlib; importlib.reload(modulename).
6.1.1. Executing modules as scripts
When you run a Python module with python fibo.py
if name == “main”:
import sys
fib(int(sys.argv[1]))
By adding this code at the end of your module, you can make the file usable as a script as well as an importable module.
If the module is imported, the code is not run.
6.1.2. The Module Search Path
When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:
The directory containing the input script (or the current directory when no file is specified).
PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
The installation-dependent default.
the directory containing the symlink is not added to the module search path.
After initialization, Python programs can modify sys.path. The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory. This is an error unless the replacement is intended.
6.1.3. “Compiled” Python files
To speed up loading modules, Python caches the compiled version of each module in the pycache directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number.
You can use the -O or -OO switches on the Python command to reduce the size of a compiled module. The -O switch removes assert statements, the -OO switch removes both assert statements and doc strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have an opt- tag and are usually smaller. Future releases may change the effects of optimization.
A program doesn’t run any faster when it is read from a .pyc file than when it is read from a .py file; the only thing that’s faster about .pyc files is the speed with which they are loaded.
The module compileall can create .pyc files for all modules in a directory.
6.2. Standard Modules
Python comes with a library of standard modules, described in a separate document, the Python Library Reference (“Library Reference” hereafter).
One particular module deserves some attention: sys which is built into every Python interpreter. The variables sys.ps1 and sys.ps2 define the strings used as primary and secondary prompts. These two variables are only defined if the interpreter is in interactive mode.
The variable sys.path is a list of strings that determines the interpreter’s search path for modules. It is initialized to a default path taken from the environment variable PYTHONPATH, or from a built-in default if PYTHONPATH is not set.
6.3. The dir() Function
The built-in function dir(imported_module) is used to find out which names a module defines. It returns a sorted list of strings.
Without arguments, dir() lists the names you have defined currently. it lists all types of names: variables, modules, functions, etc.
>» a = [1, 2, 3, 4, 5]
>» import fibo
>» fib = fibo.fib
>» dir()
[’builtins’, ‘name’, ‘a’, ‘fib’, ‘fibo’, ‘sys’]
dir() does not list the names of built-in functions and variables. If you want a list of those, they are defined in the standard module builtins:
>» import builtins
>» dir(builtins)
6.4. Packages
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
sound/ Top-level package
init.py Initialize the sound package
formats/ Subpackage for file format conversions
init.py
wavread.py
wavwrite.py
aiffread.py
aiffwrite.py
auread.py
auwrite.py
…
effects/ Subpackage for sound effects
init.py
echo.py
surround.py
reverse.py
…
filters/ Subpackage for filters
init.py
equalizer.py
vocoder.py
karaoke.py
When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.
The init.py files are required to make Python treat the directories as containing packages. In the simplest case, init.py can just be an empty file, but it can also execute initialization code for the package or set the all variable, described later.
Users of the package can import individual modules from the package, for example:
import sound.effects.echo
This loads the submodule sound.effects.echo. It must be referenced with its full name.
sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)
An alternative way of importing the submodule is:
from sound.effects import echo
This also loads the submodule echo, and makes it available without its package prefix, so it can be used as follows:
echo.echofilter(input, output, delay=0.7, atten=4)
Yet another variation is to import the desired function or variable directly:
from sound.effects.echo import echofilter
Again, this loads the submodule echo, but this makes its function echofilter() directly available:
echofilter(input, output, delay=0.7, atten=4)
Note that when using from package import item, the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable. The import statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an ImportError exception is raised.
Contrarily, when using syntax like import item.subitem.subsubitem, each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item.
6.4.1. Importing * From a Package
The import statement uses the following convention: if a package’s init.py code defines a list named all, it is taken to be the list of module names that should be imported when from package import * is encountered. It is up to the package author to keep this list up-to-date when a new version of the package is released. Package authors may also decide not to support it, if they don’t see a use for importing * from their package.
If all is not defined, the statement from sound.effects import * does not import all submodules from the package sound.effects into the current namespace; it only ensures that the package sound.effects has been imported (possibly running any initialization code in init.py) and then imports whatever names are defined in the package. This includes any names defined (and submodules explicitly loaded) by init.py. It also includes any submodules of the package that were explicitly loaded by previous import statements.
using from Package import specific_submodule! In fact, this is the recommended notation unless the importing module needs to use submodules with the same name from different packages.
6.4.2. Intra-package References
When packages are structured into subpackages (as with the sound package in the example), you can use absolute imports to refer to submodules of siblings packages.
You can also write relative imports, with the from module import name form of import statement. These imports use leading dots to indicate the current and parent packages involved in the relative import.
from . import echo from .. import formats from ..filters import equalizer
Note that relative imports are based on the name of the current module. Since the name of the main module is always “main”, modules intended for use as the main module of a Python application must always use absolute imports.
if name == ‘main’:
6.4.3. Packages in Multiple Directories
Packages support one more special attribute, path. This is initialized to be a list containing the name of the directory holding the package’s init.py before the code in that file is executed. This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.
\7. Input and Output
data can be printed in a human-readable form, or written to a file for future use.
7.1. Fancier Output Formatting
two ways of writing values: expression statements and the print() function. (A third way is using the write() method of file objects; the standard output file can be referenced as sys.stdout.
There are two ways to format your output; the first way is to do all the string handling yourself; using string slicing and concatenation operations you can create any layout you can imagine. The string type has some methods that perform useful operations for padding strings to a given column width; these will be discussed shortly. The second way is to use the str.format() method.
convert any value to a string: pass it to the repr() or str() functions.
Many values, such as numbers or structures like lists and dictionaries, have the same representation using either function. Strings, in particular, have two distinct representations.
>» s = ‘Hello, world.’ >» str(s) ‘Hello, world.’ >» repr(s) “‘Hello, world.’”
>» # The repr() of a string adds string quotes and backslashes: … hello = ‘hello, world\n’ >» hellos = repr(hello) >» print(hellos) ‘hello, world\n’
>» # The argument to repr() may be any Python object: … repr((x, y, (‘spam’, ’eggs’))) “(32.5, 40000, (‘spam’, ’eggs’))”
This example demonstrates the str.rjust() method of string objects, which right-justifies a string in a field of a given width by padding it with spaces on the left. There are similar methods str.ljust() and str.center(). These methods do not write anything, they just return a new string. If the input string is too long, they don’t truncate it, but return it unchanged.
>» for x in range(1, 11): … print(repr(x).rjust(2), repr(xx).rjust(3), end=’ ‘) # it adds spaces between its arguments … # Note use of ’end’ on previous line … print(repr(xx*x).rjust(4)) … 1 1 1 2 4 8 3 9 27 4 16 64 5 25 125 6 36 216 7 49 343 8 64 512 9 81 729 10 100 1000
There is another method, str.zfill(), which pads a numeric string on the left with zeros. It understands about plus and minus signs
>» ‘12’.zfill(5) ‘00012’ >» ‘-3.14’.zfill(7) ‘-003.14’ >» ‘3.14159265359’.zfill(5) ‘3.14159265359’
>» print(‘We are the {} who say “{}!”’.format(‘knights’, ‘Ni’))
>» print(’{1} and {0}’.format(‘spam’, ’eggs’))
>» print(‘This {food} is {adjective}.’.format(food=‘spam’, adjective=‘absolutely horrible’))
‘!a’ (apply ascii()), ‘!s’ (apply str()) and ‘!r’ (apply repr()) can be used to convert the value before it is formatted.
>» contents = ’eels’ >» print(‘My hovercraft is full of {!r}.’.format(contents)) My hovercraft is full of ’eels’.
An optional ‘:’ and format specifier can follow the field name. This allows greater control over how the value is formatted. The following example rounds Pi to three places after the decimal.
>» import math >» print(‘The value of PI is approximately {0:.3f}.’.format(math.pi)) The value of PI is approximately 3.142.
Passing an integer after the ‘:’ will cause that field to be a minimum number of characters wide.
>» table = {‘Sjoerd’: 4127, ‘Jack’: 4098, ‘Dcab’: 7678} >» for name, phone in table.items(): … print(’{0:10} ==> {1:10d}’.format(name, phone)) … Jack ==> 4098 Dcab ==> 7678 Sjoerd ==> 4127
If you have a really long format string that you don’t want to split up, it would be nice if you could reference the variables to be formatted by name instead of by position
>» table = {‘Sjoerd’: 4127, ‘Jack’: 4098, ‘Dcab’: 8637678}
>» print(‘Jack: {0[Jack]:d}; Sjoerd: {0[Sjoerd]:d}; '
… ‘Dcab: {0[Dcab]:d}’.format(table))
This could also be done by passing the table as keyword arguments with the ‘**’ notation.
>» table = {‘Sjoerd’: 4127, ‘Jack’: 4098, ‘Dcab’: 8637678}
>» print(‘Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}’.format(**table))
This is particularly useful in combination with the built-in function vars(), which returns a dictionary containing all local variables.
7.2. Reading and Writing Files
f = open(filename, mode) returns a file object
mode can be ‘r’ when the file will only be read, ‘w’ for only writing (an existing file with the same name will be erased), and ‘a’ opens the file for appending; any data written to the file is automatically added to the end. ‘r+’ opens the file for both reading and writing. The mode argument is optional; ‘r’ will be assumed if it’s omitted.
If encoding is not specified, the default is platform dependent. ‘b’ appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.
In text mode, the default when reading is to convert platform-specific line endings (\n on Unix, \r\n on Windows) to just \n. When writing in text mode, the default is to convert occurrences of \n back to platform-specific line endings. This behind-the-scenes modification to file data is fine for text files, but will corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files.
7.2.1. Methods of File Objects
str = f.read([size]) # which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode).
If the end of the file has been reached, f.read() will return an empty string (’’).
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by ‘\n’, a string containing only a single newline.
For reading lines from a file, you can loop over the file object.
>» for line in f:
… print(line, end=’’)
If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().
f.write(string) writes the contents of string to the file, returning the number of characters written.
f.tell() returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode.
To change the file object’s position, use f.seek(offset, from_what).The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.
>» f.seek(-3, 2) # Go to the 3rd byte before the end
In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.
When you’re done with a file, call f.close() to close it and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file object will automatically fail.
>» f.close()
>» f.read()
Traceback (most recent call last):
File “
ValueError: I/O operation on closed file
It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks:
>» with open(‘workfile’, ‘r’) as f:
… read_data = f.read()
>» f.closed
True
File objects have some additional methods, such as isatty() and truncate() which are less frequently used; consult the Library Reference for a complete guide to file objects.
7.2.2. Saving structured data with json
int(), which takes a string like ‘123’ and returns its numeric value 123.
Rather than having users constantly writing and debugging code to save complicated data types to files, Python allows you to use the popular data interchange format called JSON (JavaScript Object Notation). The standard module called json can take Python data hierarchies, and convert them to string representations; this process is called serializing. Reconstructing the data from the string representation is called deserializing. Between serializing and deserializing, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine.
The JSON format is commonly used by modern applications to allow for data exchange.
dumps() function view its JSON string representation with a simple line of code.
json.dumps([1, ‘simple’, ’list’])
‘[1, “simple”, “list”]’
dump(), simply serializes the object to a text file. So if f is a text file object opened for writing,
json.dump(x, f)
To decode the object again, if f is a text file object which has been opened for reading:
x = json.load(f)
This simple serialization technique can handle lists and dictionaries, but serializing arbitrary class instances in JSON requires a bit of extra effort. The reference for the json module contains an explanation of this.
Contrary to JSON, pickle is a protocol which allows the serialization of arbitrarily complex Python objects.
\8. Errors and Exceptions
8.1. Syntax Errors
The error is caused by (or at least detected at) the token preceding the arrow.
8.2. Exceptions
Errors detected during execution are called exceptions and are not unconditionally fatal: you will soon learn how to handle them in Python programs.
Exceptions come in different types, and the type is printed as part of the message.
Standard exception names are built-in identifiers (not reserved keywords).
Built-in Exceptions see the standard library 5. Built-in Exceptions.
8.3. Handling Exceptions
the following example asks the user for input until a valid integer has been entered, but allows the user to interrupt the program (using Control-C or whatever the operating system supports); note that a user-generated interruption is signalled by raising the KeyboardInterrupt exception.
>» while True:
… try:
… x = int(input(“Please enter a number: “))
… break
… except ValueError:
… print(“Oops! That was no valid number. Try again…”)
If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.
A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. An except clause may name multiple exceptions as a parenthesized tuple.
A class in an except clause is compatible with an exception if it is the same class or a base class thereof (but not the other way around — an except clause listing a derived class is not compatible with a base class). For example, the following code will print B, C, D in that order:
class B(Exception):
pass
class C(B):
pass
class D(C):
pass
for cls in [B, C, D]:
try:
raise cls()
except D:
print(“D”)
except C:
print(“C”)
except B:
print(“B”)
Note that if the except clauses were reversed (with except B first), it would have printed B, B, B — the first matching except clause is triggered.
The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! It can also be used to print an error message and then re-raise the exception (allowing a caller to handle the exception as well):
import sys
try:
f = open(‘myfile.txt’)
s = f.readline()
i = int(s.strip())
except OSError as err:
print(“OS error: {0}".format(err))
except ValueError:
print(“Could not convert data to an integer.”)
except:
print(“Unexpected error:”, sys.exc_info()[0])
raise
The try … except statement has an optional else clause, which, when present, must follow all except clauses. It is useful for code that must be executed if the try clause does not raise an exception.
for arg in sys.argv[1:]: try:
f = open(arg, ‘r’)
except IOError:
print(‘cannot open’, arg)
else:
print(arg, ‘has’, len(f.readlines()), ’lines’)
f.close()
The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try … except statement.
When an exception occurs, it may have an associated value, also known as the exception’s argument. The presence and type of the argument depend on the exception type.
The except clause may specify a variable after the exception name. The variable is bound to an exception instance with the arguments stored in instance.args. For convenience, the exception instance defines str() so the arguments can be printed directly without having to reference .args.
8.4. Raising Exceptions
The raise statement allows the programmer to force a specified exception to occur.
The sole argument to raise indicates the exception to be raised. This must be either an exception instance or an exception class (a class that derives from Exception).
>» raise NameError(‘HiThere’)
raise ValueError # shorthand for ‘raise ValueError()’
If you need to determine whether an exception was raised but don’t intend to handle it, a simpler form of the raise statement allows you to re-raise the exception:
>» try:
… raise NameError(‘HiThere’)
… except NameError:
… print(‘An exception flew by!’)
… raise
8.5. User-defined Exceptions
Programs may name their own exceptions by creating a new exception class. Exceptions should typically be derived from the Exception class, either directly or indirectly. When creating a module that can raise several distinct errors, a common practice is to create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions.
Most exceptions are defined with names that end in “Error,” similar to the naming of the standard exceptions. Many standard modules define their own exceptions to report errors that may occur in functions they define.
8.6. Defining Clean-up Actions
The try statement has another optional clause which is intended to define clean-up actions that must be executed under all circumstances. A finally clause is always executed before leaving the try statement, whether an exception has occurred or not (it is re-raised after the finally clause has been executed).
The finally clause is also executed “on the way out” when any other clause of the try statement is left via a break, continue or return statement.
In real world applications, the finally clause is useful for releasing external resources (such as files or network connections), regardless of whether the use of the resource was successful.
8.7. Predefined Clean-up Actions
Some objects define standard clean-up actions to be undertaken when the object is no longer needed, regardless of whether or not the operation using the object succeeded or failed.
with open(“myfile.txt”) as f:
After the statement is executed, the file f is always closed, even if a problem was encountered while processing the lines.
\9. Classes
the class inheritance mechanism allows multiple base classes, a derived class can override any methods of its base class or classes, and a method can call the method of a base class with the same name. Objects can contain arbitrary amounts and kinds of data.
①the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call. ②classes themselves are objects. This provides semantics for importing and renaming.
③most built-in operators with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances. ④ normally class members (including the data members) are public (except see below Private Variables), and all member functions are virtual. ⑤ built-in types can be used as base classes for extension by the user.
9.1. A Word About Names and Objects
Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has a possibly surprising effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most other types.
9.2. Python Scopes and Namespaces
A namespace is a mapping from names to objects. Examples of namespaces are: the set of built-in names (containing functions such as abs(), and built-in exception names); the global names in a module; and the local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to know about namespaces is that there is absolutely no relation between names in different namespaces.
use the word attribute for any name following a dot — for example, in the expression z.real, real is an attribute of the object z. Strictly speaking, references to names in modules are attribute references: in the expression modname.funcname, modname is a module object and funcname is an attribute of it.
Module objects have a secret read-only attribute called dict which returns the dictionary used to implement the module’s namespace;
Attributes may be read-only or writable. In the latter case, assignment to attributes is possible. Module attributes are writable: you can write modname.the_answer = 42. Writable attributes may also be deleted with the del statement. For example, del modname.the_answer will remove the attribute the_answer from the object named by modname.
Namespaces are created at different moments and have different lifetimes. The namespace containing the built-in names is created when the Python interpreter starts up, and is never deleted. The global namespace for a module is created when the module definition is read in; normally, module namespaces also last until the interpreter quits. The statements executed by the top-level invocation of the interpreter, either read from a script file or interactively, are considered part of a module called main, so they have their own global namespace. (The built-in names actually also live in a module; this is called builtins.)
The local namespace for a function is created when the function is called, and deleted when the function returns or raises an exception that is not handled within the function. Of course, recursive invocations each have their own local namespace.
A scope is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace.
Although scopes are determined statically, they are used dynamically. At any time during execution, there are at least three nested scopes whose namespaces are directly accessible:
- the innermost scope, which is searched first, contains the local names
- the scopes of any enclosing functions, which are searched starting with the nearest enclosing scope, contains non-local, but also non-global names
- the next-to-last scope contains the current module’s global names
- the outermost scope (searched last) is the namespace containing built-in names
If a name is declared global, then all references and assignments go directly to the middle scope containing the module’s global names. To rebind variables found outside of the innermost scope, the nonlocal statement can be used; if not declared nonlocal, those variables are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).
Class definitions place yet another namespace in the local scope.
It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module’s namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically, at run time.
if no global statement is in effect – assignments to names always go into the innermost scope. Assignments do not copy data — they just bind names to objects. The same is true for deletions: the statement del x removes the binding of x from the namespace referenced by the local scope. In fact, all operations that introduce new names use the local scope: in particular, import statements and function definitions bind the module or function name in the local scope.
The global statement can be used to indicate that particular variables live in the global scope and should be rebound there; the nonlocal statement indicates that particular variables live in an enclosing scope and should be rebound there. nonlocal语句表示否定当前命名空间的作用域,寻找父函数的作用域并绑定对象。
9.2.1. Scopes and Namespaces Example
def scope_test(): def do_local(): spam = “local spam”
def do_nonlocal(): nonlocal spam spam = “nonlocal spam”
def do_global(): global spam spam = “global spam”
spam = “test spam” do_local() print(“After local assignment:”, spam) do_nonlocal() print(“After nonlocal assignment:”, spam) do_global() print(“After global assignment:”, spam)
scope_test() print(“In global scope:”, spam)
The output of the example code is:
After local assignment: test spam After nonlocal assignment: nonlocal spam After global assignment: nonlocal spam In global scope: global spam
9.3. A First Look at Classes
9.3.1. Class Definition Syntax
class ClassName:
Class definitions, like function definitions (def statements) must be executed before they have any effect. (You could conceivably place a class definition in a branch of an if statement, or inside a function.)
When a class definition is entered, a new namespace is created, and used as the local scope — thus, all assignments to local variables go into this new namespace.
When a class definition is left normally (via the end), a class object is created. This is basically a wrapper around the contents of the namespace created by the class definition, the class object is bound here to the class name given in the class definition header (ClassName in the example).
9.3.2. Class Objects
Attribute references use the standard syntax used for all attribute references in Python: obj.name, then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively. Class attributes can also be assigned to, so you can change the value of MyClass.i by assignment. doc is also a valid attribute, returning the docstring belonging to the class.
class MyClass: “““A simple example class””” i = 12345
def f(self): return ‘hello world’
Class instantiation uses function notation. Just pretend that the class object is a parameterless function that returns a new instance of the class. x = MyClass(). Many classes like to create objects with instances customized to a specific initial state. Therefore a class may define a special method named init(),
def init(self): self.data = []
When a class defines an init() method, class instantiation automatically invokes init() for the newly-created class instance. the init() method may have arguments for greater flexibility.
>» class Complex: … def init(self, realpart, imagpart): … self.r = realpart … self.i = imagpart … >» x = Complex(3.0, -4.5)
9.3.3. Instance Objects
data attributes and methods.
Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to.
x.counter = 1 while x.counter < 10: x.counter = x.counter * 2 print(x.counter) del x.counter
The other kind of instance attribute reference is a method. A method is a function that “belongs to” an object.
But x.f is not the same thing as MyClass.f — it is a method object, not a function object.
9.3.4. Method Objects
Usually, a method is called right after it is bound: x.f()
However, it is not necessary to call a method right away: x.f is a method object, and can be stored away and called at a later time. xf = x.f print(xf())
the special thing about methods is that the object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s object before the first argument.
When an instance attribute is referenced that isn’t a data attribute, its class is searched. If the name denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, a new argument list is constructed from the instance object and the argument list, and the function object is called with this new argument list.
9.3.5. Class and Instance Variables
Generally speaking, instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class.
class Dog:
kind = ‘canine’ # class variable shared by all instances
def init(self, name): self.name = name # instance variable unique to each instance
shared data can have possibly surprising effects with involving mutable objects such as lists and dictionaries. For example, the tricks list in the following code should not be used as a class variable because just a single list would be shared by all Dog instances:
class Dog:
tricks = [] # mistaken use of a class variable
def init(self, name): self.name = name
def add_trick(self, trick): self.tricks.append(trick)
Correct design of the class should use an instance variable instead:
class Dog:
def init(self, name): self.name = name self.tricks = [] # creates a new empty list for each dog
def add_trick(self, trick): self.tricks.append(trick)
9.4. Random Remarks
Possible conventions include capitalizing method names, prefixing data attribute names with a small unique string (perhaps just an underscore), or using verbs for methods and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary users (“clients”) of an object. In other words, classes are not usable to implement pure abstract data types. In fact, nothing in Python makes it possible to enforce data hiding — it is all based upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and control access to an object if necessary; this can be used by extensions to Python written in C.)
Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their data attributes. Note that clients may add data attributes of their own to an instance object without affecting the validity of the methods, as long as name conflicts are avoided — again, a naming convention can save a lot of headaches here.
There is no shorthand for referencing data attributes (or other methods!) from within methods.
Often, the first argument of a method is called self.
Any function object that is a class attribute defines a method for instances of that class. It is not necessary that the function definition is textually enclosed in the class definition: assigning a function object to a local variable in the class is also ok.
# Function defined outside the class def f1(self, x, y): return min(x, x+y)
class C: f = f1
def g(self): return ‘hello world’
h = g
Now f, g and h are all attributes of class C that refer to function objects, and consequently they are all methods of instances of C.
Methods may reference global names in the same way as ordinary functions. The global scope associated with a method is the module containing its definition. (A class is never used as a global scope.)
functions and modules imported into the global scope can be used by methods, as well as functions and classes defined in it.
Each value is an object, and therefore has a class (also called its type). It is stored as object.class.
9.5. Inheritance
class DerivedClassName(BaseClassName):
class DerivedClassName(modname.BaseClassName):
When the class object is constructed, the base class is remembered. This is used for resolving attribute references: if a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class.
Method references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this yields a function object.
Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)
An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call BaseClassName.methodname(self, arguments). This is occasionally useful to clients as well. (Note that this only works if the base class is accessible as BaseClassName in the global scope.)
Python has two built-in functions that work with inheritance:
- Use isinstance() to check an instance’s type: isinstance(obj, int) will be True only if obj.class is int or some class derived from int.
- Use issubclass() to check class inheritance: issubclass(bool, int) is True since bool is a subclass of int. However, issubclass(float, int) is False since float is not a subclass of int.
9.5.1. Multiple Inheritance
class DerivedClassName(Base1, Base2, Base3):
the search for attributes inherited from a parent class as depth-first, left-to-right, not searching twice in the same class where there is an overlap in the hierarchy.
In fact, it is slightly more complex than that; the method resolution order changes dynamically to support cooperative calls to super(). This approach is known in some other multiple-inheritance languages as call-next-method and is more powerful than the super call found in single-inheritance languages.
Dynamic ordering is necessary because all cases of multiple inheritance exhibit one or more diamond relationships (where at least one of the parent classes can be accessed through multiple paths from the bottommost class). To keep the base classes from being accessed more than once, the dynamic algorithm linearizes the search order in a way that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning that a class can be subclassed without affecting the precedence order of its parents).
9.6. Private Variables
“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.
Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by subclasses), there is limited support for such a mechanism, called name mangling. Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, as long as it occurs within the definition of a class.
Name mangling is helpful for letting subclasses override methods without breaking intraclass method calls.
class Mapping: def init(self, iterable): self.items_list = [] self.__update(iterable)
def update(self, iterable): for item in iterable: self.items_list.append(item)
__update = update # private copy of original update() method
class MappingSubclass(Mapping):
def update(self, keys, values): # provides new signature for update() # but does not break init() for item in zip(keys, values): self.items_list.append(item)
Note that the mangling rules are designed mostly to avoid accidents; it still is possible to access or modify a variable that is considered private. This can even be useful in special circumstances, such as in the debugger.
Notice that code passed to exec() or eval() does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing dict directly.
9.7. Odds and Ends
bundling together a few named data items. An empty class definition will do nicely:
class Employee: pass
john = Employee() # Create an empty employee record
# Fill the fields of the record john.name = ‘John Doe’ john.dept = ‘computer lab’ john.salary = 1000
A piece of Python code that expects a particular abstract data type can often be passed a class that emulates the methods of that data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with methods read() and readline() that get the data from a string buffer instead, and pass it as an argument.
Instance method objects have attributes, too: m.self is the instance object with the method m(), and m.func is the function object corresponding to the method.
9.8. Exceptions Are Classes Too
9.9. Iterators
for element in iterator: iterator can be list, tuple, dict, string, openfd
Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the for loop to terminate. You can call the next() method using the next() built-in function
Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define an iter() method which returns an object with a next() method. If the class defines next(), then iter() can just return self:
class Reverse: “““Iterator for looping over a sequence backwards.””” def init(self, data): self.data = data self.index = len(data)
def iter(self): return self
def next(self): if self.index == 0: raise StopIteration self.index = self.index - 1 return self.data[self.index]
9.10. Generators
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
def reverse(data): for index in range(len(data)-1, -1, -1): yield data[index] >» for char in reverse(‘golf’): … print(char) … f l o g
Anything that can be done with generators can also be done with class-based iterators as described in the previous section. What makes generators so compact is that the iter() and next() methods are created automatically.
Another key feature is that the local variables and execution state are automatically saved between calls.
In addition to automatic method creation and saving program state, when generators terminate, they automatically raise StopIteration.
9.11. Generator Expressions
Some simple generators can be coded succinctly as expressions using a syntax similar to list comprehensions but with parentheses instead of brackets. These expressions are designed for situations where the generator is used right away by an enclosing function. Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions.
>» sum(i*i for i in range(10)) # sum of squares 285
>» xvec = [10, 20, 30] >» yvec = [7, 5, 3] >» sum(x*y for x,y in zip(xvec, yvec)) # dot product 260
>» from math import pi, sin >» sine_table = {x: sin(x*pi/180) for x in range(0, 91)}
>» unique_words = set(word for line in page for word in line.split())
>» valedictorian = max((student.gpa, student.name) for student in graduates)
>» data = ‘golf’ >» list(data[i] for i in range(len(data)-1, -1, -1)) [‘f’, ’l’, ‘o’, ‘g’]
\10. Brief Tour of the Standard Library
The built-in dir() and help() functions are useful as interactive aids for working with large modules like os:
>» dir(os)
>» help(os) <returns an extensive manual page created from the module’s docstrings>
10.1. Operating System Interface
The os module provides dozens of functions for interacting with the operating system:
Be sure to use the import os style instead of from os import *. This will keep os.open() from shadowing the built-in open() function which operates much differently.
os.getcwd() # Return the current working directory
os.chdir(’/server/accesslogs’) # Change current working directory
os.system(‘mkdir today’) # Run the command mkdir in the system shell
For daily file and directory management tasks, the shutil module provides a higher level interface that is easier to use: shutil.copyfile(‘data.db’, ‘archive.db’) shutil.move(’/build/executables’, ‘installdir’)
10.2. File Wildcards
The glob module provides a function for making file lists from directory wildcard searches:
>» import glob >» glob.glob(’*.py’) [‘primes.py’, ‘random.py’, ‘quote.py’]
10.3. Command Line Arguments
command line arguments are stored in the sys module’s argv attribute as a list.
sys.argv is a list which first argument is the script name.
The getopt module processes sys.argv using the conventions of the Unix getopt() function. More powerful and flexible command line processing is provided by the argparse module.
10.4. Error Output Redirection and Program Termination
The sys module also has attributes for stdin, stdout, and stderr. The latter is useful for emitting warnings and error messages to make them visible even when stdout has been redirected:
>» sys.stderr.write(‘Warning, log file not found starting a new one\n’) Warning, log file not found starting a new one
The most direct way to terminate a script is to use sys.exit().
10.5. String Pattern Matching
The re module provides regular expression tools for advanced string processing.
>» import re >» re.findall(r’\bf[a-z]*’, ‘which foot or hand fell fastest’) [‘foot’, ‘fell’, ‘fastest’] >» re.sub(r’(\b[a-z]+) \1’, r’\1’, ‘cat in the the hat’) ‘cat in the hat’
When only simple capabilities are needed, string methods are preferred because they are easier to read and debug.
10.6. Mathematics
The math module gives access to the underlying C library functions for floating point math.
math.cos(math.pi / 4)
math.log(1024, 2)
The random module provides tools for making random selections:
random.choice([‘apple’, ‘pear’, ‘banana’])
The statistics module calculates basic statistical properties (the mean, median, variance, etc.) of numeric data
>» data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >» statistics.mean(data) >» statistics.median(data) >» statistics.variance(data)
The SciPy project «https://scipy.org» has many other modules for numerical computations.
10.7. Internet Access
urllib.request module for retrieving data from URLs and smtplib module for sending mail:
>» from urllib.request import urlopen >» with urlopen(‘http://tycho.usno.navy.mil/cgi-bin/timer.pl') as response: … for line in response: … line = line.decode(‘utf-8’) # Decoding the binary data to text. … if ‘EST’ in line or ‘EDT’ in line: # look for Eastern Time … print(line)
Nov. 25, 09:43:32 PM EST
>» import smtplib >» server = smtplib.SMTP(’localhost’) >» server.sendmail(‘soothsayer@example.org’, ‘jcaesar@example.org’, … “““To: jcaesar@example.org … From: soothsayer@example.org … … Beware the Ides of March. … “””) >» server.quit()
this example needs a mailserver running on localhost.
10.8. Dates and Times
The datetime module supplies classes for manipulating dates and times, extraction for output formatting.
>» # dates support calendar arithmetic >» birthday = date(1964, 7, 31) >» age = now - birthday >» age.days 14368
10.9. Data Compression
Common data archiving and compression formats are directly supported by modules including: zlib, gzip, bz2, lzma, zipfile and tarfile.
10.10. Performance Measurement
The timeit module quickly demonstrates a modest performance advantage:
>» from timeit import Timer >» Timer(’t=a; a=b; b=t’, ‘a=1; b=2’).timeit() 0.57535828626024577 >» Timer(‘a,b = b,a’, ‘a=1; b=2’).timeit() 0.54962537085770791
In contrast to timeit‘s fine level of granularity, the profile and pstats modules provide tools for identifying time critical sections in larger blocks of code.
10.11. Quality Control
One approach for developing high quality software is to write tests for each function as it is developed and to run those tests frequently during the development process.
The doctest module provides a tool for scanning a module and validating tests embedded in a program’s docstrings.
def average(values): “““Computes the arithmetic mean of a list of numbers.
»> print(average([20, 30, 70])) 40.0 "”” return sum(values) / len(values)
import doctest doctest.testmod() # automatically validate the embedded tests
The unittest module is not as effortless as the doctest module, but it allows a more comprehensive set of tests to be maintained in a separate file:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self): self.assertEqual(average([20, 30, 70]), 40.0) self.assertEqual(round(average([1, 5, 7]), 1), 4.3) with self.assertRaises(ZeroDivisionError): average([]) with self.assertRaises(TypeError): average(20, 30, 70)
unittest.main() # Calling from the command line invokes all tests
10.12. Batteries Included
- The xmlrpc.client and xmlrpc.server modules make implementing remote procedure calls into an almost trivial task. Despite the modules names, no direct knowledge or handling of XML is needed.
- The email package is a library for managing email messages, including MIME and other RFC 2822-based message documents. Unlike smtplib and poplib which actually send and receive messages, the email package has a complete toolset for building or decoding complex message structures (including attachments) and for implementing internet encoding and header protocols.
- The json package provides robust support for parsing this popular data interchange format. The csv module supports direct reading and writing of files in Comma-Separated Value format, commonly supported by databases and spreadsheets. XML processing is supported by the xml.etree.ElementTree, xml.dom and xml.saxpackages. Together, these modules and packages greatly simplify data interchange between Python applications and other tools.
- The sqlite3 module is a wrapper for the SQLite database library, providing a persistent database that can be updated and accessed using slightly nonstandard SQL syntax.
- Internationalization is supported by a number of modules including gettext, locale, and the codecs package.
\11. Brief Tour of the Standard Library – Part II
11.1. Output Formatting
The reprlib module provides a version of repr() customized for abbreviated displays of large or deeply nested containers:
>» import reprlib >» reprlib.repr(set(‘supercalifragilisticexpialidocious’)) “{‘a’, ‘c’, ’d’, ’e’, ‘f’, ‘g’, …}”
The pprint module offers more sophisticated control over printing both built-in and user defined objects in a way that is readable by the interpreter. When the result is longer than one line, the “pretty printer” adds line breaks and indentation to more clearly reveal data structure:
The textwrap module formats paragraphs of text to fit a given screen width:
>» import textwrap >» doc = “““The wrap() method is just like fill() except that it returns … a list of strings instead of one big string with newlines to separate … the wrapped lines.””” … >» print(textwrap.fill(doc, width=40)) The wrap() method is just like fill() except that it returns a list of strings instead of one big string with newlines to separate the wrapped lines.
The locale module accesses a database of culture specific data formats. The grouping attribute of locale’s format function provides a direct way of formatting numbers with group separators:
11.2. Templating
The string module includes a versatile Template class with a simplified syntax suitable for editing by end-users. This allows users to customize their applications without having to alter the application.
The format uses placeholder names formed by $ with valid Python identifiers (alphanumeric characters and underscores). Surrounding the placeholder with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing $$ creates a single escaped $:
>» from string import Template >» t = Template(’${village}folk send $$10 to $cause.’) >» t.substitute(village=‘Nottingham’, cause=‘the ditch fund’) ‘Nottinghamfolk send $10 to the ditch fund.’
The substitute() method raises a KeyError when a placeholder is not supplied in a dictionary or a keyword argument. For mail-merge style applications, user supplied data may be incomplete and the safe_substitute() method may be more appropriate — it will leave placeholders unchanged if data is missing:
>» t = Template(‘Return the $item to $owner.’) >» d = dict(item=‘unladen swallow’) >» t.substitute(d) Traceback (most recent call last): … KeyError: ‘owner’ >» t.safe_substitute(d) ‘Return the unladen swallow to $owner.’
Template subclasses can specify a custom delimiter. For example, a batch renaming utility for a photo browser may elect to use percent signs for placeholders such as the current date, image sequence number, or file format:
>» import time, os.path >» photofiles = [‘img_1074.jpg’, ‘img_1076.jpg’, ‘img_1077.jpg’] >» class BatchRename(Template): … delimiter = ‘%’ >» fmt = input(‘Enter rename style (%d-date %n-seqnum %f-format): ‘) Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f
>» t = BatchRename(fmt) >» date = time.strftime(’%d%b%y’) >» for i, filename in enumerate(photofiles): … base, ext = os.path.splitext(filename) … newname = t.substitute(d=date, n=i, f=ext) … print(’{0} –> {1}’.format(filename, newname))
img_1074.jpg –> Ashley_0.jpg img_1076.jpg –> Ashley_1.jpg img_1077.jpg –> Ashley_2.jpg
Another application for templating is separating program logic from the details of multiple output formats. This makes it possible to substitute custom templates for XML files, plain text reports, and HTML web reports.
11.3. Working with Binary Data Record Layouts
The struct module provides pack() and unpack() functions for working with variable length binary record formats. The following example shows how to loop through header information in a ZIP file without using the zipfile module. Pack codes “H” and “I” represent two and four byte unsigned numbers respectively. The “<” indicates that they are standard size and in little-endian byte order:
import struct
with open(‘myfile.zip’, ‘rb’) as f: data = f.read()
start = 0 for i in range(3): # show the first 3 file headers start += 14 fields = struct.unpack(’<IIIHH’, data[start:start+16]) crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
start += 16 filename = data[start:start+filenamesize] start += filenamesize extra = data[start:start+extra_size] print(filename, hex(crc32), comp_size, uncomp_size)
start += extra_size + comp_size # skip to the next header
11.4. Multi-threading
Threading is a technique for decoupling tasks which are not sequentially dependent. Threads can be used to improve the responsiveness of applications that accept user input while other tasks run in the background. A related use case is running I/O in parallel with computations in another thread.
The following code shows how the high level threading module can run tasks in background while the main program continues to run:
import threading, zipfile
class AsyncZip(threading.Thread): def init(self, infile, outfile): threading.Thread.init(self) self.infile = infile self.outfile = outfile
def run(self): f = zipfile.ZipFile(self.outfile, ‘w’, zipfile.ZIP_DEFLATED) f.write(self.infile) f.close() print(‘Finished background zip of:’, self.infile)
background = AsyncZip(‘mydata.txt’, ‘myarchive.zip’) background.start() print(‘The main program continues to run in foreground.’)
background.join() # Wait for the background task to finish print(‘Main program waited until background was done.’)
The principal challenge of multi-threaded applications is coordinating threads that share data or other resources. To that end, the threading module provides a number of synchronization primitives including locks, events, condition variables, and semaphores.
While those tools are powerful, minor design errors can result in problems that are difficult to reproduce. So, the preferred approach to task coordination is to concentrate all access to a resource in a single thread and then use the queue module to feed that thread with requests from other threads. Applications using Queue objects for inter-thread communication and coordination are easier to design, more readable, and more reliable.
11.5. Logging
The logging module offers a full featured and flexible logging system. At its simplest, log messages are sent to a file or to sys.stderr:
import logging logging.debug(‘Debugging information’) logging.info(‘Informational message’) logging.warning(‘Warning:config file %s not found’, ‘server.conf’) logging.error(‘Error occurred’) logging.critical(‘Critical error – shutting down’)
This produces the following output:
WARNING:root:Warning:config file server.conf not found ERROR:root:Error occurred CRITICAL:root:Critical error – shutting down
By default, informational and debugging messages are suppressed and the output is sent to standard error. Other output options include routing messages through email, datagrams, sockets, or to an HTTP Server. New filters can select different routing based on message priority: DEBUG, INFO, WARNING, ERROR, and CRITICAL.
The logging system can be configured directly from Python or can be loaded from a user editable configuration file for customized logging without altering the application.
11.6. Weak References
Python does automatic memory management (reference counting for most objects and garbage collection to eliminate cycles). The memory is freed shortly after the last reference to it has been eliminated.
This approach works fine for most applications but occasionally there is a need to track objects only as long as they are being used by something else. Unfortunately, just tracking them creates a reference that makes them permanent. The weakref module provides tools for tracking objects without creating a reference. When the object is no longer needed, it is automatically removed from a weakref table and a callback is triggered for weakref objects. Typical applications include caching objects that are expensive to create:
11.7. Tools for Working with Lists
Many data structure needs can be met with the built-in list type. However, sometimes there is a need for alternative implementations with different performance trade-offs.
The array module provides an array() object that is like a list that stores only homogeneous data and stores it more compactly. The following example shows an array of numbers stored as two byte unsigned binary numbers (typecode “H”) rather than the usual 16 bytes per entry for regular lists of Python int objects:
>» from array import array >» a = array(‘H’, [4000, 10, 700, 22222]) >» sum(a) 26932 >» a[1:3] array(‘H’, [10, 700])
The collections module provides a deque() object that is like a list with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and breadth first tree searches:
>» from collections import deque >» d = deque([“task1”, “task2”, “task3”]) >» d.append(“task4”) >» print(“Handling”, d.popleft()) Handling task1
unsearched = deque([starting_node]) def breadth_first_search(unsearched): node = unsearched.popleft() for m in gen_moves(node): if is_goal(m): return m unsearched.append(m)
In addition to alternative list implementations, the library also offers other tools such as the bisect module with functions for manipulating sorted lists:
The heapq module provides functions for implementing heaps based on regular lists. The lowest valued entry is always kept at position zero. This is useful for applications which repeatedly access the smallest element but do not want to run a full list sort:
11.8. Decimal Floating Point Arithmetic
The decimal module offers a Decimal datatype for decimal floating point arithmetic. Compared to the built-in float implementation of binary floating point, the class is especially helpful for
- financial applications and other uses which require exact decimal representation,
- control over precision,
- control over rounding to meet legal or regulatory requirements,
- tracking of significant decimal places, or
- applications where the user expects the results to match calculations done by hand.
For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal floating point and binary floating point. The difference becomes significant if the results are rounded to the nearest cent:
>» from decimal import * >» round(Decimal(‘0.70’) * Decimal(‘1.05’), 2) Decimal(‘0.74’) >» round(.70 * 1.05, 2) 0.73
The Decimal result keeps a trailing zero, automatically inferring four place significance from multiplicands with two place significance. Decimal reproduces mathematics as done by hand and avoids issues that can arise when binary floating point cannot exactly represent decimal quantities.
Exact representation enables the Decimal class to perform modulo calculations and equality tests that are unsuitable for binary floating point:
>» Decimal(‘1.00’) % Decimal(’.10’) Decimal(‘0.00’) >» 1.00 % 0.10 0.09999999999999995
>» sum([Decimal(‘0.1’)]*10) == Decimal(‘1.0’) True >» sum([0.1]*10) == 1.0 False
The decimal module provides arithmetic with as much precision as needed:
>» getcontext().prec = 36 >» Decimal(1) / Decimal(7) Decimal(‘0.142857142857142857142857142857142857’)
\12. Virtual Environments and Packages
12.1. Introduction
If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.
The solution for this problem is to create a virtual environment (often shortened to “virtualenv”), a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.
Different applications can then use different virtual environments.
12.2. Creating Virtual Environments
The script used to create and manage virtual environments is called pyvenv. if you have multiple versions of Python on your system you can select a specific Python version by running pyvenv-3.4 or whichever version you want.
To create a virtualenv, decide upon a directory where you want to place it and run pyvenv with the directory path:
pyvenv tutorial-env
This will create the tutorial-env directory if it doesn’t exist, and also create directories inside it containing a copy of the Python interpreter, the standard library, and various supporting files.
Once you’ve created a virtual environment, you need to activate it.
On Windows, run:
tutorial-env/Scripts/activate
On Unix or MacOS, run:
source tutorial-env/bin/activate
(This script is written for the bash shell. If you use the csh or fish shells, there are alternate activate.csh and activate.fish scripts you should use instead.)
12.3. Managing Packages with pip
Once you’ve activated a virtual environment, you can install, upgrade, and remove packages using a program called pip.
pip has a number of subcommands: “search”, “install”, “uninstall”, “freeze”, etc. (Consult the Installing Python Modules guide for complete documentation for pip.)
(tutorial-env) -> pip search astronomy
-> pip install novas
You can also install a specific version of a package by giving the package name followed by == and the version number:
-> pip install requests==2.6.0
you can run pip install –upgrade to upgrade the package to the latest version:
-> pip install –upgrade requests
pip uninstall followed by one or more package names will remove the packages from the virtual environment.
pip show will display information about a particular package
(tutorial-env) -> pip show requests
pip list will display all of the packages installed in the virtual environment:
pip freeze will produce a similar list of the installed packages, but the output uses the format that pip install expects. A common convention is to put this list in a requirements.txt file:
(tutorial-env) -> pip freeze > requirements.txt (tutorial-env) -> cat requirements.txt novas==3.1.1.3 numpy==1.9.2 requests==2.7.0
The requirements.txt can then be committed to version control and shipped as part of an application. Users can then install all the necessary packages with install -r:
-> pip install -r requirements.txt
pip has many more options. Consult the Installing Python Modules guide for complete documentation for pip. When you’ve written a package and want to make it available on the Python Package Index, consult the Distributing Python Modules guide.
\13. What Now?
The Python Standard Library: You should browse through this manual.
Installing Python Modules: explains how to install additional modules written by other Python users.
The Python Language Reference: A detailed explanation of Python’s syntax and semantics. It’s heavy reading, but is useful as a complete guide to the language itself.
https://www.python.org: It contains code, documentation, and pointers to Python-related pages around the Web.
https://pypi.python.org/pypi: The Python Package Index
https://code.activestate.com/recipes/langs/python/: The Python Cookbook is a sizable collection of code examples, larger modules, and useful scripts.Particularly notable contributions are collected in a book also titled Python Cookbook.
http://www.pyvideo.org collects links to Python-related videos from conferences and user-group meetings.
https://scipy.org: The Scientific Python project includes modules for fast array computations and manipulations plus a host of packages for such things as linear algebra, Fourier transforms, non-linear solvers, random number distributions, statistical analysis and the like.
\14. Interactive Input Editing and History Substitution
14.1. Tab Completion and History Editing
Note that tab completion may execute application-defined code if an object with a getattr() method is part of the expression. The default configuration also saves your history into a file named .python_history in your user directory. The history will be available again during the next interactive interpreter session.
14.2. Alternatives to the Interactive Interpreter
One alternative enhanced interactive interpreter that has been around for quite some time is IPython, which features tab completion, object exploration and advanced history management. It can also be thoroughly customized and embedded into other applications. Another similar enhanced interactive environment is bpython.
\15. Floating Point Arithmetic: Issues and Limitations
Floating-point numbers are represented in computer hardware as base 2 (binary) fractions.
the decimal fraction 0.125 has value 1/10 + 2/100 + 5/1000
the binary fraction 0.001 has value 0/2 + 0/4 + 1/8.
These two fractions have identical values, the only real difference being that the first is written in base 10 fractional notation, and the second in base 2.
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.
floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two.
That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead. Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.
Historically, the Python prompt and built-in repr() function would choose the one with 17 significant digits, 0.10000000000000001. Starting with Python 3.1, Python (on most systems) is now able to choose the shortest of these and simply display 0.1.
For more pleasant output, you may wish to use string formatting to produce a limited number of significant digits:
>» format(math.pi, ‘.12g’) # give 12 significant digits ‘3.14159265359’
>» format(math.pi, ‘.2f’) # give 2 digits after the point ‘3.14’
>» repr(math.pi) ‘3.141592653589793’
It’s important to realize that this is simply rounding the display of the true machine value.
>» .1 + .1 + .1 == .3 False
>» round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1) False
Though the numbers cannot be made closer to their intended exact values, the round() function can be useful for post-rounding so that results with inexact values become comparable to one another:
>» round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1) False
Still, don’t be unduly wary of floating-point, but you do need to keep in mind that it’s not decimal arithmetic and that every float operation can suffer a new rounding error.
for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.
For use cases which require exact decimal representation, try using the decimal module which implements decimal arithmetic suitable for accounting applications and high-precision applications.
Another form of exact arithmetic is supported by the fractions module which implements arithmetic based on rational numbers (so the numbers like 1/3 can be represented exactly).
If you are a heavy user of floating point operations you should take a look at the Numerical Python package and many other packages for mathematical and statistical operations supplied by the SciPy project. See «https://scipy.org».
when you really do want to know the exact value of a float. Thefloat.as_integer_ratio() method expresses the value of a float as a fraction:
>» x = 3.14159 >» x.as_integer_ratio() (3537115888337719, 1125899906842624)
Since the ratio is exact, it can be used to losslessly recreate the original value:
>» x == 3537115888337719 / 1125899906842624 True
The float.hex() method expresses a float in hexadecimal (base 16), again giving the exact value stored by your computer:
>» x.hex() ‘0x1.921f9f01b866ep+1’
This precise hexadecimal representation can be used to reconstruct the float value exactly:
>» x == float.fromhex(‘0x1.921f9f01b866ep+1’) True
Since the representation is exact, it is useful for reliably porting values across different versions of Python (platform independence) and exchanging data with other languages that support the same format (such as Java and C99).
Another helpful tool is the math.fsum() function which helps mitigate loss-of-precision during summation. It tracks “lost digits” as values are added onto a running total. That can make a difference in overall accuracy so that the errors do not accumulate to the point where they affect the final total:
>» sum([0.1] * 10) == 1.0 False >» math.fsum([0.1] * 10) == 1.0 True
15.1. Representation Error
Representation error refers to the fact that some (most, actually) decimal fractions cannot be represented exactly as binary (base 2) fractions.
input the computer strives to convert 0.1 to the closest fraction it can of the form J/2N where J is an integer containing exactly 53 bits. Rewriting 1 / 10 ~= J / (2N) as J ~= 2**N / 10
and recalling that J has exactly 53 bits (is >= 252 but < 253), the best value for N is 56:
>» 252 <= 256 // 10 < 2**53
That is, 56 is the only value for N that leaves J with exactly 53 bits.
>» q, r = divmod(2**56, 10)
Since the remainder is more than half of 10, the best approximation is obtained by rounding up:
>» q+1
7205759403792794
Therefore the best possible approximation to 1/10 in 754 double precision is: 7205759403792794 / 2 ** 56
Dividing both the numerator and denominator by two reduces the fraction to: 3602879701896397 / 2 ** 55
>» format(0.1, ‘.17f’)
The fractions and decimal modules make these calculations easy:
>» from decimal import Decimal >» from fractions import Fraction
>» Fraction.from_float(0.1) Fraction(3602879701896397, 36028797018963968)
>» (0.1).as_integer_ratio() (3602879701896397, 36028797018963968)
>» Decimal.from_float(0.1) Decimal(‘0.1000000000000000055511151231257827021181583404541015625’)
>» format(Decimal.from_float(0.1), ‘.17’) ‘0.10000000000000001’
\16. Appendix
16.1. Interactive Mode
16.1.1. Error Handling
When an error occurs, the interpreter prints an error message and a stack trace.
Some errors are unconditionally fatal and cause an exit with a nonzero exit; this applies to internal inconsistencies and some cases of running out of memory. All error messages are written to the standard error stream; normal output from executed commands is written to standard output.
Typing the interrupt character (usually Control-C or Delete) to the primary or secondary prompt cancels the input and returns to the primary prompt. [1] Typing an interrupt while a command is executing raises the KeyboardInterrupt exception, which may be handled by a try statement.
16.1.2. Executable Python Scripts
On BSD’ish Unix systems, Python scripts can be made directly executable, like shell scripts, by putting the line
#!/usr/bin/env python3.5
(assuming that the interpreter is on the user’s PATH) at the beginning of the script and giving the file an executable mode.
The script can be given an executable mode, or permission, using the chmod command.
$ chmod +x myscript.py
$ ./myscript.py
On Windows systems, there is no notion of an “executable mode”. The Python installer automatically associates .py files with python.exe
16.1.3. The Interactive Startup File
When you use Python interactively, it is frequently handy to have some standard commands executed every time the interpreter is started. You can do this by setting an environment variable named PYTHONSTARTUP to the name of a file containing your start-up commands. This is similar to the .profile feature of the Unix shells.
If you want to read an additional start-up file from the current directory, you can program this in the global start-up file using code like ifos.path.isfile(’.pythonrc.py’): exec(open(’.pythonrc.py’).read()). If you want to use the startup file in a script, you must do this explicitly in the script:
import os filename = os.environ.get(‘PYTHONSTARTUP’) if filename and os.path.isfile(filename): with open(filename) as fobj: startup_file = fobj.read() exec(startup_file)
16.1.4. The Customization Modules
Python provides two hooks to let you customize it: sitecustomize and usercustomize. To see how it works, you need first to find the location of your user site-packages directory. Start Python and run this code:
>» import site >» site.getusersitepackages() ‘/home/user/.local/lib/python3.5/site-packages’
Now you can create a file named usercustomize.py in that directory and put anything you want in it. It will affect every invocation of Python, unless it is started with the -s option to disable the automatic import.
sitecustomize works in the same way, but is typically created by an administrator of the computer in the global site-packages directory, and is imported before usercustomize. See the documentation of the site module for more details.