Skip to content

Commit 66b940c

Browse files
[3.13] gh-141444:fix broken URLs and examples in urllib.request.rst (GH-144863) (#150647)
* Doc: fix broken URLs and examples in urllib.request.rst (gh-141444) * Doc: update urllib.request examples to handle gzip compression --------- (cherry picked from commit 0f1f7c7) Co-authored-by: Paper Moon <tangyuan0821@email.cn>
1 parent ca2fdca commit 66b940c

1 file changed

Lines changed: 32 additions & 24 deletions

File tree

Doc/library/urllib.request.rst

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,7 @@ AbstractBasicAuthHandler Objects
10071007
*headers* should be the error headers.
10081008

10091009
*host* is either an authority (e.g. ``"python.org"``) or a URL containing an
1010-
authority component (e.g. ``"http://python.org/"``). In either case, the
1010+
authority component (e.g. ``"https://python.org/"``). In either case, the
10111011
authority must not contain a userinfo component (so, ``"python.org"`` and
10121012
``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
10131013

@@ -1203,10 +1203,14 @@ This example gets the python.org main page and displays the first 300 bytes of
12031203
it::
12041204

12051205
>>> import urllib.request
1206-
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1207-
... print(f.read(300))
1208-
...
1209-
b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">
1206+
>>> with urllib.request.urlopen('https://www.python.org/') as f:
1207+
... # The response may be compressed (for example, 'gzip').
1208+
... print(f.headers.get('Content-Encoding'))
1209+
... data = f.read()
1210+
... if f.headers.get('Content-Encoding') == 'gzip':
1211+
... import gzip
1212+
... data = gzip.decompress(data)
1213+
... print(data[:300].decode('utf-8', errors='replace'))
12101214

12111215
Note that urlopen returns a bytes object. This is because there is no way
12121216
for urlopen to automatically determine the encoding of the byte stream
@@ -1223,26 +1227,30 @@ For additional information, see the W3C document: https://www.w3.org/Internation
12231227
As the python.org website uses *utf-8* encoding as specified in its meta tag, we
12241228
will use the same for decoding the bytes object::
12251229

1226-
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1227-
... print(f.read(100).decode('utf-8'))
1230+
>>> with urllib.request.urlopen('https://www.python.org/') as f:
1231+
... # Check for compression and decode appropriately.
1232+
... enc = f.headers.get('Content-Encoding')
1233+
... data = f.read()
1234+
... if enc == 'gzip':
1235+
... import gzip
1236+
... data = gzip.decompress(data)
1237+
... print(data[:100].decode('utf-8', errors='replace'))
12281238
...
1229-
<!doctype html>
1230-
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1231-
<!-
12321239

12331240
It is also possible to achieve the same result without using the
12341241
:term:`context manager` approach::
12351242

12361243
>>> import urllib.request
1237-
>>> f = urllib.request.urlopen('http://www.python.org/')
1244+
>>> f = urllib.request.urlopen('https://www.python.org/')
12381245
>>> try:
1239-
... print(f.read(100).decode('utf-8'))
1246+
... enc = f.headers.get('Content-Encoding')
1247+
... data = f.read()
1248+
... if enc == 'gzip':
1249+
... import gzip
1250+
... data = gzip.decompress(data)
1251+
... print(data[:100].decode('utf-8', errors='replace'))
12401252
... finally:
12411253
... f.close()
1242-
...
1243-
<!doctype html>
1244-
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1245-
<!--
12461254

12471255
In the following example, we are sending a data-stream to the stdin of a CGI
12481256
and reading the data it returns to us. Note that this example will only work
@@ -1313,7 +1321,7 @@ Use the *headers* argument to the :class:`Request` constructor, or::
13131321

13141322
import urllib.request
13151323
req = urllib.request.Request('http://www.example.com/')
1316-
req.add_header('Referer', 'http://www.python.org/')
1324+
req.add_header('Referer', 'https://www.python.org/')
13171325
# Customize the default User-Agent header value:
13181326
req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
13191327
with urllib.request.urlopen(req) as f:
@@ -1342,7 +1350,7 @@ containing parameters::
13421350
>>> import urllib.request
13431351
>>> import urllib.parse
13441352
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
1345-
>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
1353+
>>> url = "https://www.python.org/?%s" % params
13461354
>>> with urllib.request.urlopen(url) as f:
13471355
... print(f.read().decode('utf-8'))
13481356
...
@@ -1354,7 +1362,7 @@ from urlencode is encoded to bytes before it is sent to urlopen as data::
13541362
>>> import urllib.parse
13551363
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
13561364
>>> data = data.encode('ascii')
1357-
>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
1365+
>>> with urllib.request.urlopen("https://httpbin.org/post", data) as f:
13581366
... print(f.read().decode('utf-8'))
13591367
...
13601368

@@ -1363,16 +1371,16 @@ environment settings::
13631371

13641372
>>> import urllib.request
13651373
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
1366-
>>> opener = urllib.request.FancyURLopener(proxies)
1367-
>>> with opener.open("http://www.python.org") as f:
1374+
>>> opener = urllib.request.build_opener(urllib.request.ProxyHandler(proxies))
1375+
>>> with opener.open("https://www.python.org") as f:
13681376
... f.read().decode('utf-8')
13691377
...
13701378

13711379
The following example uses no proxies at all, overriding environment settings::
13721380

13731381
>>> import urllib.request
1374-
>>> opener = urllib.request.FancyURLopener({})
1375-
>>> with opener.open("http://www.python.org/") as f:
1382+
>>> opener = urllib.request.build_opener(urllib.request.ProxyHandler({}))
1383+
>>> with opener.open("https://www.python.org/") as f:
13761384
... f.read().decode('utf-8')
13771385
...
13781386

@@ -1405,7 +1413,7 @@ some point in the future.
14051413
The following example illustrates the most common usage scenario::
14061414

14071415
>>> import urllib.request
1408-
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
1416+
>>> local_filename, headers = urllib.request.urlretrieve('https://python.org/')
14091417
>>> html = open(local_filename)
14101418
>>> html.close()
14111419

0 commit comments

Comments
 (0)