Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 15, 2022

Improve performance of get_proxies_environment when there are many environment variables. Improvements depend on the number of environment variables, but the method is several times faster in this PR.

Fixes #91539

Performance test details

Results:

0.1634683609008789
0.024187326431274414

with

import os
import time
import urllib.request

if 0:
    os.environ={ f'{ii}': 1 for ii in range(8000)}
    os.environ.update({ f'{ii}_proxy': 1 for ii in range(30)})

def getproxies_environment():
    """Return a dictionary of scheme -> proxy server URL mappings.
    Scan the environment for variables named <scheme>_proxy;
    this seems to be the standard convention.  If you need a
    different way, you can pass a proxies dictionary to the
    [Fancy]URLopener constructor.
    """
    # in order to prefer lowercase variables, process environment in
    # two passes: first matches any, second pass matches lowercase only

    # select only environment variables which end in (after making lowercase) _proxy 
    candidate_names = [name for name in os.environ.keys() if len(name)>5 and name[-6]=='_'] # fast selection of candidates
    environment = [(name, os.environ[name], name.lower()) for name in candidate_names if name[-6:].lower()=='_proxy'] 

    proxies = {}
    for name, value, name_lower in environment:
        if value and name_lower[-6:] == '_proxy':
            proxies[name_lower[:-6]] = value
    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"
    # header from the client
    # If "proxy" is lowercase, it will still be used thanks to the next block
    if 'REQUEST_METHOD' in os.environ:
        proxies.pop('http', None)
    for name, value, name_lower in environment:
        if name[-6:] == '_proxy':
            if value:
                proxies[name_lower[:-6]] = value
            else:
                proxies.pop(name_lower[:-6], None)
    return proxies

nn=400
t0=time.time()
for ii in range(nn):
    urllib.request.getproxies_environment()
dt=time.time()-t0
print(dt)

t0=time.time()
for ii in range(nn):
    getproxies_environment()
dt=time.time()-t0
print(dt)

@eendebakpt eendebakpt marked this pull request as draft April 15, 2022 11:31
@eendebakpt eendebakpt marked this pull request as ready for review April 15, 2022 12:18
eendebakpt and others added 4 commits May 18, 2022 10:03
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Carl Meyer <carl@oddbird.net>
@eendebakpt eendebakpt force-pushed the performance/getproxies_environment branch 2 times, most recently from 0f694c9 to aeb96ea Compare May 18, 2022 08:14
@eendebakpt eendebakpt force-pushed the performance/getproxies_environment branch from aeb96ea to f961505 Compare May 18, 2022 08:52
@eendebakpt
Copy link
Contributor Author

@carljm Thanks for the suggestion. Benchmarks show it is just as fast, and much cleaner code.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in first review, but the name of the function has a typo (extra underscore) in the NEWS entry.

Also suggested an added comment. (Wouldn't have bothered with this if it was the only thing, but may be worth it if you are doing one more update to fix NEWS anyway.)

Thanks for the improvements to this function!

eendebakpt and others added 2 commits May 18, 2022 18:48
…gVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Carl Meyer <carl@oddbird.net>
@eendebakpt
Copy link
Contributor Author

@ambv As the latest core dev touching Lib/urllib/request.py, would you be able to review this PR?

@iritkatriel iritkatriel requested a review from orsenthil August 31, 2022 20:47
@iritkatriel iritkatriel added performance Performance or resource usage 3.12 only security fixes labels Aug 31, 2022
@orsenthil orsenthil self-assigned this Oct 5, 2022
@orsenthil orsenthil added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Oct 5, 2022
Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@orsenthil orsenthil merged commit aeb28f5 into python:main Oct 5, 2022
@miss-islington
Copy link
Contributor

Thanks @eendebakpt for the PR, and @orsenthil for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-97918 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Oct 5, 2022
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Oct 5, 2022
@bedevere-bot
Copy link

GH-97919 is a backport of this pull request to the 3.10 branch.

@eendebakpt eendebakpt deleted the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt restored the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt deleted the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt restored the performance/getproxies_environment branch October 5, 2022 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.12 only security fixes performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

speed up urllib.request.getproxies_environment

6 participants