Skip to content

Unicode problems #35

@uholzer

Description

@uholzer

There are still some unicode related issues in 1.6.2. I should have spotted them sooner,
sorry for that. The most problematic one is caused by urllib.urlencode
handling unicode objects the wrong way. I wasn't aware of this issue.

Personally, I would make sure that internally, you only have unicode objects,
i.e. that SPARQLWrapper.setQuery and similar methods convert str objects to
unicode objects. Then, decode them before applying urlencode and assembling
the HTTP request. I hope 2to3 is then able to apply the correct transformations.

Python 2.7.6 (default, Mar 22 2014, 15:40:47) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from SPARQLWrapper import SPARQLWrapper, XML, POST, GET, URLENCODED, POSTDIRECTLY
/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py:100: RuntimeWarning: JSON-LD disabled because no suitable support has been found
  warnings.warn("JSON-LD disabled because no suitable support has been found", RuntimeWarning)
>>> uquery = u'INSERT DATA { <urn:michel> <urn:says> "é" }'
>>> query = uquery.encode('UTF-8')
>>> uquery
u'INSERT DATA { <urn:michel> <urn:says> "\xe9" }'
>>> query
'INSERT DATA { <urn:michel> <urn:says> "\xc3\xa9" }'
>>> wrapper = SPARQLWrapper('http://localhost:3030/ukpp/sparql', 'http://localhost:3030/ukpp/update')

POSTDIRECTLY only works for unicode objects. Except for the unclear error
message, this is not necessarily wrong, because a SPARQL query is in Unicode
and the SPARQL protocol mandates UTF-8 as charset.

>>> wrapper.setMethod(POST)
>>> wrapper.setRequestMethod(POSTDIRECTLY)
>>> wrapper.setQuery(uquery)
>>> wrapper.query()
<SPARQLWrapper.Wrapper.QueryResult object at 0x7f7513e5c450>
>>> wrapper.setRequestMethod(POSTDIRECTLY)
>>> wrapper.setQuery(query)
>>> wrapper.query()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 515, in query
    return QueryResult(self._query())
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 483, in _query
    request = self._createRequest()
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 442, in _createRequest
    request.data = self.queryString.encode('UTF-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 39: ordinal not in range(128)

When using URLENCODED, it doesn't work with a unicode object, because for
some reason, urllib.urlencode can't handle unicode objects correctly.

>>> wrapper.setRequestMethod(URLENCODED)
>>> wrapper.setQuery(uquery)
>>> wrapper.query()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 515, in query
    return QueryResult(self._query())
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 483, in _query
    request = self._createRequest()
  File "/home/urs/.local/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 448, in _createRequest
    request.data = urllib.urlencode(parameters, True)
  File "/usr/lib/python2.7/urllib.py", line 1357, in urlencode
    l.append(k + '=' + quote_plus(str(elt)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 39: ordinal not in range(128)
>>> wrapper.setRequestMethod(URLENCODED)
>>> wrapper.setQuery(query)
>>> wrapper.query()
<SPARQLWrapper.Wrapper.QueryResult object at 0x7f7513e5c290>

The same test with Python 3:

Python 3.4.1 (default, Jul  6 2014, 20:01:46) 
[GCC 4.9.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from SPARQLWrapper import SPARQLWrapper, XML, POST, GET, URLENCODED, POSTDIRECTLY
/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py:100: RuntimeWarning: JSON-LD disabled because no suitable support has been found
  warnings.warn("JSON-LD disabled because no suitable support has been found", RuntimeWarning)
>>> uquery = 'INSERT DATA { <urn:michel> <urn:says> "é" }'
>>> query = uquery.encode('UTF-8')
>>> uquery
'INSERT DATA { <urn:michel> <urn:says> "é" }'
>>> query
b'INSERT DATA { <urn:michel> <urn:says> "\xc3\xa9" }'
>>> wrapper = SPARQLWrapper('http://localhost:3030/ukpp/sparql', 'http://localhost:3030/ukpp/update')
>>> wrapper.setMethod(POST)
>>> wrapper.setRequestMethod(POSTDIRECTLY)
>>> wrapper.setQuery(uquery)
>>> wrapper.query()
<SPARQLWrapper.Wrapper.QueryResult object at 0x7f8587444048>
>>> wrapper.setRequestMethod(POSTDIRECTLY)
>>> wrapper.setQuery(query)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 319, in setQuery
    self.queryType   = self._parseQueryType(query)
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 335, in _parseQueryType
    query = re.sub(re.compile("#.*?\n" ), "" , query) # remove all occurance singleline comments (issue #32)
  File "/usr/lib/python3.4/re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: can't use a string pattern on a bytes-like object
>>> wrapper.setRequestMethod(URLENCODED)
>>> wrapper.setQuery(uquery)
>>> wrapper.query()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 515, in query
    return QueryResult(self._query())
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 485, in _query
    response = urlopener(request)
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 453, in open
    req = meth(req)
  File "/usr/lib/python3.4/urllib/request.py", line 1120, in do_request_
    raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.
>>> wrapper.setRequestMethod(URLENCODED)
>>> wrapper.setQuery(query)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 319, in setQuery
    self.queryType   = self._parseQueryType(query)
  File "/home/urs/.local/lib/python3.4/site-packages/SPARQLWrapper/Wrapper.py", line 335, in _parseQueryType
    query = re.sub(re.compile("#.*?\n" ), "" , query) # remove all occurance singleline comments (issue #32)
  File "/usr/lib/python3.4/re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: can't use a string pattern on a bytes-like object

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions