Problem: Python Command Execution Fails with AttributeError

Problem

When executing a notebook, Python command execution fails with the following error and stack trace:

AttributeError: 'tuple' object has no attribute 'type'
Traceback (most recent call last):
File "/local_disk0/tmp/1547561952809-0/PythonShell.py", line 23, in <module>
  import matplotlib as mpl
File "/databricks/python/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 122, in <module>
  from matplotlib.cbook import is_string_like, mplDeprecation, dedent, get_label
File "/databricks/python/local/lib/python2.7/site-packages/matplotlib/cbook.py", line 33, in <module>
  import numpy as np
File "/databricks/python/local/lib/python2.7/site-packages/numpy/__init__.py", line 142, in <module>
  from . import core
File "/databricks/python/local/lib/python2.7/site-packages/numpy/core/__init__.py", line 57, in <module>
  from . import numerictypes as nt
File "/databricks/python/local/lib/python2.7/site-packages/numpy/core/numerictypes.py", line 111, in <module>
  from ._type_aliases import (
File "/databricks/python/local/lib/python2.7/site-packages/numpy/core/_type_aliases.py", line 63, in <module>
  _concrete_types = {v.type for k, v in _concrete_typeinfo.items()}
File "/databricks/python/local/lib/python2.7/site-packages/numpy/core/_type_aliases.py", line 63, in <setcomp>
  _concrete_types = {v.type for k, v in _concrete_typeinfo.items()}
AttributeError: 'tuple' object has no attribute 'type'


19/01/15 11:29:26 WARN PythonDriverWrapper: setupRepl:ReplId-7d8d1-8cc01-2d329-9: at the end, the status is
Error(ReplId-7d8d1-8cc01-2d329-,com.databricks.backend.daemon.driver.PythonDriverLocal$PythonException: Python shell failed to start in 30 seconds)

Cause

A newer version of numpy (1.16.1), which is installed by default by some PyPI clients, is incompatible with other libraries.

Solution

Follow the steps below to create a cluster-scoped init script that removes the current version and installs version 1.15.0 of numpy.

  1. If the init script does not already exist, create a base directory to store it:

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. Create the following script:

    • If the cluster is running Python 2, use this init script:

      dbutils.fs.put("dbfs:/databricks/<directory>/numpy.sh","""
      #!/bin/bash
      pip uninstall --yes numpy
      rm -rf /home/ubuntu/databricks/python/lib/python2.7/site-packages/numpy*
      rm -rf /databricks/python/lib/python2.7/site-packages/numpy*
      /usr/bin/yes | /home/ubuntu/databricks/python/bin/pip install numpy==1.15.0
      """,True)
      
    • If the cluster is running Python 3, use this init script:

      dbutils.fs.put("dbfs:/databricks/<directory>/numpy.sh","""
      #!/bin/bash
      pip uninstall --yes numpy
      rm -rf /home/ubuntu/databricks/python/lib/python3.5/site-packages/numpy*
      rm -rf /databricks/python/lib/python3.5/site-packages/numpy*
      /usr/bin/yes | /home/ubuntu/databricks/python/bin/pip install numpy==1.15.0
      """,True)
      
  3. Confirm that the script exists:

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/numpy.sh"))
    
  4. Go to the cluster configuration page and click the Advanced Options toggle.

  5. At the bottom of the page, click the Init Scripts tab:

    ../../_images/init-script-tab.png
  6. In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. Restart the cluster.

  8. In your PyPI client, pin the numpy installation to version 1.15.1, the latest working version.

Learn more about cluster-scoped init scripts.