%sh wget -O nyc_zip_codes.zip https://data.cityofnewyork.us/download/i8iw-xf4u/application%2Fzip ls
--2022-09-28 14:55:04-- https://data.cityofnewyork.us/download/i8iw-xf4u/application%2Fzip
Resolving data.cityofnewyork.us (data.cityofnewyork.us)... 52.206.68.26, 52.206.140.199, 52.206.140.205
Connecting to data.cityofnewyork.us (data.cityofnewyork.us)|52.206.68.26|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://data.cityofnewyork.us/api/views/i8iw-xf4u/files/YObIR0MbpUVA0EpQzZSq5x55FzKGM2ejSeahdvjqR20?filename=ZIP_CODE_040114.zip [following]
--2022-09-28 14:55:05-- https://data.cityofnewyork.us/api/views/i8iw-xf4u/files/YObIR0MbpUVA0EpQzZSq5x55FzKGM2ejSeahdvjqR20?filename=ZIP_CODE_040114.zip
Reusing existing connection to data.cityofnewyork.us:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘nyc_zip_codes.zip’
0K .......... .......... .......... .......... .......... 841K
50K .......... .......... .......... .......... .......... 844K
100K .......... .......... .......... .......... .......... 422K
150K .......... .......... .......... .......... .......... 842K
200K .......... .......... .......... .......... .......... 844K
250K .......... .......... .......... .......... .......... 840K
300K .......... .......... .......... .......... .......... 840K
350K .......... .......... .......... .......... .......... 423K
400K .......... .......... .......... .......... .......... 836K
450K .......... .......... .......... .......... .......... 92.7M
500K .......... .......... .......... .......... .......... 839K
550K .......... .......... .......... .......... .......... 846K
600K .......... .......... .......... .......... .......... 91.5M
650K .......... .......... .......... .......... .......... 864K
700K .......... .......... .......... .......... .......... 60.7M
750K .......... .......... .......... .......... .......... 84.0M
800K .......... .......... .......... .......... .......... 108M
850K .......... .......... .......... .......... .......... 119M
900K .......... .......... .......... .......... .......... 870K
950K .......... .......... .......... .......... .......... 70.0M
1000K .......... .......... .......... .......... .......... 103M
1050K .......... .......... .......... .......... .......... 99.4M
1100K .......... .......... .......... .......... .......... 123M
1150K .......... .......... .......... .......... .......... 102M
1200K .......... .......... .......... .......... .......... 124M
1250K .......... .......... .......... .......... .......... 137M
1300K .......... .......... .......... .......... .......... 132M
1350K .......... .......... .......... .......... .......... 113M
1400K .......... .......... .......... .......... .......... 904K
1450K .......... .......... ........ 109M=0.9s
2022-09-28 14:55:06 (1.52 MB/s) - ‘nyc_zip_codes.zip’ saved [1514401]
azure
conf
eventlogs
ganglia
hadoop_accessed_config.lst
logs
nyc_zip_codes.zip
preload_class.lst
gdf.crs
Out[8]: <Derived Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US Survey feet)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
gdf.crs
Out[10]: <Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
df_zip = spark.createDataFrame(gdf.drop(['geometry'], axis=1, inplace=False)) print(f"count? {df_zip.count()}") print(df_zip.printSchema()) display(df_zip)
count? 263
root
|-- ZIPCODE: string (nullable = true)
|-- BLDGZIP: string (nullable = true)
|-- PO_NAME: string (nullable = true)
|-- POPULATION: double (nullable = true)
|-- AREA: double (nullable = true)
|-- STATE: string (nullable = true)
|-- COUNTY: string (nullable = true)
|-- ST_FIPS: string (nullable = true)
|-- CTY_FIPS: string (nullable = true)
|-- URL: string (nullable = true)
|-- SHAPE_AREA: double (nullable = true)
|-- SHAPE_LEN: double (nullable = true)
|-- geom_wkt: string (nullable = true)
None
Table
Truncated results, showing first 26 rows.
Setup Data in Unity Catalog
After copying (aka running this notebook), any future access to the prepared tables will be as simple as the following:
!!! NOTE: This must be run outside of DBSQL, e.g. from within Databricks Spark Clusters!!!