读写和解析简单的 nc 文件
NetCDF 文件格式在气象数据工程领域占据着举足轻重的地位,其结构灵活、强兼容性等优势使其成为该领域的一个标准。无论是从事学术研究还是工程实践,掌握这种数据格式变得越发重要。其次,我注意到目前社区中气象编程大多数课程都聚焦于某个特定库的使用方法,而鲜有以数据格式本身为中心的内容。因此,我决定将 NetCDF 数据格式置于核心位置,同时辅以 xarray、netCDF4、nco、cdo 等工具,共同构建本次培训的内容框架。
关卡 1 我们会由浅入深,假设从我们刚拿到一个陌生的 nc 文件开始,一步一步教大家如何快速查看 nc 文件的元信息、简单读取和创建 nc 文件,同时也会教大家理解 nc 文件中的一些重要的特点,帮助大家避免一些初学者常遇到的坑。
1.1 查看文件信息:使用 ncdump/ncinfo 等命令行
当我们拿到一个 netcdf 格式的文件,想立刻快速看一下这个文件的元信息(也就是有哪些变量,有哪些维度等),最快的方式是使用命令行。我们看一下下面这个例子:
ERA5_FP = '/home/mw/input/training_camp9055/era5.nc'
!ncinfo $ERA5_FP
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
CDI: Climate Data Interface version 2.3.0 (https://mpimet.mpg.de/cdi)
Conventions: CF-1.6
history: Fri Dec 01 02:13:30 2023: cdo setattribute,latitude@units=degrees_north era5.nc era5_1.nc
2023-09-16 06:32:48 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.internal-1694845966.6698666-25558-2-ac87386c-9688-494c-aad2-3f57cee71470.nc /cache/tmp/ac87386c-9688-494c-aad2-3f57cee71470-adaptor.mars.internal-1694845965.2818055-25558-3-tmp.grib
CDO: Climate Data Operators version 2.3.0 (https://mpimet.mpg.de/cdo)
dimensions(sizes): time(80), longitude(1440), latitude(721)
variables(dimensions): int32 time(time), float32 longitude(longitude), float32 latitude(latitude), int16 u10(time, latitude, longitude), int16 v10(time, latitude, longitude)
groups:
上面这个命令展示的是在 Jupyter 环境里执行终端命令的例子,也就是 !
语法,即你在 Jupyter 中如果想要执行一个 shell 命令,只需要在前面增加一个 !
即可。上面的例子里也就是相当于直接在终端执行 ncinfo ./era5.nc
。
可以看到,返回的结果里显示了这个文件的组织形式和关键信息。这个命令行得以正常运行的前提是已经安装了 netCDF4
这个包,你可以使用 conda install -c conda-forge netCDF4
等方式安装。我们看一下文件中比较重要的内容1:
dimensions(sizes): time(80), longitude(1440), latitude(721)
variables(dimensions): int32 time(time), float32 longitude(longitude), float32 latitude(latitude), int16 u10(time, latitude, longitude), int16 v10(time, latitude, longitude)
以上信息显示文件维度包含了时间 80 个元素、经度 1440 个元素,纬度 721 个元素。文件中存储的变量包含经度、纬度、时间、u10(10 米风的纬向分量)、v10(10 米风的经向分量)。变量名后面括号内是该变量的维度结构,比如 u10 变量的数组是由(时间、纬度、经度)的三维结构组成,也就是可以推断出它的数组形状应该为(80, 721, 1440),而 time 变量仅由时间维单独组成,即(80,)。
-
🐳:预览下方代码块时使光标悬浮在代码块上并左右拖动,即可完整显示内容↩
当然上面的命令只显示了最简略的信息,而变量内部的属性信息被省略了。如果你想查看变量内部的更多信息,可以执行下面的命令:
!ncinfo -v u10 $ERA5_FP # 查看 u10 变量的详细元信息
# 本地执行命令为 ncinfo -v u10 ./era5.nc
<class 'netCDF4._netCDF4.Variable'>
int16 u10(time, latitude, longitude)
long_name: 10 metre U wind component
units: m s**-1
add_offset: -2.1228631327102883
scale_factor: 0.000929512490889013
_FillValue: -32767
missing_value: -32767
unlimited dimensions: time
current shape = (80, 721, 1440)
filling off
除了 ncinfo
命令以外,我们还需要知道另一个很方便的命令可以实现类似的效果,那就是 ncdump
:
!ncdump -h $ERA5_FP
netcdf era5 {
dimensions:
time = UNLIMITED ; // (80 currently)
longitude = 1440 ;
latitude = 721 ;
variables:
int time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "hours since 1900-01-01" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
float longitude(longitude) ;
longitude:axis = "X" ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
short u10(time, latitude, longitude) ;
u10:long_name = "10 metre U wind component" ;
u10:units = "m s**-1" ;
u10:add_offset = -2.12286313271029 ;
u10:scale_factor = 0.000929512490889013 ;
u10:_FillValue = -32767s ;
u10:missing_value = -32767s ;
short v10(time, latitude, longitude) ;
v10:long_name = "10 metre V wind component" ;
v10:units = "m s**-1" ;
v10:add_offset = 1.84205168976882 ;
v10:scale_factor = 0.000878920267046397 ;
v10:_FillValue = -32767s ;
v10:missing_value = -32767s ;
// global attributes:
:CDI = "Climate Data Interface version 2.3.0 (https://mpimet.mpg.de/cdi)" ;
:Conventions = "CF-1.6" ;
:history = "Fri Dec 01 02:13:30 2023: cdo setattribute,latitude@units=degrees_north era5.nc era5_1.nc\n2023-09-16 06:32:48 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.internal-1694845966.6698666-25558-2-ac87386c-9688-494c-aad2-3f57cee71470.nc /cache/tmp/ac87386c-9688-494c-aad2-3f57cee71470-adaptor.mars.internal-1694845965.2818055-25558-3-tmp.grib" ;
:CDO = "Climate Data Operators version 2.3.0 (https://mpimet.mpg.de/cdo)" ;
}
ncdump
从名字可以看出来,它是要把数据内容“倒出来”。当我们增加 -h
的属性时(head
的缩写),它就会只显示元信息。可以看到,ncdump -h
显示的信息很全面,除了基本的变量信息以外,变量内部的元信息也一并显示出来了。当然这种方式会增加篇幅,如果你有一个变量巨多的 nc 文件,可能还是要考虑先用 ncinfo
浅看一下。
当然了 ncdump
这个命令并不是只能看这种程度的元信息,它其实是可以把整个文件内容都给你“倒出来”的。在我们查看 nc 文件的时候,很多时候需要大致看一眼它的时间范围、空间范围。这个时候,我们就可以用 -c
属性:
!ncdump -c $ERA5_FP
netcdf era5 {
dimensions:
time = UNLIMITED ; // (80 currently)
longitude = 1440 ;
latitude = 721 ;
variables:
int time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "hours since 1900-01-01" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
float longitude(longitude) ;
longitude:axis = "X" ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
short u10(time, latitude, longitude) ;
u10:long_name = "10 metre U wind component" ;
u10:units = "m s**-1" ;
u10:add_offset = -2.12286313271029 ;
u10:scale_factor = 0.000929512490889013 ;
u10:_FillValue = -32767s ;
u10:missing_value = -32767s ;
short v10(time, latitude, longitude) ;
v10:long_name = "10 metre V wind component" ;
v10:units = "m s**-1" ;
v10:add_offset = 1.84205168976882 ;
v10:scale_factor = 0.000878920267046397 ;
v10:_FillValue = -32767s ;
v10:missing_value = -32767s ;
// global attributes:
:CDI = "Climate Data Interface version 2.3.0 (https://mpimet.mpg.de/cdi)" ;
:Conventions = "CF-1.6" ;
:history = "Fri Dec 01 02:13:30 2023: cdo setattribute,latitude@units=degrees_north era5.nc era5_1.nc\n2023-09-16 06:32:48 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.internal-1694845966.6698666-25558-2-ac87386c-9688-494c-aad2-3f57cee71470.nc /cache/tmp/ac87386c-9688-494c-aad2-3f57cee71470-adaptor.mars.internal-1694845965.2818055-25558-3-tmp.grib" ;
:CDO = "Climate Data Operators version 2.3.0 (https://mpimet.mpg.de/cdo)" ;
data:
time = 1083792, 1083798, 1083804, 1083810, 1083816, 1083822, 1083828,
1083834, 1083840, 1083846, 1083852, 1083858, 1083864, 1083870, 1083876,
1083882, 1083888, 1083894, 1083900, 1083906, 1083912, 1083918, 1083924,
1083930, 1083936, 1083942, 1083948, 1083954, 1083960, 1083966, 1083972,
1083978, 1083984, 1083990, 1083996, 1084002, 1084008, 1084014, 1084020,
1084026, 1084032, 1084038, 1084044, 1084050, 1084056, 1084062, 1084068,
1084074, 1084080, 1084086, 1084092, 1084098, 1084104, 1084110, 1084116,
1084122, 1084128, 1084134, 1084140, 1084146, 1084152, 1084158, 1084164,
1084170, 1084176, 1084182, 1084188, 1084194, 1084200, 1084206, 1084212,
1084218, 1084224, 1084230, 1084236, 1084242, 1084248, 1084254, 1084260,
1084266 ;
longitude = -180, -179.75, -179.5, -179.25, -179, -178.75, -178.5, -178.25,
-178, -177.75, -177.5, -177.25, -177, -176.75, -176.5, -176.25, -176,
-175.75, -175.5, -175.25, -175, -174.75, -174.5, -174.25, -174, -173.75,
-173.5, -173.25, -173, -172.75, -172.5, -172.25, -172, -171.75, -171.5,
-171.25, -171, -170.75, -170.5, -170.25, -170, -169.75, -169.5, -169.25,
-169, -168.75, -168.5, -168.25, -168, -167.75, -167.5, -167.25, -167,
-166.75, -166.5, -166.25, -166, -165.75, -165.5, -165.25, -165, -164.75,
-164.5, -164.25, -164, -163.75, -163.5, -163.25, -163, -162.75, -162.5,
-162.25, -162, -161.75, -161.5, -161.25, -161, -160.75, -160.5, -160.25,
-160, -159.75, -159.5, -159.25, -159, -158.75, -158.5, -158.25, -158,
-157.75, -157.5, -157.25, -157, -156.75, -156.5, -156.25, -156, -155.75,
-155.5, -155.25, -155, -154.75, -154.5, -154.25, -154, -153.75, -153.5,
-153.25, -153, -152.75, -152.5, -152.25, -152, -151.75, -151.5, -151.25,
-151, -150.75, -150.5, -150.25, -150, -149.75, -149.5, -149.25, -149,
-148.75, -148.5, -148.25, -148, -147.75, -147.5, -147.25, -147, -146.75,
-146.5, -146.25, -146, -145.75, -145.5, -145.25, -145, -144.75, -144.5,
-144.25, -144, -143.75, -143.5, -143.25, -143, -142.75, -142.5, -142.25,
-142, -141.75, -141.5, -141.25, -141, -140.75, -140.5, -140.25, -140,
-139.75, -139.5, -139.25, -139, -138.75, -138.5, -138.25, -138, -137.75,
-137.5, -137.25, -137, -136.75, -136.5, -136.25, -136, -135.75, -135.5,
-135.25, -135, -134.75, -134.5, -134.25, -134, -133.75, -133.5, -133.25,
-133, -132.75, -132.5, -132.25, -132, -131.75, -131.5, -131.25, -131,
-130.75, -130.5, -130.25, -130, -129.75, -129.5, -129.25, -129, -128.75,
-128.5, -128.25, -128, -127.75, -127.5, -127.25, -127, -126.75, -126.5,
-126.25, -126, -125.75, -125.5, -125.25, -125, -124.75, -124.5, -124.25,
-124, -123.75, -123.5, -123.25, -123, -122.75, -122.5, -122.25, -122,
-121.75, -121.5, -121.25, -121, -120.75, -120.5, -120.25, -120, -119.75,
-119.5, -119.25, -119, -118.75, -118.5, -118.25, -118, -117.75, -117.5,
-117.25, -117, -116.75, -116.5, -116.25, -116, -115.75, -115.5, -115.25,
-115, -114.75, -114.5, -114.25, -114, -113.75, -113.5, -113.25, -113,
-112.75, -112.5, -112.25, -112, -111.75, -111.5, -111.25, -111, -110.75,
-110.5, -110.25, -110, -109.75, -109.5, -109.25, -109, -108.75, -108.5,
-108.25, -108, -107.75, -107.5, -107.25, -107, -106.75, -106.5, -106.25,
-106, -105.75, -105.5, -105.25, -105, -104.75, -104.5, -104.25, -104,
-103.75, -103.5, -103.25, -103, -102.75, -102.5, -102.25, -102, -101.75,
-101.5, -101.25, -101, -100.75, -100.5, -100.25, -100, -99.75, -99.5,
-99.25, -99, -98.75, -98.5, -98.25, -98, -97.75, -97.5, -97.25, -97,
-96.75, -96.5, -96.25, -96, -95.75, -95.5, -95.25, -95, -94.75, -94.5,
-94.25, -94, -93.75, -93.5, -93.25, -93, -92.75, -92.5, -92.25, -92,
-91.75, -91.5, -91.25, -91, -90.75, -90.5, -90.25, -90, -89.75, -89.5,
-89.25, -89, -88.75, -88.5, -88.25, -88, -87.75, -87.5, -87.25, -87,
-86.75, -86.5, -86.25, -86, -85.75, -85.5, -85.25, -85, -84.75, -84.5,
-84.25, -84, -83.75, -83.5, -83.25, -83, -82.75, -82.5, -82.25, -82,
-81.75, -81.5, -81.25, -81, -80.75, -80.5, -80.25, -80, -79.75, -79.5,
-79.25, -79, -78.75, -78.5, -78.25, -78, -77.75, -77.5, -77.25, -77,
-76.75, -76.5, -76.25, -76, -75.75, -75.5, -75.25, -75, -74.75, -74.5,
-74.25, -74, -73.75, -73.5, -73.25, -73, -72.75, -72.5, -72.25, -72,
-71.75, -71.5, -71.25, -71, -70.75, -70.5, -70.25, -70, -69.75, -69.5,
-69.25, -69, -68.75, -68.5, -68.25, -68, -67.75, -67.5, -67.25, -67,
-66.75, -66.5, -66.25, -66, -65.75, -65.5, -65.25, -65, -64.75, -64.5,
-64.25, -64, -63.75, -63.5, -63.25, -63, -62.75, -62.5, -62.25, -62,
-61.75, -61.5, -61.25, -61, -60.75, -60.5, -60.25, -60, -59.75, -59.5,
-59.25, -59, -58.75, -58.5, -58.25, -58, -57.75, -57.5, -57.25, -57,
-56.75, -56.5, -56.25, -56, -55.75, -55.5, -55.25, -55, -54.75, -54.5,
-54.25, -54, -53.75, -53.5, -53.25, -53, -52.75, -52.5, -52.25, -52,
-51.75, -51.5, -51.25, -51, -50.75, -50.5, -50.25, -50, -49.75, -49.5,
-49.25, -49, -48.75, -48.5, -48.25, -48, -47.75, -47.5, -47.25, -47,
-46.75, -46.5, -46.25, -46, -45.75, -45.5, -45.25, -45, -44.75, -44.5,
-44.25, -44, -43.75, -43.5, -43.25, -43, -42.75, -42.5, -42.25, -42,
-41.75, -41.5, -41.25, -41, -40.75, -40.5, -40.25, -40, -39.75, -39.5,
-39.25, -39, -38.75, -38.5, -38.25, -38, -37.75, -37.5, -37.25, -37,
-36.75, -36.5, -36.25, -36, -35.75, -35.5, -35.25, -35, -34.75, -34.5,
-34.25, -34, -33.75, -33.5, -33.25, -33, -32.75, -32.5, -32.25, -32,
-31.75, -31.5, -31.25, -31, -30.75, -30.5, -30.25, -30, -29.75, -29.5,
-29.25, -29, -28.75, -28.5, -28.25, -28, -27.75, -27.5, -27.25, -27,
-26.75, -26.5, -26.25, -26, -25.75, -25.5, -25.25, -25, -24.75, -24.5,
-24.25, -24, -23.75, -23.5, -23.25, -23, -22.75, -22.5, -22.25, -22,
-21.75, -21.5, -21.25, -21, -20.75, -20.5, -20.25, -20, -19.75, -19.5,
-19.25, -19, -18.75, -18.5, -18.25, -18, -17.75, -17.5, -17.25, -17,
-16.75, -16.5, -16.25, -16, -15.75, -15.5, -15.25, -15, -14.75, -14.5,
-14.25, -14, -13.75, -13.5, -13.25, -13, -12.75, -12.5, -12.25, -12,
-11.75, -11.5, -11.25, -11, -10.75, -10.5, -10.25, -10, -9.75, -9.5,
-9.25, -9, -8.75, -8.5, -8.25, -8, -7.75, -7.5, -7.25, -7, -6.75, -6.5,
-6.25, -6, -5.75, -5.5, -5.25, -5, -4.75, -4.5, -4.25, -4, -3.75, -3.5,
-3.25, -3, -2.75, -2.5, -2.25, -2, -1.75, -1.5, -1.25, -1, -0.75, -0.5,
-0.25, 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3,
3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5,
6.75, 7, 7.25, 7.5, 7.75, 8, 8.25, 8.5, 8.75, 9, 9.25, 9.5, 9.75, 10,
10.25, 10.5, 10.75, 11, 11.25, 11.5, 11.75, 12, 12.25, 12.5, 12.75, 13,
13.25, 13.5, 13.75, 14, 14.25, 14.5, 14.75, 15, 15.25, 15.5, 15.75, 16,
16.25, 16.5, 16.75, 17, 17.25, 17.5, 17.75, 18, 18.25, 18.5, 18.75, 19,
19.25, 19.5, 19.75, 20, 20.25, 20.5, 20.75, 21, 21.25, 21.5, 21.75, 22,
22.25, 22.5, 22.75, 23, 23.25, 23.5, 23.75, 24, 24.25, 24.5, 24.75, 25,
25.25, 25.5, 25.75, 26, 26.25, 26.5, 26.75, 27, 27.25, 27.5, 27.75, 28,
28.25, 28.5, 28.75, 29, 29.25, 29.5, 29.75, 30, 30.25, 30.5, 30.75, 31,
31.25, 31.5, 31.75, 32, 32.25, 32.5, 32.75, 33, 33.25, 33.5, 33.75, 34,
34.25, 34.5, 34.75, 35, 35.25, 35.5, 35.75, 36, 36.25, 36.5, 36.75, 37,
37.25, 37.5, 37.75, 38, 38.25, 38.5, 38.75, 39, 39.25, 39.5, 39.75, 40,
40.25, 40.5, 40.75, 41, 41.25, 41.5, 41.75, 42, 42.25, 42.5, 42.75, 43,
43.25, 43.5, 43.75, 44, 44.25, 44.5, 44.75, 45, 45.25, 45.5, 45.75, 46,
46.25, 46.5, 46.75, 47, 47.25, 47.5, 47.75, 48, 48.25, 48.5, 48.75, 49,
49.25, 49.5, 49.75, 50, 50.25, 50.5, 50.75, 51, 51.25, 51.5, 51.75, 52,
52.25, 52.5, 52.75, 53, 53.25, 53.5, 53.75, 54, 54.25, 54.5, 54.75, 55,
55.25, 55.5, 55.75, 56, 56.25, 56.5, 56.75, 57, 57.25, 57.5, 57.75, 58,
58.25, 58.5, 58.75, 59, 59.25, 59.5, 59.75, 60, 60.25, 60.5, 60.75, 61,
61.25, 61.5, 61.75, 62, 62.25, 62.5, 62.75, 63, 63.25, 63.5, 63.75, 64,
64.25, 64.5, 64.75, 65, 65.25, 65.5, 65.75, 66, 66.25, 66.5, 66.75, 67,
67.25, 67.5, 67.75, 68, 68.25, 68.5, 68.75, 69, 69.25, 69.5, 69.75, 70,
70.25, 70.5, 70.75, 71, 71.25, 71.5, 71.75, 72, 72.25, 72.5, 72.75, 73,
73.25, 73.5, 73.75, 74, 74.25, 74.5, 74.75, 75, 75.25, 75.5, 75.75, 76,
76.25, 76.5, 76.75, 77, 77.25, 77.5, 77.75, 78, 78.25, 78.5, 78.75, 79,
79.25, 79.5, 79.75, 80, 80.25, 80.5, 80.75, 81, 81.25, 81.5, 81.75, 82,
82.25, 82.5, 82.75, 83, 83.25, 83.5, 83.75, 84, 84.25, 84.5, 84.75, 85,
85.25, 85.5, 85.75, 86, 86.25, 86.5, 86.75, 87, 87.25, 87.5, 87.75, 88,
88.25, 88.5, 88.75, 89, 89.25, 89.5, 89.75, 90, 90.25, 90.5, 90.75, 91,
91.25, 91.5, 91.75, 92, 92.25, 92.5, 92.75, 93, 93.25, 93.5, 93.75, 94,
94.25, 94.5, 94.75, 95, 95.25, 95.5, 95.75, 96, 96.25, 96.5, 96.75, 97,
97.25, 97.5, 97.75, 98, 98.25, 98.5, 98.75, 99, 99.25, 99.5, 99.75, 100,
100.25, 100.5, 100.75, 101, 101.25, 101.5, 101.75, 102, 102.25, 102.5,
102.75, 103, 103.25, 103.5, 103.75, 104, 104.25, 104.5, 104.75, 105,
105.25, 105.5, 105.75, 106, 106.25, 106.5, 106.75, 107, 107.25, 107.5,
107.75, 108, 108.25, 108.5, 108.75, 109, 109.25, 109.5, 109.75, 110,
110.25, 110.5, 110.75, 111, 111.25, 111.5, 111.75, 112, 112.25, 112.5,
112.75, 113, 113.25, 113.5, 113.75, 114, 114.25, 114.5, 114.75, 115,
115.25, 115.5, 115.75, 116, 116.25, 116.5, 116.75, 117, 117.25, 117.5,
117.75, 118, 118.25, 118.5, 118.75, 119, 119.25, 119.5, 119.75, 120,
120.25, 120.5, 120.75, 121, 121.25, 121.5, 121.75, 122, 122.25, 122.5,
122.75, 123, 123.25, 123.5, 123.75, 124, 124.25, 124.5, 124.75, 125,
125.25, 125.5, 125.75, 126, 126.25, 126.5, 126.75, 127, 127.25, 127.5,
127.75, 128, 128.25, 128.5, 128.75, 129, 129.25, 129.5, 129.75, 130,
130.25, 130.5, 130.75, 131, 131.25, 131.5, 131.75, 132, 132.25, 132.5,
132.75, 133, 133.25, 133.5, 133.75, 134, 134.25, 134.5, 134.75, 135,
135.25, 135.5, 135.75, 136, 136.25, 136.5, 136.75, 137, 137.25, 137.5,
137.75, 138, 138.25, 138.5, 138.75, 139, 139.25, 139.5, 139.75, 140,
140.25, 140.5, 140.75, 141, 141.25, 141.5, 141.75, 142, 142.25, 142.5,
142.75, 143, 143.25, 143.5, 143.75, 144, 144.25, 144.5, 144.75, 145,
145.25, 145.5, 145.75, 146, 146.25, 146.5, 146.75, 147, 147.25, 147.5,
147.75, 148, 148.25, 148.5, 148.75, 149, 149.25, 149.5, 149.75, 150,
150.25, 150.5, 150.75, 151, 151.25, 151.5, 151.75, 152, 152.25, 152.5,
152.75, 153, 153.25, 153.5, 153.75, 154, 154.25, 154.5, 154.75, 155,
155.25, 155.5, 155.75, 156, 156.25, 156.5, 156.75, 157, 157.25, 157.5,
157.75, 158, 158.25, 158.5, 158.75, 159, 159.25, 159.5, 159.75, 160,
160.25, 160.5, 160.75, 161, 161.25, 161.5, 161.75, 162, 162.25, 162.5,
162.75, 163, 163.25, 163.5, 163.75, 164, 164.25, 164.5, 164.75, 165,
165.25, 165.5, 165.75, 166, 166.25, 166.5, 166.75, 167, 167.25, 167.5,
167.75, 168, 168.25, 168.5, 168.75, 169, 169.25, 169.5, 169.75, 170,
170.25, 170.5, 170.75, 171, 171.25, 171.5, 171.75, 172, 172.25, 172.5,
172.75, 173, 173.25, 173.5, 173.75, 174, 174.25, 174.5, 174.75, 175,
175.25, 175.5, 175.75, 176, 176.25, 176.5, 176.75, 177, 177.25, 177.5,
177.75, 178, 178.25, 178.5, 178.75, 179, 179.25, 179.5, 179.75 ;
latitude = 90, 89.75, 89.5, 89.25, 89, 88.75, 88.5, 88.25, 88, 87.75, 87.5,
87.25, 87, 86.75, 86.5, 86.25, 86, 85.75, 85.5, 85.25, 85, 84.75, 84.5,
84.25, 84, 83.75, 83.5, 83.25, 83, 82.75, 82.5, 82.25, 82, 81.75, 81.5,
81.25, 81, 80.75, 80.5, 80.25, 80, 79.75, 79.5, 79.25, 79, 78.75, 78.5,
78.25, 78, 77.75, 77.5, 77.25, 77, 76.75, 76.5, 76.25, 76, 75.75, 75.5,
75.25, 75, 74.75, 74.5, 74.25, 74, 73.75, 73.5, 73.25, 73, 72.75, 72.5,
72.25, 72, 71.75, 71.5, 71.25, 71, 70.75, 70.5, 70.25, 70, 69.75, 69.5,
69.25, 69, 68.75, 68.5, 68.25, 68, 67.75, 67.5, 67.25, 67, 66.75, 66.5,
66.25, 66, 65.75, 65.5, 65.25, 65, 64.75, 64.5, 64.25, 64, 63.75, 63.5,
63.25, 63, 62.75, 62.5, 62.25, 62, 61.75, 61.5, 61.25, 61, 60.75, 60.5,
60.25, 60, 59.75, 59.5, 59.25, 59, 58.75, 58.5, 58.25, 58, 57.75, 57.5,
57.25, 57, 56.75, 56.5, 56.25, 56, 55.75, 55.5, 55.25, 55, 54.75, 54.5,
54.25, 54, 53.75, 53.5, 53.25, 53, 52.75, 52.5, 52.25, 52, 51.75, 51.5,
51.25, 51, 50.75, 50.5, 50.25, 50, 49.75, 49.5, 49.25, 49, 48.75, 48.5,
48.25, 48, 47.75, 47.5, 47.25, 47, 46.75, 46.5, 46.25, 46, 45.75, 45.5,
45.25, 45, 44.75, 44.5, 44.25, 44, 43.75, 43.5, 43.25, 43, 42.75, 42.5,
42.25, 42, 41.75, 41.5, 41.25, 41, 40.75, 40.5, 40.25, 40, 39.75, 39.5,
39.25, 39, 38.75, 38.5, 38.25, 38, 37.75, 37.5, 37.25, 37, 36.75, 36.5,
36.25, 36, 35.75, 35.5, 35.25, 35, 34.75, 34.5, 34.25, 34, 33.75, 33.5,
33.25, 33, 32.75, 32.5, 32.25, 32, 31.75, 31.5, 31.25, 31, 30.75, 30.5,
30.25, 30, 29.75, 29.5, 29.25, 29, 28.75, 28.5, 28.25, 28, 27.75, 27.5,
27.25, 27, 26.75, 26.5, 26.25, 26, 25.75, 25.5, 25.25, 25, 24.75, 24.5,
24.25, 24, 23.75, 23.5, 23.25, 23, 22.75, 22.5, 22.25, 22, 21.75, 21.5,
21.25, 21, 20.75, 20.5, 20.25, 20, 19.75, 19.5, 19.25, 19, 18.75, 18.5,
18.25, 18, 17.75, 17.5, 17.25, 17, 16.75, 16.5, 16.25, 16, 15.75, 15.5,
15.25, 15, 14.75, 14.5, 14.25, 14, 13.75, 13.5, 13.25, 13, 12.75, 12.5,
12.25, 12, 11.75, 11.5, 11.25, 11, 10.75, 10.5, 10.25, 10, 9.75, 9.5,
9.25, 9, 8.75, 8.5, 8.25, 8, 7.75, 7.5, 7.25, 7, 6.75, 6.5, 6.25, 6,
5.75, 5.5, 5.25, 5, 4.75, 4.5, 4.25, 4, 3.75, 3.5, 3.25, 3, 2.75, 2.5,
2.25, 2, 1.75, 1.5, 1.25, 1, 0.75, 0.5, 0.25, 0, -0.25, -0.5, -0.75, -1,
-1.25, -1.5, -1.75, -2, -2.25, -2.5, -2.75, -3, -3.25, -3.5, -3.75, -4,
-4.25, -4.5, -4.75, -5, -5.25, -5.5, -5.75, -6, -6.25, -6.5, -6.75, -7,
-7.25, -7.5, -7.75, -8, -8.25, -8.5, -8.75, -9, -9.25, -9.5, -9.75, -10,
-10.25, -10.5, -10.75, -11, -11.25, -11.5, -11.75, -12, -12.25, -12.5,
-12.75, -13, -13.25, -13.5, -13.75, -14, -14.25, -14.5, -14.75, -15,
-15.25, -15.5, -15.75, -16, -16.25, -16.5, -16.75, -17, -17.25, -17.5,
-17.75, -18, -18.25, -18.5, -18.75, -19, -19.25, -19.5, -19.75, -20,
-20.25, -20.5, -20.75, -21, -21.25, -21.5, -21.75, -22, -22.25, -22.5,
-22.75, -23, -23.25, -23.5, -23.75, -24, -24.25, -24.5, -24.75, -25,
-25.25, -25.5, -25.75, -26, -26.25, -26.5, -26.75, -27, -27.25, -27.5,
-27.75, -28, -28.25, -28.5, -28.75, -29, -29.25, -29.5, -29.75, -30,
-30.25, -30.5, -30.75, -31, -31.25, -31.5, -31.75, -32, -32.25, -32.5,
-32.75, -33, -33.25, -33.5, -33.75, -34, -34.25, -34.5, -34.75, -35,
-35.25, -35.5, -35.75, -36, -36.25, -36.5, -36.75, -37, -37.25, -37.5,
-37.75, -38, -38.25, -38.5, -38.75, -39, -39.25, -39.5, -39.75, -40,
-40.25, -40.5, -40.75, -41, -41.25, -41.5, -41.75, -42, -42.25, -42.5,
-42.75, -43, -43.25, -43.5, -43.75, -44, -44.25, -44.5, -44.75, -45,
-45.25, -45.5, -45.75, -46, -46.25, -46.5, -46.75, -47, -47.25, -47.5,
-47.75, -48, -48.25, -48.5, -48.75, -49, -49.25, -49.5, -49.75, -50,
-50.25, -50.5, -50.75, -51, -51.25, -51.5, -51.75, -52, -52.25, -52.5,
-52.75, -53, -53.25, -53.5, -53.75, -54, -54.25, -54.5, -54.75, -55,
-55.25, -55.5, -55.75, -56, -56.25, -56.5, -56.75, -57, -57.25, -57.5,
-57.75, -58, -58.25, -58.5, -58.75, -59, -59.25, -59.5, -59.75, -60,
-60.25, -60.5, -60.75, -61, -61.25, -61.5, -61.75, -62, -62.25, -62.5,
-62.75, -63, -63.25, -63.5, -63.75, -64, -64.25, -64.5, -64.75, -65,
-65.25, -65.5, -65.75, -66, -66.25, -66.5, -66.75, -67, -67.25, -67.5,
-67.75, -68, -68.25, -68.5, -68.75, -69, -69.25, -69.5, -69.75, -70,
-70.25, -70.5, -70.75, -71, -71.25, -71.5, -71.75, -72, -72.25, -72.5,
-72.75, -73, -73.25, -73.5, -73.75, -74, -74.25, -74.5, -74.75, -75,
-75.25, -75.5, -75.75, -76, -76.25, -76.5, -76.75, -77, -77.25, -77.5,
-77.75, -78, -78.25, -78.5, -78.75, -79, -79.25, -79.5, -79.75, -80,
-80.25, -80.5, -80.75, -81, -81.25, -81.5, -81.75, -82, -82.25, -82.5,
-82.75, -83, -83.25, -83.5, -83.75, -84, -84.25, -84.5, -84.75, -85,
-85.25, -85.5, -85.75, -86, -86.25, -86.5, -86.75, -87, -87.25, -87.5,
-87.75, -88, -88.25, -88.5, -88.75, -89, -89.25, -89.5, -89.75, -90 ;
}
可以看到,它把维度上的值都给你打出来了,而没有打出变量的值,这个命令可以让你快速了解你所要处理的 nc 文件的经纬度坐标大概是在哪个范围。当然,如果你的 nc 文件经纬度坐标非常的巨大,那么不建议直接像上面这样执行,否则就被刷屏了。可以尝试使用管道符 |
加 more
来限制一次性输出:ncdump -c ./era5.nc | more
如果要查看某一个变量的值,可以执行:
!ncdump -v u10 $ERA5_FP | head -n 50
netcdf era5 {
dimensions:
time = UNLIMITED ; // (80 currently)
longitude = 1440 ;
latitude = 721 ;
variables:
int time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "hours since 1900-01-01" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
float longitude(longitude) ;
longitude:axis = "X" ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
short u10(time, latitude, longitude) ;
u10:long_name = "10 metre U wind component" ;
u10:units = "m s**-1" ;
u10:add_offset = -2.12286313271029 ;
u10:scale_factor = 0.000929512490889013 ;
u10:_FillValue = -32767s ;
u10:missing_value = -32767s ;
short v10(time, latitude, longitude) ;
v10:long_name = "10 metre V wind component" ;
v10:units = "m s**-1" ;
v10:add_offset = 1.84205168976882 ;
v10:scale_factor = 0.000878920267046397 ;
v10:_FillValue = -32767s ;
v10:missing_value = -32767s ;
// global attributes:
:CDI = "Climate Data Interface version 2.3.0 (https://mpimet.mpg.de/cdi)" ;
:Conventions = "CF-1.6" ;
:history = "Fri Dec 01 02:13:30 2023: cdo setattribute,latitude@units=degrees_north era5.nc era5_1.nc\n2023-09-16 06:32:48 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.internal-1694845966.6698666-25558-2-ac87386c-9688-494c-aad2-3f57cee71470.nc /cache/tmp/ac87386c-9688-494c-aad2-3f57cee71470-adaptor.mars.internal-1694845965.2818055-25558-3-tmp.grib" ;
:CDO = "Climate Data Operators version 2.3.0 (https://mpimet.mpg.de/cdo)" ;
data:
u10 =
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845, 8845,
由于数据量过于巨大,因此上面的命令使用 Unix 管道符将输出结果限制在了 50 行以内。
可以看到输出的 u10 的数值好像很反直觉(值全是 8845),但是事实上这个数据并非错误,而是涉及到一个打包和解包的过程,这里暂时不表,后面会专门讲到。
1.2 读取:使用 netcdf4-python、xarray 等常⻅工具
在简要查看了 nc 文件之后,我们如何在 Python 程序中读取 nc 文件呢?其实方法有很多,在这里我介绍两个比较主流的方法:使用 xarray 和 netCDF4-python。
使用 netCDF4-python
我们先来介绍基于 netCDF4-python 包的方法,netCDF4-python 包是由 C 版本 netCDF 模块包的 Python 版 API 接口。在 Python 环境下安装方法是 pip install netCDF4
。
我们来看看如何用 netCDF4 简单读取 nc 文件:
import netCDF4 as nc
ds = nc.Dataset(ERA5_FP)
ds
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
CDI: Climate Data Interface version 2.3.0 (https://mpimet.mpg.de/cdi)
Conventions: CF-1.6
history: Fri Dec 01 02:13:30 2023: cdo setattribute,latitude@units=degrees_north era5.nc era5_1.nc
2023-09-16 06:32:48 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data7/adaptor.mars.internal-1694845966.6698666-25558-2-ac87386c-9688-494c-aad2-3f57cee71470.nc /cache/tmp/ac87386c-9688-494c-aad2-3f57cee71470-adaptor.mars.internal-1694845965.2818055-25558-3-tmp.grib
CDO: Climate Data Operators version 2.3.0 (https://mpimet.mpg.de/cdo)
dimensions(sizes): time(80), longitude(1440), latitude(721)
variables(dimensions): int32 time(time), float32 longitude(longitude), float32 latitude(latitude), int16 u10(time, latitude, longitude), int16 v10(time, latitude, longitude)
groups:
从上面可以看到 dataset 数据中的变量和维度信息,这个信息和之前 !ncinfo ./era5.nc
显示的结果一样。
如果我们想要读取某一个变量,可以按以下方式读取:
u10 = ds.variables['u10']
u10
<class 'netCDF4._netCDF4.Variable'>
int16 u10(time, latitude, longitude)
long_name: 10 metre U wind component
units: m s**-1
add_offset: -2.1228631327102883
scale_factor: 0.000929512490889013
_FillValue: -32767
missing_value: -32767
unlimited dimensions: time
current shape = (80, 721, 1440)
filling off
上述代码选取了 u10
这个要素,当我们直接打印时显示的是该要素的元信息。如果我们要取其中的值,可以这样执行:
u10_data = u10[:]
u10_data
masked_array(
data=[[[ 6.09867485e+00, 6.09867485e+00, 6.09867485e+00, ...,
6.09867485e+00, 6.09867485e+00, 6.09867485e+00],
[ 3.15119074e+00, 3.13724805e+00, 3.12237585e+00, ...,
3.19394832e+00, 3.18000563e+00, 3.16606294e+00],
[ 2.89185676e+00, 2.86397138e+00, 2.83515649e+00, ...,
2.97830142e+00, 2.94948653e+00, 2.92160116e+00],
...,
[ 1.26613941e+00, 1.26056233e+00, 1.25312623e+00, ...,
1.28565917e+00, 1.27915258e+00, 1.27264600e+00],
[ 1.16575206e+00, 1.16296352e+00, 1.16017499e+00, ...,
1.17504718e+00, 1.17225865e+00, 1.16854060e+00],
[ 3.29991274e+00, 3.29991274e+00, 3.29991274e+00, ...,
3.29991274e+00, 3.29991274e+00, 3.29991274e+00]],
[[ 4.51478556e+00, 4.51478556e+00, 4.51478556e+00, ...,
4.51478556e+00, 4.51478556e+00, 4.51478556e+00],
[ 2.25142265e+00, 2.23933899e+00, 2.22725532e+00, ...,
2.28767364e+00, 2.27651949e+00, 2.26350631e+00],
[ 2.60184886e+00, 2.57675202e+00, 2.55258470e+00, ...,
2.67435083e+00, 2.65018351e+00, 2.62601618e+00],
...,
[ 8.71096600e-01, 8.69237575e-01, 8.66449038e-01, ...,
8.79462213e-01, 8.76673675e-01, 8.73885138e-01],
[ 1.42601556e+00, 1.42415653e+00, 1.42229751e+00, ...,
1.42973361e+00, 1.42787458e+00, 1.42694507e+00],
[ 3.27202736e+00, 3.27202736e+00, 3.27202736e+00, ...,
3.27202736e+00, 3.27202736e+00, 3.27202736e+00]],
[[ 3.90409586e+00, 3.90409586e+00, 3.90409586e+00, ...,
3.90409586e+00, 3.90409586e+00, 3.90409586e+00],
[ 1.93167035e+00, 1.92144572e+00, 1.91122108e+00, ...,
1.96420329e+00, 1.95304914e+00, 1.94282450e+00],
[ 2.01439696e+00, 1.99301818e+00, 1.97163939e+00, ...,
2.07853333e+00, 2.05715454e+00, 2.03484624e+00],
...,
[ 1.34235943e+00, 1.33213480e+00, 1.32098065e+00, ...,
1.37210383e+00, 1.36280871e+00, 1.35165456e+00],
[ 1.76993518e+00, 1.76435810e+00, 1.75971054e+00, ...,
1.78573689e+00, 1.77923030e+00, 1.77458274e+00],
[ 4.89867422e+00, 4.89867422e+00, 4.89867422e+00, ...,
4.89867422e+00, 4.89867422e+00, 4.89867422e+00]],
...,
[[-3.65191118e+00, -3.65191118e+00, -3.65191118e+00, ...,
-3.65191118e+00, -3.65191118e+00, -3.65191118e+00],
[-2.23068658e+00, -2.22510951e+00, -2.21767341e+00, ...,
-2.24927683e+00, -2.24369976e+00, -2.23719317e+00],
[-1.79288620e+00, -1.78080254e+00, -1.76778936e+00, ...,
-1.83006670e+00, -1.81798304e+00, -1.80496986e+00],
...,
[-1.64044615e+00, -1.63858712e+00, -1.63486907e+00, ...,
-1.64974127e+00, -1.64695274e+00, -1.64416420e+00],
[-4.04194537e-01, -4.03265025e-01, -4.01406000e-01, ...,
-4.09771612e-01, -4.07912587e-01, -4.06983075e-01],
[-2.70344738e-01, -2.70344738e-01, -2.70344738e-01, ...,
-2.70344738e-01, -2.70344738e-01, -2.70344738e-01]],
[[-2.39335127e+00, -2.39335127e+00, -2.39335127e+00, ...,
-2.39335127e+00, -2.39335127e+00, -2.39335127e+00],
[-3.16820363e-01, -3.08454750e-01, -2.98230113e-01, ...,
-3.44705738e-01, -3.34481100e-01, -3.26115488e-01],
[ 4.29009711e-02, 5.96321959e-02, 7.91519582e-02, ...,
-1.28697784e-02, 5.72047143e-03, 2.43107212e-02],
...,
[-1.28129327e-01, -1.17904690e-01, -1.07680052e-01, ...,
-1.59732752e-01, -1.49508115e-01, -1.38353965e-01],
[-1.94124714e-01, -1.90406664e-01, -1.84829589e-01, ...,
-2.10855939e-01, -2.05278864e-01, -1.99701789e-01],
[ 3.84032055e-01, 3.84032055e-01, 3.84032055e-01, ...,
3.84032055e-01, 3.84032055e-01, 3.84032055e-01]],
[[-3.02263122e+00, -3.02263122e+00, -3.02263122e+00, ...,
-3.02263122e+00, -3.02263122e+00, -3.02263122e+00],
[-6.78400722e-01, -6.66317059e-01, -6.55162910e-01, ...,
-7.14651709e-01, -7.03497559e-01, -6.90484384e-01],
[-6.42149735e-01, -6.17052897e-01, -5.94744598e-01, ...,
-7.12792684e-01, -6.89554872e-01, -6.65387547e-01],
...,
[ 3.15026123e+00, 3.14747269e+00, 3.14375464e+00, ...,
3.16234489e+00, 3.15862684e+00, 3.15490879e+00],
[ 2.87419602e+00, 2.87140748e+00, 2.86954846e+00, ...,
2.87791407e+00, 2.87605504e+00, 2.87512553e+00],
[ 1.97535744e+00, 1.97535744e+00, 1.97535744e+00, ...,
1.97535744e+00, 1.97535744e+00, 1.97535744e+00]]],
mask=False,
fill_value=1e+20)
如果我们想要获取变量的属性,可以按照以下两种方法获取:
print(u10.getncattr('units')) # 函数法
print(u10.units) # 属性法
m s**-1
m s**-1
这里我比较推荐使用函数法,因为这种方法可以将属性名称作为变量传入而避免成为硬编码,属性法的写法是一种硬编码。
硬编码是一种有缺陷的编程习惯,它会降低代码模块的可复用性。以上述获取变量属性为例,如果我们使用硬编码的属性方式获取
units
属性,如果后续想要获取其他属性,则需要通过“修改源码”的方式进行,但如果我们使用函数法获取属性,那么属性的 key 就可以在封装时作为一种参数进行传递,在需要获取的时候只需要更换为想要的 key 即可,如果辅以循环逻辑,可以非常灵活地变更所要获取属性的内容和数量。类似地,当我们要给一个变量赋值属性时,如果使用属性法,就需要根据需要把每一个属性都写在源码里,并且一旦属性内容变更,就要修改源码;而如果我们使用函数法,可以将属性内容以字典的方式通过参数传入,如果属性内容有变更,无需修改源码只需要修改传递的参数即可。这种设计思路也是依据软件编程范式中大名鼎鼎的开闭原则
使用 xarray
下面我们再来使用 xarray 读取 nc 文件。xarray 是基于各种格点数据底层 IO 包的高级封装。在读取 netcdf 数据时,通常也是基于 netcdf4 包作为依赖。
import xarray as xr
ds = xr.open_dataset(ERA5_FP)
ds
/opt/conda/lib/python3.9/site-packages/xarray/backends/plugins.py:71: RuntimeWarning: Engine 'cfradial1' loading failed:
cannot import name 'ParasiteAxesAuxTrans' from 'mpl_toolkits.axisartist' (/opt/conda/lib/python3.9/site-packages/mpl_toolkits/axisartist/__init__.py)
warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
要获取某个要素:
u10 = ds['u10']
u10
如果要获取要素的数组:
print(u10.values)
得到的数组实质上是 numpy 的 ndarray 类型数组
print(type(u10.values))
<class 'numpy.ndarray'>
如果要获取要素的属性,方法和 netCDF4 包里的方法类似,有两种方法:
print(u10.attrs.get('units')) # 函数法
print(u10.units) # 属性法
m s**-1
m s**-1
netCDF-4 ,还是 xarray?
xarray 相对 netcdf4 来说,有更高级和方便的封装,例如通过直接传递经纬度的方式在空间取值:
u10.sel(longitude=130, latitude=40, method='nearest')
或者用时间信息直接取值:
u10.sel(time='2023-08-22 00:00:00')
这些高级封装都让使用者获得了极其舒适的使用体验,这也让 xarray 成为科研人员处理格点数据时的首选模块包。对于科研人员来说,面对的数据处理情况相对简单,但对于工程师来说,处理 nc 文件所要面临的问题会更复杂,因此如果只掌握 xarray 是无法解决工作中遇到的所有问题的,所以仍需要学会更底层的 netcdf 包的相关技巧。
1.3 解析:时间维度、数据值
初学者在读取 nc 文件时,可能会在时间信息解析或者数据解包问题上遇到麻烦,下面我们就来看一下如果正确解析时间信息以及如何对数据进行正确的解包操作。
使用 netCDF4 时,正确解析时间维度信息的方法
上面我们可以看到使用 xarray 读取的数据,在对时间的选取上相对直观,但是在 netcdf4 包里,对时间的处理就会略显复杂。
ds = nc.Dataset(ERA5_FP)
time = ds.variables['time'][:]
time
masked_array(data=[1083792, 1083798, 1083804, 1083810, 1083816, 1083822,
1083828, 1083834, 1083840, 1083846, 1083852, 1083858,
1083864, 1083870, 1083876, 1083882, 1083888, 1083894,
1083900, 1083906, 1083912, 1083918, 1083924, 1083930,
1083936, 1083942, 1083948, 1083954, 1083960, 1083966,
1083972, 1083978, 1083984, 1083990, 1083996, 1084002,
1084008, 1084014, 1084020, 1084026, 1084032, 1084038,
1084044, 1084050, 1084056, 1084062, 1084068, 1084074,
1084080, 1084086, 1084092, 1084098, 1084104, 1084110,
1084116, 1084122, 1084128, 1084134, 1084140, 1084146,
1084152, 1084158, 1084164, 1084170, 1084176, 1084182,
1084188, 1084194, 1084200, 1084206, 1084212, 1084218,
1084224, 1084230, 1084236, 1084242, 1084248, 1084254,
1084260, 1084266],
mask=False,
fill_value=999999,
dtype=int32)
可以看到,netcdf4 包读取的时间直接读取是一串整数,并不是直观的时间字符串。如果要正确解析它的时间信息,需要用一些专门的方法进行转换:
dts = nc.num2date(time, ds.variables['time'].units)
dts
masked_array(data=[cftime.DatetimeGregorian(2023, 8, 22, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 22, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 22, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 22, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 23, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 23, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 23, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 23, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 24, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 24, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 24, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 24, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 25, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 25, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 25, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 25, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 26, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 26, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 26, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 26, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 27, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 27, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 27, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 27, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 28, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 28, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 28, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 28, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 29, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 29, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 29, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 29, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 30, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 30, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 30, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 30, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 31, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 31, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 31, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 8, 31, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 1, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 1, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 1, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 2, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 2, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 2, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 2, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 3, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 3, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 3, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 3, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 4, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 4, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 4, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 5, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 5, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 5, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 6, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 6, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 6, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 6, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 7, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 7, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 7, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 8, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 8, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 8, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 9, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 9, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 9, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 9, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 10, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 10, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 10, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2023, 9, 10, 18, 0, 0, 0, has_year_zero=False)],
mask=False,
fill_value='?',
dtype=object)
如果想以 Python 的 datetime 对象形式来存储返回值,可以增加 only_use_python_datetimes
和 only_use_cftime_datetimes
来强制将结果按照符合 Python datetime 对象的形式返回。
dts = nc.num2date(time, ds.variables['time'].units, only_use_python_datetimes=True, only_use_cftime_datetimes=False)
dts
masked_array(data=[real_datetime(2023, 8, 22, 0, 0),
real_datetime(2023, 8, 22, 6, 0),
real_datetime(2023, 8, 22, 12, 0),
real_datetime(2023, 8, 22, 18, 0),
real_datetime(2023, 8, 23, 0, 0),
real_datetime(2023, 8, 23, 6, 0),
real_datetime(2023, 8, 23, 12, 0),
real_datetime(2023, 8, 23, 18, 0),
real_datetime(2023, 8, 24, 0, 0),
real_datetime(2023, 8, 24, 6, 0),
real_datetime(2023, 8, 24, 12, 0),
real_datetime(2023, 8, 24, 18, 0),
real_datetime(2023, 8, 25, 0, 0),
real_datetime(2023, 8, 25, 6, 0),
real_datetime(2023, 8, 25, 12, 0),
real_datetime(2023, 8, 25, 18, 0),
real_datetime(2023, 8, 26, 0, 0),
real_datetime(2023, 8, 26, 6, 0),
real_datetime(2023, 8, 26, 12, 0),
real_datetime(2023, 8, 26, 18, 0),
real_datetime(2023, 8, 27, 0, 0),
real_datetime(2023, 8, 27, 6, 0),
real_datetime(2023, 8, 27, 12, 0),
real_datetime(2023, 8, 27, 18, 0),
real_datetime(2023, 8, 28, 0, 0),
real_datetime(2023, 8, 28, 6, 0),
real_datetime(2023, 8, 28, 12, 0),
real_datetime(2023, 8, 28, 18, 0),
real_datetime(2023, 8, 29, 0, 0),
real_datetime(2023, 8, 29, 6, 0),
real_datetime(2023, 8, 29, 12, 0),
real_datetime(2023, 8, 29, 18, 0),
real_datetime(2023, 8, 30, 0, 0),
real_datetime(2023, 8, 30, 6, 0),
real_datetime(2023, 8, 30, 12, 0),
real_datetime(2023, 8, 30, 18, 0),
real_datetime(2023, 8, 31, 0, 0),
real_datetime(2023, 8, 31, 6, 0),
real_datetime(2023, 8, 31, 12, 0),
real_datetime(2023, 8, 31, 18, 0),
real_datetime(2023, 9, 1, 0, 0),
real_datetime(2023, 9, 1, 6, 0),
real_datetime(2023, 9, 1, 12, 0),
real_datetime(2023, 9, 1, 18, 0),
real_datetime(2023, 9, 2, 0, 0),
real_datetime(2023, 9, 2, 6, 0),
real_datetime(2023, 9, 2, 12, 0),
real_datetime(2023, 9, 2, 18, 0),
real_datetime(2023, 9, 3, 0, 0),
real_datetime(2023, 9, 3, 6, 0),
real_datetime(2023, 9, 3, 12, 0),
real_datetime(2023, 9, 3, 18, 0),
real_datetime(2023, 9, 4, 0, 0),
real_datetime(2023, 9, 4, 6, 0),
real_datetime(2023, 9, 4, 12, 0),
real_datetime(2023, 9, 4, 18, 0),
real_datetime(2023, 9, 5, 0, 0),
real_datetime(2023, 9, 5, 6, 0),
real_datetime(2023, 9, 5, 12, 0),
real_datetime(2023, 9, 5, 18, 0),
real_datetime(2023, 9, 6, 0, 0),
real_datetime(2023, 9, 6, 6, 0),
real_datetime(2023, 9, 6, 12, 0),
real_datetime(2023, 9, 6, 18, 0),
real_datetime(2023, 9, 7, 0, 0),
real_datetime(2023, 9, 7, 6, 0),
real_datetime(2023, 9, 7, 12, 0),
real_datetime(2023, 9, 7, 18, 0),
real_datetime(2023, 9, 8, 0, 0),
real_datetime(2023, 9, 8, 6, 0),
real_datetime(2023, 9, 8, 12, 0),
real_datetime(2023, 9, 8, 18, 0),
real_datetime(2023, 9, 9, 0, 0),
real_datetime(2023, 9, 9, 6, 0),
real_datetime(2023, 9, 9, 12, 0),
real_datetime(2023, 9, 9, 18, 0),
real_datetime(2023, 9, 10, 0, 0),
real_datetime(2023, 9, 10, 6, 0),
real_datetime(2023, 9, 10, 12, 0),
real_datetime(2023, 9, 10, 18, 0)],
mask=False,
fill_value='?',
dtype=object)
基于 scale_factor 和 add_offset 参数,正确解析数据值
还记得我们在 1.1 章节里提到的 scale_factor
和 add_offset
吗?
这是一种压缩参数,使用这两个参数对原始数据值进行一个公式计算,就可以把它转为更节省空间的整型数据类型进行存储以节约存储空间(打包),而在读取数据的时候,再使用这两个参数反向计算一下获取真实的原始数据(解包)。
ds.variables['u10']
<class 'netCDF4._netCDF4.Variable'>
int16 u10(time, latitude, longitude)
long_name: 10 metre U wind component
units: m s**-1
add_offset: -2.1228631327102883
scale_factor: 0.000929512490889013
_FillValue: -32767
missing_value: -32767
unlimited dimensions: time
current shape = (80, 721, 1440)
filling off
我们在 u10 要素的属性中,可以看到 add_offset
和 scale_factor
两个参数。
在最新版的 netcdf4 包以及依赖于 netcdf4 包的 xarray 中,通常是不需要考虑 scale_factor
和 add_offset
的影响的,因为 netcdf4 包已经可以自动解包数据了。例如我们在 1.2 章节里,无论使用 xarray 还是 netcdf4,读取出来的 u10 的值看起来都是在比较合理的范围内。我们根本感觉不到解包的过程。
但是如果我们使用 h5py 等其他并不自动解包的库来处理数据时,就需要手动解包:
import h5py
ds = h5py.File(ERA5_FP)
ds['u10'][:]
array([[[ 8845, 8845, 8845, ..., 8845, 8845, 8845],
[ 5674, 5659, 5643, ..., 5720, 5705, 5690],
[ 5395, 5365, 5334, ..., 5488, 5457, 5427],
...,
[ 3646, 3640, 3632, ..., 3667, 3660, 3653],
[ 3538, 3535, 3532, ..., 3548, 3545, 3541],
[ 5834, 5834, 5834, ..., 5834, 5834, 5834]],
[[ 7141, 7141, 7141, ..., 7141, 7141, 7141],
[ 4706, 4693, 4680, ..., 4745, 4733, 4719],
[ 5083, 5056, 5030, ..., 5161, 5135, 5109],
...,
[ 3221, 3219, 3216, ..., 3230, 3227, 3224],
[ 3818, 3816, 3814, ..., 3822, 3820, 3819],
[ 5804, 5804, 5804, ..., 5804, 5804, 5804]],
[[ 6484, 6484, 6484, ..., 6484, 6484, 6484],
[ 4362, 4351, 4340, ..., 4397, 4385, 4374],
[ 4451, 4428, 4405, ..., 4520, 4497, 4473],
...,
[ 3728, 3717, 3705, ..., 3760, 3750, 3738],
[ 4188, 4182, 4177, ..., 4205, 4198, 4193],
[ 7554, 7554, 7554, ..., 7554, 7554, 7554]],
...,
[[-1645, -1645, -1645, ..., -1645, -1645, -1645],
[ -116, -110, -102, ..., -136, -130, -123],
[ 355, 368, 382, ..., 315, 328, 342],
...,
[ 519, 521, 525, ..., 509, 512, 515],
[ 1849, 1850, 1852, ..., 1843, 1845, 1846],
[ 1993, 1993, 1993, ..., 1993, 1993, 1993]],
[[ -291, -291, -291, ..., -291, -291, -291],
[ 1943, 1952, 1963, ..., 1913, 1924, 1933],
[ 2330, 2348, 2369, ..., 2270, 2290, 2310],
...,
[ 2146, 2157, 2168, ..., 2112, 2123, 2135],
[ 2075, 2079, 2085, ..., 2057, 2063, 2069],
[ 2697, 2697, 2697, ..., 2697, 2697, 2697]],
[[ -968, -968, -968, ..., -968, -968, -968],
[ 1554, 1567, 1579, ..., 1515, 1527, 1541],
[ 1593, 1620, 1644, ..., 1517, 1542, 1568],
...,
[ 5673, 5670, 5666, ..., 5686, 5682, 5678],
[ 5376, 5373, 5371, ..., 5380, 5378, 5377],
[ 4409, 4409, 4409, ..., 4409, 4409, 4409]]], dtype=int16)
使用 h5py 读取 u10 时得到的是原始值,而我们需要从它的属性中读取解包参数然后使用相应的解包公式来完成解包:
add_offset = ds['u10'].attrs.get('add_offset', None)
scale_factor = ds['u10'].attrs.get('scale_factor', None)
real_u10 = ds['u10'][:] * scale_factor + add_offset # 解包公式
real_u10
array([[[ 6.09867485e+00, 6.09867485e+00, 6.09867485e+00, ...,
6.09867485e+00, 6.09867485e+00, 6.09867485e+00],
[ 3.15119074e+00, 3.13724805e+00, 3.12237585e+00, ...,
3.19394832e+00, 3.18000563e+00, 3.16606294e+00],
[ 2.89185676e+00, 2.86397138e+00, 2.83515649e+00, ...,
2.97830142e+00, 2.94948653e+00, 2.92160116e+00],
...,
[ 1.26613941e+00, 1.26056233e+00, 1.25312623e+00, ...,
1.28565917e+00, 1.27915258e+00, 1.27264600e+00],
[ 1.16575206e+00, 1.16296352e+00, 1.16017499e+00, ...,
1.17504718e+00, 1.17225865e+00, 1.16854060e+00],
[ 3.29991274e+00, 3.29991274e+00, 3.29991274e+00, ...,
3.29991274e+00, 3.29991274e+00, 3.29991274e+00]],
[[ 4.51478556e+00, 4.51478556e+00, 4.51478556e+00, ...,
4.51478556e+00, 4.51478556e+00, 4.51478556e+00],
[ 2.25142265e+00, 2.23933899e+00, 2.22725532e+00, ...,
2.28767364e+00, 2.27651949e+00, 2.26350631e+00],
[ 2.60184886e+00, 2.57675202e+00, 2.55258470e+00, ...,
2.67435083e+00, 2.65018351e+00, 2.62601618e+00],
...,
[ 8.71096600e-01, 8.69237575e-01, 8.66449038e-01, ...,
8.79462213e-01, 8.76673675e-01, 8.73885138e-01],
[ 1.42601556e+00, 1.42415653e+00, 1.42229751e+00, ...,
1.42973361e+00, 1.42787458e+00, 1.42694507e+00],
[ 3.27202736e+00, 3.27202736e+00, 3.27202736e+00, ...,
3.27202736e+00, 3.27202736e+00, 3.27202736e+00]],
[[ 3.90409586e+00, 3.90409586e+00, 3.90409586e+00, ...,
3.90409586e+00, 3.90409586e+00, 3.90409586e+00],
[ 1.93167035e+00, 1.92144572e+00, 1.91122108e+00, ...,
1.96420329e+00, 1.95304914e+00, 1.94282450e+00],
[ 2.01439696e+00, 1.99301818e+00, 1.97163939e+00, ...,
2.07853333e+00, 2.05715454e+00, 2.03484624e+00],
...,
[ 1.34235943e+00, 1.33213480e+00, 1.32098065e+00, ...,
1.37210383e+00, 1.36280871e+00, 1.35165456e+00],
[ 1.76993518e+00, 1.76435810e+00, 1.75971054e+00, ...,
1.78573689e+00, 1.77923030e+00, 1.77458274e+00],
[ 4.89867422e+00, 4.89867422e+00, 4.89867422e+00, ...,
4.89867422e+00, 4.89867422e+00, 4.89867422e+00]],
...,
[[-3.65191118e+00, -3.65191118e+00, -3.65191118e+00, ...,
-3.65191118e+00, -3.65191118e+00, -3.65191118e+00],
[-2.23068658e+00, -2.22510951e+00, -2.21767341e+00, ...,
-2.24927683e+00, -2.24369976e+00, -2.23719317e+00],
[-1.79288620e+00, -1.78080254e+00, -1.76778936e+00, ...,
-1.83006670e+00, -1.81798304e+00, -1.80496986e+00],
...,
[-1.64044615e+00, -1.63858712e+00, -1.63486907e+00, ...,
-1.64974127e+00, -1.64695274e+00, -1.64416420e+00],
[-4.04194537e-01, -4.03265025e-01, -4.01406000e-01, ...,
-4.09771612e-01, -4.07912587e-01, -4.06983075e-01],
[-2.70344738e-01, -2.70344738e-01, -2.70344738e-01, ...,
-2.70344738e-01, -2.70344738e-01, -2.70344738e-01]],
[[-2.39335127e+00, -2.39335127e+00, -2.39335127e+00, ...,
-2.39335127e+00, -2.39335127e+00, -2.39335127e+00],
[-3.16820363e-01, -3.08454750e-01, -2.98230113e-01, ...,
-3.44705738e-01, -3.34481100e-01, -3.26115488e-01],
[ 4.29009711e-02, 5.96321959e-02, 7.91519582e-02, ...,
-1.28697784e-02, 5.72047143e-03, 2.43107212e-02],
...,
[-1.28129327e-01, -1.17904690e-01, -1.07680052e-01, ...,
-1.59732752e-01, -1.49508115e-01, -1.38353965e-01],
[-1.94124714e-01, -1.90406664e-01, -1.84829589e-01, ...,
-2.10855939e-01, -2.05278864e-01, -1.99701789e-01],
[ 3.84032055e-01, 3.84032055e-01, 3.84032055e-01, ...,
3.84032055e-01, 3.84032055e-01, 3.84032055e-01]],
[[-3.02263122e+00, -3.02263122e+00, -3.02263122e+00, ...,
-3.02263122e+00, -3.02263122e+00, -3.02263122e+00],
[-6.78400722e-01, -6.66317059e-01, -6.55162910e-01, ...,
-7.14651709e-01, -7.03497559e-01, -6.90484384e-01],
[-6.42149735e-01, -6.17052897e-01, -5.94744598e-01, ...,
-7.12792684e-01, -6.89554872e-01, -6.65387547e-01],
...,
[ 3.15026123e+00, 3.14747269e+00, 3.14375464e+00, ...,
3.16234489e+00, 3.15862684e+00, 3.15490879e+00],
[ 2.87419602e+00, 2.87140748e+00, 2.86954846e+00, ...,
2.87791407e+00, 2.87605504e+00, 2.87512553e+00],
[ 1.97535744e+00, 1.97535744e+00, 1.97535744e+00, ...,
1.97535744e+00, 1.97535744e+00, 1.97535744e+00]]])
1.4 简单输出:用 netcdf4-python、xarray 等常⻅工具
知道了如何读取 nc 文件,也需要知道如何输出和保存 nc 文件,我们还是以 netcdf4-python 和 xarray 两个库为例,来看一看如何保存一个简单结构的 nc 文件。
首先还是以 netcdf4 为例,我们从前面的文件里,把 u10、v10 数据计算为10米风速 wspd ,然后保存为 nc 文件。简单起见,这里我们省去对时间维度的处理。
ds = nc.Dataset(ERA5_FP)
u10 = ds.variables['u10'][0] # 取第 0 个时间点,从而忽略时间维
v10 = ds.variables['v10'][0] # 同上
wspd = (u10 ** 2 + v10 ** 2 ) ** 0.5 # 勾股定理计算风速值
wspd
masked_array(
data=[[7.45549172, 7.45549172, 7.45549172, ..., 7.45549172, 7.45549172,
7.45549172],
[7.58922772, 7.59065141, 7.59092519, ..., 7.58714413, 7.58766872,
7.58822541],
[7.33959868, 7.34241225, 7.34581297, ..., 7.33310267, 7.33512501,
7.33766044],
...,
[3.81937921, 3.81753401, 3.81508504, ..., 3.82589441, 3.82371284,
3.8215411 ],
[3.65374161, 3.65285287, 3.65196603, ..., 3.6567179 , 3.65582279,
3.65463227],
[3.9934181 , 3.9934181 , 3.9934181 , ..., 3.9934181 , 3.9934181 ,
3.9934181 ]],
mask=False,
fill_value=1e+20)
wspd.shape # 查看风速矩阵的形状
(721, 1440)
使用 netcdf4 创建和保存 nc 文件的基本步骤
with nc.Dataset('./output.nc', 'w') as dsout: # 以“写”模式创建 Dataset 实例
dsout.createDimension('lat', wspd.shape[0]) # 创建纬度维
dsout.createDimension('lon', wspd.shape[1]) # 创建经度维
dsout.createVariable('longitude', float, ('lon',))
dsout.createVariable('latitude', float, ('lat',))
dsout.createVariable('wspd', float, ('lat', 'lon'))
dsout.variables['longitude'][:] = ds.variables['longitude'][:]
dsout.variables['latitude'][:] = ds.variables['latitude'][:]
dsout.variables['wspd'][:] = wspd
看一下生成的数据
with nc.Dataset('./output.nc') as new_ds:
print(new_ds)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): lat(721), lon(1440)
variables(dimensions): float64 longitude(lon), float64 latitude(lat), float64 wspd(lat, lon)
groups:
下面用 xarray 来实现上面的步骤
ds = xr.open_dataset(ERA5_FP)
dsout = xr.DataArray(wspd,
name='wspd',
dims=['lat', 'lon'],
coords={'latitude': ('lat', ds['latitude'].values),
'longitude': ('lon', ds['longitude'].values)})
dsout.to_netcdf('./output_by_xarray.nc')
dsout
可以看出,实现相同功能的情况下,使用 xarray 比使用 netcdf4 的代码量要少得多。
✍️ 练习:创建纬度和经度范围的 NetCDF 文件并填充随机数据
使用随机数组生成一个简单结构的 netcdf 格式文件,要素的名叫 myvar
,数组尺寸为 (181, 360),纬度为90~-90,长度181间隔为-1°,经度为-179~180,长度360间隔为1°。
# import netCDF4 as nc
# import xarray as xr
# import numpy as np
# filename = "simple_netcdf.nc"
# with nc.Dataset(filename, "w", format="NETCDF4") as dataset:
# lat_dim = dataset.createDimension('latitude', 181)
# lon_dim = dataset.createDimension('longitude', 360)
# latitude = dataset.createVariable('latitude', np.float32, ('latitude',))
# longitude = dataset.createVariable('longitude', np.float32, ('longitude',))
# latitude[:] = np.arange(90, -91, -1)
# longitude[:] = np.arange(-179, 181)
# myvar = dataset.createVariable('myvar', np.float32, ('latitude', 'longitude'))
# myvar[:, :] = np.random.rand(181, 360)
# ds = xr.open_dataset('simple_netcdf.nc')
# ds
1.5 修改并覆写:针对简单结构的 netcdf 文件
如果我们有一个已经存在的 nc 文件,想要在不创建新的文件的情况下直接原地修改文件,要怎么做呢?这个时候,xarray 就排不上用场了,只能用 netcdf4 来实现。我们就拿刚才创建的 output.nc
文件为例,来看一下如果我们想把所有风速小于 5 的值都改为 0。
ds = nc.Dataset('./output.nc', 'r+') # 这里使用 r+ 模式打开,即覆写模式
wspd = ds.variables['wspd'][:]
wspd
masked_array(
data=[[7.45549172, 7.45549172, 7.45549172, ..., 7.45549172, 7.45549172,
7.45549172],
[7.58922772, 7.59065141, 7.59092519, ..., 7.58714413, 7.58766872,
7.58822541],
[7.33959868, 7.34241225, 7.34581297, ..., 7.33310267, 7.33512501,
7.33766044],
...,
[3.81937921, 3.81753401, 3.81508504, ..., 3.82589441, 3.82371284,
3.8215411 ],
[3.65374161, 3.65285287, 3.65196603, ..., 3.6567179 , 3.65582279,
3.65463227],
[3.9934181 , 3.9934181 , 3.9934181 , ..., 3.9934181 , 3.9934181 ,
3.9934181 ]],
mask=False,
fill_value=1e+20)
我们使用 numpy 的数值筛选功能,把所有小于 5 的元素全部改为 0:
wspd[wspd<5] = 0
wspd
masked_array(
data=[[7.45549172, 7.45549172, 7.45549172, ..., 7.45549172, 7.45549172,
7.45549172],
[7.58922772, 7.59065141, 7.59092519, ..., 7.58714413, 7.58766872,
7.58822541],
[7.33959868, 7.34241225, 7.34581297, ..., 7.33310267, 7.33512501,
7.33766044],
...,
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ]],
mask=False,
fill_value=1e+20)
最后把修改后的值赋值给原数组,并关闭对象:
ds.variables['wspd'][:] = wspd # 赋值给 dataset 对象
ds.close() # 关闭对象
完成覆写修改以后,我们重新打开文件看看:
ds = nc.Dataset('./output.nc')
wspd = ds.variables['wspd'][:]
wspd
masked_array(
data=[[7.45549172, 7.45549172, 7.45549172, ..., 7.45549172, 7.45549172,
7.45549172],
[7.58922772, 7.59065141, 7.59092519, ..., 7.58714413, 7.58766872,
7.58822541],
[7.33959868, 7.34241225, 7.34581297, ..., 7.33310267, 7.33512501,
7.33766044],
...,
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ]],
mask=False,
fill_value=1e+20)
可以看到文件已经被修改完成了。
在工程实践中,以覆写方式修改已存在的文件可以提高处理大数据集的性能,减少代码的复杂程度。
通过上面的讲解,我们已经初步了解了如何通过各种工具对 nc 文件进行查看、读写,同时也了解了 nc 文件的时间信息存储的特点及正确的解析方式,此外还对 nc 数据中的打包和解包有了初步的认识。下面我们准备了几道闯关题,来测测你对知识的吸收程度如何吧。
闯关题
STEP1:请根据要求完成题目
Q1:读取 ERA5 样例数据,获取特定时间和坐标点的值
读取样例文件 /home/mw/input/training_camp9055/era5-temp.nc
文件,从中取出 2023 年 8 月 26 日 10:00 时刻的数据,并获取坐标点为 lon=110, lat=40 处的值,保留 1 位小数后赋值给 a1。
### ...your code here... # a1 =
Q2:修改一个存储气温的 nc 文件,将开氏温度改为摄氏度
将 /home/mw/input/training_camp9055/era5-temp-float.nc
拷贝到工作目录,并原地修改拷贝过来的 era5-temp-float.nc
,将原本的开氏温度转为摄氏度,然后计算转换后气温整个数据矩阵的平均值,保留 1 位小数后赋值给 a2。
In [38]:
### ...your code here... # a2 =
STEP2:将结果保存为 csv 文件
csv 需要有两列,列名:id、answer。其中,id 列为题号,如 q1、q2;answer 列为 STEP1 中各题你计算出来的结果。💡 这一步的代码你无需修改,直接运行即可。
# 生成 csv 作业答案文件
def save_csv(answers):
import pandas as pd
df = pd.DataFrame({"id": ["q1", "q2"], "answer": answers})
df.to_csv("answer_MeteoEng_1_1.csv", index=None)
save_csv([a1,a2]) # 该csv文件在左侧文件树project工作区下,你可以自行右击下载或者读取查看