解决报错TypeError:unsupported operand type(s) for +: ‘NoneType‘ and ‘str‘

网友投稿 3188 2022-08-30

解决报错TypeError:unsupported operand type(s) for +: ‘NoneType‘ and ‘str‘

解决报错TypeError:unsupported operand type(s) for +: ‘NoneType‘ and ‘str‘

文章目录

​​一、问题描述​​​​二、解决方案​​​​Reference​​

一、问题描述

from pyspark.sql.types import StringType@udf(returnType = StringType())def bad_funify(s): return s + " is fun!"countries2 = spark.createDataFrame([("Thailand", 3), (None, 4)], ["country", "id"])countries2.withColumn("fun_country", bad_funify("country")).show()

用一个udf想让df(有country和id两个字段)生成新的一列​​fun_country​​​(内容是字符串,内容为​​country xx is fun​​​),但是df中有的​​country​​​字段内容没有数据(注意类型是​​None​​​而不是​​null​​),结果报错如下:

PythonException: An exception was thrown from the Python worker. Please see the stack trace below.Traceback (most recent call last): File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 619, in main process() File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 611, in process serializer.dump_stream(out_iter, outfile) File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 211, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 132, in dump_stream for obj in iterator: File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 200, in _batched for item in iterator: File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 452, in mapper result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs) File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 452, in result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs) File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 87, in return lambda *a: f(*a) File "/usr/lib/spark-current/python/lib/pyspark.zip/pyspark/util.py", line 74, in wrapper return f(*args, **kwargs) File "", line 5, in bad_funifyTypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

二、解决方案

这是个很蠢的问题。其实如果​​country​​​为空值时,​​fun_country​​​应该也是空的,所以就简单加多个判断的逻辑即可。修改udf为​​good_funity​​后:

@udf(returnType=StringType())def good_funify(s): return None if s == None else s + " is fun!"countries2.withColumn("fun_country", good_funify("country")).show()+--------+---+----------------+| country| id| fun_country|+--------+---+----------------+|Thailand| 3|Thailand is fun!|| null| 4| null|+--------+---+----------------+

Reference

[1] ​​Navigating None and null in PySpark​​

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:centos 7 安装mySql
下一篇:centos7.6搭建Harbor私有仓库
相关文章

 发表评论

暂时没有评论,来抢沙发吧~