Troubleshoot: NoClassDefFoundError when reading with Spark Avro

Just a quick post on an error you  might find when using apache spark and Avro due to a version mismatch.

Software Version
Spark 3.3.2, 3.3
spark-avro 2.12:3.4.0

I've you have an older version of spark (latest as of now is 3.4.0) and have follow the current documentation on how to work with avro files, you might find yourself with this error due to spark-avro package trying to use a newer class FileSourceOptions not available in older versions.

The solution is to use the correct spark-avro version for your spark installation. In our case would be org.apache.spark:spark-avro_2.12:3.3.2

Error Stacktrace for pyspark

ℹ️
The important part of the stacktrace is the following:
1. Py4JJavaError: An error occurred while calling o257.load pyspark to scala spark link. We know now that the error is in the java/scala code
2. java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/FileSourceOptions Class not found
3. org.apache.spark.sql.avro.AvroUtils$.inferSchema(AvroUtils.scala:51) We now know it's due to the Avro library

Usually when something is not found, it indicates missmatching versions of libraries. 

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 dataset = spark.read.format("avro").load(f"data_avro")

File ~/venv/lib/python3.9/site-packages/pyspark/sql/readwriter.py:177, in DataFrameReader.load(self, path, format, schema, **options)
    175 self.options(**options)
    176 if isinstance(path, str):
--> 177     return self._df(self._jreader.load(path))
    178 elif path is not None:
    179     if type(path) != list:

File ~/venv/lib/python3.9/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File ~/venv/lib/python3.9/site-packages/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
    188 def deco(*a: Any, **kw: Any) -> Any:
    189     try:
--> 190         return f(*a, **kw)
    191     except Py4JJavaError as e:
    192         converted = convert_exception(e.java_exception)

File ~/venv/lib/python3.9/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o257.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/FileSourceOptions
	at org.apache.spark.sql.avro.AvroUtils$.inferSchema(AvroUtils.scala:51)
	at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:56)
	at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$11(DataSource.scala:210)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:207)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:411)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)