Spark
Spark and HugginFace: Named Entity Recognition for aircraft ownership
How to use one of the best AI/ML libraries (hugginface) with Spark? Let's try it
Spark
How to use one of the best AI/ML libraries (hugginface) with Spark? Let's try it
Spark
The pyspark library communicates to a spark driver. If the driver dies, we'll obtain some odd error messages that will not directly indicate the root cause.
Spark
Just a quick post on an error you might find when using apache spark and Avro due to a version mismatch. Software Version Spark 3.3.2, 3.3 spark-avro 2.12:3.4.0 I've you have an older version of spark (latest as of now is
Spark
It's sometimes difficult to access s3 files in Apache Spark if you don't use a prebuilt environment like Zepelin, Glue Notebooks, HUE, Databriks Notebooks or other alternatives. And googling around might get you half working solutions. But do not worry, I'll show you how
Spark
Using spark in an interactive way it's a bit cumbersome sometimes if you don't want to go to the good old terminal and you decide something like a jupyter notebook better suits you. Or you are doing a more complex analysis also. If that is the
Spark
In physics or biology you sometimes simulate processes in a 2 dimensional lattice, or discrete space. In those cases you usually compute some local interactions of "cells", and with that, calculate a result. An example of this could be the Ising model which was proposed in 1920 for
Spark
After reading this, you will be able to execute python files and jupyter notebooks that execute Apache Spark code in your local environment. This tutorial applies to OS X and Linux systems. We assume you already have knowledge on python and a console environment. 1. Download Apache Spark We will
Spark
Spark has a way to compute the histogram, however, it is kept under low level and sometimes obscure classes. In this post I'll give you a function that provides you with the desired values passing a dataframe. In the official documentation the only mention [https://spark.apache.org/
Spark
This is one of the things it makes sense when you stop to think about it. When you are performing the same aggregation over a window in more than one column, it is recommended to execute only once that windowing. Lets dive in with an example: I am working with
Spark
Nginx is a common webserver to be used as reverse proxy for things like adding TLS, basic authentication and forwarding the requests to other internal servers on your network. In this case, we are going to serve the sparkUI adding security (https) and authentication, and serving it in a different
Spark
When working with change data capture data, containing sometimes just the updated cells for a given PK, it is not easy to efficiently obtain the latest value for the entry (row). Let's dive into an example. First though, we need a couple of definitions about change data capture
Spark
Imagine we have a table with a sort of primary key where information is added or updated partially: not all the columns for a key are updated each time, but we now want to have a consolidated view of the information, with just one value of the key containing the