How to determine if Spark is rewriting data
First open the SQL DAG for your write stage. Scroll up to the top of the job’s page and click on the Associated SQL Query:
You should now see the DAG. If not, scroll around a bit and you should see it:
If you’re doing a Delete or Update operation, look at the amount of data being written by the writer versus what you expect. If you’re seeing a lot more data being written than you expect, you’re probably rewriting data:
If you’re doing a merge, the merge node has explicit statistics about how much data it’s rewriting.