Shoumik Palkar

I am currently a software engineer at Databricks. I work on Photon, a new native vectorized execution engine for Spark SQL.

Before joining Databricks, I was a Ph.D. student in the Computer Science department at Stanford University, advised by Prof. Matei Zaharia. My dissertation work focused on designing new interfaces for efficient software library composition on modern hardware. The full dissertation is available here.


Publications

Photon: A Fast Query Engine for Lakehouse Systems
Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman van Hovell, Maryann Xue, Reynold Xin, Matei Zaharia
In SIGMOD 2022 (Best Industry Paper Award).

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads
Gina Yuan, Shoumik Palkar, Deepak Narayanan, Matei Zaharia
In USENIX ATC 2020.

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference
Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, and Matei Zaharia.
In MLSys 2020.

Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations.
Shoumik Palkar and Matei Zaharia.
In SOSP 2019.

Exploring the Use of Learning Algorithms for Efficient Performance Profiling.
Shoumik Palkar, Sahaana Suri, Peter Bailis, and Matei Zaharia.
In NeurIPS 2018 Workshop on Machine Learning for Systems.

Filter Before You Parse: Faster Analytics on Raw Data with Sparser.
Shoumik Palkar, Firas Abuzaid, Peter Bailis, and Matei Zaharia.
In PVLDB 2018.

Evaluating End-to-End Optimization for Data Analytics Applications in Weld.
Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Parimarjan Negi, Rahul Palamuttam, Anil Shanbhag, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Samuel Madden, and Matei Zaharia.
In PVLDB 2018.

DIY Hosting for Online Privacy.
Shoumik Palkar and Matei Zaharia.
In HotNets 2017.

Weld: A Common Runtime for Data Analytics.
Shoumik Palkar, James Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia.
In CIDR 2017.

E2: A Framework for NFV Applications.
Shoumik Palkar, Chang Lan, Sangjin Han, Keon Jang, Aurojit Panda, Sylvia Ratnasamy, Luigi Rizzo, and Scott Shenker.
In SOSP 2015.

Tech Reports

Interfaces for Efficient Software Composition on Modern Hardware
Shoumik Palkar
Ph.D. Dissertation.

Weld: Rethinking the Interface Between Data-Intensive Applications.
Shoumik Palkar, James Thomas, Deepak Narayanan, Anil Shanbhag, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Samuel Madden, and Matei Zaharia.
Arxiv Preprint 1709.06416.

SoftNIC: A Software NIC to Augment Hardware.
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy
UC Berkeley Technical Report No. UCB/EECS-2015-155


Academic Service


Talks

Interfaces for Efficient Software Composition on Modern Hardware
Ph.D. Defense, April 2020, Stanford, CA.

Weld: An Optimizing Runtime for High Performance Data Analytics
at Scale By the Bay 2019, November 2019, Oakland, CA.

Rust for Weld: A High Performance Parallel JIT Compiler
at RustConf 2019, August 2019, Portland, OR.

Sparser: Fast Analytics over Raw Data by Avoiding Parsing
at Spark+AI Summit, June 2018, San Francisco, CA.

Weld: Accelerating Data Science by 100x
at DataEngConf, April 2018, San Francisco, CA.

DIY Hosting for Online Privacy
at the Stanford NetSeminar, January 2018, Stanford, CA.

DIY Hosting for Online Privacy
at HotNets 2017, November 2017, Palo Alto, CA.

Generating Fast Data Planes for Data-Intensive Systems
at the 17th International Workshop on High Performance Transaction Systems (HPTS), October 2017, Asilomar, CA.

Weld: Accelerating Data Science by 100x
at Strata Data Conference, September 2017, New York, NY.

Weld: An Optimizing Runtime for High-Performance Data Analytics
at Strata + HadoopWorld, March 2017, San Jose, CA.

Weld: A Common Runtime for Data Analytics
at the Stanford Platform Lab Seminar, January 2017, Stanford, CA.


In the Past

From 2016-2020, I was a Ph.D. student in the Computer Science department at Stanford University, advised by Prof. Matei Zaharia. My dissertation work was on making interfaces for composing independently written software systems more efficient on modern hardware. You can find the slides for my Ph.D. defense talk here and a video recording here. The full dissertation is available here.

During the 2015-2016 academic year I was at MIT working in the PDOS Lab, and was supported by a Jacobs Presidential Fellowship. Before that, I received a B.S. in Electrical Engineering and Computer Science from UC Berkeley where I worked with Profs. Scott Shenker and Sylvia Ratnasamy in the NetSys Lab on E2, a scalable, high performance framework for NFV.


Selected Projects from Thesis Work

I have also worked on projects that aim to improve the performance of tracing profilers using learning methods, and in the past, systems related to network functions virtualization.


Other Things


Contact