Stylistic analysis can de-anonymize code, even compiled code

doctorow · August 10, 2018, 7:03pm

Originally published at: https://boingboing.net/2018/08/10/greenstadt-and-caliskan.html

…

brad_quinn · August 10, 2018, 8:07pm

Any relation to Niko Matsakis? That guy’s on a whole other level when it comes to language design and implementation.

Boundegar · August 10, 2018, 8:10pm

Nope, not buying it. This analysis might be able to distinguish between two coders with high confidence, but there is no international sample-bank to compare against. There isn’t even an international fingerprint registry yet.

RickMycroft · August 10, 2018, 8:11pm

I hope this also has has worrying implications for the writers of ransomware and other nasties.

brad_quinn · August 10, 2018, 8:14pm

That’s why I use Google translate to convert all of my code to Visual Basic and then to Python and finally Fortran.

Gutierrez · August 10, 2018, 9:28pm

knappa · August 11, 2018, 1:09am

From their 2017 paper, but order-of-magnitude typical of the restricted space of programmers they consider:

Assuming a known set of suspect programmers, such as the employees of a company, and some form of segmentation and grouping by authorship, such as accounts on a version control system, we present a technique which performs stylistic authorship attribution of a collection of partial source code samples written by the same programmer with up to 99% accuracy for a set of 106 suspect programmers.

Also, they seem to focus on code written in code-jams. Personally, I write code entirely differently when I’m in a rush versus when I have time to really plan.

gatto · August 11, 2018, 2:51am

or, do you?

personally, this research makes a bit of sense to me.

i feel many people’s code have tell tale signals. on c++, the amount of template usage, references vs pointers, return vars vs out values, heavy inheritance vs composition, ifs switches flags bools.

it’d make sense if some of that leaked out into the binary. there really is very little one right way in programming. or, we’d just have the programs program the programs.

( except my way. my way is the right way. )

werdnagreb · August 11, 2018, 5:00am

I think it also depends heavily on the culture of the company you work for. My current and previous work places do heavy code reviews, and over time everyone’s code starts to resemble each other’s. I’ve seen it happen when newbies code in an unacceptable way at first, and slowly learn the proper style, patterns, naming conventions, tests, etc.

It’s definitely interesting research, but it seems to me that hacking things out overnight produces a vastly different style than production code.

doctorow · August 15, 2018, 7:03pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
The messy science that spotted J.K. Rowling's secret novel boing	5	2488	July 22, 2013
Code as a work of art wrath	36	973	January 27, 2020
Sweary source code comments a sign of competence boing	72	1483	July 12, 2023
Front-line programmers default to insecure practices unless they are instructed to do otherwise boing	59	1703	April 1, 2019
Wikileaks offers tech giants access to sourcecode for CIA Vault 7 exploits boing	24	2517	March 15, 2017

Stylistic analysis can de-anonymize code, even compiled code

Related topics