"This limits the value of testing, because if you had the foresight to write a test for a particular case, then you probably had the foresight to make the code handle that case too. This makes conventional testing great for catching regressions, but really terrible at catching all the “unknown unknowns” that life, the universe, and your endlessly creative users will throw at you."
Helping me create UTs is my main usage of LLMs during coding (in addition to creating ORM models..)
I feel that it's indeed the best strategic usage of it's ability to think differently than you, and catch things you missed - with minimal risk for the company.
Thanks Anton! I agree - I find that LLMs really useful for giving me ideas and options that I can then evaluate rather than strictly generating code (at least when I'm not writing boilerplate).
Using the existing code instead of actual reasoning (be it human or automated) seems like a bad idea. Imagine my code has an unknown bug. This LLM would potentially write a test case that asserts that this bug stays in the code - even if someone or something discovers it later and tries to fix it - the generated unit test would fail and detecting that it's actually a wrongly generated unit test seems much more difficult than simply using human reasoning to write actually correct test cases. Does the paper mention anything about this risk?
You're correct, which is why each test that makes it through all the filters requires a manual check-off by humans. A good way to think about this LLM is that it's a junior dev with the task of creating more comprehensive tests for existing code. Other devs have more important things to work on, so this LLM gets the fun task of improving unit tests.
The tests that it creates in its pull requests are often good, and sometimes trivial or pointless. Occasionally, a test it produces is really good or uncovers a bug inadvertently. Regardless, this work wouldn't have been done by humans anyways due to priorities. All of its pull requests require a human reviewer before pushed into the codebase.
Unit tests shouldn't necessarily become "change detectors", which is what would happen if the LLM generically wrote tests to cover 100% of the code base. It would solidify any bugs in the codebase, like you said.
I loved this quote!
"This limits the value of testing, because if you had the foresight to write a test for a particular case, then you probably had the foresight to make the code handle that case too. This makes conventional testing great for catching regressions, but really terrible at catching all the “unknown unknowns” that life, the universe, and your endlessly creative users will throw at you."
Helping me create UTs is my main usage of LLMs during coding (in addition to creating ORM models..)
I feel that it's indeed the best strategic usage of it's ability to think differently than you, and catch things you missed - with minimal risk for the company.
Thanks Anton! I agree - I find that LLMs really useful for giving me ideas and options that I can then evaluate rather than strictly generating code (at least when I'm not writing boilerplate).
Using the existing code instead of actual reasoning (be it human or automated) seems like a bad idea. Imagine my code has an unknown bug. This LLM would potentially write a test case that asserts that this bug stays in the code - even if someone or something discovers it later and tries to fix it - the generated unit test would fail and detecting that it's actually a wrongly generated unit test seems much more difficult than simply using human reasoning to write actually correct test cases. Does the paper mention anything about this risk?
You're correct, which is why each test that makes it through all the filters requires a manual check-off by humans. A good way to think about this LLM is that it's a junior dev with the task of creating more comprehensive tests for existing code. Other devs have more important things to work on, so this LLM gets the fun task of improving unit tests.
The tests that it creates in its pull requests are often good, and sometimes trivial or pointless. Occasionally, a test it produces is really good or uncovers a bug inadvertently. Regardless, this work wouldn't have been done by humans anyways due to priorities. All of its pull requests require a human reviewer before pushed into the codebase.
Unit tests shouldn't necessarily become "change detectors", which is what would happen if the LLM generically wrote tests to cover 100% of the code base. It would solidify any bugs in the codebase, like you said.