Craftify

It Compiles, Therefore It Works

The experiment with “vibecoding” can be considered complete. The goal of the experiment was to evaluate how viable this development method is, how effective it can be, and what its limitations are.

As a test task, I chose to develop an iOS app. First, because I haven’t been interested in iOS development for the past 10 years. Second, because I actually needed such an app. Its core idea is to automate routine operations using an LLM. In my case, that means fixing my clumsy texts in English and Ukrainian, translating them into English and Bulgarian, and summarizing web pages. I had already built something similar for Raycast in one evening. So the real challenge was iOS development. And that part was supposed to be completely handled by the LLM, with me offering only minimal assistance.

Here’s the bottom line: I managed to create the app in 60 hours. It works and does its job. It has 6 screens, a custom UI, and around 6,500 lines of Swift code (cloc). It has about a dozen unresolved issues, but nothing critical. It’s uploaded to TestFlight, so I can share it from there. It probably won’t pass App Store review, though, due to guideline non-compliance :)

Conclusions:

There’s no way I could have built such an app solo in 60 hours. I’d have had to learn way too much.
I lost about 15 hours battling with xcodegen, wrapping tooling around it, integrating swiftlint, swiftformat, etc. If I had gone with Tuist from the beginning, I probably wouldn’t have wasted hours on trivial stuff like “the app isn’t showing in the share sheet because of a wrong YAML nesting” or “the app icon doesn’t show up due to a typo in its name.”
On the other hand, if I hadn’t tried to set up an environment where everything runs via CLI, I’d still be poking around Xcode buttons following tutorials.
This highlights the importance of building the fastest possible change/test cycle for the LLM — via CLI or maybe MCP. The faster it is, and the better the error messages, the faster the development goes.
Having some basic conceptual knowledge would have significantly sped things up, helping avoid mistakes and duct-tape fixes.
The model has no sense of generalizing from existing code. An experienced developer often intuitively senses where to lay down abstractions to account for future changes (the so-called “technical buffer for future hacks”). I had a standing instruction for the LLM to suggest two ways to improve the code at the end of each task — and not once did it come up with anything meaningful at the architectural level. I had to specifically request architecture reviews to get actually useful suggestions.
You constantly need to refine your instructions, adding common mistakes, new requirements, and so on. And even that doesn’t help during long sessions — the model starts “forgetting” earlier context. In my case, it kept trying to import code from the Common module, even though it was already visible without imports.
The project has to be continuously kept in a valid state with regard to the linter and tests.
Sometimes it’s easier to start a task from scratch than to figure out what’s wrong with the current solution.
Image support in Cursor is a killer feature for explaining problems. You can even solve UI alignment and layout issues this way. Take a screenshot, highlight the problem area, upload it to the IDE — and if the LLM doesn’t get confused about what’s causing the spacing issue, it might even fix it ;)
Next time I dive into development from scratch in an unfamiliar infrastructure, I’ll:
- carefully choose tooling that offers maximum validation and clear error messages,
- prepare a highly detailed spec that is constantly updated (maybe ditch architecture.md and implementation.md from the memory bank),
- set up a short and model-friendly change/test loop with clear error feedback.

Still, the app is written and:

works with the specified OpenAI key,
translates, simplifies, summarizes, and fixes errors,
and yes, it can even translate into Klingon.