Have you heard about libido driven development, where development is not fueled by making profit revenue, but pure primal desire.
This is my journey, making a side project for fun, that ends up, becoming more serious than my initial plan.
It was few years ago, 2022? 2023? I've just stumbled upon a new technology, an AI powered waifu summoner, called automatic1111 (will from now be called a1111).
I tried running it in my potato laptop, it took a while like a minute just to make a picture, but the power is real, just that it probably chipping on my GPU lifespan.
So, I was wondering, if there's a better way, to play around with this new power, and that's why, I make a simple AMI in AWS to be run on EC2 spot instance, bake stable diffusion inside, add some script to watch new file in the output directory, and dumps them into S3, and as a safety measure, I added command to stop the instance, if 60min has passed since the last time it syncing file to S3.
That enables me to run a spawn.sh -> wait -> generate -> enjoy -> generate -> enjoy, loop, finish by running terminate script.
After a few fiddling around, I found out, that turns out the prompt that's used to generate an image, is stored in the PNG metadata, so, while storing the images under a directory-by-date make sense, it become harders to navigate.
So I need to decide, which database should I use to store this, so i can just query the prompt to get the matching image.
Some option that comes to mind is:
- dynamodb (hard to maintain in this flow, expensive if i just scan the whole db)
- sqlite (simple, just save it on s3, read as necessary, rewrite after update, but has synchronization problem)
- bigquery (fucking cheap, simple mysql, relatively fast)
and with that in mind, I made another page that simply accept the filter (prompt/checkpoint/lora), search in bigquery, returns all images that match it.
It was fun for a while, until, it start become a chore, because it's not every time i have access to my laptop, and thanks to my unoptimized AMI, it took about 30 minute to boot, and I felt like I need to be able to spawn this from my mobile phone.
So, I moved my logic to spawn spot instance from a simple bash-script, to a proper AWS Lambda code, make a simple launcher page, and this is what I got.
And while I'm on it, I also noticed that, apparently it's much faster if you dont bake everything into the AMI, but just let it fetch what you need, when it spawn.
So yeah, just add more script on the AMI's systemctl where it'd read launch-config in dynamodb, get the requested checkpoint from S3, so, instead of the bloaty 100 GB AMI snapshot, now, I'm only charged for about 20 ~ 30 GB snapshot, much faster boot time, from 30 min, to 5 ~ 6 min, and cheaper snapshot bill = huge win.
And while I'm on it, I also used SQS, to command the spot instance, maybe to fetch new lora, or checkpoint, or grab one that's already in S3.
The launch config structure, is something simple like:
{ key: "launch-config", checkpoint: { bucket, key, size }, idle: { tolerance, action } }
Because i added feature to reading config from dynamodb anyway, I think it's wise to add customization for how long it tolerate idle time, and whether it should stop instance, or terminate instance.
Satisfied with the spawn and stop flow, the next thing I felt need optimization, is how to structure the prompt, there's already some option available like internal queue-system, and batching prompt from file.
But for my creative flow, it feels unsatisfying, so I put some effort, addressing how do i want to make it, while keeping it to simpler to maintain, after a few iteration, I decided to making it like this:
So, basically, there's 2 important part in this image, the template, and the variables.
{ title, defaultConfig: { seed, sampler, ... }, templates: [{ prompt, seed, sampler, ... }], variables: { ... } }
For a while, I was satisfied with the small blue botton on the bottom-left-corner, which allows me to copy the built-prompts, and paste it into a1111 batch feature.
But, as time goes on, and I have free time, I decided to use my queue-engine, make it more generic, so it's not just fetching things, but also generating things, and migrate it from SQS long polling, into a simple polling to dynamodb.
So, I made a new table with this structure
And my consumer, that's running inside the spot instance, will query something like:
QUERY GSI WHERE (status = pending) and (createdAt < now) ORDER BY prioScore LIMIT 1
And then depending on the command it'll either generate 80++ images or fetch safetensors, and while it's all good now, it was not fun waiting until a1111 finished generating all of my requested images, like 300+ or so, before I can terminate the spot instance.
therefore, I consider some options, maybe idle detection, or shorter idle tolerance, but the best solution I came up with, is.... Just push the terminate command into the queue, and I can also make it of very high prio-score so that it'll just become the last action after all the requested works is done.
I can also change the createdAt into 1 hour from now, if i want to the spot instance running, despite there's no work to be done.
And lastly, I noticed a problem in my flow, is that, I might be able to reorder the priority, or cancel pending task, but once it's picked, and being run, I have no way to stop it.
So, I added some more logic in the code
And with that, now I'm able to view my progress, cancel, and reorder my priority even mid-progress.
Banzai.
No comments:
Post a Comment