(Probably) my most complex line of code ever written

One line of code I am going to present here is one of the most complex line of code that I might have ever written. Goal was to import StackOverflow’s questions and answers to MongoDB for further analysis. You can find whole dump of StackOverflow in XML format here. When you unpack it, it requires 8 lines of code to load it to MongoDB:

1
2
3
4
5
6
7
8
from pymongo.mongo_client import MongoClient
import xml.etree.ElementTree as etree
if __name__ == '__main__':
    db = MongoClient('localhost', 27017)['so']
    for event, elem in etree.iterparse('/home/kokan/Posts.xml', events=('end',)):
        if elem.tag != 'row': continue
        db.entries.insert(elem.attrib)
        elem.clear()

And this is literally whole program!

However, what you might notice is that all fields end up as strings in MongoDB. Somebody might not care and just live with this, but I have OCD, I just couldn’t let that happen. So, I started looking at all attributes in XML and figuring out their types. It turns out we have strings, integers, dates and even one list (it was attribute “Tags” which is in format “<html><css><css3><internet-explorer-7>”). My first reaction is to add code like this:

for key,value in elem.attrib.items():
    if key == 'Id':
        elem.attrib[key] = int(value)
    elif key == 'CreationDate':
        elem.attrib[key] = dateutil.parser.parse(v + 'Z')
    elif key == 'Body:
        pass # this is already string
    ...
    else:
        print('Unknown key %s with value %s' % (key, value))

You can see where this is going…So, I wanted to have a way to execute preprocessor logic applied to any given key to cast it from string to its real type. Another requirement was not to miss any key, e.g. I should have list of all used keys, so if any new key pops up, I can examine it and determine which type it is before rerunning script. Here is my end result – typed import in 23 lines of code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
INTEGER_KEYS = ('Id', 'ParentId', 'LastEditorUserId', 'OwnerUserId', 'PostTypeId', 'ViewCount', 'Score', 'AcceptedAnswerId', 'AnswerCount', 'CommentCount', 'FavoriteCount')
STRING_KEYS = ('Title', 'LastEditorDisplayName', 'Body', 'OwnerDisplayName')
DATE_KEYS = ('CommunityOwnedDate', 'LastActivityDate', 'LastEditDate', 'CreationDate', 'ClosedDate')
LIST_KEYS = ('Tags')
 
def warning_nonexistant_key(key, value):
    print('Unknown key %s with value %s' % (key, value))
    return value
 
PREPROCESSOR = {
    INTEGER_KEYS: lambda k,v: int(v),
    STRING_KEYS: lambda k,v: v,
    DATE_KEYS: lambda k,v: dateutil.parser.parse(v + 'Z'),
    LIST_KEYS: lambda k,v: v[1:-1].split('&gt;&lt;'),
    '': warning_nonexistant_key 
}
 
if __name__ == '__main__':
    db = MongoClient('localhost', 27017)['so']
    for event, elem in etree.iterparse('/home/kokan/Posts.xml', events=('end',)):
        if elem.tag != 'row': continue
        db.entries.insert(dict([key, PREPROCESSOR[next((key_type for key_type in PREPROCESSOR if key in key_type), '')](key, value)] for key,value in elem.attrib.items())
        elem.clear()

Brief explanation – I created dictionary PREPROCESSOR where keys are tuples of all keys in XML of a given type, and value is lambda function that knows how to cast values from string to its own type. Key line here is 22. What it does is – for each XML attribute, it tries to find that value in each tuple of each key in PREPROCESSOR and if it finds it, it executes proprocessor lambda. If it doesn’t find it, it executes default error message and returns unmodified value (as a string). There is so much in this line – list comprehension, dictionaries, tuples, lambdas and couple of awesome and cool built-in functions. If we are going to unwrap it, it would look something like this:

entry = {}
for key,value in elem.attrib.items():
    found_key_type = ''
    for key_types in PREPROCESSOR.keys():
        if key in key_types:
            found_key_type = key_type
    cast_function = PREPROCESSOR[found_key_type]
    entry[key] = cast_function(key, value)

Don’t get me wrong, I would never write lines of codes similar to that in any production code, nor I would encourage others to do that, but this was fun, this was one-time only script and I wanted to push my (and Python’s) limits doing this. And it turned out pretty cool, admit it:)

Whole source code, if interested, is here.

Posted in Uncategorized | 3 Comments

Programming languages playground

Preface to second edition

This text is around 4-5 years old.  My old blog vanished and this is one of the posts that were on it. I am putting it back as it got very good critic review. Hack, somebody even translated it to some language so obscure that even Google can’t translate (try it for yourself, I think it is Burmese). Although that old, much of the things still holds true and when I read it today, I think that not has changed much.

If programming languages were kids

It’s summer. Day is sunny and all the kids went out to play. They all gathered at playground enjoying the beautiful day and we’re now going to describe some of them.

First kid that catches eye is one tall boy, larger then all the other kids and obviously older then all of them. His name is C. He’s casually dressed and is always smiling cheerfully. All the other smaller kids are swirling around him and he obviously enjoys playing with them. He knows he is the coolest kid there and that he got respect from all of them, but he is not presumptuous about it. He’s fast, his moves are sharp and intelligent and he likes to help other kids, knowing that they are helpless without him. Look, he just helped that kid Python to climb that tree. Python could climb that tree himself, but it would take him forever, and he asked C for help. C, smiling as always, immediately picked him up and put him on branch. He really is like older brother for all of them.

Speaking of brothers, C really have one younger brother. His name is C++. Actually, they are stepbrothers. C’s fathers had thicker beards then C++’s father, and C++’s father have less hair. You could see these two kids look alike, but C++ is little smaller and younger. C++’s father thought he could give C a little brother that will be better at handling objects at playground, but that will still looks similar to C, and although he succeeded, other kids still prefer to play with C. He’s also a bit fat for a child of that age and little slower then his older brother, but granted – he’s better at handling various objects. Reason for it’s slowness is maybe because he carries with him a lot of equipment. He got with himself a little shovel, rake and little plastic bin, and man, he even have a Swiss knife. Other kids looks in awe at C++’s tools and possibilities with it, but they also heard from elders that those tools he’s carrying are just burden and that it takes a skilled person to use all those tools properly and wisely, and that you may even cut yourself if you try to use them without any training, so that’s why they mostly like to play with his older brother.

We already mentioned Python kid. He’s one of the kids that often asks C to help him. He is fast and agile, but alas, there are some times when only C can help. He’s one small kid with shaggy hair and he enjoys doing things both nice and quick. He never stops, he’s restless, his trousers are scraped on knees from constant running, jumping and falling. In one word, he’s very dynamic. Because of his dynamic nature, he often breaks things, but he can also very quickly put them together because he always carries duct tape with him. There is one more thing he always carries with him. If you ask him to do something, he’ll get to work immediately, no matter how hard that is and when the job is finished, he likes to take out couple of batteries from his pocket and to shout childishly ”batteries included!”.

Another kid similar to Python is Ruby. They are both very similar, but interestingly, they like to compete. If one of them does something, other kid will try to do the same, but quicker and nicer, showing that it’s better. They are even dressed similar, except Ruby likes to wear red.  When they have to do something in parallel and Python is faster then Ruby, he likes to say that Ruby’s red shirt is in fact woven from green threads. If, on the other hand, Ruby wins, he can then jump all day around Python making fun of him by shouting “Global Interpreter Lock!”. Ruby’s dad is from Japan and he really likes his kid. He’s so protective, and sometimes he worries so much about security of his child, that his child can’t develop normally because of him. C is also like older brother to Ruby and helps him a lot when he’s stuck.

PHP is one of the weirdest kids out there. He’s there, loves C, but rarely hangs with him or asks for help, mostly because he never needs it. He’s smallest, but also one of the fastest, most alive and most popular kids out there. Python and Ruby want to be as popular and fast as he is, when they grow up. Also, PHP doesn’t respect anyone, he is kind of rebel and has his own ways. One time, for example, kids wanted to build sand castle. They all gathered and started to talk how to do it. They mentioned “frameworks”, “scalability”, “paradigms”, “design patterns” and all those other stuff kids talk when they build a sand castles, and suddenly, in the middle of talk, they turned and saw that PHP already built his castle. He just said “Architecture – who needs that” and built it. It might not last long and you can’t build new floor on top of it, but ironically, his castle was better and more stable then any of the other castles.

Of course, not all the kids likes C and playing with him. There is this kid called Java. Although he is dependent of C, he think he’s better then him and doesn’t like to ask him for any help. Yes, he is respecting him, but thinks he can do everything on his own. He doesn’t like to play with other children and is very introvert. This is, because when he was younger, he was extremely fat and slow, and other kids always made fun of him. He is not accepted since then. All the kids remember when once one long bearded man, dressed like a hippie calling himself RMS, came to them one day and talked to them that they should avoid Java because he is not open to other kids and speaking on and on about trap they will fall into if they hang out with Java. This made Java grow inferiority complex. Inferiority complex soon developed into superiority complex and that explains his behavior a lot. Since then, Java tried hard to overcome that obesity problem he had and although he’s slim now, scars from bullying and wrinkled skin are still visible. Even today, he tries to be more open, but like it’s all in vain. He doesn’t even want to hear about the other kids, he created his own tools, his own toys that are not compatible with others kids toys, even his own part of the playground he calls open and accessible for everyone, and tries to lures other kids to join him, but other kids know that, once you enter his part of the playground, there is no returning back. Because of lack of other kids’ company, he artificially created his own kids from his special DNK called JVM and now plays with them.

There is one other kid who also thinks he’s too good to hang with C. His name is C#. He is just an ordinary kid, but he thinks that somehow, he is better then other kids. He wears corporate suit with pink tie and always keep his head high. He doesn’t speak with other children – they are all stupid and immature for him. Always surrounded with his fathers, who also wear corporate suits and forbid him to play with other kids. He is very spoiled because he’s very rich and his fathers buy him everything he wants. His suit is always clean because he really doesn’t want to play very much. If, for example, he needs to climb a tree, he just calls one of his dads and order him to buy him a ladder. Similar like Java, he has all the tools built for him by his fathers as people from community rarely donate any of the tools for him. Other kids and other kid’s fathers despise him because of his attitude and don’t want to have anything with him. Only ones that adores him very, very much are some other elders that also have corporate suits with pink ties, because they like that he is always safe and secure with his fathers.

Oh, I almost forgot one other kid. His name is Visual Basic. Unfortunately, he is retarded. He just sits all day long by the sandbox, with his head low, drooling in sand and hitting himself in head with his hand. Poor kid.

Posted in Uncategorized | Leave a comment

Programming to the people

On my work, as an exercise, we needed to take some time and come up with a “vision“. This is the complete, unedited text of the vision I always had and dreamt about, just now I needed to materialize it with the words and present it to collegues. I am sharing it now with the world.

My vision is simple.

My vision is that my boss, gets fired.

Did I get your attention? I hope I did, because when you skimmed over this text, you probably though „Oh my God, look how much this guy’s vision is long, who is going to read all that“. OK, let me clarify this some more for you – I want myself also to be fired; I want my colleagues to get fired; I don’t want programming to exist at all!

People to the programming

 

This may sound pretty radical to you, but this is intentional (and by the way, if it does – thank you). Joking aside, I want you to look at the current state of „computing“ or whatever you want to call it. Today, you get some box we call computer and this box is driven by something we call OS. On top of that, we got a bunch of programs we use to accomplish something. These programs are real little gems. Each and every one of them is designed to have some purpose. All of them are hand-crafted, nourished, polished and constantly taken great care of. All of them required man labor to create them and keep them updated. A real calories-burning, sweat-in-the-pants, hemorrhoids-in-the-butt type of labor. Never underestimate even the simplest of programs such as Notepad, as each one of them is a little piece of art; where every line is looked thousands of times, every condition is triple-checked, every bug is opened and resolved three times before finally closed! Is this sustainable? I think not. And that’s only one of the problems. Bigger problem, as I already said, is that all of those programs, no matter how big they are, have static purpose – they let you solve one problem. No matter how complex they are (think of Excel), you are the one that needs to drive them and you are the one that needs to be creative and thorough to solve your problem; programs are merely a helpers. And to make things even worse, your problems are usually not solved by one program alone – you need to switch constantly between something called “windows” to get your work done.

script

To conclude – programs are stupid helpers and you are the one doing all the work, not programs. You…you, my friend should make choices, not solving simple problems! So, I say:

Programming to the people

 

Imagine now what it would be like to have machine to be more than just “compute” in “computer” – imagine AI built on those machines, so advanced that it is capable to think as a human, but at the same time powerful enough to complement man in what we are not good at – “computing”. I am aware that this idea is not new and it’s just a cliché – almost every SF book or movie have AI embedded somewhere in it. But let’s forget for a moment Arthur Clarke’s HAL9000, Asimov’s Multivac or Skynet from Terminator and hundreds of others. They are all sweet and nice, but if you remove the characters and plots, or evilness from AIs, you are left with one idea – and the sole purpose of this vision is to emphasize this idea. Having AI means that your machine could be converted from set of unconnected, isolated, primitive tools to your real assistant – assistant that understand the context of problems you are trying to solve. Just imagine the freedom you could have when you could articulate what your problem is to AI and let it solve it for you. Implication to this are enormous. First of all, people will have more time to focus on their businesses and on important decisions, AI is the one that will do the boring stuff. Second of all, there will be no programming as a concept and no developers; in fact, everybody will be developer in some way. Mary from Southborough, England will not have to bug her shy geek friend to create her program to rename her pictures from vacation as date pattern (she would not have sex with him anyway) nor she would need to search on the internet programs called “Super JPEG Renamer 3000” just to extract EXIF data from pictures and batch rename them – she would just explain to her computer what she needs to accomplish. Deepak from Bhopal, India, avid writer and former developer, will not have to hire an IT company or fiddle with WordPress just to create his blog that explains why new AI technology sucks – he will ask computer to do all the boring stuff and he will just pick a domain, theme, options and start writing immediately. Larry from Louisiana, USA, also known as “Fatty McFatFat” on RottenTomatoes, 35 years old who is still living in a basement in his parents’ house always wanted to be a movie director – he doesn’t need big studio or fancy programs to create CG effects, he will direct whole movie from the comfort of his couch in the basement. Cristobal from Santiago, Chile have a charitable organization that collect second-hand clothes and he wants to know what type of clothes people needs most and for what gender. He have all the data, but he doesn’t know how to query database to get that information – computer is there for him to figure out the existing schema and to obtain any information he is requesting from it.

ex-facebook

As you probably noticed, I am giving examples as if this AI exists today. In reality (or at least, in a theoretical reality), decades will pass before we have that AI and a lot of crucial things will change. Possibilities I am presenting here are probably ridiculously elementary and without imagination, but constant is the same – computer should free us. Are we going in the right direction? Well, I think it depends how you look at the things. I feel we are more interested in optimizing our lovely little retarded tools that we currently have than investing in this approach. On the other hand, with current state of our knowledge of physics, materials we use and state of software engineering, I don’t think we could do better at this point anyway – it will take time for conditions to evolve before we start going there. Anyhow, child in me like to think that this is the future and that this future is near…even if it means I would be part of layoff as technologically redundant.

Power to the people, right on

 

Similar to how we managed to bring books from monastery’s elite to peasants using Gutenberg’s machine, similar to how we provided electricity to those that couldn’t tell the difference between AC and DC even if it hit them in the head and similar to how we brought computers even to the people who were not wearing thick glasses and didn’t have tics, goal for the centuries to come is to bring programming to masses. And this is the next step to really empower people. I am hoping only that there will come a day when human civilization will see and speak to real Multivac and not only just read about it. And as for the reading – thank you for reading this!

Posted in Uncategorized | Leave a comment

Allow access to whole S3 Bucket to IAM user

It took me a while to figure this out. Googling helped, but the answers are not obvious. So, you have IAM user and you want to grant that user complete read-write access to some bucket. Catch is that you need two statements to achieve this. Here is full bucket policy (just replace “YourIAMUser” and “YourBucketName” in the policy below):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Principal": {
                "AWS": "arn:aws:iam::821707826313:user/YourIAMUser"
            },
            "Resource": [
                "arn:aws:s3:::YourBucketName"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Principal": {
                "AWS": "arn:aws:iam::821707826313:user/YourIAMUser"
            },
            "Resource": [
                "arn:aws:s3:::YourBucketName/*"
            ]
        }
    ]
}

So, explanation now – as I already mentioned, notice that we have two separate statements (lines 3-14 and 15-28).

  • First one allow IAM user to “list buckets” (line 6) and resource given here is just plain ARN to the bucket (line 12)
  • Second statement gives that IAM user permissions on objects in bucket (lines 18-20), but resource given here is path to your bucket plus “/*” (line 26). This is the key thing I was missing when trying to create policy using AWS policy tool.

Hope this helps you!

Posted in Uncategorized | Leave a comment