Semantic Werks

Thoughts on people, machines and systems.

My sample Google/Microsoft interview question

with 4 comments

As in, one I have to deal with, as opposed to being asked.

Let’s say you want to calculate the powerset P of a list of items. For each subset of P you will do a non-zero amount of work, so we would like to make P really small. The size of P is … 2^N. Meaning of course for any useful input, P is enormous, heat-death of the universe type size.

We have some function f(s) that takes a subset member  of P (a set) and returns True or False. If True, we want to filter out all the subsets of the subset set.  E.g., if f([1,2,3]) = True, we would like to remove [1,2], [1], [2] etc. from P (so we don’t evaluate them as well). So the question is, what is an efficient way to do this? The naïve solution is to generate all the members of P, then iterate over them to remove the members that are subsets of a solution. But of course, for large N, this is infeasible to store and to do. Consider the case where the member we find is the set of all items. In this case we should stop our loop. A good solution will allow us to do this in constant time (and not by using a special case).

Update: I should make it clear that f([1]) = T and f([2]) = T does not mean f([1,2] = T.

Solutions I’ve pondered include using a bit matrix and bit masking, but I can’t see how to do that in constant time. We might also represent the subsets using a tree with related by subset, which would probably be log or sublog time.

One question I have is whether these questions are always checked for solvability. I mean, it is relatively easy to pose a question that is not solvable in deterministic polynomial time, and just as easy to pose one whose answer cannot even be checked for correctness. That would be a rather unfair question, wouldn’t it? Like, find a linear time algorithm for finding a truth assignment to the following formula. If you can solve that you should probably start your own company.

Written by Neil

2009 November 14 at 08:38

Posted in Uncategorized

Tagged with , ,

4 Responses

Subscribe to comments with RSS.

  1. Hey Neil, I probably don’t understand the problem exactly because the following solution seems trivial. First off, I assume that f takes an element of P (the powerset)… that is, f takes a set. In your post you say it takes a subset, but that would mean f takes a set of sets, no? I assume not. Anyhow, if f returns true on set S then you say the we must remove all of the elements in the power set of S from P. Presumably f also returns true for all the elements of the powerset of S if it returns true for S. If so, then it returns true for each of the original set items (the ones that which P is the powerset of) that appear in any S which f(S) returns true on.

    So solution: just run f on every element in the original set of items and filter out the powerset of the set of items for which f returns true. E.g. if f returns true for f(1), f(3), and f(5), subtract powerset([1,3,5]) from P and you’re done.

    What am I missing?

    Jon P

    2009 November 14 at 12:37

    • S is a set in P, I should fix that.

      The trouble is that f(1) = True and f(2) = True does not imply F([1,2]) = True. What I’m dealing with are requirements, so P is a powerset of requirements, and requirements might interact, e.g., requirement 1 makes it impossible to achieve requirement 2.

      We want to maximize the set of requirements we can achieve (where f() tells us if we can achieve that set). So we just remove subsets of it without knowing or caring what their individual results are.

      thanks for the comment!

      Neil

      2009 November 14 at 13:23

  2. Knuth’s Art of Computer Volume 4 on Combinatorics, Permutations and Combinations would give you the answer.

    As long as you are allowed to check each subset once and in order you can do this with the storage limit of a single job. How? Iterate through all combinations, one at a time, and then operate on each of those subsets.

    So as long as all you need to do is iterate and test, you’re good, you don’t need crazy memory.

    [1] Donald E. Knuth, The Art of Computer Programming, Volume 4, Fascicle
    2: Generating All Tuples and Permutations. Addison Wesley Professional,
    2005. ISBN 0201853930.

    [2] Donald E. Knuth, The Art of Computer Programming, Volume 4, Fascicle
    3: Generating All Combinations and Partitions. Addison Wesley
    Professional, 2005. ISBN 0201853949.

    [3] Michael Orlov, Efficient Generation of Set Partitions,
    http://www.informatik.uni-ulm.de/ni/Lehre/WS03/DMM/Software/partitions.pdf.

    also if you like perl: http://search.cpan.org/~fxn/Algorithm-Combinatorics/Combinatorics.pm

    Anonymous

    2009 November 14 at 16:04

  3. The math is a little beyond the time I have, but … this seems to address the space issue — won’t a generator do the same thing? — but I don’t see how it handles the time issue. For N items I still need to do 2^N operations, don’t I? I really need to filter the subsets of a given solution, so I never need to operate on them.

    Thanks for the links.

    Neil

    2009 November 14 at 16:25


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 384 other followers