Notes for Computer Security

I’m not a security expert. For a long time I used the same terrible password everywhere because I’m a lazy moron. But I finally cleaned up my act. The first thing to decide is who is my adversary. I’m not trying to protect myself against government agencies nor violent crime lords. In both cases they will merely cut off my fingers until I give up my password. I’m primarily worried about hackers grabbing passwords in bulk from web sites. I’m only a tiny bit concerned about hackers getting on my machine directly.

I used Diceware to come up with a 6 word password, which is about 77 bits. I then changed a few characters randomly. Each change adds another 10 bits. I wrote it down on paper and store it in my wallet. I use this as my master password for LastPass. In addition, I use 2 factor authentication (2FA) using Google Authenticator to generate login codes (TOTP). Using LastPass I’ve generated unique, big, complex passwords for all the sites I use. Basically, there’s no way for me to log into anything without LastPass. For important sites that support it, I use 2FA with Authenticator. For these same sites I generated backup codes and stored them as secure notes in LastPass.

I use LastPass app on my iPhone 5. It’s a pain to type in my master password, so I use a PIN code to protect it. I wish that LastPass copied the iPhone’s Erase Data option: after 10 failed attempts it should delete the LastPass encrypted file. For someone to get this data, they need to get through the iPhone PIN and the LastPass PIN. I should use a password instead of a simple 4 digit passcode on the iPhone.

On my Mac I use the LastPass browser extension and the LastPass desktop app. Neither of these are password protected, and I turned off 2FA. I can set them to auto logoff after a timeout, but then I have to type in my master password every time. I’d prefer to enter a PIN code, which falls back to the master password after 3 failures. Instead, the only thing protecting my computer is a terrible account password. I should change that. Again, I wish the Mac supported a PIN for quick login and a backup password. Or 2FA somehow.

My laptop is encrypted with FileVault. Backups run by Crashplan are encrypted with my account password. The spare computer holding my backups has a terrible password and is not encrypted. I think that’s all ok so far. Though I’m not sure why security people trust FileVault and not Microsoft’s BitLocker.

For some inexplicable reason, none of my banks — the most important sites to protect!! — support 2FA. Etrade does support Symantec’s VIP Access app. However, why don’t they support Authenticator or just plain TOTP? Also, if I turn it on then I can’t use Quicken or Mint.com. It should support app-specific passwords that have read-only access to my accounts. This is a serious weakness.

I really don’t like Google Authenticator. I’d prefer to use Authy. It’s easier to use and it stores an encrypted backup of your codes to load onto another phone. But I don’t quite understand how they ensure the other phone is under my control. Could a hacker fool them into copying my Authy data to another device? Plus, their desktop app doesn’t feel like it’s in the spirit of 2FA.

LastPass is the center of my security universe. If someone breaks into LastPass, as they did recently, I’m pretty much screwed. Other people use KeePass because they have control over the encrypted DB. But it isn’t easy to use like LastPass. It’s a tradeoff I’m willing to make. I’d like to use Yubikey when LastPass supports U2F. This would prevent phishing attacks. However, I’m still vulnerable if someone figures out how to write a page that extracts data from the browser extension. Also, I’m not comfortable with their account recovery schemes. I think there’s a way for an attacker to turn off 2FA and then force recovery using a OTP stored in your browser. This would work if they have your laptop, including email access. I should fix this somehow.

Overall this setup is moderately secure, but the NSA could break it in a jiffy.

3-way merge of PowerPoint with git

This doesn’t work for some reason, but I’ll make a note and look at it later. On the Mac’s Script Editor, the dictionary for MS PowerPoint 2011 says it has a method called mergeWithBaseline. That sounds like a 3-way merge. This is a sketch of what the code should look like. For some reason I can’t get mergeWithBaseline to do anything.

#!bash
# pptmerge LOCAL REMOTE BASE MERGED
osascript -l JavaScript << EOF
PPT = Application("Microsoft PowerPoint");
PPT.activate()
local = Path($1)
PPT.open(local)
PPT.activeWindow.presentation.mergeWithBaseline({withRevisionPath: $2, withBaselinePath: $3})
PPT.activeWindow.presentation.endReview()
PPT.save({in: $4, as: "save as presentation"})
PPT.quit()
EOF

In the .gitconfig there should be something like this:

[mergetool "powerpoint"]
    cmd = pptmerge \"$LOCAL\" \"$REMOTE\" \"$BASE\" \"$MERGED\"

It looks like you can select different merge tools with “git mergetool –tool=powerpoint fileName”. Is there a way to trigger this by file extensions (.pptx)?

Build an interpeter in .NET

A long while ago I noted one could build an efficient language interpreter if the runtime could inline operations more aggressively. .NET has long had an attribute called MethodImplAttribute that gives the runtime some hints on how to compile a method. In .NET 4.5 they added one more hint: AggressiveInlining. This instructs the JIT to inline the method if possible. It took some doing, but I verified that it does inline methods most of the time. This post talks more about it. Plus, at the bottom is a comment from (probably) a .NET dev that says, “It simply turns off all our heuristics, and causes us to inline the method if the JIT is capable of inlining the method. The reason the documentation is so vague is that there are limitations to what the JIT can actually inline…”. So it works, but it might not sometimes.

To determine if inlining is working you can scan the ETW output from the JIT as described here. I did the following:

// Run wevtutil only once on your machine. It should work fine forever after. :)
wevtutil im C:\Windows\Microsoft.NET\Framework\v4.0.30319\CLR-ETW.man
// Do the following each time.
logman start clrevents -p {e13c0d23-ccbc-4e12-931b-d9cc2eee27e4} 0x1000 5 -ets
// RUN YOUR PROGRAM IN RELEASE MODE, NOT IN THE DEBUGGER
logman stop clrevents -ets
tracerpt clrevents.etl

Search in dumpfile.xml for MethodJitInliningSucceeded and MethodJitInliningFailed. They will give you a reason for what it chose to do. To make this a bit easier you might be able to use Reactive Extensions to monitor the logfile as described here.

The goal is to inline and optimize the interpreter directly into the program, thus creating a compiled program. This is the first Futamura projection. This is already pretty good, but to get the 2nd and 3rd projections you’d need to do the inlining at the bytecode level at compile time, not at runtime. That way you can save the optimized assembly and specialize again. This should exist somewhere. I couldn’t find anything comparable in the JVM. Perhaps LLVM? Maybe Rosyln?

Storm w/ Reactive Extensions and Dataflow

The Apache Storm project is a distributed dataflow framework. It’s used by Twitter to process a continuous stream of tweets through a network of machines. Microsoft’s TPL Dataflow library is similar, but only works on a single process. To get an approximation in .NET I want to convert MSMQ into a Spout.

One solution is to convert MSMQ into an Observable that pushes elements into a Dataflow block. I packaged this into the constructor for a QueueSourceBlock, which implements ISourceBlock. Here’s the code snippet:

public QueueSourceBlock(string queueAddress)
{
    var queue = new MessageQueue(queueAddress);
    queue.Formatter = new XmlMessageFormatter(new Type[]{typeof(T)});
    var tb = new TransformBlock<Message, T>(m => (T) m.Body);
    block = tb;
    var queueObserveOnce =
        Observable.Defer(
            () =>
                Observable.FromAsync(
                    () => Task<Message>.Factory.FromAsync(queue.BeginReceive(), queue.EndReceive)));

    queueObservable = Observable.While(() => true, queueObserveOnce);
    queueDispose = queueObservable.Subscribe(m => tb.Post(m));
}

The FromAsync method will receive only one message from the queue. The Defer method will generate a new FromAsync call on demand. The While method will keep calling the Defer forever. Whenever a message arrives from the Observable, it calls Post to push it into the TranformBlock. This block will extract the data and send it to the next node it’s linked to. This code doesn’t handle cancellation. AFAIK, there’s no way to cancel the BeginReceive on a queue, but I could support cancellation in other places.

Surprisingly there doesn’t appear to be a way to split a stream in .NET’s Dataflow. They’ve got a JoinBlock that merges streams, but not a SplitStream. I think if you increase the parallelism in a block it behaves sort of like “shuffleGrouping” in Storm. Still, it’s a weird oversight.

Simple Dropbox client w/ Reactive Extensions

Reactive Extensions is ideally suited to the task of monitoring a directory for changes. While the events from FileSystemWatcher are ok, it isn’t efficient. The goal is to send as little data as possible to the server. If a file is moved or copied, there’s no need to upload the file again. Instead, you should recognize the event on the client and simply send a Moved or Copied message pointing to the original file on the server.

Here’s a simple prototype. I left out all the code to track the files with hashes. It gets a little trickier trying to track directories. If a directory is deleted, the entire subtree is deleted. You can use SQLite with the closure.c extension to track hierarchical data.

Anyway, this proof-of-concept is easy. Cut and paste lost some formatting. Stupid tabs.

    public class DropboxClient
    {
        private readonly FileSystemWatcher watcher;

        public IObservable<DropboxEventArg> FileSystemObservable { get; private set; } 

        public DropboxClient(string home)
        {
            watcher = new FileSystemWatcher
            {
                Path = home,
                EnableRaisingEvents = true,
                IncludeSubdirectories = true
            };

            SetupRx();
        }

        private void SetupRx()
        {
            var changed = Observable.FromEventPattern<FileSystemEventArgs>(watcher, "Changed");
            var created = Observable.FromEventPattern<FileSystemEventArgs>(watcher, "Created");
            var deleted = Observable.FromEventPattern<FileSystemEventArgs>(watcher, "Deleted");
            var renamed = Observable.FromEventPattern<FileSystemEventArgs>(watcher, "Renamed");

            // Often it repeats a change event for every little change (timestamp, file size, etc). 
            var dbchanged = changed
                .DistinctUntilChanged()
                //.Do(UpdateRecord)
                .Select(fe => new DropboxEventArg(fe.EventArgs));

            // This seems to work fine, I think
            var dbrenamed = renamed.Select(fe => new DropboxEventArg(fe.EventArgs));

            // Deleted is ok, too
            var dbdeleted = deleted
                //.Do(DeleteRecord)
                .Select(fe => new DropboxEventArg(fe.EventArgs));

            // If file already exists, then a created file is a copy of another file
            var dbcreated = created
                .Select(fe =>
                {
                    if (FileExists(fe.EventArgs.FullPath))
                        return new DropboxEventArg(fe.EventArgs, DropboxChangeTypes.Copied);
                    else
                    {
                        //CreateRecord(fe);
                        return new DropboxEventArg(fe.EventArgs);
                    }
                });

            FileSystemObservable = dbchanged.Merge(dbrenamed).Merge(dbdeleted).Merge(dbcreated);
        }

        private void CreateRecord(EventPattern<FileSystemEventArgs> fe) {
            // Create row in repo
            throw new NotImplementedException();
        }

        private void UpdateRecord(EventPattern<FileSystemEventArgs> obj)
        {
            // If file size is different, rehash and update
            // If dir, maybe do nothing. Not sure.
            throw new NotImplementedException();
        }

        private void DeleteRecord(EventPattern<FileSystemEventArgs> obj) {
            // Delete file from repository
            // If directory, delete entire subtree from repo
            throw new NotImplementedException();
        }

        private bool FileExists(string fpath)
        {
            // If file, hash and lookup 
            // If dir, maybe do nothing
            return false;
        }
    }

    public enum DropboxChangeTypes
    {
	Changed,
	Created,
	Deleted,
	Renamed,
	Moved,
	Copied
    }

    public class DropboxEventArg {
	public DropboxChangeTypes ChangeType;
	public string FullPath;
	public string Name;

	public DropboxEventArg()
	{
	}

	public DropboxEventArg(FileSystemEventArgs fe)
	{
		FullPath = fe.FullPath;
		Name = fe.Name;
		switch (fe.ChangeType)
		{
			case WatcherChangeTypes.Changed: 
				ChangeType = DropboxChangeTypes.Changed; break;
			case WatcherChangeTypes.Created:
				ChangeType = DropboxChangeTypes.Created; break;
			case WatcherChangeTypes.Deleted:
				ChangeType = DropboxChangeTypes.Deleted; break;
			case WatcherChangeTypes.Renamed:
				ChangeType = DropboxChangeTypes.Renamed; break;
		}
	}

	public DropboxEventArg(FileSystemEventArgs fe, DropboxChangeTypes ct)
	{
		FullPath = fe.FullPath;
		Name = fe.Name;
		ChangeType = ct;
	}
    }

Probabilistic Reactive Extensions: ProbablyDistinct

Reactive Extensions has an operator called Distinct. As data streams through, it filters out any items it has already seen before, allowing only unique items to pass to OnNext. The problem is Distinct stores all the unique items in a HashSet, which means it will consume a lot of memory if the number of unique items is large. The solution is to implement a ProbablyDistinct operator that uses a Bloom Filter to store unique items. A Bloom Filter is a very compact data structure that tests each item and replies “item is probably in the set” or “item is definitely not in the set”. In this case, there will be some false positives (it says it’s in the set, but it’s actually not). But it’s a tradeoff some applications might need, particulary long-running server apps that see lots of unique data items.

More broadly, it would be useful to have a few Rx operators that use probabilistic data structures. For example, the HyperLogLog can be used to estimate the count of large numbers of distinct items using very little storage. Another useful operator is the opposite of Distinct (Indistinct? SeenItBefore?). This is how most people use a Bloom Filter. Rather than do an expensive DB lookup, it first checks the filter to see if it is probably in the DB and then does the query.

Here’s the source for Distinct. All you have to do is replace the HashSet with a Bloom Filter (+ minor code tweaks).

        private static IObservable<TSource> Distinct_<TSource, TKey>(IObservable<TSource> source,
            Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
        {
            return new AnonymousObservable<TSource>(observer =>
            {
                var hashSet = new HashSet<TKey>(comparer);
                return source.Subscribe(
                    x =>
                    {
                        var key = default(TKey);
                        var hasAdded = false;

                        try
                        {
                            key = keySelector(x);
                            hasAdded = hashSet.Add(key);
                        }
                        catch (Exception exception)
                        {
                            observer.OnError(exception);
                            return;
                        }

                        if (hasAdded)
                            observer.OnNext(x);
                    },
                    observer.OnError,
                    observer.OnCompleted
                    );
            });
        }

LINQ Unfold Operator

In Scheme’s SRFI-1 List library there’s a fold and unfold function. The fold is similar to the Aggregate operator in LINQ. However, there doesn’t appear to be anything like an unfold in LINQ. What is it exactly? The Aggregate operator takes an IEnumerable and applies a Func to the elements until you get a single result value. The unfold operation takes a single value and returns an IEnumerable. I needed this because I wanted an easy way to get all the parent classes and interfaces for a type T.

/// seeds: the initial data to unfold
/// stop: if stop(seed) is True, don't go any further
/// map: transform the seed into the final data
/// next: generate the next seed value from the current seed 
public static IEnumerable<R> UnFold<T,R>(this IEnumerable<T> seeds, Predicate<T> stop, 
                                         Func<T,R> map, Func<T,IEnumerable<T>> next) {
	foreach (var seed in seeds) {
		if (!stop(seed)) {
			yield return map(seed);
			foreach (var val in next(seed).UnFold(stop, map, next))
				yield return val; 
		}
	}
}

I’m not too happy about this code, but it should probably work. Now to produce all the parents of a type T:

var parents = new[]{someType}.UnFold(t => t == null, t => t, 
                                     t => t.GetInterfaces().Concat(new[]{t.BaseType}))
                             .Distinct();

For type theory nerds this is called an anamorphism, which is the dual of a catamorphism. There’s a way to generalize this to arbitrary recursive data structures, described in the paper “A Fold For All Seasons” (PDF link). I’ll leave it to someone else to generalize this beyond IEnumerable.