At the confluence of machine learning and library science

I am taking a class titled “Information Retrieval” this semester. It covers general topics about how to organize information so that it can be easily searched, retrieved, and used later. Much of the content overlaps with previous coursework I’ve had on databases and machine learning, but with a different emphasis.

I’m really enjoying the assignments in the class so far. In the first one, we were given a collection of scanned postcards to analyze. We looked at them in batches of three (randomly selected), and then were asked to come up with an “attribute” that was true for two of the postcards but false for the third. After doing this 20 times, we each had a list of 20 descriptive postcard attributes; this process was referred to as “attribute elicitation.”

I was delighted! The representation question is at the heart of machine learning, too, but we rarely (if ever) are given the chance to MAKE UP THE ATTRIBUTES ourselves. (Unless, of course, it’s a data set that we’re creating, which is also rare.) I felt such freedom. At the same time, I realized that the objective wasn’t quite the same. In machine learning, you want a representation that maximizes your later ability to classify, cluster, or otherwise analyze the data. In library science, you want a representation that maximizes your later ability to find particular items that satisfy a query. This perhaps boils down to discriminability vs. findability.

This goes deeper than it may seem at first. Often the machine learning (ML) task at hand is one of classification, in which case the universe of classes of interest is known in advance. Each item can be assigned to one of those classes. The representation can be (and sometimes is) optimized to maximize performance in classifying the known classes of interest. One of the latest trends in ML is to use “deep learning” to manufacture the representation automatically.

For information retrieval, in the sense used by library science folks, the classes of interest are not known, nor is the goal to craft an automated classifier for future data. Instead, the system (and representation) should support a potentially unlimited variety of future user (human) queries about any of the items in the collection. Success is measured not by classification or clustering accuracy, but by how many queries successfully locate the desired item or items, and how easily this is achieved (from the user perspective).

Has anyone tried to apply deep learning to library collections? Would it be useful here?

There is a terminology shift between the fields, too. The process of deciding on a representation is called “attribute elicitation” in library science, not to be confused with “feature extraction” in ML, which means the (automated) calculating of feature *values*. That process (assigning attribute values to items), in turn, is called “indexing.” (After creating 20 attributes, we then indexed 10 postcards by filling in their values for each of the attributes.) In ML we generally don’t get to do that, either, and especially not in a manual fashion. It was fun!

Going through the attribute elicitation and item indexing process raised other questions for me. It quickly becomes obvious that some attributes are easier or faster to “compute” than others, even for humans doing the task. “Color image” vs. “not a color image” is an easy decision, but “picture of a French location” can be far more difficult, relying as it does on deeper domain knowledge and deeper analysis of the image.

Should we prefer those that are easier to compute, all other things being equal? If you assume human indexers, then it seems you’d also prefer attributes that are most likely to be consistently computed by different people. We talked briefly about the “indexing rules” (crafted by humans) that go along with any such representation, to help with consistency. However, there was no discussion about informativeness, discriminability, or other properties that would guide you in selecting the best attributes to use. Perhaps we’ll get to that later.

Our next task is a group exercise in creating a database catalog of any objects we like, other than books. My group has chosen candles, and we’re now discussing what the most useful attributes might be; what might one like to search on, when in need of a candle? Or candles?

We’re only required to input five (five!) items into the final database. If we manage to get a few more in there, I’m tempted to do a clustering or PCA analysis and examine the distribution of candles that we end up with. :)

How to fail to lease a car (in practice)

In July, I did a couple of test drives with the Leaf and other electric vehicles. In early August, I settled on the Leaf as my car of choice. Not one to be rushed into a purchase before doing my research, I passed up the last day of the August window for getting a free home charging station, figuring I would come back in September if I really wanted that car.

This may have been a mistake. (Or maybe not.)

That same day (August 10), I began negotiations with Rudy, my salesperson from Nissan Duarte, for a lease price. We dickered the price a little by email, and then we agreed that he’d contact me again on September 1 with updated prices “in case anything changes.” I would then be poised to swoop in on September 3, the first day that the free charging stations would be available that month, and drive away in my Leaf.

In the meantime, I called two other local dealerships and compared prices; they were all about the same. I also belatedly realized that the $2500 CA electric vehicle rebate was rapidly exhausting its funds. Would they last until September?

Late in the month, I did a bunch of research on leases so as to be a savvy consumer and learned that the prices I’d gotten were much inflated. But now I was equipped to really negotiate.

Monday
On September 1, I failed to receive updated numbers from Rudy. So I called him on Monday, September 2. I negotiated the price down. I was excited. This was working!

“You have a car for me?” I asked.

“Oh yes,” he said. “No problem.”

“In the color I want? Blue?” I realized he’d never asked me what color.

“Yes, we’ll have it. When do you want to make your appointment for?”

Tuesday
On September 3, I woke up all psyched to be getting a new car at my 7 p.m. appointment. All was going well until 2 p.m., when I received this email from Rudy:

Hi i know your at work so i did not want to call with this important information.
I am very sorry but the sv leaf that you we were out to get has been sold by the dealership. Is there a second color option?

I called him immediately. He confirmed: they had SOLD THE CAR I HAD AN APPOINTMENT TO LEASE that day.

“It’s first-come, first-served, you know?”

No, I did not want another color. I wanted the car I’d spec’d out. I asked him to see if other dealerships in the area had it. His answer: No. There was not a single Leaf SV in blue with premium package to be had in the Los Angeles area.

I couldn’t believe it. I hung up and started calling. I called 6 dealerships in an ever-widening circle, despite knowing they must be checking the same inventory database. No. No one had that car. I was stunned. THEY SOLD MY CAR!

A couple of hours later, I get a call from Nissan Alhambra. Miguel tells me that they don’t have the car, but will be getting one that night or maybe the next morning. I quote the amount I’d negotiated with Rudy, and Miguel sputters and says that it’s way too low. So I try to call Rudy back and see if he can get that car transferred to his dealership. Rudy is “with a client” so I instead get Matt, who says, “Oh yeah, I’m the one who sold your Leaf.” Thanks, Matt.

Matt claims that Alhambra may be lying about their inventory, as he looked “30 days out” and no one is getting that particular car. I can’t figure out why he doesn’t contact them himself to find out. We go in circles until I finally say, “So, what I’m hearing is, you’re not willing to CALL Alhambra and get this car from them, if it does exist. So basically, you don’t want my business.” Matt then helpfully adds that they actually DO have the blue Leaf that I want, because someone bought it two weeks ago and just traded it in for a Sentra. But they can’t lease it to me because they can only lease new cars, not two-week-old cars.

I give up on Duarte and call Alhambra back to begin negotiating. I get Emmett, a manager. At first he tries to say something about MSRP, but I’m not having any of that and after some arm-twisting, he seems to switch out of salesperson mode and become this completely reasonable, straight-talking guy who’s a pleasure to work with. We end up at a number for the car that’s actually $500 cheaper than the price Rudy and I had reached. Unlike Rudy, he glibly itemizes the “non-tax fees” so I know exactly what’s included. Emmett tells me that Danny will call me back the next day as soon as the car comes in.

Wednesday
I hear nothing from Danny, so I call the dealership at 6 p.m. Turns out that Danny wasn’t working that day. I ask for Emmett. He’s “with a client.” I get Omar, who transfers me to Thomas, an “inventory manager,” which sounds promising. Thomas says, “We have your car! Only it’s got the charcoal interior, not the light gray. Let me do another search and call you back.” He doesn’t. I call back at 6:30. Thomas says he found a blue Leaf with gray interior but needs to consult with Emmett to make sure the MSRP matches what we’d negotiated, and he’ll call me back in 5-10 minutes.

At 7 p.m., I call Thomas back. He ruefully reports that he “made a mistake” and there is no blue/gray Leaf. But they absolutely positively will have one by 10 a.m. the next morning (“just as you’re waking up!” he adds bizarrely), and that Emmett will call me.

Thursday
I don’t get any calls so I call Emmett at 11 a.m. He’s not in the office that day but takes my call on his cell. They don’t have the Leaf I want (model SV) and in fact the blue/charcoal one Thomas mentioned to me was actually a model S. Emmett hands me off to Ron, another manager, who is in the office that day. Ron turns out to also be a very reasonable, straight guy. The bad news from Ron: There is no blue Leaf SV to be found. The nearest one is in Seattle. No one is getting one for at least the next 30 days, which means the CA rebate would undoubtedly be exhausted. Would I consider another color?

At this point I was so exhausted that I contemplated it. Silver. Would I be okay with silver? I detested the idea of pursuing a stupid quixotic quest that turned so pointedly on something as inane as the color of the vehicle. Did I really need blue? But if I got silver, would I regret it for the 36 months of the lease? Ron and I briefly discuss whether I can have the car PAINTED. Turns out, no: or at least, not without paying a fine at the end of the lease for modifying a car I don’t own.

And then a weird thing happened. See, the nearest silver Leaf SV + premium package was 55 miles away, barely within the Leaf’s range. And that one, too, had the charcoal interior, not a desirable attribute in southern CA. There was also a silver SV with the light gray interior, but it was 110 miles away, definitely not in the Leaf’s range. Yes, they could have it shipped. Shipped! It hit me like a bizarre reality check. I’d talked myself into how I could make a Leaf work, since most of my driving does fall within the vehicle’s 75-mile radius. I think. But seeing how hard it was just to get the car within reach made me rethink that. I’ve been saying all along that I wanted the Leaf even though it wasn’t really the “practical” choice… and now, did I really want to bend over backwards to get a car that wasn’t really a practical car?

No.

I didn’t.

So I said, “No, thanks,” along with a “but you’ve been so great through all of this that I *wish* I were buying a car from you,” to which Ron laughed.

And I hung up the phone.

How to lease a car (in theory)

After researching and test-driving a number of electric and hybrid cars, I settled on the Nissan Leaf as my car of choice. Because electric car technology is still changing so much each year, and given the current battery range limitations (the Tesla Model S excepted) as well as the hefty incentives currently available, I determined that a car lease would be a better option for me than buying.

Benefits of buying a Leaf:

  • $7500 federal rebate
  • $2500 CA state rebate
  • Carpool use permission
  • Free home charging station ($1000 value)
  • Zero gas to buy (~$90/month for me)

Drawback (and yes, this is a biggie): 75 mile range.

At any rate, once I settled on the Leaf, I did some research to find out how to negotiate a lease, which was a new experience for me. I found some great advice on lease negotiation as well as how to find out empirical data on what people are actually paying for your car model in your area. (I ended up not needing to use this data, because the offer I got was sufficiently attractive already.)

Most educational (and confidence-building) for me was this information on how to calculate the monthly payment on a lease. It turns out that there is a standard formula, and it isn’t even complicated. You need to ask the dealer for some additional numbers, then plug them in. If your number and their offered number differ, insist that they justify the difference.

In the process of this calculation, you also learn the (effective, average) interest rate that you’re paying, even though the car is not “financed.” For me that was 4.272%.

I also valued the advice to focus on the total car cost first, even though you are not purchasing the vehicle. The monthly payment is directly determined by the car cost that you negotiate, and it is much easier to understand the car cost than a monthly payment figure, in which small monthly differences balloon into BIG total cost differences. ($40/month for 36 months = $1440.)

Armed with this information, I set out to do battle. The salesperson initially quoted me a monthly payment of $335 for the car I wanted: the Leaf model SV with the “premium” package and NO OTHER OPTIONS. I found that you have to re-state NO OTHER OPTIONS frequently and with volume. I wanted the premium package because it includes a back-up camera. It also comes with three more cameras for full surround vision and a Bose stereo system, none of which I needed. Hooray for option packaging. This number also included a $309 “marketing assessment fee” which I indicated that I did not want to pay. $335/mo
I then had to wait out the rest of the month so that I could time my purchase to get a free home charging station from Aerovironment. When I made contact again, the salesperson was offering a lower car cost ($1296 lower — and actually $1000 below invoice) with everything I wanted and without the marketing fee, for the suddenly lower and more attractive price of $299/month. I didn’t even need to use the average car sales data because the car cost was already lower than the empirical average.

That seemed a little weird (too good to be true?), so I went to work with the calculations. Using the numbers I’d been given and my math, the monthly payment worked out instead to $222/month. Wow! Where was the extra $77 coming from?

$299/mo
When I asked how he’d arrived at $299, the salesperson sent me a high-level list of costs along with a monthly payment that was mysteriously suddenly lower ($270), though still higher than expected. Where was the extra $48 coming from?

A variety of things:

  • VIN etching: $350. Some research indicates that this can increase the chance of recovery of a stolen car (it etches the VIN into the window glass, making it harder to disguise the car), but you get it for free in some locations or do it yourself with a home kit that costs $20-25. No thank you to this charge.
  • Documentation fee: $80.00
  • Non-tax fees: $921.75. Presumably this includes car registration and licensing. I looked up the cost for these items at the CA DMV website: $282. Plus, why would I (effectively) finance this charge? I decided to fold it in to my down payment instead.
$270/mo
My car insurance, of course, would go up to cover the new car. And a slight wrinkle: due to the range limitation, I would need to keep (and therefore maintain, license – $6/month, and insure – $31/month) my old car. However, that ’99 Nissan Sentra XE has a trade-in value of only $1400, so it wouldn’t have saved me much on the Leaf cost. My insurance cost (to cover both cars) would go up by $69/month, which is offset by the gas savings.

When all was said and done, I expected to pay ~$2500 to drive away in a car that was effectively free for 6 months, and then ~$230/month for the next 2.5 years. Not bad at all!

$230/mo

But that’s not what happened.

I learned that all the research in the world can’t help you if the car doesn’t exist.

My first day roving on Mars

I recently joined the Mars Exploration Rover team as a TAP/SIE (Tactical Activity Planner / Sequence Integration Engineer) for the Opportunity rover. That means it’s my job to sit in on the morning SOWG (Science Operations Working Group) meeting, in which the rover’s scientific goals for the day are set, and then work with payload, thermal, downlink, mobility, and other experts to come up with a plan to achieve those goals. What pictures will we take? When? Where will we drive? Is there enough power?

Reading the training documents only gets you so far. I’ve just begun “shadowing” the current TAP/SIEs so that I can learn on the job, watching over their shoulders through a day of planning. My first shift was on Wednesday, and it was supposed to be an “easy” day: pick one of two rock targets, drive towards it, and take some pictures looking backward at yesterday’s tracks.

Scientists dialed in from all over the country for the SOWG meeting. After some debate and consultation with the Rover Planners, they settled on the rock that had the easiest approach. The scientists signed off and we went to work building the plan.

The TAP/SIE’s job is facilitated by a bewildering array of scripts and tools. These allow for the setup, development, refinement, and checking of the plan. Are power or thermal constraints violated? Do we have enough onboard storage space for the new images to be collected and enough downlink allocation to get them back to Earth?

While the RPs (Rover Planners) settled in to their job of constructing the drive sequence, we worked on the full sol’s plan (a day on Mars, which is 24 hours and 40 minutes in Earth time, is called a sol). Very quickly we realized that the planned drive, despite covering only a couple of meters, would drain the rover’s battery dangerously low. Opportunity was starting the sol at only 80% charge because of two long instrument observations the previous sol.

We modified the plan to give the rover a morning “nap” in which it could sun itself and collect power, like a desert lizard. That helped the power situation, but not enough. Several iterations later, we finally squeaked by at 0.1 Amp-hour above the required threshold.

Meanwhile, the RPs were growing concerned about a different problem. To reach its goal, Opportunity would have to straddle a rock that, while small by human standards, could pose a risk to the rover’s instrument arm, which dangles slightly down when stowed for driving. The RPs put their 3D simulation of the rover and the terrain up for all to see, and we stared at the screen while they spun the rover and tried to examine the rock from all angles.

“Can we raise the arm while it drives over the rock?” I whispered to the TAP/SIE I was shadowing. “It’s risky to do that,” she whispered back. “The arm bobs around, especially going over a big rock.”

A few minutes later, a scientist on the telecon asked, “Can’t we just put the arm up?” but was quickly shot down by the TUL (Tactical Uplink Lead, head planner): “Too risky.” The TAP/SIE and I grinned at each other.

Ultimately, it was deemed too dangeous to drive over the rock with our current data (images the rover had taken the sol before), and they decided to drive up to that rock and stop. Post-drive imaging would illuminate the obstacle in more detail.

At the end of our shift, which apparently was two hours later than usual, we had a plan. We ran it through multiple checks and re-checks and manually confirmed all of the sequences. The final walk-through was punctuated with “check!” coming from different areas of the room as each person confirmed that their part was correctly represented. The plan was finalized and transmitted to the rover using the Deep Space Network later that night.

There’s nothing like seeing a job in action. I learned a lot about the steps involved in planning and (unexpectedly) a lot of re-planning. For the rover, today is “tosol” and yesterday is “yestersol.” I got to practice the phonetic alphabet, which is used to communicate letters (in rover sequence ids) with a minimal chance that they will be misheard. I even got to help out a bit as a second pair of eyes to catch typos, spot constraint violations, and suggest alternative solutions. And I’ll be back on shift next Monday!

Opportunity is near Endeavor Crater, working its way along a ridge that is at the perfect tilt to keep its solar arrays pointed toward the sun. This is important because Winter Is Coming, even on Mars, and we want to keep it sufficiently powered to make it through to spring — its fifth spring on Mars. (Opportunity landed 9.5 Earth years ago!)

Test-driving the Tesla

I slid into the driver’s seat of the Tesla Model S with eager anticipation. Would it live up to all the hype?

Yes.

This is a car the way cars should be. The fact that it’s also an electric car catapults it into the realm of Sublimely Awesome, but even without that it would be a thing of beauty and utility. Everything in it works. The touch screen is responsive. The user interface is simple, clean, and beautiful. The sun roof is generous, the passenger space expansive, the cargo space eyebrow-raising. The seats are comfortable, the ride is smooth. The backup camera view is cinematic. Really, there is nothing NOT to like about this car. It’s practically the Platonic ideal of a car.

Here’s the dashboard, which consists of a simple customizable display in front of the wheel and a simply IMMENSE touch screen in the center console:


You can’t imagine how huge this thing is until you sit down next to it. It’s the size of a legal pad. However, it is often used in split-screen mode (as shown here) to give you, effectively, two displays. Its response is never jumpy, just smooth. It has magical powers, like a touch slider that opens the sun roof to the percent specified and other touch buttons that let you change the suspension of the car, while driving.

Oh, speaking of driving: this car moves. At the free way on-ramp, I was warned to put both hands on the wheel, “because it takes people by surprise.” Boy, did it! You press on the accelerator and it leaves your stomach behind. I didn’t even get to floor it because we were already at the top of the ramp going 60 (zero to 60 in 4.2 seconds!) and I didn’t want to ram the cars ahead of me. I got onto the freeway and tested the acceleration from a starting point of 70 mph. Zoom! It *still* took off with the same unnaturally intense pickup (unnatural today because we don’t expect that from a car. BUT WE WILL.).

The Tesla battery design is a leap ahead of the other electric vehicles on the market. It is composed of 8000 individual lithium-ion batteries, each about the size of a AA battery. Loss of individual batteries doesn’t affect the overall performance, and Tesla (the company) constantly monitors your battery status and contacts you to bring the car in if there are failures. The batteries are liquid-cooled and therefore perform much better, especially on hot days, than other electric vehicles that use air-cooled batteries. The Model S also has a range of more than 250 miles, handily addressing range anxiety.

I still can’t bring myself to spend $62,000 (after rebates) on a car. But I’m almost persuaded that this car is actually worth it.

« Newer entries · Older entries »